Internal validation of risk models in clustered data: a comparison of bootstrap schemes

W. Bouwmeester; K.G.M. Moons; T.H. Kappen; W.A. van Klei; J.W.R. Twisk; M.J.C. Eijkemans; Y. Vergouwe

doi:https://doi.org/10.1093/aje/kws396

Internal validation of risk models in clustered data: a comparison of bootstrap schemes

W. Bouwmeester, K.G.M. Moons, T.H. Kappen, W.A. van Klei, J.W.R. Twisk, M.J.C. Eijkemans, Y. Vergouwe

Epidemiology and Data Science (VUmc)

Research output: Contribution to journal › Article › Academic › peer-review

22 Citations (Scopus)

Abstract

Internal validity of a risk model can be studied efficiently with bootstrapping to assess possible optimism in model performance. Assumptions of the regular bootstrap are violated when the development data are clustered. We compared alternative resampling schemes in clustered data for the estimation of optimism in model performance. A simulation study was conducted to compare regular resampling on only the patient level with resampling on only the cluster level and with resampling sequentially on both the cluster and patient levels (2-step approach). Optimism for the concordance index and calibration slope was estimated. Resampling of only patients or only clusters showed accurate estimates of optimism in model performance. The 2-step approach overestimated the optimism in model performance. If the number of centers or intraclass correlation coefficient was high, resampling of clusters showed more accurate estimates than resampling of patients. The 3 bootstrap schemes also were applied to empirical data that were clustered. The results presented in this paper support the use of resampling on only the clusters for estimation of optimism in model performance when data are clustered. © 2013 © The Author 2013. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Original language	English
Pages (from-to)	1209-1217
Journal	American journal of epidemiology
Volume	177
Issue number	11
DOIs	https://doi.org/10.1093/aje/kws396
Publication status	Published - 2013

Access to Document

https://doi.org/10.1093/aje/kws396

Cite this

@article{b5147b201e1d4d91942209dc0494aac3,

title = "Internal validation of risk models in clustered data: a comparison of bootstrap schemes",

abstract = "Internal validity of a risk model can be studied efficiently with bootstrapping to assess possible optimism in model performance. Assumptions of the regular bootstrap are violated when the development data are clustered. We compared alternative resampling schemes in clustered data for the estimation of optimism in model performance. A simulation study was conducted to compare regular resampling on only the patient level with resampling on only the cluster level and with resampling sequentially on both the cluster and patient levels (2-step approach). Optimism for the concordance index and calibration slope was estimated. Resampling of only patients or only clusters showed accurate estimates of optimism in model performance. The 2-step approach overestimated the optimism in model performance. If the number of centers or intraclass correlation coefficient was high, resampling of clusters showed more accurate estimates than resampling of patients. The 3 bootstrap schemes also were applied to empirical data that were clustered. The results presented in this paper support the use of resampling on only the clusters for estimation of optimism in model performance when data are clustered. {\textcopyright} 2013 {\textcopyright} The Author 2013. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.",

author = "W. Bouwmeester and K.G.M. Moons and T.H. Kappen and {van Klei}, W.A. and J.W.R. Twisk and M.J.C. Eijkemans and Y. Vergouwe",

year = "2013",

doi = "https://doi.org/10.1093/aje/kws396",

language = "English",

volume = "177",

pages = "1209--1217",

journal = "American journal of epidemiology",

issn = "0002-9262",

publisher = "Oxford University Press",

number = "11",

}

TY - JOUR

T1 - Internal validation of risk models in clustered data: a comparison of bootstrap schemes

AU - Bouwmeester, W.

AU - Moons, K.G.M.

AU - Kappen, T.H.

AU - van Klei, W.A.

AU - Twisk, J.W.R.

AU - Eijkemans, M.J.C.

AU - Vergouwe, Y.

PY - 2013

Y1 - 2013

N2 - Internal validity of a risk model can be studied efficiently with bootstrapping to assess possible optimism in model performance. Assumptions of the regular bootstrap are violated when the development data are clustered. We compared alternative resampling schemes in clustered data for the estimation of optimism in model performance. A simulation study was conducted to compare regular resampling on only the patient level with resampling on only the cluster level and with resampling sequentially on both the cluster and patient levels (2-step approach). Optimism for the concordance index and calibration slope was estimated. Resampling of only patients or only clusters showed accurate estimates of optimism in model performance. The 2-step approach overestimated the optimism in model performance. If the number of centers or intraclass correlation coefficient was high, resampling of clusters showed more accurate estimates than resampling of patients. The 3 bootstrap schemes also were applied to empirical data that were clustered. The results presented in this paper support the use of resampling on only the clusters for estimation of optimism in model performance when data are clustered. © 2013 © The Author 2013. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

AB - Internal validity of a risk model can be studied efficiently with bootstrapping to assess possible optimism in model performance. Assumptions of the regular bootstrap are violated when the development data are clustered. We compared alternative resampling schemes in clustered data for the estimation of optimism in model performance. A simulation study was conducted to compare regular resampling on only the patient level with resampling on only the cluster level and with resampling sequentially on both the cluster and patient levels (2-step approach). Optimism for the concordance index and calibration slope was estimated. Resampling of only patients or only clusters showed accurate estimates of optimism in model performance. The 2-step approach overestimated the optimism in model performance. If the number of centers or intraclass correlation coefficient was high, resampling of clusters showed more accurate estimates than resampling of patients. The 3 bootstrap schemes also were applied to empirical data that were clustered. The results presented in this paper support the use of resampling on only the clusters for estimation of optimism in model performance when data are clustered. © 2013 © The Author 2013. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

U2 - https://doi.org/10.1093/aje/kws396

DO - https://doi.org/10.1093/aje/kws396

M3 - Article

C2 - 23660796

SN - 0002-9262

VL - 177

SP - 1209

EP - 1217

JO - American journal of epidemiology

JF - American journal of epidemiology

IS - 11

ER -