External validation of prognostic models for critically ill patients required substantial sample sizes

N. Peek; D. G. T. Arts; R. J. Bosman; P. H. J. van der Voort; N. F. de Keizer

doi:https://doi.org/10.1016/j.jclinepi.2006.08.011

External validation of prognostic models for critically ill patients required substantial sample sizes

N. Peek, D. G. T. Arts, R. J. Bosman, P. H. J. van der Voort, N. F. de Keizer

Research output: Contribution to journal › Article › Academic › peer-review

68 Citations (Scopus)

Abstract

OBJECTIVE: To investigate the behavior of predictive performance measures that are commonly used in external validation of prognostic models for outcome at intensive care units (ICUs). STUDY DESIGN AND SETTING: Four prognostic models (Simplified Acute Physiology Score II, the Acute Physiology and Chronic Health Evaluation II, and the Mortality Probability Models II) were evaluated in the Dutch National Intensive Care Evaluation registry database. For each model discrimination (AUC), accuracy (Brier score), and two calibration measures were assessed on data from 41,239 ICU admissions. This validation procedure was repeated with smaller subsamples randomly drawn from the database, and the results were compared with those obtained on the entire data set. RESULTS: Differences in performance between the models were small. The AUC and Brier score showed large variation with small samples. Standard errors of AUC values were accurate but the power to detect differences in performance was low. Calibration tests were extremely sensitive to sample size. Direct comparison of performance, without statistical analysis, was unreliable with either measure. CONCLUSION: Substantial sample sizes are required for performance assessment and model comparison in external validation. Calibration statistics and significance tests should not be used in these settings. Instead, a simple customization method to repair lack-of-fit problems is recommended

Original language	English
Pages (from-to)	491-501
Journal	Journal of Clinical Epidemiology
Volume	60
Issue number	5
DOIs	https://doi.org/10.1016/j.jclinepi.2006.08.011
Publication status	Published - 2007

Access to Document

https://doi.org/10.1016/j.jclinepi.2006.08.011

Cite this

@article{e7758458d5aa4c52a12cd5a60fd292f6,

title = "External validation of prognostic models for critically ill patients required substantial sample sizes",

abstract = "OBJECTIVE: To investigate the behavior of predictive performance measures that are commonly used in external validation of prognostic models for outcome at intensive care units (ICUs). STUDY DESIGN AND SETTING: Four prognostic models (Simplified Acute Physiology Score II, the Acute Physiology and Chronic Health Evaluation II, and the Mortality Probability Models II) were evaluated in the Dutch National Intensive Care Evaluation registry database. For each model discrimination (AUC), accuracy (Brier score), and two calibration measures were assessed on data from 41,239 ICU admissions. This validation procedure was repeated with smaller subsamples randomly drawn from the database, and the results were compared with those obtained on the entire data set. RESULTS: Differences in performance between the models were small. The AUC and Brier score showed large variation with small samples. Standard errors of AUC values were accurate but the power to detect differences in performance was low. Calibration tests were extremely sensitive to sample size. Direct comparison of performance, without statistical analysis, was unreliable with either measure. CONCLUSION: Substantial sample sizes are required for performance assessment and model comparison in external validation. Calibration statistics and significance tests should not be used in these settings. Instead, a simple customization method to repair lack-of-fit problems is recommended",

author = "N. Peek and Arts, {D. G. T.} and Bosman, {R. J.} and {van der Voort}, {P. H. J.} and {de Keizer}, {N. F.}",

year = "2007",

doi = "https://doi.org/10.1016/j.jclinepi.2006.08.011",

language = "English",

volume = "60",

pages = "491--501",

journal = "Journal of Clinical Epidemiology",

issn = "0895-4356",

publisher = "Elsevier USA",

number = "5",

}

TY - JOUR

T1 - External validation of prognostic models for critically ill patients required substantial sample sizes

AU - Peek, N.

AU - Arts, D. G. T.

AU - Bosman, R. J.

AU - van der Voort, P. H. J.

AU - de Keizer, N. F.

PY - 2007

Y1 - 2007

N2 - OBJECTIVE: To investigate the behavior of predictive performance measures that are commonly used in external validation of prognostic models for outcome at intensive care units (ICUs). STUDY DESIGN AND SETTING: Four prognostic models (Simplified Acute Physiology Score II, the Acute Physiology and Chronic Health Evaluation II, and the Mortality Probability Models II) were evaluated in the Dutch National Intensive Care Evaluation registry database. For each model discrimination (AUC), accuracy (Brier score), and two calibration measures were assessed on data from 41,239 ICU admissions. This validation procedure was repeated with smaller subsamples randomly drawn from the database, and the results were compared with those obtained on the entire data set. RESULTS: Differences in performance between the models were small. The AUC and Brier score showed large variation with small samples. Standard errors of AUC values were accurate but the power to detect differences in performance was low. Calibration tests were extremely sensitive to sample size. Direct comparison of performance, without statistical analysis, was unreliable with either measure. CONCLUSION: Substantial sample sizes are required for performance assessment and model comparison in external validation. Calibration statistics and significance tests should not be used in these settings. Instead, a simple customization method to repair lack-of-fit problems is recommended

AB - OBJECTIVE: To investigate the behavior of predictive performance measures that are commonly used in external validation of prognostic models for outcome at intensive care units (ICUs). STUDY DESIGN AND SETTING: Four prognostic models (Simplified Acute Physiology Score II, the Acute Physiology and Chronic Health Evaluation II, and the Mortality Probability Models II) were evaluated in the Dutch National Intensive Care Evaluation registry database. For each model discrimination (AUC), accuracy (Brier score), and two calibration measures were assessed on data from 41,239 ICU admissions. This validation procedure was repeated with smaller subsamples randomly drawn from the database, and the results were compared with those obtained on the entire data set. RESULTS: Differences in performance between the models were small. The AUC and Brier score showed large variation with small samples. Standard errors of AUC values were accurate but the power to detect differences in performance was low. Calibration tests were extremely sensitive to sample size. Direct comparison of performance, without statistical analysis, was unreliable with either measure. CONCLUSION: Substantial sample sizes are required for performance assessment and model comparison in external validation. Calibration statistics and significance tests should not be used in these settings. Instead, a simple customization method to repair lack-of-fit problems is recommended

U2 - https://doi.org/10.1016/j.jclinepi.2006.08.011

DO - https://doi.org/10.1016/j.jclinepi.2006.08.011

M3 - Article

C2 - 17419960

SN - 0895-4356

VL - 60

SP - 491

EP - 501

JO - Journal of Clinical Epidemiology

JF - Journal of Clinical Epidemiology

IS - 5

ER -