Handling missing predictor values when validating and applying a prediction model to new patients

J. Hoogland; Marit van Barreveld; Thomas P.A. Debray; Johannes B. Reitsma; Tom E. Verstraelen; Marcel G.W. Dijkgraaf; Aeilko H. Zwinderman

doi:https://doi.org/10.1002/sim.8682

Handling missing predictor values when validating and applying a prediction model to new patients

J. Hoogland, Marit van Barreveld, Thomas P.A. Debray, Johannes B. Reitsma, Tom E. Verstraelen, Marcel G.W. Dijkgraaf, Aeilko H. Zwinderman

Research output: Contribution to journal › Article › Academic › peer-review

27 Citations (Scopus)

Abstract

Missing data present challenges for development and real-world application of clinical prediction models. While these challenges have received considerable attention in the development setting, there is only sparse research on the handling of missing data in applied settings. The main unique feature of handling missing data in these settings is that missing data methods have to be performed for a single new individual, precluding direct application of mainstay methods used during model development. Correspondingly, we propose that it is desirable to perform model validation using missing data methods that transfer to practice in single new patients. This article compares existing and new methods to account for missing data for a new individual in the context of prediction. These methods are based on (i) submodels based on observed data only, (ii) marginalization over the missing variables, or (iii) imputation based on fully conditional specification (also known as chained equations). They were compared in an internal validation setting to highlight the use of missing data methods that transfer to practice while validating a model. As a reference, they were compared to the use of multiple imputation by chained equations in a set of test patients, because this has been used in validation studies in the past. The methods were evaluated in a simulation study where performance was measured by means of optimism corrected C-statistic and mean squared prediction error. Furthermore, they were applied in data from a large Dutch cohort of prophylactic implantable cardioverter defibrillator patients.

Original language	English
Pages (from-to)	3591-3607
Number of pages	17
Journal	Statistics in medicine
Volume	39
Issue number	25
Early online date	2020
DOIs	https://doi.org/10.1002/sim.8682
Publication status	Published - 10 Nov 2020

Keywords

clinical prediction modeling
missing data
real-world application
validation

Access to Document

https://doi.org/10.1002/sim.8682

Cite this

@article{c4ea7c2ba01143d0a49d60cd1d9cdeca,

title = "Handling missing predictor values when validating and applying a prediction model to new patients",

abstract = "Missing data present challenges for development and real-world application of clinical prediction models. While these challenges have received considerable attention in the development setting, there is only sparse research on the handling of missing data in applied settings. The main unique feature of handling missing data in these settings is that missing data methods have to be performed for a single new individual, precluding direct application of mainstay methods used during model development. Correspondingly, we propose that it is desirable to perform model validation using missing data methods that transfer to practice in single new patients. This article compares existing and new methods to account for missing data for a new individual in the context of prediction. These methods are based on (i) submodels based on observed data only, (ii) marginalization over the missing variables, or (iii) imputation based on fully conditional specification (also known as chained equations). They were compared in an internal validation setting to highlight the use of missing data methods that transfer to practice while validating a model. As a reference, they were compared to the use of multiple imputation by chained equations in a set of test patients, because this has been used in validation studies in the past. The methods were evaluated in a simulation study where performance was measured by means of optimism corrected C-statistic and mean squared prediction error. Furthermore, they were applied in data from a large Dutch cohort of prophylactic implantable cardioverter defibrillator patients.",

keywords = "clinical prediction modeling, missing data, real-world application, validation",

author = "J. Hoogland and {van Barreveld}, Marit and Debray, {Thomas P.A.} and Reitsma, {Johannes B.} and Verstraelen, {Tom E.} and Dijkgraaf, {Marcel G.W.} and Zwinderman, {Aeilko H.}",

note = "Funding Information: information ZonMw (The Netherlands Organisation for Health Research and Development), 91617050; Zorginstituut Nederland (Dutch National Health Care Institute), 837004009JH and JBR acknowledge financial support from the Netherlands Organisation for Health Research and Development (grant 91215058). TD acknowledges financial support from the Netherlands Organisation for Health Research and Development (grant 91617050) and the Dutch Heart foundation (grant 2018B006). We want to thank Arthur Wilde, MD. Professor, Amsterdam UMC, for providing the DO-IT Registry data and for his comments on the manuscript. This research was supported by The Netherlands Organisation for Health Research and Development (ZonMw; grant number 91617050) and Dutch National Health Care Institute (Zorginstituut Nederland; grant number 837004009). Funding Information: JH and JBR acknowledge financial support from the Netherlands Organisation for Health Research and Development (grant 91215058). TD acknowledges financial support from the Netherlands Organisation for Health Research and Development (grant 91617050) and the Dutch Heart foundation (grant 2018B006). We want to thank Arthur Wilde, MD. Professor, Amsterdam UMC, for providing the DO‐IT Registry data and for his comments on the manuscript. This research was supported by The Netherlands Organisation for Health Research and Development (ZonMw; grant number 91617050) and Dutch National Health Care Institute (Zorginstituut Nederland; grant number 837004009). Publisher Copyright: {\textcopyright} 2020 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd.",

year = "2020",

month = nov,

day = "10",

doi = "https://doi.org/10.1002/sim.8682",

language = "English",

volume = "39",

pages = "3591--3607",

journal = "Statistics in medicine",

issn = "0277-6715",

publisher = "John Wiley and Sons Ltd",

number = "25",

}

TY - JOUR

T1 - Handling missing predictor values when validating and applying a prediction model to new patients

AU - Hoogland, J.

AU - van Barreveld, Marit

AU - Debray, Thomas P.A.

AU - Reitsma, Johannes B.

AU - Verstraelen, Tom E.

AU - Dijkgraaf, Marcel G.W.

AU - Zwinderman, Aeilko H.

N1 - Funding Information: information ZonMw (The Netherlands Organisation for Health Research and Development), 91617050; Zorginstituut Nederland (Dutch National Health Care Institute), 837004009JH and JBR acknowledge financial support from the Netherlands Organisation for Health Research and Development (grant 91215058). TD acknowledges financial support from the Netherlands Organisation for Health Research and Development (grant 91617050) and the Dutch Heart foundation (grant 2018B006). We want to thank Arthur Wilde, MD. Professor, Amsterdam UMC, for providing the DO-IT Registry data and for his comments on the manuscript. This research was supported by The Netherlands Organisation for Health Research and Development (ZonMw; grant number 91617050) and Dutch National Health Care Institute (Zorginstituut Nederland; grant number 837004009). Funding Information: JH and JBR acknowledge financial support from the Netherlands Organisation for Health Research and Development (grant 91215058). TD acknowledges financial support from the Netherlands Organisation for Health Research and Development (grant 91617050) and the Dutch Heart foundation (grant 2018B006). We want to thank Arthur Wilde, MD. Professor, Amsterdam UMC, for providing the DO‐IT Registry data and for his comments on the manuscript. This research was supported by The Netherlands Organisation for Health Research and Development (ZonMw; grant number 91617050) and Dutch National Health Care Institute (Zorginstituut Nederland; grant number 837004009). Publisher Copyright: © 2020 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd.

PY - 2020/11/10

Y1 - 2020/11/10

N2 - Missing data present challenges for development and real-world application of clinical prediction models. While these challenges have received considerable attention in the development setting, there is only sparse research on the handling of missing data in applied settings. The main unique feature of handling missing data in these settings is that missing data methods have to be performed for a single new individual, precluding direct application of mainstay methods used during model development. Correspondingly, we propose that it is desirable to perform model validation using missing data methods that transfer to practice in single new patients. This article compares existing and new methods to account for missing data for a new individual in the context of prediction. These methods are based on (i) submodels based on observed data only, (ii) marginalization over the missing variables, or (iii) imputation based on fully conditional specification (also known as chained equations). They were compared in an internal validation setting to highlight the use of missing data methods that transfer to practice while validating a model. As a reference, they were compared to the use of multiple imputation by chained equations in a set of test patients, because this has been used in validation studies in the past. The methods were evaluated in a simulation study where performance was measured by means of optimism corrected C-statistic and mean squared prediction error. Furthermore, they were applied in data from a large Dutch cohort of prophylactic implantable cardioverter defibrillator patients.

AB - Missing data present challenges for development and real-world application of clinical prediction models. While these challenges have received considerable attention in the development setting, there is only sparse research on the handling of missing data in applied settings. The main unique feature of handling missing data in these settings is that missing data methods have to be performed for a single new individual, precluding direct application of mainstay methods used during model development. Correspondingly, we propose that it is desirable to perform model validation using missing data methods that transfer to practice in single new patients. This article compares existing and new methods to account for missing data for a new individual in the context of prediction. These methods are based on (i) submodels based on observed data only, (ii) marginalization over the missing variables, or (iii) imputation based on fully conditional specification (also known as chained equations). They were compared in an internal validation setting to highlight the use of missing data methods that transfer to practice while validating a model. As a reference, they were compared to the use of multiple imputation by chained equations in a set of test patients, because this has been used in validation studies in the past. The methods were evaluated in a simulation study where performance was measured by means of optimism corrected C-statistic and mean squared prediction error. Furthermore, they were applied in data from a large Dutch cohort of prophylactic implantable cardioverter defibrillator patients.

KW - clinical prediction modeling

KW - missing data

KW - real-world application

KW - validation

UR - http://www.scopus.com/inward/record.url?scp=85088169477&partnerID=8YFLogxK

U2 - https://doi.org/10.1002/sim.8682

DO - https://doi.org/10.1002/sim.8682

M3 - Article

C2 - 32687233

SN - 0277-6715

VL - 39

SP - 3591

EP - 3607

JO - Statistics in medicine

JF - Statistics in medicine

IS - 25

ER -

Handling missing predictor values when validating and applying a prediction model to new patients

Abstract

Keywords

Access to Document

Other files and links

Cite this