Missing data in a multi-item instrument were best handled by multiple imputation at the item score level

I. Eekhout; H.C.W. de Vet; J.W.R. Twisk; J.P.L. Brand; M.R. de Boer; M.W. Heymans

doi:https://doi.org/10.1016/j.jclinepi.2013.09.009

Missing data in a multi-item instrument were best handled by multiple imputation at the item score level

I. Eekhout, H.C.W. de Vet, J.W.R. Twisk, J.P.L. Brand, M.R. de Boer, M.W. Heymans

Epidemiology and Data Science (VUmc)

Research output: Contribution to journal › Article › Academic › peer-review

184 Citations (Scopus)

Abstract

Objectives Regardless of the proportion of missing values, complete-case analysis is most frequently applied, although advanced techniques such as multiple imputation (MI) are available. The objective of this study was to explore the performance of simple and more advanced methods for handling missing data in cases when some, many, or all item scores are missing in a multi-item instrument. Study Design and Setting Real-life missing data situations were simulated in a multi-item variable used as a covariate in a linear regression model. Various missing data mechanisms were simulated with an increasing percentage of missing data. Subsequently, several techniques to handle missing data were applied to decide on the most optimal technique for each scenario. Fitted regression coefficients were compared using the bias and coverage as performance parameters. Results Mean imputation caused biased estimates in every missing data scenario when data are missing for more than 10% of the subjects. Furthermore, when a large percentage of subjects had missing items (>25%), MI methods applied to the items outperformed methods applied to the total score. Conclusion We recommend applying MI to the item scores to get the most accurate regression model estimates. Moreover, we advise not to use any form of mean imputation to handle missing data.© 2014 Elsevier Inc. All rights reserved.

Original language	English
Pages (from-to)	335-342
Journal	Journal of Clinical Epidemiology
Volume	67
Issue number	3
DOIs	https://doi.org/10.1016/j.jclinepi.2013.09.009
Publication status	Published - 2014

Access to Document

https://doi.org/10.1016/j.jclinepi.2013.09.009

Cite this

@article{5468f71b67e84f90909d4b370b069100,

title = "Missing data in a multi-item instrument were best handled by multiple imputation at the item score level",

abstract = "Objectives Regardless of the proportion of missing values, complete-case analysis is most frequently applied, although advanced techniques such as multiple imputation (MI) are available. The objective of this study was to explore the performance of simple and more advanced methods for handling missing data in cases when some, many, or all item scores are missing in a multi-item instrument. Study Design and Setting Real-life missing data situations were simulated in a multi-item variable used as a covariate in a linear regression model. Various missing data mechanisms were simulated with an increasing percentage of missing data. Subsequently, several techniques to handle missing data were applied to decide on the most optimal technique for each scenario. Fitted regression coefficients were compared using the bias and coverage as performance parameters. Results Mean imputation caused biased estimates in every missing data scenario when data are missing for more than 10% of the subjects. Furthermore, when a large percentage of subjects had missing items (>25%), MI methods applied to the items outperformed methods applied to the total score. Conclusion We recommend applying MI to the item scores to get the most accurate regression model estimates. Moreover, we advise not to use any form of mean imputation to handle missing data.{\textcopyright} 2014 Elsevier Inc. All rights reserved.",

author = "I. Eekhout and {de Vet}, H.C.W. and J.W.R. Twisk and J.P.L. Brand and {de Boer}, M.R. and M.W. Heymans",

year = "2014",

doi = "https://doi.org/10.1016/j.jclinepi.2013.09.009",

language = "English",

volume = "67",

pages = "335--342",

journal = "Journal of Clinical Epidemiology",

issn = "0895-4356",

publisher = "Elsevier USA",

number = "3",

}

TY - JOUR

T1 - Missing data in a multi-item instrument were best handled by multiple imputation at the item score level

AU - Eekhout, I.

AU - de Vet, H.C.W.

AU - Twisk, J.W.R.

AU - Brand, J.P.L.

AU - de Boer, M.R.

AU - Heymans, M.W.

PY - 2014

Y1 - 2014

N2 - Objectives Regardless of the proportion of missing values, complete-case analysis is most frequently applied, although advanced techniques such as multiple imputation (MI) are available. The objective of this study was to explore the performance of simple and more advanced methods for handling missing data in cases when some, many, or all item scores are missing in a multi-item instrument. Study Design and Setting Real-life missing data situations were simulated in a multi-item variable used as a covariate in a linear regression model. Various missing data mechanisms were simulated with an increasing percentage of missing data. Subsequently, several techniques to handle missing data were applied to decide on the most optimal technique for each scenario. Fitted regression coefficients were compared using the bias and coverage as performance parameters. Results Mean imputation caused biased estimates in every missing data scenario when data are missing for more than 10% of the subjects. Furthermore, when a large percentage of subjects had missing items (>25%), MI methods applied to the items outperformed methods applied to the total score. Conclusion We recommend applying MI to the item scores to get the most accurate regression model estimates. Moreover, we advise not to use any form of mean imputation to handle missing data.© 2014 Elsevier Inc. All rights reserved.

AB - Objectives Regardless of the proportion of missing values, complete-case analysis is most frequently applied, although advanced techniques such as multiple imputation (MI) are available. The objective of this study was to explore the performance of simple and more advanced methods for handling missing data in cases when some, many, or all item scores are missing in a multi-item instrument. Study Design and Setting Real-life missing data situations were simulated in a multi-item variable used as a covariate in a linear regression model. Various missing data mechanisms were simulated with an increasing percentage of missing data. Subsequently, several techniques to handle missing data were applied to decide on the most optimal technique for each scenario. Fitted regression coefficients were compared using the bias and coverage as performance parameters. Results Mean imputation caused biased estimates in every missing data scenario when data are missing for more than 10% of the subjects. Furthermore, when a large percentage of subjects had missing items (>25%), MI methods applied to the items outperformed methods applied to the total score. Conclusion We recommend applying MI to the item scores to get the most accurate regression model estimates. Moreover, we advise not to use any form of mean imputation to handle missing data.© 2014 Elsevier Inc. All rights reserved.

U2 - https://doi.org/10.1016/j.jclinepi.2013.09.009

DO - https://doi.org/10.1016/j.jclinepi.2013.09.009

M3 - Article

C2 - 24291505

SN - 0895-4356

VL - 67

SP - 335

EP - 342

JO - Journal of Clinical Epidemiology

JF - Journal of Clinical Epidemiology

IS - 3

ER -