TY - JOUR
T1 - Implications of resampling data to address the class imbalance problem (IRCIP)
T2 - an evaluation of impact on performance between classification algorithms in medical data
AU - Welvaars, Koen
AU - Oosterhoff, Jacobien H. F.
AU - van den Bekerom, Michel P. J.
AU - Doornberg, Job N.
AU - OLVG Urology Consortium
AU - van Haarst, Ernst P.
AU - Machine Learning Consortium
AU - van der Zee, J. A.
AU - van Andel, G. A.
AU - Lagerveld, B. W.
AU - Hovius, M. C.
AU - Kauer, P. C.
AU - Boevé, L. M. S.
AU - van der Kuit, A.
AU - Mallee, W.
AU - Poolman, R.
N1 - Funding Information: This work was supported by the OLVG Urology Consortium. The funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication. Publisher Copyright: © The Author(s) 2023. Published by Oxford University Press on behalf of the American Medical Informatics Association.
PY - 2023/7/1
Y1 - 2023/7/1
N2 - Objective: When correcting for the “class imbalance” problem in medical data, the effects of resampling applied on classifier algorithms remain unclear. We examined the effect on performance over several combinations of classifiers and resampling ratios. Materials and Methods: Multiple classification algorithms were trained on 7 resampled datasets: no correction, random undersampling, 4 ratios of Synthetic Minority Oversampling Technique (SMOTE), and random oversampling with the Adaptive Synthetic algorithm (ADASYN). Performance was evaluated in Area Under the Curve (AUC), precision, recall, Brier score, and calibration metrics. A case study on prediction modeling for 30-day unplanned readmissions in previously admitted Urology patients was presented. Results: For most algorithms, using resampled data showed a significant increase in AUC and precision, ranging from 0.74 (CI: 0.69–0.79) to 0.93 (CI: 0.92–0.94), and 0.35 (CI: 0.12–0.58) to 0.86 (CI: 0.81–0.92) respectively. All classification algorithms showed significant increases in recall, and significant decreases in Brier score with distorted calibration overestimating positives. Discussion: Imbalance correction resulted in an overall improved performance, yet poorly calibrated models. There can still be clinical utility due to a strong discriminating performance, specifically when predicting only low and high risk cases is clinically more relevant. Conclusion: Resampling data resulted in increased performances in classification algorithms, yet produced an overestimation of positive predictions. Based on the findings from our case study, a thoughtful predefinition of the clinical prediction task may guide the use of resampling techniques in future studies aiming to improve clinical decision support tools.
AB - Objective: When correcting for the “class imbalance” problem in medical data, the effects of resampling applied on classifier algorithms remain unclear. We examined the effect on performance over several combinations of classifiers and resampling ratios. Materials and Methods: Multiple classification algorithms were trained on 7 resampled datasets: no correction, random undersampling, 4 ratios of Synthetic Minority Oversampling Technique (SMOTE), and random oversampling with the Adaptive Synthetic algorithm (ADASYN). Performance was evaluated in Area Under the Curve (AUC), precision, recall, Brier score, and calibration metrics. A case study on prediction modeling for 30-day unplanned readmissions in previously admitted Urology patients was presented. Results: For most algorithms, using resampled data showed a significant increase in AUC and precision, ranging from 0.74 (CI: 0.69–0.79) to 0.93 (CI: 0.92–0.94), and 0.35 (CI: 0.12–0.58) to 0.86 (CI: 0.81–0.92) respectively. All classification algorithms showed significant increases in recall, and significant decreases in Brier score with distorted calibration overestimating positives. Discussion: Imbalance correction resulted in an overall improved performance, yet poorly calibrated models. There can still be clinical utility due to a strong discriminating performance, specifically when predicting only low and high risk cases is clinically more relevant. Conclusion: Resampling data resulted in increased performances in classification algorithms, yet produced an overestimation of positive predictions. Based on the findings from our case study, a thoughtful predefinition of the clinical prediction task may guide the use of resampling techniques in future studies aiming to improve clinical decision support tools.
KW - ADASYN
KW - RUS
KW - SMOTE
KW - class imbalance
KW - classification algorithms
KW - resampling
UR - http://www.scopus.com/inward/record.url?scp=85163135477&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85163135477&partnerID=8YFLogxK
U2 - https://doi.org/10.1093/jamiaopen/ooad033
DO - https://doi.org/10.1093/jamiaopen/ooad033
M3 - Article
C2 - 37266187
SN - 2574-2531
VL - 6
SP - 1
EP - 9
JO - JAMIA Open
JF - JAMIA Open
IS - 2
M1 - ooad033
ER -