Implications of resampling data to address the class imbalance problem (IRCIP): an evaluation of impact on performance between classification algorithms in medical data

OLVG Urology Consortium; Machine Learning Consortium

doi:https://doi.org/10.1093/jamiaopen/ooad033

Implications of resampling data to address the class imbalance problem (IRCIP): an evaluation of impact on performance between classification algorithms in medical data

OLVG Urology Consortium, Machine Learning Consortium

Urology (VUmc)

Research output: Contribution to journal › Article › Academic › peer-review

2 Citations (Scopus)

Abstract

Objective: When correcting for the “class imbalance” problem in medical data, the effects of resampling applied on classifier algorithms remain unclear. We examined the effect on performance over several combinations of classifiers and resampling ratios. Materials and Methods: Multiple classification algorithms were trained on 7 resampled datasets: no correction, random undersampling, 4 ratios of Synthetic Minority Oversampling Technique (SMOTE), and random oversampling with the Adaptive Synthetic algorithm (ADASYN). Performance was evaluated in Area Under the Curve (AUC), precision, recall, Brier score, and calibration metrics. A case study on prediction modeling for 30-day unplanned readmissions in previously admitted Urology patients was presented. Results: For most algorithms, using resampled data showed a significant increase in AUC and precision, ranging from 0.74 (CI: 0.69–0.79) to 0.93 (CI: 0.92–0.94), and 0.35 (CI: 0.12–0.58) to 0.86 (CI: 0.81–0.92) respectively. All classification algorithms showed significant increases in recall, and significant decreases in Brier score with distorted calibration overestimating positives. Discussion: Imbalance correction resulted in an overall improved performance, yet poorly calibrated models. There can still be clinical utility due to a strong discriminating performance, specifically when predicting only low and high risk cases is clinically more relevant. Conclusion: Resampling data resulted in increased performances in classification algorithms, yet produced an overestimation of positive predictions. Based on the findings from our case study, a thoughtful predefinition of the clinical prediction task may guide the use of resampling techniques in future studies aiming to improve clinical decision support tools.

Original language	English
Article number	ooad033
Pages (from-to)	1-9
Number of pages	9
Journal	JAMIA Open
Volume	6
Issue number	2
Early online date	31 May 2023
DOIs	https://doi.org/10.1093/jamiaopen/ooad033
Publication status	Published - 1 Jul 2023

Keywords

ADASYN
RUS
SMOTE
class imbalance
classification algorithms
resampling

Access to Document

https://doi.org/10.1093/jamiaopen/ooad033

Cite this

@article{ea0c1c59de92406ca4656dc12c5f10cb,

title = "Implications of resampling data to address the class imbalance problem (IRCIP): an evaluation of impact on performance between classification algorithms in medical data",

abstract = "Objective: When correcting for the “class imbalance” problem in medical data, the effects of resampling applied on classifier algorithms remain unclear. We examined the effect on performance over several combinations of classifiers and resampling ratios. Materials and Methods: Multiple classification algorithms were trained on 7 resampled datasets: no correction, random undersampling, 4 ratios of Synthetic Minority Oversampling Technique (SMOTE), and random oversampling with the Adaptive Synthetic algorithm (ADASYN). Performance was evaluated in Area Under the Curve (AUC), precision, recall, Brier score, and calibration metrics. A case study on prediction modeling for 30-day unplanned readmissions in previously admitted Urology patients was presented. Results: For most algorithms, using resampled data showed a significant increase in AUC and precision, ranging from 0.74 (CI: 0.69–0.79) to 0.93 (CI: 0.92–0.94), and 0.35 (CI: 0.12–0.58) to 0.86 (CI: 0.81–0.92) respectively. All classification algorithms showed significant increases in recall, and significant decreases in Brier score with distorted calibration overestimating positives. Discussion: Imbalance correction resulted in an overall improved performance, yet poorly calibrated models. There can still be clinical utility due to a strong discriminating performance, specifically when predicting only low and high risk cases is clinically more relevant. Conclusion: Resampling data resulted in increased performances in classification algorithms, yet produced an overestimation of positive predictions. Based on the findings from our case study, a thoughtful predefinition of the clinical prediction task may guide the use of resampling techniques in future studies aiming to improve clinical decision support tools.",

keywords = "ADASYN, RUS, SMOTE, class imbalance, classification algorithms, resampling",

author = "Koen Welvaars and Oosterhoff, {Jacobien H. F.} and {van den Bekerom}, {Michel P. J.} and Doornberg, {Job N.} and {OLVG Urology Consortium} and {van Haarst}, {Ernst P.} and {Machine Learning Consortium} and {van der Zee}, {J. A.} and {van Andel}, {G. A.} and Lagerveld, {B. W.} and Hovius, {M. C.} and Kauer, {P. C.} and Boev{\'e}, {L. M. S.} and {van der Kuit}, A. and W. Mallee and R. Poolman",

note = "Funding Information: This work was supported by the OLVG Urology Consortium. The funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication. Publisher Copyright: {\textcopyright} The Author(s) 2023. Published by Oxford University Press on behalf of the American Medical Informatics Association.",

year = "2023",

month = jul,

day = "1",

doi = "https://doi.org/10.1093/jamiaopen/ooad033",

language = "English",

volume = "6",

pages = "1--9",

journal = "JAMIA Open",

issn = "2574-2531",

publisher = "Oxford University Press",

number = "2",

}

TY - JOUR

T1 - Implications of resampling data to address the class imbalance problem (IRCIP)

T2 - an evaluation of impact on performance between classification algorithms in medical data

AU - Welvaars, Koen

AU - Oosterhoff, Jacobien H. F.

AU - van den Bekerom, Michel P. J.

AU - Doornberg, Job N.

AU - OLVG Urology Consortium

AU - van Haarst, Ernst P.

AU - Machine Learning Consortium

AU - van der Zee, J. A.

AU - van Andel, G. A.

AU - Lagerveld, B. W.

AU - Hovius, M. C.

AU - Kauer, P. C.

AU - Boevé, L. M. S.

AU - van der Kuit, A.

AU - Mallee, W.

AU - Poolman, R.

N1 - Funding Information: This work was supported by the OLVG Urology Consortium. The funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication. Publisher Copyright: © The Author(s) 2023. Published by Oxford University Press on behalf of the American Medical Informatics Association.

PY - 2023/7/1

Y1 - 2023/7/1

N2 - Objective: When correcting for the “class imbalance” problem in medical data, the effects of resampling applied on classifier algorithms remain unclear. We examined the effect on performance over several combinations of classifiers and resampling ratios. Materials and Methods: Multiple classification algorithms were trained on 7 resampled datasets: no correction, random undersampling, 4 ratios of Synthetic Minority Oversampling Technique (SMOTE), and random oversampling with the Adaptive Synthetic algorithm (ADASYN). Performance was evaluated in Area Under the Curve (AUC), precision, recall, Brier score, and calibration metrics. A case study on prediction modeling for 30-day unplanned readmissions in previously admitted Urology patients was presented. Results: For most algorithms, using resampled data showed a significant increase in AUC and precision, ranging from 0.74 (CI: 0.69–0.79) to 0.93 (CI: 0.92–0.94), and 0.35 (CI: 0.12–0.58) to 0.86 (CI: 0.81–0.92) respectively. All classification algorithms showed significant increases in recall, and significant decreases in Brier score with distorted calibration overestimating positives. Discussion: Imbalance correction resulted in an overall improved performance, yet poorly calibrated models. There can still be clinical utility due to a strong discriminating performance, specifically when predicting only low and high risk cases is clinically more relevant. Conclusion: Resampling data resulted in increased performances in classification algorithms, yet produced an overestimation of positive predictions. Based on the findings from our case study, a thoughtful predefinition of the clinical prediction task may guide the use of resampling techniques in future studies aiming to improve clinical decision support tools.

AB - Objective: When correcting for the “class imbalance” problem in medical data, the effects of resampling applied on classifier algorithms remain unclear. We examined the effect on performance over several combinations of classifiers and resampling ratios. Materials and Methods: Multiple classification algorithms were trained on 7 resampled datasets: no correction, random undersampling, 4 ratios of Synthetic Minority Oversampling Technique (SMOTE), and random oversampling with the Adaptive Synthetic algorithm (ADASYN). Performance was evaluated in Area Under the Curve (AUC), precision, recall, Brier score, and calibration metrics. A case study on prediction modeling for 30-day unplanned readmissions in previously admitted Urology patients was presented. Results: For most algorithms, using resampled data showed a significant increase in AUC and precision, ranging from 0.74 (CI: 0.69–0.79) to 0.93 (CI: 0.92–0.94), and 0.35 (CI: 0.12–0.58) to 0.86 (CI: 0.81–0.92) respectively. All classification algorithms showed significant increases in recall, and significant decreases in Brier score with distorted calibration overestimating positives. Discussion: Imbalance correction resulted in an overall improved performance, yet poorly calibrated models. There can still be clinical utility due to a strong discriminating performance, specifically when predicting only low and high risk cases is clinically more relevant. Conclusion: Resampling data resulted in increased performances in classification algorithms, yet produced an overestimation of positive predictions. Based on the findings from our case study, a thoughtful predefinition of the clinical prediction task may guide the use of resampling techniques in future studies aiming to improve clinical decision support tools.

KW - ADASYN

KW - RUS

KW - SMOTE

KW - class imbalance

KW - classification algorithms

KW - resampling

UR - http://www.scopus.com/inward/record.url?scp=85163135477&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85163135477&partnerID=8YFLogxK

U2 - https://doi.org/10.1093/jamiaopen/ooad033

DO - https://doi.org/10.1093/jamiaopen/ooad033

M3 - Article

C2 - 37266187

SN - 2574-2531

VL - 6

SP - 1

EP - 9

JO - JAMIA Open

JF - JAMIA Open

IS - 2

M1 - ooad033

ER -

Implications of resampling data to address the class imbalance problem (IRCIP): an evaluation of impact on performance between classification algorithms in medical data

Abstract

Keywords

Access to Document

Other files and links

Cite this