TY - GEN
T1 - The effect of aggregating subtype performances depends strongly on the performance measure used
AU - Tax, David M.J.
AU - Sontrop, Herman M.J.
AU - Reinders, Marcel J.T.
AU - Moerland, Perry D.
PY - 2014/12/4
Y1 - 2014/12/4
N2 - For some classification tasks the data can be partitioned into disjoint subsets based on some attribute, for example a disease subtype. It then seems logical to train a classifier with the same classes as the original classification problem for each subtype separately, such that the performance per subtype is optimized. Unfortunately, the influence of the subtype performances on the aggregated overall performance depends strongly on the performance measure used and can be very counterintuitive. We show that for some performance measures (e.g., classification accuracy, precision, recall, Fi) the aggregated performance is a simple linear combination of subtype performances. In these cases, improving the performance of a subtype-specific classifier implies that the overall performance improves. However, for other performance measures (e.g., balanced accuracy rate, area under the ROC curve) and also for performance measures in survival analysis (concordance index), additional cross terms appear in the aggregation of the subtype performances. These cross terms are heavily dependent on both the overall class imbalance and the subtype class imbalances. For these measures, improving subtype performances may actually result in a decrease of the overall performance.
AB - For some classification tasks the data can be partitioned into disjoint subsets based on some attribute, for example a disease subtype. It then seems logical to train a classifier with the same classes as the original classification problem for each subtype separately, such that the performance per subtype is optimized. Unfortunately, the influence of the subtype performances on the aggregated overall performance depends strongly on the performance measure used and can be very counterintuitive. We show that for some performance measures (e.g., classification accuracy, precision, recall, Fi) the aggregated performance is a simple linear combination of subtype performances. In these cases, improving the performance of a subtype-specific classifier implies that the overall performance improves. However, for other performance measures (e.g., balanced accuracy rate, area under the ROC curve) and also for performance measures in survival analysis (concordance index), additional cross terms appear in the aggregation of the subtype performances. These cross terms are heavily dependent on both the overall class imbalance and the subtype class imbalances. For these measures, improving subtype performances may actually result in a decrease of the overall performance.
KW - AUC
KW - Balanced accuracy rate
KW - Class imbalance
KW - Classifier performance evaluation
KW - Error decomposition
UR - http://www.scopus.com/inward/record.url?scp=84919934071&partnerID=8YFLogxK
U2 - https://doi.org/10.1109/ICPR.2014.639
DO - https://doi.org/10.1109/ICPR.2014.639
M3 - Conference contribution
T3 - Proceedings - International Conference on Pattern Recognition
SP - 3720
EP - 3725
BT - Proceedings - International Conference on Pattern Recognition
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 22nd International Conference on Pattern Recognition, ICPR 2014
Y2 - 24 August 2014 through 28 August 2014
ER -