The effect of aggregating subtype performances depends strongly on the performance measure used

David M.J. Tax; Herman M.J. Sontrop; Marcel J.T. Reinders; Perry D. Moerland

doi:https://doi.org/10.1109/ICPR.2014.639

The effect of aggregating subtype performances depends strongly on the performance measure used

David M.J. Tax, Herman M.J. Sontrop, Marcel J.T. Reinders, Perry D. Moerland

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Academic › peer-review

Abstract

For some classification tasks the data can be partitioned into disjoint subsets based on some attribute, for example a disease subtype. It then seems logical to train a classifier with the same classes as the original classification problem for each subtype separately, such that the performance per subtype is optimized. Unfortunately, the influence of the subtype performances on the aggregated overall performance depends strongly on the performance measure used and can be very counterintuitive. We show that for some performance measures (e.g., classification accuracy, precision, recall, Fi) the aggregated performance is a simple linear combination of subtype performances. In these cases, improving the performance of a subtype-specific classifier implies that the overall performance improves. However, for other performance measures (e.g., balanced accuracy rate, area under the ROC curve) and also for performance measures in survival analysis (concordance index), additional cross terms appear in the aggregation of the subtype performances. These cross terms are heavily dependent on both the overall class imbalance and the subtype class imbalances. For these measures, improving subtype performances may actually result in a decrease of the overall performance.

Original language	English
Title of host publication	Proceedings - International Conference on Pattern Recognition
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	3720-3725
Number of pages	6
ISBN (Electronic)	9781479952083
DOIs	https://doi.org/10.1109/ICPR.2014.639
Publication status	Published - 4 Dec 2014
Event	22nd International Conference on Pattern Recognition, ICPR 2014 - Stockholm, Sweden Duration: 24 Aug 2014 → 28 Aug 2014

Publication series

Name	Proceedings - International Conference on Pattern Recognition

Conference

Conference	22nd International Conference on Pattern Recognition, ICPR 2014
Country/Territory	Sweden
City	Stockholm
Period	24/08/2014 → 28/08/2014

Keywords

AUC
Balanced accuracy rate
Class imbalance
Classifier performance evaluation
Error decomposition

Access to Document

https://doi.org/10.1109/ICPR.2014.639

Cite this

Tax, D. M. J., Sontrop, H. M. J., Reinders, M. J. T., & Moerland, P. D. (2014). The effect of aggregating subtype performances depends strongly on the performance measure used. In Proceedings - International Conference on Pattern Recognition (pp. 3720-3725). Article 6977351 (Proceedings - International Conference on Pattern Recognition). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICPR.2014.639

Tax, David M.J. ; Sontrop, Herman M.J. ; Reinders, Marcel J.T. et al. / The effect of aggregating subtype performances depends strongly on the performance measure used. Proceedings - International Conference on Pattern Recognition. Institute of Electrical and Electronics Engineers Inc., 2014. pp. 3720-3725 (Proceedings - International Conference on Pattern Recognition).

@inproceedings{a0bcafe92ac2424e85a8ddf9b92406f1,

title = "The effect of aggregating subtype performances depends strongly on the performance measure used",

abstract = "For some classification tasks the data can be partitioned into disjoint subsets based on some attribute, for example a disease subtype. It then seems logical to train a classifier with the same classes as the original classification problem for each subtype separately, such that the performance per subtype is optimized. Unfortunately, the influence of the subtype performances on the aggregated overall performance depends strongly on the performance measure used and can be very counterintuitive. We show that for some performance measures (e.g., classification accuracy, precision, recall, Fi) the aggregated performance is a simple linear combination of subtype performances. In these cases, improving the performance of a subtype-specific classifier implies that the overall performance improves. However, for other performance measures (e.g., balanced accuracy rate, area under the ROC curve) and also for performance measures in survival analysis (concordance index), additional cross terms appear in the aggregation of the subtype performances. These cross terms are heavily dependent on both the overall class imbalance and the subtype class imbalances. For these measures, improving subtype performances may actually result in a decrease of the overall performance.",

keywords = "AUC, Balanced accuracy rate, Class imbalance, Classifier performance evaluation, Error decomposition",

author = "Tax, {David M.J.} and Sontrop, {Herman M.J.} and Reinders, {Marcel J.T.} and Moerland, {Perry D.}",

year = "2014",

month = dec,

day = "4",

doi = "https://doi.org/10.1109/ICPR.2014.639",

language = "English",

series = "Proceedings - International Conference on Pattern Recognition",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "3720--3725",

booktitle = "Proceedings - International Conference on Pattern Recognition",

address = "United States",

note = "22nd International Conference on Pattern Recognition, ICPR 2014 ; Conference date: 24-08-2014 Through 28-08-2014",

}

Tax, DMJ, Sontrop, HMJ, Reinders, MJT & Moerland, PD 2014, The effect of aggregating subtype performances depends strongly on the performance measure used. in Proceedings - International Conference on Pattern Recognition., 6977351, Proceedings - International Conference on Pattern Recognition, Institute of Electrical and Electronics Engineers Inc., pp. 3720-3725, 22nd International Conference on Pattern Recognition, ICPR 2014, Stockholm, Sweden, 24/08/2014. https://doi.org/10.1109/ICPR.2014.639

The effect of aggregating subtype performances depends strongly on the performance measure used. / Tax, David M.J.; Sontrop, Herman M.J.; Reinders, Marcel J.T. et al.
Proceedings - International Conference on Pattern Recognition. Institute of Electrical and Electronics Engineers Inc., 2014. p. 3720-3725 6977351 (Proceedings - International Conference on Pattern Recognition).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Academic › peer-review

TY - GEN

T1 - The effect of aggregating subtype performances depends strongly on the performance measure used

AU - Tax, David M.J.

AU - Sontrop, Herman M.J.

AU - Reinders, Marcel J.T.

AU - Moerland, Perry D.

PY - 2014/12/4

Y1 - 2014/12/4

N2 - For some classification tasks the data can be partitioned into disjoint subsets based on some attribute, for example a disease subtype. It then seems logical to train a classifier with the same classes as the original classification problem for each subtype separately, such that the performance per subtype is optimized. Unfortunately, the influence of the subtype performances on the aggregated overall performance depends strongly on the performance measure used and can be very counterintuitive. We show that for some performance measures (e.g., classification accuracy, precision, recall, Fi) the aggregated performance is a simple linear combination of subtype performances. In these cases, improving the performance of a subtype-specific classifier implies that the overall performance improves. However, for other performance measures (e.g., balanced accuracy rate, area under the ROC curve) and also for performance measures in survival analysis (concordance index), additional cross terms appear in the aggregation of the subtype performances. These cross terms are heavily dependent on both the overall class imbalance and the subtype class imbalances. For these measures, improving subtype performances may actually result in a decrease of the overall performance.

AB - For some classification tasks the data can be partitioned into disjoint subsets based on some attribute, for example a disease subtype. It then seems logical to train a classifier with the same classes as the original classification problem for each subtype separately, such that the performance per subtype is optimized. Unfortunately, the influence of the subtype performances on the aggregated overall performance depends strongly on the performance measure used and can be very counterintuitive. We show that for some performance measures (e.g., classification accuracy, precision, recall, Fi) the aggregated performance is a simple linear combination of subtype performances. In these cases, improving the performance of a subtype-specific classifier implies that the overall performance improves. However, for other performance measures (e.g., balanced accuracy rate, area under the ROC curve) and also for performance measures in survival analysis (concordance index), additional cross terms appear in the aggregation of the subtype performances. These cross terms are heavily dependent on both the overall class imbalance and the subtype class imbalances. For these measures, improving subtype performances may actually result in a decrease of the overall performance.

KW - AUC

KW - Balanced accuracy rate

KW - Class imbalance

KW - Classifier performance evaluation

KW - Error decomposition

UR - http://www.scopus.com/inward/record.url?scp=84919934071&partnerID=8YFLogxK

U2 - https://doi.org/10.1109/ICPR.2014.639

DO - https://doi.org/10.1109/ICPR.2014.639

M3 - Conference contribution

T3 - Proceedings - International Conference on Pattern Recognition

SP - 3720

EP - 3725

BT - Proceedings - International Conference on Pattern Recognition

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 22nd International Conference on Pattern Recognition, ICPR 2014

Y2 - 24 August 2014 through 28 August 2014

ER -

Tax DMJ, Sontrop HMJ, Reinders MJT, Moerland PD. The effect of aggregating subtype performances depends strongly on the performance measure used. In Proceedings - International Conference on Pattern Recognition. Institute of Electrical and Electronics Engineers Inc. 2014. p. 3720-3725. 6977351. (Proceedings - International Conference on Pattern Recognition). doi: https://doi.org/10.1109/ICPR.2014.639

The effect of aggregating subtype performances depends strongly on the performance measure used

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Cite this