The effect of aggregating subtype performances depends strongly on the performance measure used

David M.J. Tax, Herman M.J. Sontrop, Marcel J.T. Reinders, Perry D. Moerland

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review


For some classification tasks the data can be partitioned into disjoint subsets based on some attribute, for example a disease subtype. It then seems logical to train a classifier with the same classes as the original classification problem for each subtype separately, such that the performance per subtype is optimized. Unfortunately, the influence of the subtype performances on the aggregated overall performance depends strongly on the performance measure used and can be very counterintuitive. We show that for some performance measures (e.g., classification accuracy, precision, recall, Fi) the aggregated performance is a simple linear combination of subtype performances. In these cases, improving the performance of a subtype-specific classifier implies that the overall performance improves. However, for other performance measures (e.g., balanced accuracy rate, area under the ROC curve) and also for performance measures in survival analysis (concordance index), additional cross terms appear in the aggregation of the subtype performances. These cross terms are heavily dependent on both the overall class imbalance and the subtype class imbalances. For these measures, improving subtype performances may actually result in a decrease of the overall performance.

Original languageEnglish
Title of host publicationProceedings - International Conference on Pattern Recognition
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages6
ISBN (Electronic)9781479952083
Publication statusPublished - 4 Dec 2014
Event22nd International Conference on Pattern Recognition, ICPR 2014 - Stockholm, Sweden
Duration: 24 Aug 201428 Aug 2014

Publication series

NameProceedings - International Conference on Pattern Recognition


Conference22nd International Conference on Pattern Recognition, ICPR 2014


  • AUC
  • Balanced accuracy rate
  • Class imbalance
  • Classifier performance evaluation
  • Error decomposition

Cite this