Objectives: To illustrate in-depth validation of prediction models developed on multicenter data. Methods: For each hospital in a multicenter registry, we evaluated predictive performance of a 30-day mortality prediction model for transcatheter aortic valve implantation (TAVI) using the Netherlands heart registration (NHR) dataset. We measured discrimination and calibration per hospital in a leave-center-out analysis (LCOA). Meta-analysis was used to calculate I2 values per performance metric from the LCOA and to compute mean and confidence interval (CI) estimates. Case mix differences between studies were inspected using the framework of Debray et al. for understanding external validation. We also aimed to discover subgroups (SGs) with high model prediction error (PE) and their distribution over the centers. Results: We studied 16 hospitals with 11,599 TAVI patients with an early mortality of 3.7%. The models’ area under the curve (AUCs) had a wide range between hospitals from 0.59 to 0.79, and miscalibration occurred in seven hospitals. Mean AUC from meta-analysis was 0.68 (95% CI 0.65-0.70). I2 values were 0%, 74%, and 0% for AUC, calibration intercept and slope, respectively. Between-hospital case-mix differences were substantial, and model transportability was low. One SG was discovered with marked global PE and was associated with poor performance on validation centers. Conclusion: The illustrated combination of approaches provides useful insights to inspect multicenter-based prediction models, and it exposes their limitations in transportability and performance variability when applied to different populations.
Original languageEnglish
Pages (from-to)13-21
Number of pages9
JournalJournal of Clinical Epidemiology
Publication statusPublished - 1 May 2023


  • Calibration
  • Discrimination
  • Heterogeneity
  • Multicenter
  • Prediction models
  • Subgroup discovery

Cite this