TY - JOUR
T1 - Measuring the performance of prediction models to personalize treatment choice
AU - Efthimiou, Orestis
AU - Hoogland, Jeroen
AU - Debray, Thomas P. A.
AU - Seo, Michael
AU - Furukawa, Toshiaki A.
AU - Egger, Matthias
AU - White, Ian R.
N1 - Funding Information: information European Commission,Horizon 2020 Research and Innovation Programme, Medical Research Council, Grant/Award Number: Programme MC_UU_00004/07; Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung, Grant/Award Numbers: Ambizione grant number 180083; special project funding 189498; ZonMw, Grant/Award Number: grant 91215058OE, MS and ME were supported by the Swiss National Science Foundation (Ambizione grant number 180083, special project funding 189498). IW was supported by the Medical Research Council Programme MC_UU_00004/07. TD is supported by the European Union's Horizon 2020 research and innovation programme under ReCoDID grant agreement no. 825746. JH is supported by ZonMw (grant 91215058). Funding Information: OE, MS and ME were supported by the Swiss National Science Foundation (Ambizione grant number 180083, special project funding 189498). IW was supported by the Medical Research Council Programme MC_UU_00004/07. TD is supported by the European Union's Horizon 2020 research and innovation programme under ReCoDID grant agreement no. 825746. JH is supported by ZonMw (grant 91215058). Funding Information: European Commission,Horizon 2020 Research and Innovation Programme, Medical Research Council, Grant/Award Number: Programme MC_UU_00004/07; Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung, Grant/Award Numbers: Ambizione grant number 180083; special project funding 189498; ZonMw, Grant/Award Number: grant 91215058 Funding information Publisher Copyright: © 2023 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.
PY - 2023/4/15
Y1 - 2023/4/15
N2 - When data are available from individual patients receiving either a treatment or a control intervention in a randomized trial, various statistical and machine learning methods can be used to develop models for predicting future outcomes under the two conditions, and thus to predict treatment effect at the patient level. These predictions can subsequently guide personalized treatment choices. Although several methods for validating prediction models are available, little attention has been given to measuring the performance of predictions of personalized treatment effect. In this article, we propose a range of measures that can be used to this end. We start by defining two dimensions of model accuracy for treatment effects, for a single outcome: discrimination for benefit and calibration for benefit. We then amalgamate these two dimensions into an additional concept, decision accuracy, which quantifies the model's ability to identify patients for whom the benefit from treatment exceeds a given threshold. Subsequently, we propose a series of performance measures related to these dimensions and discuss estimating procedures, focusing on randomized data. Our methods are applicable for continuous or binary outcomes, for any type of prediction model, as long as it uses baseline covariates to predict outcomes under treatment and control. We illustrate all methods using two simulated datasets and a real dataset from a trial in depression. We implement all methods in the R package predieval. Results suggest that the proposed measures can be useful in evaluating and comparing the performance of competing models in predicting individualized treatment effect.
AB - When data are available from individual patients receiving either a treatment or a control intervention in a randomized trial, various statistical and machine learning methods can be used to develop models for predicting future outcomes under the two conditions, and thus to predict treatment effect at the patient level. These predictions can subsequently guide personalized treatment choices. Although several methods for validating prediction models are available, little attention has been given to measuring the performance of predictions of personalized treatment effect. In this article, we propose a range of measures that can be used to this end. We start by defining two dimensions of model accuracy for treatment effects, for a single outcome: discrimination for benefit and calibration for benefit. We then amalgamate these two dimensions into an additional concept, decision accuracy, which quantifies the model's ability to identify patients for whom the benefit from treatment exceeds a given threshold. Subsequently, we propose a series of performance measures related to these dimensions and discuss estimating procedures, focusing on randomized data. Our methods are applicable for continuous or binary outcomes, for any type of prediction model, as long as it uses baseline covariates to predict outcomes under treatment and control. We illustrate all methods using two simulated datasets and a real dataset from a trial in depression. We implement all methods in the R package predieval. Results suggest that the proposed measures can be useful in evaluating and comparing the performance of competing models in predicting individualized treatment effect.
KW - heterogeneous treatment effects
KW - personalized medicine
KW - prediction modelling
UR - http://www.scopus.com/inward/record.url?scp=85147289187&partnerID=8YFLogxK
U2 - https://doi.org/10.1002/sim.9665
DO - https://doi.org/10.1002/sim.9665
M3 - Article
C2 - 36700492
SN - 0277-6715
VL - 42
SP - 1188
EP - 1206
JO - Statistics in medicine
JF - Statistics in medicine
IS - 8
ER -