An omics-based machine learning approach to predict diabetes progression: a RHAPSODY study

Roderick C. Slieker; Magnus Münch; Louise A. Donnelly; Gerard A. Bouland; Iulian Dragan; Dmitry Kuznetsov; Petra J.M. Elders; Guy A. Rutter; Mark Ibberson; Ewan R. Pearson; Leen M. ’t Hart; Mark A. van de Wiel; Joline W.J. Beulens

doi:10.1007/s00125-024-06105-8

An omics-based machine learning approach to predict diabetes progression: a RHAPSODY study

Roderick C. Slieker, Magnus Münch, Louise A. Donnelly, Gerard A. Bouland, Iulian Dragan, Dmitry Kuznetsov, Petra J.M. Elders, Guy A. Rutter, Mark Ibberson, Ewan R. Pearson, Leen M. ’t Hart, Mark A. van de Wiel, Joline W.J. Beulens

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

Aims/hypothesis: People with type 2 diabetes are heterogeneous in their disease trajectory, with some progressing more quickly to insulin initiation than others. Although classical biomarkers such as age, HbA_1c and diabetes duration are associated with glycaemic progression, it is unclear how well such variables predict insulin initiation or requirement and whether newly identified markers have added predictive value. Methods: In two prospective cohort studies as part of IMI-RHAPSODY, we investigated whether clinical variables and three types of molecular markers (metabolites, lipids, proteins) can predict time to insulin requirement using different machine learning approaches (lasso, ridge, GRridge, random forest). Clinical variables included age, sex, HbA_1c, HDL-cholesterol and C-peptide. Models were run with unpenalised clinical variables (i.e. always included in the model without weights) or penalised clinical variables, or without clinical variables. Model development was performed in one cohort and the model was applied in a second cohort. Model performance was evaluated using Harrel’s C statistic. Results: Of the 585 individuals from the Hoorn Diabetes Care System (DCS) cohort, 69 required insulin during follow-up (1.0–11.4 years); of the 571 individuals in the Genetics of Diabetes Audit and Research in Tayside Scotland (GoDARTS) cohort, 175 required insulin during follow-up (0.3–11.8 years). Overall, the clinical variables and proteins were selected in the different models most often, followed by the metabolites. The most frequently selected clinical variables were HbA_1c (18 of the 36 models, 50%), age (15 models, 41.2%) and C-peptide (15 models, 41.2%). Base models (age, sex, BMI, HbA_1c) including only clinical variables performed moderately in both the DCS discovery cohort (C statistic 0.71 [95% CI 0.64, 0.79]) and the GoDARTS replication cohort (C 0.71 [95% CI 0.69, 0.75]). A more extensive model including HDL-cholesterol and C-peptide performed better in both cohorts (DCS, C 0.74 [95% CI 0.67, 0.81]; GoDARTS, C 0.73 [95% CI 0.69, 0.77]). Two proteins, lactadherin and proto-oncogene tyrosine-protein kinase receptor, were most consistently selected and slightly improved model performance. Conclusions/interpretation: Using machine learning approaches, we show that insulin requirement risk can be modestly well predicted by predominantly clinical variables. Inclusion of molecular markers improves the prognostic performance beyond that of clinical variables by up to 5%. Such prognostic models could be useful for identifying people with diabetes at high risk of progressing quickly to treatment intensification. Data availability: Summary statistics of lipidomic, proteomic and metabolomic data are available from a Shiny dashboard at https://rhapdata-app.vital-it.ch. Graphical Abstract: (Figure presented.).

Original language	English
Pages (from-to)	885-894
Number of pages	10
Journal	Diabetologia
Volume	67
Issue number	5
DOIs	https://doi.org/10.1007/s00125-024-06105-8
Publication status	Accepted/In press - 2024

Keywords

Machine learning
Prediction model
Progression
Type 2 diabetes

Access to Document

10.1007/s00125-024-06105-8

Cite this

Slieker, R. C., Münch, M., Donnelly, L. A., Bouland, G. A., Dragan, I., Kuznetsov, D., Elders, P. J. M., Rutter, G. A., Ibberson, M., Pearson, E. R., ’t Hart, L. M., van de Wiel, M. A., & Beulens, J. W. J. (Accepted/In press). An omics-based machine learning approach to predict diabetes progression: a RHAPSODY study. Diabetologia, 67(5), 885-894. https://doi.org/10.1007/s00125-024-06105-8

@article{ff5831df92ee4479a5a56c6870072a4a,

title = "An omics-based machine learning approach to predict diabetes progression: a RHAPSODY study",

abstract = "Aims/hypothesis: People with type 2 diabetes are heterogeneous in their disease trajectory, with some progressing more quickly to insulin initiation than others. Although classical biomarkers such as age, HbA1c and diabetes duration are associated with glycaemic progression, it is unclear how well such variables predict insulin initiation or requirement and whether newly identified markers have added predictive value. Methods: In two prospective cohort studies as part of IMI-RHAPSODY, we investigated whether clinical variables and three types of molecular markers (metabolites, lipids, proteins) can predict time to insulin requirement using different machine learning approaches (lasso, ridge, GRridge, random forest). Clinical variables included age, sex, HbA1c, HDL-cholesterol and C-peptide. Models were run with unpenalised clinical variables (i.e. always included in the model without weights) or penalised clinical variables, or without clinical variables. Model development was performed in one cohort and the model was applied in a second cohort. Model performance was evaluated using Harrel{\textquoteright}s C statistic. Results: Of the 585 individuals from the Hoorn Diabetes Care System (DCS) cohort, 69 required insulin during follow-up (1.0–11.4 years); of the 571 individuals in the Genetics of Diabetes Audit and Research in Tayside Scotland (GoDARTS) cohort, 175 required insulin during follow-up (0.3–11.8 years). Overall, the clinical variables and proteins were selected in the different models most often, followed by the metabolites. The most frequently selected clinical variables were HbA1c (18 of the 36 models, 50%), age (15 models, 41.2%) and C-peptide (15 models, 41.2%). Base models (age, sex, BMI, HbA1c) including only clinical variables performed moderately in both the DCS discovery cohort (C statistic 0.71 [95% CI 0.64, 0.79]) and the GoDARTS replication cohort (C 0.71 [95% CI 0.69, 0.75]). A more extensive model including HDL-cholesterol and C-peptide performed better in both cohorts (DCS, C 0.74 [95% CI 0.67, 0.81]; GoDARTS, C 0.73 [95% CI 0.69, 0.77]). Two proteins, lactadherin and proto-oncogene tyrosine-protein kinase receptor, were most consistently selected and slightly improved model performance. Conclusions/interpretation: Using machine learning approaches, we show that insulin requirement risk can be modestly well predicted by predominantly clinical variables. Inclusion of molecular markers improves the prognostic performance beyond that of clinical variables by up to 5%. Such prognostic models could be useful for identifying people with diabetes at high risk of progressing quickly to treatment intensification. Data availability: Summary statistics of lipidomic, proteomic and metabolomic data are available from a Shiny dashboard at https://rhapdata-app.vital-it.ch. Graphical Abstract: (Figure presented.).",

keywords = "Machine learning, Prediction model, Progression, Type 2 diabetes",

author = "Slieker, {Roderick C.} and Magnus M{\"u}nch and Donnelly, {Louise A.} and Bouland, {Gerard A.} and Iulian Dragan and Dmitry Kuznetsov and Elders, {Petra J.M.} and Rutter, {Guy A.} and Mark Ibberson and Pearson, {Ewan R.} and {{\textquoteright}t Hart}, {Leen M.} and {van de Wiel}, {Mark A.} and Beulens, {Joline W.J.}",

note = "Publisher Copyright: {\textcopyright} The Author(s) 2024.",

year = "2024",

doi = "10.1007/s00125-024-06105-8",

language = "English",

volume = "67",

pages = "885--894",

journal = "Diabetologia",

issn = "0012-186X",

publisher = "Springer Verlag",

number = "5",

}

TY - JOUR

T1 - An omics-based machine learning approach to predict diabetes progression

T2 - a RHAPSODY study

AU - Slieker, Roderick C.

AU - Münch, Magnus

AU - Donnelly, Louise A.

AU - Bouland, Gerard A.

AU - Dragan, Iulian

AU - Kuznetsov, Dmitry

AU - Elders, Petra J.M.

AU - Rutter, Guy A.

AU - Ibberson, Mark

AU - Pearson, Ewan R.

AU - ’t Hart, Leen M.

AU - van de Wiel, Mark A.

AU - Beulens, Joline W.J.

PY - 2024

Y1 - 2024

N2 - Aims/hypothesis: People with type 2 diabetes are heterogeneous in their disease trajectory, with some progressing more quickly to insulin initiation than others. Although classical biomarkers such as age, HbA1c and diabetes duration are associated with glycaemic progression, it is unclear how well such variables predict insulin initiation or requirement and whether newly identified markers have added predictive value. Methods: In two prospective cohort studies as part of IMI-RHAPSODY, we investigated whether clinical variables and three types of molecular markers (metabolites, lipids, proteins) can predict time to insulin requirement using different machine learning approaches (lasso, ridge, GRridge, random forest). Clinical variables included age, sex, HbA1c, HDL-cholesterol and C-peptide. Models were run with unpenalised clinical variables (i.e. always included in the model without weights) or penalised clinical variables, or without clinical variables. Model development was performed in one cohort and the model was applied in a second cohort. Model performance was evaluated using Harrel’s C statistic. Results: Of the 585 individuals from the Hoorn Diabetes Care System (DCS) cohort, 69 required insulin during follow-up (1.0–11.4 years); of the 571 individuals in the Genetics of Diabetes Audit and Research in Tayside Scotland (GoDARTS) cohort, 175 required insulin during follow-up (0.3–11.8 years). Overall, the clinical variables and proteins were selected in the different models most often, followed by the metabolites. The most frequently selected clinical variables were HbA1c (18 of the 36 models, 50%), age (15 models, 41.2%) and C-peptide (15 models, 41.2%). Base models (age, sex, BMI, HbA1c) including only clinical variables performed moderately in both the DCS discovery cohort (C statistic 0.71 [95% CI 0.64, 0.79]) and the GoDARTS replication cohort (C 0.71 [95% CI 0.69, 0.75]). A more extensive model including HDL-cholesterol and C-peptide performed better in both cohorts (DCS, C 0.74 [95% CI 0.67, 0.81]; GoDARTS, C 0.73 [95% CI 0.69, 0.77]). Two proteins, lactadherin and proto-oncogene tyrosine-protein kinase receptor, were most consistently selected and slightly improved model performance. Conclusions/interpretation: Using machine learning approaches, we show that insulin requirement risk can be modestly well predicted by predominantly clinical variables. Inclusion of molecular markers improves the prognostic performance beyond that of clinical variables by up to 5%. Such prognostic models could be useful for identifying people with diabetes at high risk of progressing quickly to treatment intensification. Data availability: Summary statistics of lipidomic, proteomic and metabolomic data are available from a Shiny dashboard at https://rhapdata-app.vital-it.ch. Graphical Abstract: (Figure presented.).

AB - Aims/hypothesis: People with type 2 diabetes are heterogeneous in their disease trajectory, with some progressing more quickly to insulin initiation than others. Although classical biomarkers such as age, HbA1c and diabetes duration are associated with glycaemic progression, it is unclear how well such variables predict insulin initiation or requirement and whether newly identified markers have added predictive value. Methods: In two prospective cohort studies as part of IMI-RHAPSODY, we investigated whether clinical variables and three types of molecular markers (metabolites, lipids, proteins) can predict time to insulin requirement using different machine learning approaches (lasso, ridge, GRridge, random forest). Clinical variables included age, sex, HbA1c, HDL-cholesterol and C-peptide. Models were run with unpenalised clinical variables (i.e. always included in the model without weights) or penalised clinical variables, or without clinical variables. Model development was performed in one cohort and the model was applied in a second cohort. Model performance was evaluated using Harrel’s C statistic. Results: Of the 585 individuals from the Hoorn Diabetes Care System (DCS) cohort, 69 required insulin during follow-up (1.0–11.4 years); of the 571 individuals in the Genetics of Diabetes Audit and Research in Tayside Scotland (GoDARTS) cohort, 175 required insulin during follow-up (0.3–11.8 years). Overall, the clinical variables and proteins were selected in the different models most often, followed by the metabolites. The most frequently selected clinical variables were HbA1c (18 of the 36 models, 50%), age (15 models, 41.2%) and C-peptide (15 models, 41.2%). Base models (age, sex, BMI, HbA1c) including only clinical variables performed moderately in both the DCS discovery cohort (C statistic 0.71 [95% CI 0.64, 0.79]) and the GoDARTS replication cohort (C 0.71 [95% CI 0.69, 0.75]). A more extensive model including HDL-cholesterol and C-peptide performed better in both cohorts (DCS, C 0.74 [95% CI 0.67, 0.81]; GoDARTS, C 0.73 [95% CI 0.69, 0.77]). Two proteins, lactadherin and proto-oncogene tyrosine-protein kinase receptor, were most consistently selected and slightly improved model performance. Conclusions/interpretation: Using machine learning approaches, we show that insulin requirement risk can be modestly well predicted by predominantly clinical variables. Inclusion of molecular markers improves the prognostic performance beyond that of clinical variables by up to 5%. Such prognostic models could be useful for identifying people with diabetes at high risk of progressing quickly to treatment intensification. Data availability: Summary statistics of lipidomic, proteomic and metabolomic data are available from a Shiny dashboard at https://rhapdata-app.vital-it.ch. Graphical Abstract: (Figure presented.).

KW - Machine learning

KW - Prediction model

KW - Progression

KW - Type 2 diabetes

UR - http://www.scopus.com/inward/record.url?scp=85185302533&partnerID=8YFLogxK

U2 - 10.1007/s00125-024-06105-8

DO - 10.1007/s00125-024-06105-8

M3 - Article

C2 - 38374450

SN - 0012-186X

VL - 67

SP - 885

EP - 894

JO - Diabetologia

JF - Diabetologia

IS - 5

ER -

An omics-based machine learning approach to predict diabetes progression: a RHAPSODY study

Abstract

Keywords

Access to Document

Other files and links

Cite this