Machine learning approaches to enhance diagnosis and staging of patients with MASLD using routinely available clinical information

Matthew McTeer; Douglas Applegate; Peter Mesenbrink; Vlad Ratziu; J. rn M. Schattenberg; Elisabetta Bugianesi; Andreas Geier; Manuel Romero Gomez; Jean-Francois Dufour; Mattias Ekstedt; Sven Francque; Hannele Yki-Jarvinen; Michael Allison; Luca Valenti; Luca Miele; Michael Pavlides; Jeremy Cobbold; Georgios Papatheodoridis; Adriaan G. Holleboom; Dina Tiniakos; Clifford Brass; Quentin M. Anstee; Paolo Missier

doi:10.1371/journal.pone.0299487

Machine learning approaches to enhance diagnosis and staging of patients with MASLD using routinely available clinical information

Matthew McTeer, Douglas Applegate, Peter Mesenbrink, Vlad Ratziu, J. rn M. Schattenberg, Elisabetta Bugianesi, Andreas Geier, Manuel Romero Gomez, Jean-Francois Dufour, Mattias Ekstedt, Sven Francque, Hannele Yki-Jarvinen, Michael Allison, Luca Valenti, Luca Miele, Michael Pavlides, Jeremy Cobbold, Georgios Papatheodoridis, Adriaan G. Holleboom, Dina TiniakosClifford Brass, Quentin M. Anstee, Paolo Missier

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

Aims Metabolic dysfunction Associated Steatotic Liver Disease (MASLD) outcomes such as MASH (metabolic dysfunction associated steatohepatitis), fibrosis and cirrhosis are ordinarily determined by resource-intensive and invasive biopsies. We aim to show that routine clinical tests offer sufficient information to predict these endpoints. Methods Using the LITMUS Metacohort derived from the European NAFLD Registry, the largest MASLD dataset in Europe, we create three combinations of features which vary in degree of procurement including a 19-variable feature set that are attained through a routine clinical appointment or blood test. This data was used to train predictive models using supervised machine learning (ML) algorithm XGBoost, alongside missing imputation technique MICE and class balancing algorithm SMOTE. Shapley Additive exPlanations (SHAP) were added to determine relative importance for each clinical variable. Results Analysing nine biopsy-derived MASLD outcomes of cohort size ranging between 5385 and 6673 subjects, we were able to predict individuals at training set AUCs ranging from 0.719-0.994, including classifying individuals who are At-Risk MASH at an AUC = 0.899. Using two further feature combinations of 26-variables and 35-variables, which included composite scores known to be good indicators for MASLD endpoints and advanced specialist tests, we found predictive performance did not sufficiently improve. We are also able to present local and global explanations for each ML model, offering clinicians interpretability without the expense of worsening predictive performance. Conclusions This study developed a series of ML models of accuracy ranging from 71.9—99.4% using only easily extractable and readily available information in predicting MASLD outcomes which are usually determined through highly invasive means.

Original language	English
Article number	e0299487
Journal	PLOS ONE
Volume	19
Issue number	2 February
DOIs	https://doi.org/10.1371/journal.pone.0299487
Publication status	Published - 1 Feb 2024

Access to Document

10.1371/journal.pone.0299487

Cite this

McTeer, M., Applegate, D., Mesenbrink, P., Ratziu, V., Schattenberg, J. R. M., Bugianesi, E., Geier, A., Gomez, M. R., Dufour, J.-F., Ekstedt, M., Francque, S., Yki-Jarvinen, H., Allison, M., Valenti, L., Miele, L., Pavlides, M., Cobbold, J., Papatheodoridis, G., Holleboom, A. G., ... Missier, P. (2024). Machine learning approaches to enhance diagnosis and staging of patients with MASLD using routinely available clinical information. PLOS ONE, 19(2 February), Article e0299487. https://doi.org/10.1371/journal.pone.0299487

@article{09a90f2b6ad34e91938c0cb8ff19367c,

title = "Machine learning approaches to enhance diagnosis and staging of patients with MASLD using routinely available clinical information",

abstract = "Aims Metabolic dysfunction Associated Steatotic Liver Disease (MASLD) outcomes such as MASH (metabolic dysfunction associated steatohepatitis), fibrosis and cirrhosis are ordinarily determined by resource-intensive and invasive biopsies. We aim to show that routine clinical tests offer sufficient information to predict these endpoints. Methods Using the LITMUS Metacohort derived from the European NAFLD Registry, the largest MASLD dataset in Europe, we create three combinations of features which vary in degree of procurement including a 19-variable feature set that are attained through a routine clinical appointment or blood test. This data was used to train predictive models using supervised machine learning (ML) algorithm XGBoost, alongside missing imputation technique MICE and class balancing algorithm SMOTE. Shapley Additive exPlanations (SHAP) were added to determine relative importance for each clinical variable. Results Analysing nine biopsy-derived MASLD outcomes of cohort size ranging between 5385 and 6673 subjects, we were able to predict individuals at training set AUCs ranging from 0.719-0.994, including classifying individuals who are At-Risk MASH at an AUC = 0.899. Using two further feature combinations of 26-variables and 35-variables, which included composite scores known to be good indicators for MASLD endpoints and advanced specialist tests, we found predictive performance did not sufficiently improve. We are also able to present local and global explanations for each ML model, offering clinicians interpretability without the expense of worsening predictive performance. Conclusions This study developed a series of ML models of accuracy ranging from 71.9—99.4% using only easily extractable and readily available information in predicting MASLD outcomes which are usually determined through highly invasive means.",

author = "Matthew McTeer and Douglas Applegate and Peter Mesenbrink and Vlad Ratziu and Schattenberg, {J. rn M.} and Elisabetta Bugianesi and Andreas Geier and Gomez, {Manuel Romero} and Jean-Francois Dufour and Mattias Ekstedt and Sven Francque and Hannele Yki-Jarvinen and Michael Allison and Luca Valenti and Luca Miele and Michael Pavlides and Jeremy Cobbold and Georgios Papatheodoridis and Holleboom, {Adriaan G.} and Dina Tiniakos and Clifford Brass and Anstee, {Quentin M.} and Paolo Missier",

note = "Publisher Copyright: {\textcopyright} 2024 McTeer et al.",

year = "2024",

month = feb,

day = "1",

doi = "10.1371/journal.pone.0299487",

language = "English",

volume = "19",

journal = "PLOS ONE",

issn = "1932-6203",

publisher = "Public Library of Science",

number = "2 February",

}

McTeer, M, Applegate, D, Mesenbrink, P, Ratziu, V, Schattenberg, JRM, Bugianesi, E, Geier, A, Gomez, MR, Dufour, J-F, Ekstedt, M, Francque, S, Yki-Jarvinen, H, Allison, M, Valenti, L, Miele, L, Pavlides, M, Cobbold, J, Papatheodoridis, G, Holleboom, AG, Tiniakos, D, Brass, C, Anstee, QM & Missier, P 2024, 'Machine learning approaches to enhance diagnosis and staging of patients with MASLD using routinely available clinical information', PLOS ONE, vol. 19, no. 2 February, e0299487. https://doi.org/10.1371/journal.pone.0299487

TY - JOUR

T1 - Machine learning approaches to enhance diagnosis and staging of patients with MASLD using routinely available clinical information

AU - McTeer, Matthew

AU - Applegate, Douglas

AU - Mesenbrink, Peter

AU - Ratziu, Vlad

AU - Schattenberg, J. rn M.

AU - Bugianesi, Elisabetta

AU - Geier, Andreas

AU - Gomez, Manuel Romero

AU - Dufour, Jean-Francois

AU - Ekstedt, Mattias

AU - Francque, Sven

AU - Yki-Jarvinen, Hannele

AU - Allison, Michael

AU - Valenti, Luca

AU - Miele, Luca

AU - Pavlides, Michael

AU - Cobbold, Jeremy

AU - Papatheodoridis, Georgios

AU - Holleboom, Adriaan G.

AU - Tiniakos, Dina

AU - Brass, Clifford

AU - Anstee, Quentin M.

AU - Missier, Paolo

PY - 2024/2/1

Y1 - 2024/2/1

N2 - Aims Metabolic dysfunction Associated Steatotic Liver Disease (MASLD) outcomes such as MASH (metabolic dysfunction associated steatohepatitis), fibrosis and cirrhosis are ordinarily determined by resource-intensive and invasive biopsies. We aim to show that routine clinical tests offer sufficient information to predict these endpoints. Methods Using the LITMUS Metacohort derived from the European NAFLD Registry, the largest MASLD dataset in Europe, we create three combinations of features which vary in degree of procurement including a 19-variable feature set that are attained through a routine clinical appointment or blood test. This data was used to train predictive models using supervised machine learning (ML) algorithm XGBoost, alongside missing imputation technique MICE and class balancing algorithm SMOTE. Shapley Additive exPlanations (SHAP) were added to determine relative importance for each clinical variable. Results Analysing nine biopsy-derived MASLD outcomes of cohort size ranging between 5385 and 6673 subjects, we were able to predict individuals at training set AUCs ranging from 0.719-0.994, including classifying individuals who are At-Risk MASH at an AUC = 0.899. Using two further feature combinations of 26-variables and 35-variables, which included composite scores known to be good indicators for MASLD endpoints and advanced specialist tests, we found predictive performance did not sufficiently improve. We are also able to present local and global explanations for each ML model, offering clinicians interpretability without the expense of worsening predictive performance. Conclusions This study developed a series of ML models of accuracy ranging from 71.9—99.4% using only easily extractable and readily available information in predicting MASLD outcomes which are usually determined through highly invasive means.

AB - Aims Metabolic dysfunction Associated Steatotic Liver Disease (MASLD) outcomes such as MASH (metabolic dysfunction associated steatohepatitis), fibrosis and cirrhosis are ordinarily determined by resource-intensive and invasive biopsies. We aim to show that routine clinical tests offer sufficient information to predict these endpoints. Methods Using the LITMUS Metacohort derived from the European NAFLD Registry, the largest MASLD dataset in Europe, we create three combinations of features which vary in degree of procurement including a 19-variable feature set that are attained through a routine clinical appointment or blood test. This data was used to train predictive models using supervised machine learning (ML) algorithm XGBoost, alongside missing imputation technique MICE and class balancing algorithm SMOTE. Shapley Additive exPlanations (SHAP) were added to determine relative importance for each clinical variable. Results Analysing nine biopsy-derived MASLD outcomes of cohort size ranging between 5385 and 6673 subjects, we were able to predict individuals at training set AUCs ranging from 0.719-0.994, including classifying individuals who are At-Risk MASH at an AUC = 0.899. Using two further feature combinations of 26-variables and 35-variables, which included composite scores known to be good indicators for MASLD endpoints and advanced specialist tests, we found predictive performance did not sufficiently improve. We are also able to present local and global explanations for each ML model, offering clinicians interpretability without the expense of worsening predictive performance. Conclusions This study developed a series of ML models of accuracy ranging from 71.9—99.4% using only easily extractable and readily available information in predicting MASLD outcomes which are usually determined through highly invasive means.

UR - http://www.scopus.com/inward/record.url?scp=85186311522&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0299487

DO - 10.1371/journal.pone.0299487

M3 - Article

C2 - 38421999

SN - 1932-6203

VL - 19

JO - PLOS ONE

JF - PLOS ONE

IS - 2 February

M1 - e0299487

ER -

Machine learning approaches to enhance diagnosis and staging of patients with MASLD using routinely available clinical information

Abstract

Access to Document

Other files and links

Cite this