Explainable machine learning model based on clinical factors for predicting the disappearance of indeterminate pulmonary nodules

Jingxuan Wang; Nikos Sourlos; Marjolein Heuvelmans; Mathias Prokop; Rozemarijn Vliegenthart; Peter van Ooijen

doi:https://doi.org/10.1016/j.compbiomed.2023.107871

Explainable machine learning model based on clinical factors for predicting the disappearance of indeterminate pulmonary nodules

Jingxuan Wang, Nikos Sourlos, Marjolein Heuvelmans, Mathias Prokop, Rozemarijn Vliegenthart, Peter van Ooijen

Pulmonary medicine (VUmc)

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

Background: During lung cancer screening, indeterminate pulmonary nodules (IPNs) are a frequent finding. We aim to predict whether IPNs are resolving or non-resolving to reduce follow-up examinations, using machine learning (ML) models. We incorporated dedicated techniques to enhance prediction explainability. Methods: In total, 724 IPNs (size 50–500 mm3, 575 participants) from the Dutch-Belgian Randomized Lung Cancer Screening Trial were used. We implemented six ML models and 14 factors to predict nodule disappearance. Random search was applied to determine the optimal hyperparameters on the training set (579 nodules). ML models were trained using 5-fold cross-validation and tested on the test set (145 nodules). Model predictions were evaluated by utilizing the recall, precision, F1 score, and the area under the receiver operating characteristic curve (AUC). The best-performing model was used for three feature importance techniques: mean decrease in impurity (MDI), permutation feature importance (PFI), and SHAPley Additive exPlanations (SHAP). Results: The random forest model outperformed the other ML models with an AUC of 0.865. This model achieved a recall of 0.646, a precision of 0.816, and an F1 score of 0.721. The evaluation of feature importance achieved consistent ranking across all three methods for the most crucial factors. The MDI, PFI, and SHAP methods highlighted volume, maximum diameter, and minimum diameter as the top three factors. However, the remaining factors revealed discrepant ranking across methods. Conclusion: ML models effectively predict IPN disappearance using participant demographics and nodule characteristics. Explainable techniques can assist clinicians in developing understandable preliminary assessments.

Original language	English
Article number	107871
Journal	Computers in Biology and Medicine
Volume	169
DOIs	https://doi.org/10.1016/j.compbiomed.2023.107871
Publication status	Published - 1 Feb 2024

Keywords

Clinical factor
Explainable machine learning
Feature importance
Indeterminate pulmonary nodule
Visualization

Access to Document

https://doi.org/10.1016/j.compbiomed.2023.107871

Cite this

@article{71de259ddb2e4d429d2cf255ec7c2548,

title = "Explainable machine learning model based on clinical factors for predicting the disappearance of indeterminate pulmonary nodules",

abstract = "Background: During lung cancer screening, indeterminate pulmonary nodules (IPNs) are a frequent finding. We aim to predict whether IPNs are resolving or non-resolving to reduce follow-up examinations, using machine learning (ML) models. We incorporated dedicated techniques to enhance prediction explainability. Methods: In total, 724 IPNs (size 50–500 mm3, 575 participants) from the Dutch-Belgian Randomized Lung Cancer Screening Trial were used. We implemented six ML models and 14 factors to predict nodule disappearance. Random search was applied to determine the optimal hyperparameters on the training set (579 nodules). ML models were trained using 5-fold cross-validation and tested on the test set (145 nodules). Model predictions were evaluated by utilizing the recall, precision, F1 score, and the area under the receiver operating characteristic curve (AUC). The best-performing model was used for three feature importance techniques: mean decrease in impurity (MDI), permutation feature importance (PFI), and SHAPley Additive exPlanations (SHAP). Results: The random forest model outperformed the other ML models with an AUC of 0.865. This model achieved a recall of 0.646, a precision of 0.816, and an F1 score of 0.721. The evaluation of feature importance achieved consistent ranking across all three methods for the most crucial factors. The MDI, PFI, and SHAP methods highlighted volume, maximum diameter, and minimum diameter as the top three factors. However, the remaining factors revealed discrepant ranking across methods. Conclusion: ML models effectively predict IPN disappearance using participant demographics and nodule characteristics. Explainable techniques can assist clinicians in developing understandable preliminary assessments.",

keywords = "Clinical factor, Explainable machine learning, Feature importance, Indeterminate pulmonary nodule, Visualization",

author = "Jingxuan Wang and Nikos Sourlos and Marjolein Heuvelmans and Mathias Prokop and Rozemarijn Vliegenthart and {van Ooijen}, Peter",

note = "Funding Information: Jingxuan Wang is grateful for the PhD financial support from China Scholarship Council (CSC file No. 202006540020 ) and the University of Groningen . This funding source had no role in the design, data collection, management, analysis, interpretation, preparation, review, approval of the manuscript, and decision to submit the manuscript for publication. Publisher Copyright: {\textcopyright} 2023 The Authors",

year = "2024",

month = feb,

day = "1",

doi = "https://doi.org/10.1016/j.compbiomed.2023.107871",

language = "English",

volume = "169",

journal = "Computers in Biology and Medicine",

issn = "0010-4825",

publisher = "Elsevier Limited",

}

TY - JOUR

T1 - Explainable machine learning model based on clinical factors for predicting the disappearance of indeterminate pulmonary nodules

AU - Wang, Jingxuan

AU - Sourlos, Nikos

AU - Heuvelmans, Marjolein

AU - Prokop, Mathias

AU - Vliegenthart, Rozemarijn

AU - van Ooijen, Peter

N1 - Funding Information: Jingxuan Wang is grateful for the PhD financial support from China Scholarship Council (CSC file No. 202006540020 ) and the University of Groningen . This funding source had no role in the design, data collection, management, analysis, interpretation, preparation, review, approval of the manuscript, and decision to submit the manuscript for publication. Publisher Copyright: © 2023 The Authors

PY - 2024/2/1

Y1 - 2024/2/1

N2 - Background: During lung cancer screening, indeterminate pulmonary nodules (IPNs) are a frequent finding. We aim to predict whether IPNs are resolving or non-resolving to reduce follow-up examinations, using machine learning (ML) models. We incorporated dedicated techniques to enhance prediction explainability. Methods: In total, 724 IPNs (size 50–500 mm3, 575 participants) from the Dutch-Belgian Randomized Lung Cancer Screening Trial were used. We implemented six ML models and 14 factors to predict nodule disappearance. Random search was applied to determine the optimal hyperparameters on the training set (579 nodules). ML models were trained using 5-fold cross-validation and tested on the test set (145 nodules). Model predictions were evaluated by utilizing the recall, precision, F1 score, and the area under the receiver operating characteristic curve (AUC). The best-performing model was used for three feature importance techniques: mean decrease in impurity (MDI), permutation feature importance (PFI), and SHAPley Additive exPlanations (SHAP). Results: The random forest model outperformed the other ML models with an AUC of 0.865. This model achieved a recall of 0.646, a precision of 0.816, and an F1 score of 0.721. The evaluation of feature importance achieved consistent ranking across all three methods for the most crucial factors. The MDI, PFI, and SHAP methods highlighted volume, maximum diameter, and minimum diameter as the top three factors. However, the remaining factors revealed discrepant ranking across methods. Conclusion: ML models effectively predict IPN disappearance using participant demographics and nodule characteristics. Explainable techniques can assist clinicians in developing understandable preliminary assessments.

AB - Background: During lung cancer screening, indeterminate pulmonary nodules (IPNs) are a frequent finding. We aim to predict whether IPNs are resolving or non-resolving to reduce follow-up examinations, using machine learning (ML) models. We incorporated dedicated techniques to enhance prediction explainability. Methods: In total, 724 IPNs (size 50–500 mm3, 575 participants) from the Dutch-Belgian Randomized Lung Cancer Screening Trial were used. We implemented six ML models and 14 factors to predict nodule disappearance. Random search was applied to determine the optimal hyperparameters on the training set (579 nodules). ML models were trained using 5-fold cross-validation and tested on the test set (145 nodules). Model predictions were evaluated by utilizing the recall, precision, F1 score, and the area under the receiver operating characteristic curve (AUC). The best-performing model was used for three feature importance techniques: mean decrease in impurity (MDI), permutation feature importance (PFI), and SHAPley Additive exPlanations (SHAP). Results: The random forest model outperformed the other ML models with an AUC of 0.865. This model achieved a recall of 0.646, a precision of 0.816, and an F1 score of 0.721. The evaluation of feature importance achieved consistent ranking across all three methods for the most crucial factors. The MDI, PFI, and SHAP methods highlighted volume, maximum diameter, and minimum diameter as the top three factors. However, the remaining factors revealed discrepant ranking across methods. Conclusion: ML models effectively predict IPN disappearance using participant demographics and nodule characteristics. Explainable techniques can assist clinicians in developing understandable preliminary assessments.

KW - Clinical factor

KW - Explainable machine learning

KW - Feature importance

KW - Indeterminate pulmonary nodule

KW - Visualization

UR - http://www.scopus.com/inward/record.url?scp=85181245980&partnerID=8YFLogxK

U2 - https://doi.org/10.1016/j.compbiomed.2023.107871

DO - https://doi.org/10.1016/j.compbiomed.2023.107871

M3 - Article

C2 - 38154157

SN - 0010-4825

VL - 169

JO - Computers in Biology and Medicine

JF - Computers in Biology and Medicine

M1 - 107871

ER -

Explainable machine learning model based on clinical factors for predicting the disappearance of indeterminate pulmonary nodules

Abstract

Keywords

Access to Document

Other files and links

Cite this