Use of unstructured text in prognostic clinical prediction models: A systematic review

Tom M. Seinen; Egill A. Fridgeirsson; Solomon Ioannou; Daniel Jeannetot; Luis H. John; Jan A. Kors; Aniek F. Markus; Victor Pera; Alexandros Rekkas; Ross D. Williams; Cynthia Yang; Erik M. van Mulligen; Peter R. Rijnbeek

doi:https://doi.org/10.1093/jamia/ocac058

Use of unstructured text in prognostic clinical prediction models: A systematic review

Tom M. Seinen, Egill A. Fridgeirsson, Solomon Ioannou, Daniel Jeannetot, Luis H. John, Jan A. Kors, Aniek F. Markus, Victor Pera, Alexandros Rekkas, Ross D. Williams, Cynthia Yang, Erik M. van Mulligen, Peter R. Rijnbeek

Research output: Contribution to journal › Review article › Academic › peer-review

17 Citations (Scopus)

Abstract

Objective: This systematic review aims to assess how information from unstructured text is used to develop and validate clinical prognostic prediction models. We summarize the prediction problems and methodological landscape and determine whether using text data in addition to more commonly used structured data improves the prediction performance. Materials and Methods: We searched Embase, MEDLINE, Web of Science, and Google Scholar to identify studies that developed prognostic prediction models using information extracted from unstructured text in a data-driven manner, published in the period from January 2005 to March 2021. Data items were extracted, analyzed, and a meta-Analysis of the model performance was carried out to assess the added value of text to structured-data models. Results: We identified 126 studies that described 145 clinical prediction problems. Combining text and structured data improved model performance, compared with using only text or only structured data. In these studies, a wide variety of dense and sparse numeric text representations were combined with both deep learning and more traditional machine learning methods. External validation, public availability, and attention for the explainability of the developed models were limited. Conclusion: The use of unstructured text in the development of prognostic prediction models has been found beneficial in addition to structured data in most studies. The text data are source of valuable information for prediction model development and should not be neglected. We suggest a future focus on explainability and external validation of the developed models, promoting robust and trustworthy prediction models in clinical practice.

Original language	English
Pages (from-to)	1292-1302
Number of pages	11
Journal	Journal of the American Medical Informatics Association
Volume	29
Issue number	7
DOIs	https://doi.org/10.1093/jamia/ocac058
Publication status	Published - 1 Jul 2022

Keywords

clinical prediction model
electronic health records
machine learning
natural language processing
prognostic prediction

Access to Document

https://doi.org/10.1093/jamia/ocac058

Cite this

Seinen, T. M., Fridgeirsson, E. A., Ioannou, S., Jeannetot, D., John, L. H., Kors, J. A., Markus, A. F., Pera, V., Rekkas, A., Williams, R. D., Yang, C., van Mulligen, E. M., & Rijnbeek, P. R. (2022). Use of unstructured text in prognostic clinical prediction models: A systematic review. Journal of the American Medical Informatics Association, 29(7), 1292-1302. https://doi.org/10.1093/jamia/ocac058

@article{ec3fbc61beac4b41a5463aa2144ee0a1,

title = "Use of unstructured text in prognostic clinical prediction models: A systematic review",

abstract = "Objective: This systematic review aims to assess how information from unstructured text is used to develop and validate clinical prognostic prediction models. We summarize the prediction problems and methodological landscape and determine whether using text data in addition to more commonly used structured data improves the prediction performance. Materials and Methods: We searched Embase, MEDLINE, Web of Science, and Google Scholar to identify studies that developed prognostic prediction models using information extracted from unstructured text in a data-driven manner, published in the period from January 2005 to March 2021. Data items were extracted, analyzed, and a meta-Analysis of the model performance was carried out to assess the added value of text to structured-data models. Results: We identified 126 studies that described 145 clinical prediction problems. Combining text and structured data improved model performance, compared with using only text or only structured data. In these studies, a wide variety of dense and sparse numeric text representations were combined with both deep learning and more traditional machine learning methods. External validation, public availability, and attention for the explainability of the developed models were limited. Conclusion: The use of unstructured text in the development of prognostic prediction models has been found beneficial in addition to structured data in most studies. The text data are source of valuable information for prediction model development and should not be neglected. We suggest a future focus on explainability and external validation of the developed models, promoting robust and trustworthy prediction models in clinical practice.",

keywords = "clinical prediction model, electronic health records, machine learning, natural language processing, prognostic prediction",

author = "Seinen, {Tom M.} and Fridgeirsson, {Egill A.} and Solomon Ioannou and Daniel Jeannetot and John, {Luis H.} and Kors, {Jan A.} and Markus, {Aniek F.} and Victor Pera and Alexandros Rekkas and Williams, {Ross D.} and Cynthia Yang and {van Mulligen}, {Erik M.} and Rijnbeek, {Peter R.}",

note = "Publisher Copyright: {\textcopyright} 2022 The Author(s) 2022. Published by Oxford University Press on behalf of the American Medical Informatics Association.",

year = "2022",

month = jul,

day = "1",

doi = "https://doi.org/10.1093/jamia/ocac058",

language = "English",

volume = "29",

pages = "1292--1302",

journal = "Journal of the American Medical Informatics Association",

issn = "1067-5027",

publisher = "Oxford University Press",

number = "7",

}

Seinen, TM, Fridgeirsson, EA, Ioannou, S, Jeannetot, D, John, LH, Kors, JA, Markus, AF, Pera, V, Rekkas, A, Williams, RD, Yang, C, van Mulligen, EM & Rijnbeek, PR 2022, 'Use of unstructured text in prognostic clinical prediction models: A systematic review', Journal of the American Medical Informatics Association, vol. 29, no. 7, pp. 1292-1302. https://doi.org/10.1093/jamia/ocac058

TY - JOUR

T1 - Use of unstructured text in prognostic clinical prediction models

T2 - A systematic review

AU - Seinen, Tom M.

AU - Fridgeirsson, Egill A.

AU - Ioannou, Solomon

AU - Jeannetot, Daniel

AU - John, Luis H.

AU - Kors, Jan A.

AU - Markus, Aniek F.

AU - Pera, Victor

AU - Rekkas, Alexandros

AU - Williams, Ross D.

AU - Yang, Cynthia

AU - van Mulligen, Erik M.

AU - Rijnbeek, Peter R.

PY - 2022/7/1

Y1 - 2022/7/1

N2 - Objective: This systematic review aims to assess how information from unstructured text is used to develop and validate clinical prognostic prediction models. We summarize the prediction problems and methodological landscape and determine whether using text data in addition to more commonly used structured data improves the prediction performance. Materials and Methods: We searched Embase, MEDLINE, Web of Science, and Google Scholar to identify studies that developed prognostic prediction models using information extracted from unstructured text in a data-driven manner, published in the period from January 2005 to March 2021. Data items were extracted, analyzed, and a meta-Analysis of the model performance was carried out to assess the added value of text to structured-data models. Results: We identified 126 studies that described 145 clinical prediction problems. Combining text and structured data improved model performance, compared with using only text or only structured data. In these studies, a wide variety of dense and sparse numeric text representations were combined with both deep learning and more traditional machine learning methods. External validation, public availability, and attention for the explainability of the developed models were limited. Conclusion: The use of unstructured text in the development of prognostic prediction models has been found beneficial in addition to structured data in most studies. The text data are source of valuable information for prediction model development and should not be neglected. We suggest a future focus on explainability and external validation of the developed models, promoting robust and trustworthy prediction models in clinical practice.

AB - Objective: This systematic review aims to assess how information from unstructured text is used to develop and validate clinical prognostic prediction models. We summarize the prediction problems and methodological landscape and determine whether using text data in addition to more commonly used structured data improves the prediction performance. Materials and Methods: We searched Embase, MEDLINE, Web of Science, and Google Scholar to identify studies that developed prognostic prediction models using information extracted from unstructured text in a data-driven manner, published in the period from January 2005 to March 2021. Data items were extracted, analyzed, and a meta-Analysis of the model performance was carried out to assess the added value of text to structured-data models. Results: We identified 126 studies that described 145 clinical prediction problems. Combining text and structured data improved model performance, compared with using only text or only structured data. In these studies, a wide variety of dense and sparse numeric text representations were combined with both deep learning and more traditional machine learning methods. External validation, public availability, and attention for the explainability of the developed models were limited. Conclusion: The use of unstructured text in the development of prognostic prediction models has been found beneficial in addition to structured data in most studies. The text data are source of valuable information for prediction model development and should not be neglected. We suggest a future focus on explainability and external validation of the developed models, promoting robust and trustworthy prediction models in clinical practice.

KW - clinical prediction model

KW - electronic health records

KW - machine learning

KW - natural language processing

KW - prognostic prediction

UR - http://www.scopus.com/inward/record.url?scp=85132050064&partnerID=8YFLogxK

U2 - https://doi.org/10.1093/jamia/ocac058

DO - https://doi.org/10.1093/jamia/ocac058

M3 - Review article

C2 - 35475536

SN - 1067-5027

VL - 29

SP - 1292

EP - 1302

JO - Journal of the American Medical Informatics Association

JF - Journal of the American Medical Informatics Association

IS - 7

ER -

Use of unstructured text in prognostic clinical prediction models: A systematic review

Abstract

Keywords

Access to Document

Other files and links

Cite this