TY - JOUR
T1 - The added value of text from Dutch general practitioner notes in predictive modeling
AU - Seinen, Tom M.
AU - Kors, Jan A.
AU - van Mulligen, Erik M.
AU - Fridgeirsson, Egill
AU - Rijnbeek, Peter R.
N1 - Publisher Copyright: © The Author(s) 2023. Published by Oxford University Press on behalf of the American Medical Informatics Association.
PY - 2023/11/17
Y1 - 2023/11/17
N2 - OBJECTIVE: This work aims to explore the value of Dutch unstructured data, in combination with structured data, for the development of prognostic prediction models in a general practitioner (GP) setting. MATERIALS AND METHODS: We trained and validated prediction models for 4 common clinical prediction problems using various sparse text representations, common prediction algorithms, and observational GP electronic health record (EHR) data. We trained and validated 84 models internally and externally on data from different EHR systems. RESULTS: On average, over all the different text representations and prediction algorithms, models only using text data performed better or similar to models using structured data alone in 2 prediction tasks. Additionally, in these 2 tasks, the combination of structured and text data outperformed models using structured or text data alone. No large performance differences were found between the different text representations and prediction algorithms. DISCUSSION: Our findings indicate that the use of unstructured data alone can result in well-performing prediction models for some clinical prediction problems. Furthermore, the performance improvement achieved by combining structured and text data highlights the added value. Additionally, we demonstrate the significance of clinical natural language processing research in languages other than English and the possibility of validating text-based prediction models across various EHR systems. CONCLUSION: Our study highlights the potential benefits of incorporating unstructured data in clinical prediction models in a GP setting. Although the added value of unstructured data may vary depending on the specific prediction task, our findings suggest that it has the potential to enhance patient care.
AB - OBJECTIVE: This work aims to explore the value of Dutch unstructured data, in combination with structured data, for the development of prognostic prediction models in a general practitioner (GP) setting. MATERIALS AND METHODS: We trained and validated prediction models for 4 common clinical prediction problems using various sparse text representations, common prediction algorithms, and observational GP electronic health record (EHR) data. We trained and validated 84 models internally and externally on data from different EHR systems. RESULTS: On average, over all the different text representations and prediction algorithms, models only using text data performed better or similar to models using structured data alone in 2 prediction tasks. Additionally, in these 2 tasks, the combination of structured and text data outperformed models using structured or text data alone. No large performance differences were found between the different text representations and prediction algorithms. DISCUSSION: Our findings indicate that the use of unstructured data alone can result in well-performing prediction models for some clinical prediction problems. Furthermore, the performance improvement achieved by combining structured and text data highlights the added value. Additionally, we demonstrate the significance of clinical natural language processing research in languages other than English and the possibility of validating text-based prediction models across various EHR systems. CONCLUSION: Our study highlights the potential benefits of incorporating unstructured data in clinical prediction models in a GP setting. Although the added value of unstructured data may vary depending on the specific prediction task, our findings suggest that it has the potential to enhance patient care.
KW - clinical prediction model
KW - electronic health records
KW - machine learning
KW - natural language processing
KW - prognostic prediction
UR - http://www.scopus.com/inward/record.url?scp=85177103064&partnerID=8YFLogxK
U2 - https://doi.org/10.1093/jamia/ocad160
DO - https://doi.org/10.1093/jamia/ocad160
M3 - Article
C2 - 37587084
SN - 1067-5027
VL - 30
SP - 1973
EP - 1984
JO - Journal of the American Medical Informatics Association
JF - Journal of the American Medical Informatics Association
IS - 12
ER -