TY - JOUR
T1 - Predicting future falls in older people using natural language processing of general practitioners' clinical notes
AU - Dormosh, Noman
AU - Schut, Martijn C
AU - Heymans, Martijn W
AU - Maarsingh, Otto R.
AU - Bouman, Jonathan
AU - van der Velde, Nathalie
AU - Abu-Hanna, Ameen
N1 - Funding Information: The authors are grateful to all participating GPs and the data managers of the Academic General Practitioner’s Network at Academic Medical Center (AHA AMC) for their time and effort in contributing routine care data for this study. Publisher Copyright: © The Author(s) 2023. Published by Oxford University Press on behalf of the British Geriatrics Society. All rights reserved.
PY - 2023/4/1
Y1 - 2023/4/1
N2 - BACKGROUND: Falls in older people are common and morbid. Prediction models can help identifying individuals at higher fall risk. Electronic health records (EHR) offer an opportunity to develop automated prediction tools that may help to identify fall-prone individuals and lower clinical workload. However, existing models primarily utilise structured EHR data and neglect information in unstructured data. Using machine learning and natural language processing (NLP), we aimed to examine the predictive performance provided by unstructured clinical notes, and their incremental performance over structured data to predict falls.METHODS: We used primary care EHR data of people aged 65 or over. We developed three logistic regression models using the least absolute shrinkage and selection operator: one using structured clinical variables (Baseline), one with topics extracted from unstructured clinical notes (Topic-based) and one by adding clinical variables to the extracted topics (Combi). Model performance was assessed in terms of discrimination using the area under the receiver operating characteristic curve (AUC), and calibration by calibration plots. We used 10-fold cross-validation to validate the approach.RESULTS: Data of 35,357 individuals were analysed, of which 4,734 experienced falls. Our NLP topic modelling technique discovered 151 topics from the unstructured clinical notes. AUCs and 95% confidence intervals of the Baseline, Topic-based and Combi models were 0.709 (0.700-0.719), 0.685 (0.676-0.694) and 0.718 (0.708-0.727), respectively. All the models showed good calibration.CONCLUSIONS: Unstructured clinical notes are an additional viable data source to develop and improve prediction models for falls compared to traditional prediction models, but the clinical relevance remains limited.
AB - BACKGROUND: Falls in older people are common and morbid. Prediction models can help identifying individuals at higher fall risk. Electronic health records (EHR) offer an opportunity to develop automated prediction tools that may help to identify fall-prone individuals and lower clinical workload. However, existing models primarily utilise structured EHR data and neglect information in unstructured data. Using machine learning and natural language processing (NLP), we aimed to examine the predictive performance provided by unstructured clinical notes, and their incremental performance over structured data to predict falls.METHODS: We used primary care EHR data of people aged 65 or over. We developed three logistic regression models using the least absolute shrinkage and selection operator: one using structured clinical variables (Baseline), one with topics extracted from unstructured clinical notes (Topic-based) and one by adding clinical variables to the extracted topics (Combi). Model performance was assessed in terms of discrimination using the area under the receiver operating characteristic curve (AUC), and calibration by calibration plots. We used 10-fold cross-validation to validate the approach.RESULTS: Data of 35,357 individuals were analysed, of which 4,734 experienced falls. Our NLP topic modelling technique discovered 151 topics from the unstructured clinical notes. AUCs and 95% confidence intervals of the Baseline, Topic-based and Combi models were 0.709 (0.700-0.719), 0.685 (0.676-0.694) and 0.718 (0.708-0.727), respectively. All the models showed good calibration.CONCLUSIONS: Unstructured clinical notes are an additional viable data source to develop and improve prediction models for falls compared to traditional prediction models, but the clinical relevance remains limited.
KW - Accidental Falls/prevention & control
KW - Aged
KW - Electronic Health Records
KW - General Practitioners
KW - Humans
KW - Logistic Models
KW - Natural Language Processing
KW - accidental falls
KW - electronic health records
KW - fall prediction
KW - free text
KW - natural language processing
KW - older people
KW - topic modelling
UR - https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85151616194&origin=inward
UR - https://www.ncbi.nlm.nih.gov/pubmed/37014000
UR - http://www.scopus.com/inward/record.url?scp=85151616194&partnerID=8YFLogxK
U2 - https://doi.org/10.1093/ageing/afad046
DO - https://doi.org/10.1093/ageing/afad046
M3 - Article
C2 - 37014000
SN - 0002-0729
VL - 52
JO - Age and ageing
JF - Age and ageing
IS - 4
ER -