Early detection of colorectal cancer by leveraging Dutch primary care consultation notes with free text embeddings

Torec T. Luik; Ameen Abu-Hanna; Henk C. P. M. van Weert; Martijn C. Schut

doi:https://doi.org/10.1038/s41598-023-37397-2

Early detection of colorectal cancer by leveraging Dutch primary care consultation notes with free text embeddings

Torec T. Luik, Ameen Abu-Hanna, Henk C. P. M. van Weert, Martijn C. Schut

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

We aimed to assess the added predictive performance that free-text Dutch consultation notes provide in detecting colorectal cancer in primary care, in comparison to currently used models. We developed, evaluated and compared three prediction models for colorectal cancer (CRC) in a large primary care database with 60,641 patients. The prediction model with both known predictive features and free-text data (with TabTxt AUROC: 0.823) performs statistically significantly better (p < 0.05) than the other two models with only tabular (as used nowadays) and text data, respectively (AUROC Tab: 0.767; Txt: 0.797). The specificity of the two models that use demographics and known CRC features (with specificity Tab: 0.321; TabTxt: 0.335) are higher than that of the model with only free-text (specificity Txt: 0.234). The Txt and, to a lesser degree, TabTxt model are well calibrated, while the Tab model shows slight underprediction at both tails. As expected with an outcome prevalence below 0.01, all models show much uncalibrated predictions in the extreme upper tail (top 1%). Free-text consultation notes show promising results to improve the predictive performance over established prediction models that only use structured features. Clinical future implications for our CRC use case include that such improvement may help lowering the number of referrals for suspected CRC to medical specialists.

Original language	English
Article number	10760
Journal	Scientific reports
Volume	13
Issue number	1
DOIs	https://doi.org/10.1038/s41598-023-37397-2
Publication status	Published - 1 Dec 2023

Access to Document

https://doi.org/10.1038/s41598-023-37397-2

Cite this

@article{0ea8cd6f81204b0ca4515e70e4214a8a,

title = "Early detection of colorectal cancer by leveraging Dutch primary care consultation notes with free text embeddings",

abstract = "We aimed to assess the added predictive performance that free-text Dutch consultation notes provide in detecting colorectal cancer in primary care, in comparison to currently used models. We developed, evaluated and compared three prediction models for colorectal cancer (CRC) in a large primary care database with 60,641 patients. The prediction model with both known predictive features and free-text data (with TabTxt AUROC: 0.823) performs statistically significantly better (p < 0.05) than the other two models with only tabular (as used nowadays) and text data, respectively (AUROC Tab: 0.767; Txt: 0.797). The specificity of the two models that use demographics and known CRC features (with specificity Tab: 0.321; TabTxt: 0.335) are higher than that of the model with only free-text (specificity Txt: 0.234). The Txt and, to a lesser degree, TabTxt model are well calibrated, while the Tab model shows slight underprediction at both tails. As expected with an outcome prevalence below 0.01, all models show much uncalibrated predictions in the extreme upper tail (top 1%). Free-text consultation notes show promising results to improve the predictive performance over established prediction models that only use structured features. Clinical future implications for our CRC use case include that such improvement may help lowering the number of referrals for suspected CRC to medical specialists.",

author = "Luik, {Torec T.} and Ameen Abu-Hanna and {van Weert}, {Henk C. P. M.} and Schut, {Martijn C.}",

note = "Funding Information: TL and MCS were funded by the Dutch Cancer Society (KWF.nl) Programme Research & Implementation call 2019-I (Project Number 12225: AI-DOC). TL received internal funding from the departments of Medical Informatics and Primary Care of the Amsterdam University Medical Centers, location AMC. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Publisher Copyright: {\textcopyright} 2023, The Author(s).",

year = "2023",

month = dec,

day = "1",

doi = "https://doi.org/10.1038/s41598-023-37397-2",

language = "English",

volume = "13",

journal = "Scientific reports",

issn = "2045-2322",

publisher = "Springer Nature",

number = "1",

}

TY - JOUR

T1 - Early detection of colorectal cancer by leveraging Dutch primary care consultation notes with free text embeddings

AU - Luik, Torec T.

AU - Abu-Hanna, Ameen

AU - van Weert, Henk C. P. M.

AU - Schut, Martijn C.

N1 - Funding Information: TL and MCS were funded by the Dutch Cancer Society (KWF.nl) Programme Research & Implementation call 2019-I (Project Number 12225: AI-DOC). TL received internal funding from the departments of Medical Informatics and Primary Care of the Amsterdam University Medical Centers, location AMC. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Publisher Copyright: © 2023, The Author(s).

PY - 2023/12/1

Y1 - 2023/12/1

N2 - We aimed to assess the added predictive performance that free-text Dutch consultation notes provide in detecting colorectal cancer in primary care, in comparison to currently used models. We developed, evaluated and compared three prediction models for colorectal cancer (CRC) in a large primary care database with 60,641 patients. The prediction model with both known predictive features and free-text data (with TabTxt AUROC: 0.823) performs statistically significantly better (p < 0.05) than the other two models with only tabular (as used nowadays) and text data, respectively (AUROC Tab: 0.767; Txt: 0.797). The specificity of the two models that use demographics and known CRC features (with specificity Tab: 0.321; TabTxt: 0.335) are higher than that of the model with only free-text (specificity Txt: 0.234). The Txt and, to a lesser degree, TabTxt model are well calibrated, while the Tab model shows slight underprediction at both tails. As expected with an outcome prevalence below 0.01, all models show much uncalibrated predictions in the extreme upper tail (top 1%). Free-text consultation notes show promising results to improve the predictive performance over established prediction models that only use structured features. Clinical future implications for our CRC use case include that such improvement may help lowering the number of referrals for suspected CRC to medical specialists.

AB - We aimed to assess the added predictive performance that free-text Dutch consultation notes provide in detecting colorectal cancer in primary care, in comparison to currently used models. We developed, evaluated and compared three prediction models for colorectal cancer (CRC) in a large primary care database with 60,641 patients. The prediction model with both known predictive features and free-text data (with TabTxt AUROC: 0.823) performs statistically significantly better (p < 0.05) than the other two models with only tabular (as used nowadays) and text data, respectively (AUROC Tab: 0.767; Txt: 0.797). The specificity of the two models that use demographics and known CRC features (with specificity Tab: 0.321; TabTxt: 0.335) are higher than that of the model with only free-text (specificity Txt: 0.234). The Txt and, to a lesser degree, TabTxt model are well calibrated, while the Tab model shows slight underprediction at both tails. As expected with an outcome prevalence below 0.01, all models show much uncalibrated predictions in the extreme upper tail (top 1%). Free-text consultation notes show promising results to improve the predictive performance over established prediction models that only use structured features. Clinical future implications for our CRC use case include that such improvement may help lowering the number of referrals for suspected CRC to medical specialists.

UR - http://www.scopus.com/inward/record.url?scp=85164017471&partnerID=8YFLogxK

U2 - https://doi.org/10.1038/s41598-023-37397-2

DO - https://doi.org/10.1038/s41598-023-37397-2

M3 - Article

C2 - 37402757

SN - 2045-2322

VL - 13

JO - Scientific reports

JF - Scientific reports

IS - 1

M1 - 10760

ER -

Early detection of colorectal cancer by leveraging Dutch primary care consultation notes with free text embeddings

Abstract

Access to Document

Other files and links

Cite this