Natural language processing in pathology: a scoping review

Gerard Burger; Ameen Abu-Hanna; Nicolette de Keizer; Ronald Cornet

doi:https://doi.org/10.1136/jclinpath-2016-203872

Natural language processing in pathology: a scoping review

Gerard Burger, Ameen Abu-Hanna, Nicolette de Keizer, Ronald Cornet

Research output: Contribution to journal › Review article › Academic › peer-review

43 Citations (Scopus)

Abstract

Background Encoded pathology data are key for medical registries and analyses, but pathology information is often expressed as free text. Objective We reviewed and assessed the use of NLP (natural language processing) for encoding pathology documents. Materials and methods Papers addressing NLP in pathology were retrieved from PubMed, Association for Computing Machinery (ACM) Digital Library and Association for Computational Linguistics (ACL) Anthology. We reviewed and summarised the study objectives; NLP methods used and their validation; software implementations; the performance on the dataset used and any reported use in practice. Results The main objectives of the 38 included papers were encoding and extraction of clinically relevant information from pathology reports. Common approaches were word/phrase matching, probabilistic machine learning and rule-based systems. Five papers (13%) compared different methods on the same dataset. Four papers did not specify the method(s) used. 18 of the 26 studies that reported F-measure, recall or precision reported values of over 0.9. Proprietary software was the most frequently mentioned category (14 studies); General Architecture for Text Engineering (GATE) was the most applied architecture overall. Practical system use was reported in four papers. Most papers used expert annotation validation. Conclusions Different methods are used in NLP research in pathology, and good performances, that is, high precision and recall, high retrieval/removal rates, are reported for all of these. Lack of validation and of shared datasets precludes performance comparison. More comparative analysis and validation are needed to provide better insight into the performance and merits of these methods

Original language	English
Pages (from-to)	949-955
Journal	Journal of clinical pathology
Volume	69
Issue number	11
Early online date	2016
DOIs	https://doi.org/10.1136/jclinpath-2016-203872
Publication status	Published - 2016

Access to Document

https://doi.org/10.1136/jclinpath-2016-203872

Cite this

@article{27534992075c496ab0977fbe7fff3d77,

title = "Natural language processing in pathology: a scoping review",

abstract = "Background Encoded pathology data are key for medical registries and analyses, but pathology information is often expressed as free text. Objective We reviewed and assessed the use of NLP (natural language processing) for encoding pathology documents. Materials and methods Papers addressing NLP in pathology were retrieved from PubMed, Association for Computing Machinery (ACM) Digital Library and Association for Computational Linguistics (ACL) Anthology. We reviewed and summarised the study objectives; NLP methods used and their validation; software implementations; the performance on the dataset used and any reported use in practice. Results The main objectives of the 38 included papers were encoding and extraction of clinically relevant information from pathology reports. Common approaches were word/phrase matching, probabilistic machine learning and rule-based systems. Five papers (13%) compared different methods on the same dataset. Four papers did not specify the method(s) used. 18 of the 26 studies that reported F-measure, recall or precision reported values of over 0.9. Proprietary software was the most frequently mentioned category (14 studies); General Architecture for Text Engineering (GATE) was the most applied architecture overall. Practical system use was reported in four papers. Most papers used expert annotation validation. Conclusions Different methods are used in NLP research in pathology, and good performances, that is, high precision and recall, high retrieval/removal rates, are reported for all of these. Lack of validation and of shared datasets precludes performance comparison. More comparative analysis and validation are needed to provide better insight into the performance and merits of these methods",

author = "Gerard Burger and Ameen Abu-Hanna and {de Keizer}, Nicolette and Ronald Cornet",

year = "2016",

doi = "https://doi.org/10.1136/jclinpath-2016-203872",

language = "English",

volume = "69",

pages = "949--955",

journal = "Journal of clinical pathology",

issn = "0021-9746",

publisher = "BMJ Publishing Group",

number = "11",

}

TY - JOUR

T1 - Natural language processing in pathology: a scoping review

AU - Burger, Gerard

AU - Abu-Hanna, Ameen

AU - de Keizer, Nicolette

AU - Cornet, Ronald

PY - 2016

Y1 - 2016

N2 - Background Encoded pathology data are key for medical registries and analyses, but pathology information is often expressed as free text. Objective We reviewed and assessed the use of NLP (natural language processing) for encoding pathology documents. Materials and methods Papers addressing NLP in pathology were retrieved from PubMed, Association for Computing Machinery (ACM) Digital Library and Association for Computational Linguistics (ACL) Anthology. We reviewed and summarised the study objectives; NLP methods used and their validation; software implementations; the performance on the dataset used and any reported use in practice. Results The main objectives of the 38 included papers were encoding and extraction of clinically relevant information from pathology reports. Common approaches were word/phrase matching, probabilistic machine learning and rule-based systems. Five papers (13%) compared different methods on the same dataset. Four papers did not specify the method(s) used. 18 of the 26 studies that reported F-measure, recall or precision reported values of over 0.9. Proprietary software was the most frequently mentioned category (14 studies); General Architecture for Text Engineering (GATE) was the most applied architecture overall. Practical system use was reported in four papers. Most papers used expert annotation validation. Conclusions Different methods are used in NLP research in pathology, and good performances, that is, high precision and recall, high retrieval/removal rates, are reported for all of these. Lack of validation and of shared datasets precludes performance comparison. More comparative analysis and validation are needed to provide better insight into the performance and merits of these methods

AB - Background Encoded pathology data are key for medical registries and analyses, but pathology information is often expressed as free text. Objective We reviewed and assessed the use of NLP (natural language processing) for encoding pathology documents. Materials and methods Papers addressing NLP in pathology were retrieved from PubMed, Association for Computing Machinery (ACM) Digital Library and Association for Computational Linguistics (ACL) Anthology. We reviewed and summarised the study objectives; NLP methods used and their validation; software implementations; the performance on the dataset used and any reported use in practice. Results The main objectives of the 38 included papers were encoding and extraction of clinically relevant information from pathology reports. Common approaches were word/phrase matching, probabilistic machine learning and rule-based systems. Five papers (13%) compared different methods on the same dataset. Four papers did not specify the method(s) used. 18 of the 26 studies that reported F-measure, recall or precision reported values of over 0.9. Proprietary software was the most frequently mentioned category (14 studies); General Architecture for Text Engineering (GATE) was the most applied architecture overall. Practical system use was reported in four papers. Most papers used expert annotation validation. Conclusions Different methods are used in NLP research in pathology, and good performances, that is, high precision and recall, high retrieval/removal rates, are reported for all of these. Lack of validation and of shared datasets precludes performance comparison. More comparative analysis and validation are needed to provide better insight into the performance and merits of these methods

U2 - https://doi.org/10.1136/jclinpath-2016-203872

DO - https://doi.org/10.1136/jclinpath-2016-203872

M3 - Review article

C2 - 27451435

SN - 0021-9746

VL - 69

SP - 949

EP - 955

JO - Journal of clinical pathology

JF - Journal of clinical pathology

IS - 11

ER -