Development and validation of a machine-learning algorithm to predict the relevance of scientific articles within the field of teratology

Philippe C Habets; David Gp van IJzendoorn; Christiaan H Vinkers; Linda Härmark; Loes C de Vries; Willem M Otte

doi:https://doi.org/10.1016/j.reprotox.2022.09.001

Development and validation of a machine-learning algorithm to predict the relevance of scientific articles within the field of teratology

Philippe C Habets, David Gp van IJzendoorn, Christiaan H Vinkers, Linda Härmark, Loes C de Vries, Willem M Otte

Research output: Contribution to journal › Article › Academic › peer-review

2 Citations (Scopus)

Abstract

The Dutch Teratology Information Service Lareb counsels healthcare professionals and patients about medication use during pregnancy and lactation. To keep the evidence up to date, employees perform a standardized weekly PubMed query where relevant literature is identified manually. We aimed to develop an accurate machine-learning algorithm to predict the relevance of PubMed entries, thereby reducing the labor-intensive task of manually screening the articles. We fine-tuned a pre-trained natural language processing transformer model to identify relevant entries. We split 15,540 labeled entries into case-control-balanced train, validation, and test datasets. Additionally, we externally validated the model prospectively with 1288 labeled entries obtained from weekly queries after developing the model. This dataset was also independently labeled by a team of six experienced human raters to evaluate our model's performance. The validation of our machine learning model on the retrospectively collected outheld dataset obtained an area under the sensitivity-versus-specificity curve of 89.3 % (CI: 88.2- 90.4). In the prospective external validation of the model, our model classified relevant literature with a sensitivity versus specificity curve area of 87.4 % (CI: 85.0-89.8). Our model achieved a higher sensitivity than the human raters' team without sacrificing too much specificity. The team of human raters showed weak to moderate levels of agreement in their article classifications (kappa range 0.40-0.64). The human selection of the latest relevant literature is indispensable to keep the teratology information up to date. We show that automatic preselection of relevant abstracts using machine learning is possible without sacrificing the selection performance.

Original language	English
Pages (from-to)	150-154
Number of pages	5
Journal	Reproductive Toxicology
Volume	113
Early online date	5 Sept 2022
DOIs	https://doi.org/10.1016/j.reprotox.2022.09.001
Publication status	Published - Oct 2022

Keywords

Deep learning
Literature screening
Pharmacovigilance
TIS

Access to Document

https://doi.org/10.1016/j.reprotox.2022.09.001

Cite this

@article{d9f8588c04934014b39bac114f2ad99d,

title = "Development and validation of a machine-learning algorithm to predict the relevance of scientific articles within the field of teratology",

abstract = "The Dutch Teratology Information Service Lareb counsels healthcare professionals and patients about medication use during pregnancy and lactation. To keep the evidence up to date, employees perform a standardized weekly PubMed query where relevant literature is identified manually. We aimed to develop an accurate machine-learning algorithm to predict the relevance of PubMed entries, thereby reducing the labor-intensive task of manually screening the articles. We fine-tuned a pre-trained natural language processing transformer model to identify relevant entries. We split 15,540 labeled entries into case-control-balanced train, validation, and test datasets. Additionally, we externally validated the model prospectively with 1288 labeled entries obtained from weekly queries after developing the model. This dataset was also independently labeled by a team of six experienced human raters to evaluate our model's performance. The validation of our machine learning model on the retrospectively collected outheld dataset obtained an area under the sensitivity-versus-specificity curve of 89.3 % (CI: 88.2- 90.4). In the prospective external validation of the model, our model classified relevant literature with a sensitivity versus specificity curve area of 87.4 % (CI: 85.0-89.8). Our model achieved a higher sensitivity than the human raters' team without sacrificing too much specificity. The team of human raters showed weak to moderate levels of agreement in their article classifications (kappa range 0.40-0.64). The human selection of the latest relevant literature is indispensable to keep the teratology information up to date. We show that automatic preselection of relevant abstracts using machine learning is possible without sacrificing the selection performance.",

keywords = "Deep learning, Literature screening, Pharmacovigilance, TIS",

author = "Habets, {Philippe C} and {van IJzendoorn}, {David Gp} and Vinkers, {Christiaan H} and Linda H{\"a}rmark and {de Vries}, {Loes C} and Otte, {Willem M}",

note = "Publisher Copyright: {\textcopyright} 2022 The Authors",

year = "2022",

month = oct,

doi = "https://doi.org/10.1016/j.reprotox.2022.09.001",

language = "English",

volume = "113",

pages = "150--154",

journal = "Reproductive Toxicology",

issn = "0890-6238",

publisher = "Elsevier Inc.",

}

TY - JOUR

T1 - Development and validation of a machine-learning algorithm to predict the relevance of scientific articles within the field of teratology

AU - Habets, Philippe C

AU - van IJzendoorn, David Gp

AU - Vinkers, Christiaan H

AU - Härmark, Linda

AU - de Vries, Loes C

AU - Otte, Willem M

PY - 2022/10

Y1 - 2022/10

N2 - The Dutch Teratology Information Service Lareb counsels healthcare professionals and patients about medication use during pregnancy and lactation. To keep the evidence up to date, employees perform a standardized weekly PubMed query where relevant literature is identified manually. We aimed to develop an accurate machine-learning algorithm to predict the relevance of PubMed entries, thereby reducing the labor-intensive task of manually screening the articles. We fine-tuned a pre-trained natural language processing transformer model to identify relevant entries. We split 15,540 labeled entries into case-control-balanced train, validation, and test datasets. Additionally, we externally validated the model prospectively with 1288 labeled entries obtained from weekly queries after developing the model. This dataset was also independently labeled by a team of six experienced human raters to evaluate our model's performance. The validation of our machine learning model on the retrospectively collected outheld dataset obtained an area under the sensitivity-versus-specificity curve of 89.3 % (CI: 88.2- 90.4). In the prospective external validation of the model, our model classified relevant literature with a sensitivity versus specificity curve area of 87.4 % (CI: 85.0-89.8). Our model achieved a higher sensitivity than the human raters' team without sacrificing too much specificity. The team of human raters showed weak to moderate levels of agreement in their article classifications (kappa range 0.40-0.64). The human selection of the latest relevant literature is indispensable to keep the teratology information up to date. We show that automatic preselection of relevant abstracts using machine learning is possible without sacrificing the selection performance.

AB - The Dutch Teratology Information Service Lareb counsels healthcare professionals and patients about medication use during pregnancy and lactation. To keep the evidence up to date, employees perform a standardized weekly PubMed query where relevant literature is identified manually. We aimed to develop an accurate machine-learning algorithm to predict the relevance of PubMed entries, thereby reducing the labor-intensive task of manually screening the articles. We fine-tuned a pre-trained natural language processing transformer model to identify relevant entries. We split 15,540 labeled entries into case-control-balanced train, validation, and test datasets. Additionally, we externally validated the model prospectively with 1288 labeled entries obtained from weekly queries after developing the model. This dataset was also independently labeled by a team of six experienced human raters to evaluate our model's performance. The validation of our machine learning model on the retrospectively collected outheld dataset obtained an area under the sensitivity-versus-specificity curve of 89.3 % (CI: 88.2- 90.4). In the prospective external validation of the model, our model classified relevant literature with a sensitivity versus specificity curve area of 87.4 % (CI: 85.0-89.8). Our model achieved a higher sensitivity than the human raters' team without sacrificing too much specificity. The team of human raters showed weak to moderate levels of agreement in their article classifications (kappa range 0.40-0.64). The human selection of the latest relevant literature is indispensable to keep the teratology information up to date. We show that automatic preselection of relevant abstracts using machine learning is possible without sacrificing the selection performance.

KW - Deep learning

KW - Literature screening

KW - Pharmacovigilance

KW - TIS

UR - http://www.scopus.com/inward/record.url?scp=85138040185&partnerID=8YFLogxK

U2 - https://doi.org/10.1016/j.reprotox.2022.09.001

DO - https://doi.org/10.1016/j.reprotox.2022.09.001

M3 - Article

C2 - 36067870

SN - 0890-6238

VL - 113

SP - 150

EP - 154

JO - Reproductive Toxicology

JF - Reproductive Toxicology

ER -

Development and validation of a machine-learning algorithm to predict the relevance of scientific articles within the field of teratology

Abstract

Keywords

Access to Document

Other files and links

Cite this