Data Extraction and Synthesis in Systematic Reviews of Diagnostic Test Accuracy: A Corpus for Automating and Evaluating the Process

Christopher Norman, Mariska Leeflang, Aurélie Névéol

Research output: Contribution to journalArticleAcademicpeer-review

9 Citations (Scopus)

Abstract

BACKGROUND: Systematic reviews are critical for obtaining accurate estimates of diagnostic test accuracy, yet these require extracting information buried in free text articles, an often laborious process. OBJECTIVE: We create a dataset describing the data extraction and synthesis processes in 63 DTA systematic reviews, and demonstrate its utility by using it to replicate the data synthesis in the original reviews. METHOD: We construct our dataset using a custom automated extraction pipeline complemented with manual extraction, verification, and post-editing. We evaluate using manual assessment by two annotators and by comparing against data extracted from source files. RESULTS: The constructed dataset contains 5,848 test results for 1,354 diagnostic tests from 1,738 diagnostic studies. We observe an extraction error rate of 0.06-0.3%. CONCLUSIONS: This constitutes the first dataset describing the later stages of the DTA systematic review process, and is intended to be useful for automating or evaluating the process.
Original languageEnglish
Pages (from-to)817-826
JournalAMIA ... Annual Symposium proceedings. AMIA Symposium
Volume2018
Publication statusPublished - 2018

Cite this