Diagnostic accuracy of an artificial intelligence algorithm versus radiologists for fracture detection on cervical spine CT

Gaby J. van den Wittenboer, Brigitta Y. M. van der Kolk, Ingrid M. Nijholt, Eline Langius-Wiffen, Rogier A. van Dijk, Boudewijn A. A. M. van Hasselt, Martin Podlogar, Wimar A. van den Brink, Gert Joan Bouma, Niels W. L. Schep, Mario Maas, Martijn F. Boomsma

Research output: Contribution to journalArticleAcademicpeer-review

3 Citations (Scopus)

Abstract

Objectives: To compare diagnostic accuracy of a deep learning artificial intelligence (AI) for cervical spine (C-spine) fracture detection on CT to attending radiologists and assess which undetected fractures were injuries in need of stabilising therapy (IST). Methods: This single-centre, retrospective diagnostic accuracy study included consecutive patients (age ≥18 years; 2007–2014) screened for C-spine fractures with CT. To validate ground truth, one radiologist and three neurosurgeons independently examined scans positive for fracture. Negative scans were followed up until 2022 through patient files and two radiologists reviewed negative scans that were flagged positive by AI. The neurosurgeons determined which fractures were ISTs. Diagnostic accuracy of AI and attending radiologists (index tests) were compared using McNemar. Results: Of the 2368 scans (median age, 48, interquartile range 30–65; 1441 men) analysed, 221 (9.3%) scans contained C-spine fractures with 133 IST. AI detected 158/221 scans with fractures (sensitivity 71.5%, 95% CI 65.5–77.4%) and 2118/2147 scans without fractures (specificity 98.6%, 95% CI 98.2–99.1). In comparison, attending radiologists detected 195/221 scans with fractures (sensitivity 88.2%, 95% CI 84.0–92.5%, p < 0.001) and 2130/2147 scans without fracture (specificity 99.2%, 95% CI 98.8–99.6, p = 0.07). Of the fractures undetected by AI 30/63 were ISTs versus 4/26 for radiologists. AI detected 22/26 fractures undetected by the radiologists, including 3/4 undetected ISTs. Conclusion: Compared to attending radiologists, the artificial intelligence has a lower sensitivity and a higher miss rate of fractures in need of stabilising therapy; however, it detected most fractures undetected by the radiologists, including fractures in need of stabilising therapy. Clinical relevance statement The artificial intelligence algorithm missed more cervical spine fractures on CT than attending radiologists, but detected 84.6% of fractures undetected by radiologists, including fractures in need of stabilising therapy. Key Points: The impact of artificial intelligence for cervical spine fracture detection on CT on fracture management is unknown. The algorithm detected less fractures than attending radiologists, but detected most fractures undetected by the radiologists including almost all in need of stabilising therapy. The artificial intelligence algorithm shows potential as a concurrent reader.

Original languageEnglish
Pages (from-to)5041-5048
Number of pages8
JournalEuropean Radiology
Volume34
Issue number8
Early online date2024
DOIs
Publication statusPublished - Aug 2024

Keywords

  • Artificial intelligence
  • Cervical vertebrae
  • Diagnosis
  • Spinal injuries
  • Spiral computed tomography

Cite this