TY - JOUR
T1 - Diagnostic accuracy of an artificial intelligence algorithm versus radiologists for fracture detection on cervical spine CT
AU - van den Wittenboer, Gaby J.
AU - van der Kolk, Brigitta Y. M.
AU - Nijholt, Ingrid M.
AU - Langius-Wiffen, Eline
AU - van Dijk, Rogier A.
AU - van Hasselt, Boudewijn A. A. M.
AU - Podlogar, Martin
AU - van den Brink, Wimar A.
AU - Bouma, Gert Joan
AU - Schep, Niels W. L.
AU - Maas, Mario
AU - Boomsma, Martijn F.
N1 - Publisher Copyright: © The Author(s), under exclusive licence to European Society of Radiology 2024.
PY - 2024/8
Y1 - 2024/8
N2 - Objectives: To compare diagnostic accuracy of a deep learning artificial intelligence (AI) for cervical spine (C-spine) fracture detection on CT to attending radiologists and assess which undetected fractures were injuries in need of stabilising therapy (IST). Methods: This single-centre, retrospective diagnostic accuracy study included consecutive patients (age ≥18 years; 2007–2014) screened for C-spine fractures with CT. To validate ground truth, one radiologist and three neurosurgeons independently examined scans positive for fracture. Negative scans were followed up until 2022 through patient files and two radiologists reviewed negative scans that were flagged positive by AI. The neurosurgeons determined which fractures were ISTs. Diagnostic accuracy of AI and attending radiologists (index tests) were compared using McNemar. Results: Of the 2368 scans (median age, 48, interquartile range 30–65; 1441 men) analysed, 221 (9.3%) scans contained C-spine fractures with 133 IST. AI detected 158/221 scans with fractures (sensitivity 71.5%, 95% CI 65.5–77.4%) and 2118/2147 scans without fractures (specificity 98.6%, 95% CI 98.2–99.1). In comparison, attending radiologists detected 195/221 scans with fractures (sensitivity 88.2%, 95% CI 84.0–92.5%, p < 0.001) and 2130/2147 scans without fracture (specificity 99.2%, 95% CI 98.8–99.6, p = 0.07). Of the fractures undetected by AI 30/63 were ISTs versus 4/26 for radiologists. AI detected 22/26 fractures undetected by the radiologists, including 3/4 undetected ISTs. Conclusion: Compared to attending radiologists, the artificial intelligence has a lower sensitivity and a higher miss rate of fractures in need of stabilising therapy; however, it detected most fractures undetected by the radiologists, including fractures in need of stabilising therapy. Clinical relevance statement The artificial intelligence algorithm missed more cervical spine fractures on CT than attending radiologists, but detected 84.6% of fractures undetected by radiologists, including fractures in need of stabilising therapy. Key Points: The impact of artificial intelligence for cervical spine fracture detection on CT on fracture management is unknown. The algorithm detected less fractures than attending radiologists, but detected most fractures undetected by the radiologists including almost all in need of stabilising therapy. The artificial intelligence algorithm shows potential as a concurrent reader.
AB - Objectives: To compare diagnostic accuracy of a deep learning artificial intelligence (AI) for cervical spine (C-spine) fracture detection on CT to attending radiologists and assess which undetected fractures were injuries in need of stabilising therapy (IST). Methods: This single-centre, retrospective diagnostic accuracy study included consecutive patients (age ≥18 years; 2007–2014) screened for C-spine fractures with CT. To validate ground truth, one radiologist and three neurosurgeons independently examined scans positive for fracture. Negative scans were followed up until 2022 through patient files and two radiologists reviewed negative scans that were flagged positive by AI. The neurosurgeons determined which fractures were ISTs. Diagnostic accuracy of AI and attending radiologists (index tests) were compared using McNemar. Results: Of the 2368 scans (median age, 48, interquartile range 30–65; 1441 men) analysed, 221 (9.3%) scans contained C-spine fractures with 133 IST. AI detected 158/221 scans with fractures (sensitivity 71.5%, 95% CI 65.5–77.4%) and 2118/2147 scans without fractures (specificity 98.6%, 95% CI 98.2–99.1). In comparison, attending radiologists detected 195/221 scans with fractures (sensitivity 88.2%, 95% CI 84.0–92.5%, p < 0.001) and 2130/2147 scans without fracture (specificity 99.2%, 95% CI 98.8–99.6, p = 0.07). Of the fractures undetected by AI 30/63 were ISTs versus 4/26 for radiologists. AI detected 22/26 fractures undetected by the radiologists, including 3/4 undetected ISTs. Conclusion: Compared to attending radiologists, the artificial intelligence has a lower sensitivity and a higher miss rate of fractures in need of stabilising therapy; however, it detected most fractures undetected by the radiologists, including fractures in need of stabilising therapy. Clinical relevance statement The artificial intelligence algorithm missed more cervical spine fractures on CT than attending radiologists, but detected 84.6% of fractures undetected by radiologists, including fractures in need of stabilising therapy. Key Points: The impact of artificial intelligence for cervical spine fracture detection on CT on fracture management is unknown. The algorithm detected less fractures than attending radiologists, but detected most fractures undetected by the radiologists including almost all in need of stabilising therapy. The artificial intelligence algorithm shows potential as a concurrent reader.
KW - Artificial intelligence
KW - Cervical vertebrae
KW - Diagnosis
KW - Spinal injuries
KW - Spiral computed tomography
UR - http://www.scopus.com/inward/record.url?scp=85182221062&partnerID=8YFLogxK
U2 - https://doi.org/10.1007/s00330-023-10559-6
DO - https://doi.org/10.1007/s00330-023-10559-6
M3 - Article
C2 - 38206401
SN - 0938-7994
VL - 34
SP - 5041
EP - 5048
JO - European Radiology
JF - European Radiology
IS - 8
ER -