Diagnostic accuracy of an artificial intelligence algorithm versus radiologists for fracture detection on cervical spine CT

Gaby J. van den Wittenboer; Brigitta Y. M. van der Kolk; Ingrid M. Nijholt; Eline Langius-Wiffen; Rogier A. van Dijk; Boudewijn A. A. M. van Hasselt; Martin Podlogar; Wimar A. van den Brink; Gert Joan Bouma; Niels W. L. Schep; Mario Maas; Martijn F. Boomsma

doi:https://doi.org/10.1007/s00330-023-10559-6

Diagnostic accuracy of an artificial intelligence algorithm versus radiologists for fracture detection on cervical spine CT

Gaby J. van den Wittenboer, Brigitta Y. M. van der Kolk, Ingrid M. Nijholt, Eline Langius-Wiffen, Rogier A. van Dijk, Boudewijn A. A. M. van Hasselt, Martin Podlogar, Wimar A. van den Brink, Gert Joan Bouma, Niels W. L. Schep, Mario Maas, Martijn F. Boomsma

Research output: Contribution to journal › Article › Academic › peer-review

1 Citation (Scopus)

Abstract

Objectives: To compare diagnostic accuracy of a deep learning artificial intelligence (AI) for cervical spine (C-spine) fracture detection on CT to attending radiologists and assess which undetected fractures were injuries in need of stabilising therapy (IST). Methods: This single-centre, retrospective diagnostic accuracy study included consecutive patients (age ≥18 years; 2007–2014) screened for C-spine fractures with CT. To validate ground truth, one radiologist and three neurosurgeons independently examined scans positive for fracture. Negative scans were followed up until 2022 through patient files and two radiologists reviewed negative scans that were flagged positive by AI. The neurosurgeons determined which fractures were ISTs. Diagnostic accuracy of AI and attending radiologists (index tests) were compared using McNemar. Results: Of the 2368 scans (median age, 48, interquartile range 30–65; 1441 men) analysed, 221 (9.3%) scans contained C-spine fractures with 133 IST. AI detected 158/221 scans with fractures (sensitivity 71.5%, 95% CI 65.5–77.4%) and 2118/2147 scans without fractures (specificity 98.6%, 95% CI 98.2–99.1). In comparison, attending radiologists detected 195/221 scans with fractures (sensitivity 88.2%, 95% CI 84.0–92.5%, p < 0.001) and 2130/2147 scans without fracture (specificity 99.2%, 95% CI 98.8–99.6, p = 0.07). Of the fractures undetected by AI 30/63 were ISTs versus 4/26 for radiologists. AI detected 22/26 fractures undetected by the radiologists, including 3/4 undetected ISTs. Conclusion: Compared to attending radiologists, the artificial intelligence has a lower sensitivity and a higher miss rate of fractures in need of stabilising therapy; however, it detected most fractures undetected by the radiologists, including fractures in need of stabilising therapy. Clinical relevance statement The artificial intelligence algorithm missed more cervical spine fractures on CT than attending radiologists, but detected 84.6% of fractures undetected by radiologists, including fractures in need of stabilising therapy. Key Points: The impact of artificial intelligence for cervical spine fracture detection on CT on fracture management is unknown. The algorithm detected less fractures than attending radiologists, but detected most fractures undetected by the radiologists including almost all in need of stabilising therapy. The artificial intelligence algorithm shows potential as a concurrent reader.

Original language	English
Journal	European Radiology
Early online date	2024
DOIs	https://doi.org/10.1007/s00330-023-10559-6
Publication status	E-pub ahead of print - 2024

Keywords

Artificial intelligence
Cervical vertebrae
Diagnosis
Spinal injuries
Spiral computed tomography

Access to Document

https://doi.org/10.1007/s00330-023-10559-6

Cite this

van den Wittenboer, G. J., van der Kolk, B. Y. M., Nijholt, I. M., Langius-Wiffen, E., van Dijk, R. A., van Hasselt, B. A. A. M., Podlogar, M., van den Brink, W. A., Bouma, G. J., Schep, N. W. L., Maas, M., & Boomsma, M. F. (2024). Diagnostic accuracy of an artificial intelligence algorithm versus radiologists for fracture detection on cervical spine CT. European Radiology. Advance online publication. https://doi.org/10.1007/s00330-023-10559-6

@article{7bc5a5c3b9a849989861b2c110332350,

title = "Diagnostic accuracy of an artificial intelligence algorithm versus radiologists for fracture detection on cervical spine CT",

abstract = "Objectives: To compare diagnostic accuracy of a deep learning artificial intelligence (AI) for cervical spine (C-spine) fracture detection on CT to attending radiologists and assess which undetected fractures were injuries in need of stabilising therapy (IST). Methods: This single-centre, retrospective diagnostic accuracy study included consecutive patients (age ≥18 years; 2007–2014) screened for C-spine fractures with CT. To validate ground truth, one radiologist and three neurosurgeons independently examined scans positive for fracture. Negative scans were followed up until 2022 through patient files and two radiologists reviewed negative scans that were flagged positive by AI. The neurosurgeons determined which fractures were ISTs. Diagnostic accuracy of AI and attending radiologists (index tests) were compared using McNemar. Results: Of the 2368 scans (median age, 48, interquartile range 30–65; 1441 men) analysed, 221 (9.3%) scans contained C-spine fractures with 133 IST. AI detected 158/221 scans with fractures (sensitivity 71.5%, 95% CI 65.5–77.4%) and 2118/2147 scans without fractures (specificity 98.6%, 95% CI 98.2–99.1). In comparison, attending radiologists detected 195/221 scans with fractures (sensitivity 88.2%, 95% CI 84.0–92.5%, p < 0.001) and 2130/2147 scans without fracture (specificity 99.2%, 95% CI 98.8–99.6, p = 0.07). Of the fractures undetected by AI 30/63 were ISTs versus 4/26 for radiologists. AI detected 22/26 fractures undetected by the radiologists, including 3/4 undetected ISTs. Conclusion: Compared to attending radiologists, the artificial intelligence has a lower sensitivity and a higher miss rate of fractures in need of stabilising therapy; however, it detected most fractures undetected by the radiologists, including fractures in need of stabilising therapy. Clinical relevance statement The artificial intelligence algorithm missed more cervical spine fractures on CT than attending radiologists, but detected 84.6% of fractures undetected by radiologists, including fractures in need of stabilising therapy. Key Points: The impact of artificial intelligence for cervical spine fracture detection on CT on fracture management is unknown. The algorithm detected less fractures than attending radiologists, but detected most fractures undetected by the radiologists including almost all in need of stabilising therapy. The artificial intelligence algorithm shows potential as a concurrent reader.",

keywords = "Artificial intelligence, Cervical vertebrae, Diagnosis, Spinal injuries, Spiral computed tomography",

author = "{van den Wittenboer}, {Gaby J.} and {van der Kolk}, {Brigitta Y. M.} and Nijholt, {Ingrid M.} and Eline Langius-Wiffen and {van Dijk}, {Rogier A.} and {van Hasselt}, {Boudewijn A. A. M.} and Martin Podlogar and {van den Brink}, {Wimar A.} and Bouma, {Gert Joan} and Schep, {Niels W. L.} and Mario Maas and Boomsma, {Martijn F.}",

note = "Publisher Copyright: {\textcopyright} 2024, The Author(s), under exclusive licence to European Society of Radiology.",

year = "2024",

doi = "https://doi.org/10.1007/s00330-023-10559-6",

language = "English",

journal = "European Radiology",

issn = "0938-7994",

publisher = "Springer Verlag",

}

van den Wittenboer, GJ, van der Kolk, BYM, Nijholt, IM, Langius-Wiffen, E, van Dijk, RA, van Hasselt, BAAM, Podlogar, M, van den Brink, WA, Bouma, GJ, Schep, NWL, Maas, M & Boomsma, MF 2024, 'Diagnostic accuracy of an artificial intelligence algorithm versus radiologists for fracture detection on cervical spine CT', European Radiology. https://doi.org/10.1007/s00330-023-10559-6

TY - JOUR

T1 - Diagnostic accuracy of an artificial intelligence algorithm versus radiologists for fracture detection on cervical spine CT

AU - van den Wittenboer, Gaby J.

AU - van der Kolk, Brigitta Y. M.

AU - Nijholt, Ingrid M.

AU - Langius-Wiffen, Eline

AU - van Dijk, Rogier A.

AU - van Hasselt, Boudewijn A. A. M.

AU - Podlogar, Martin

AU - van den Brink, Wimar A.

AU - Bouma, Gert Joan

AU - Schep, Niels W. L.

AU - Maas, Mario

AU - Boomsma, Martijn F.

PY - 2024

Y1 - 2024

N2 - Objectives: To compare diagnostic accuracy of a deep learning artificial intelligence (AI) for cervical spine (C-spine) fracture detection on CT to attending radiologists and assess which undetected fractures were injuries in need of stabilising therapy (IST). Methods: This single-centre, retrospective diagnostic accuracy study included consecutive patients (age ≥18 years; 2007–2014) screened for C-spine fractures with CT. To validate ground truth, one radiologist and three neurosurgeons independently examined scans positive for fracture. Negative scans were followed up until 2022 through patient files and two radiologists reviewed negative scans that were flagged positive by AI. The neurosurgeons determined which fractures were ISTs. Diagnostic accuracy of AI and attending radiologists (index tests) were compared using McNemar. Results: Of the 2368 scans (median age, 48, interquartile range 30–65; 1441 men) analysed, 221 (9.3%) scans contained C-spine fractures with 133 IST. AI detected 158/221 scans with fractures (sensitivity 71.5%, 95% CI 65.5–77.4%) and 2118/2147 scans without fractures (specificity 98.6%, 95% CI 98.2–99.1). In comparison, attending radiologists detected 195/221 scans with fractures (sensitivity 88.2%, 95% CI 84.0–92.5%, p < 0.001) and 2130/2147 scans without fracture (specificity 99.2%, 95% CI 98.8–99.6, p = 0.07). Of the fractures undetected by AI 30/63 were ISTs versus 4/26 for radiologists. AI detected 22/26 fractures undetected by the radiologists, including 3/4 undetected ISTs. Conclusion: Compared to attending radiologists, the artificial intelligence has a lower sensitivity and a higher miss rate of fractures in need of stabilising therapy; however, it detected most fractures undetected by the radiologists, including fractures in need of stabilising therapy. Clinical relevance statement The artificial intelligence algorithm missed more cervical spine fractures on CT than attending radiologists, but detected 84.6% of fractures undetected by radiologists, including fractures in need of stabilising therapy. Key Points: The impact of artificial intelligence for cervical spine fracture detection on CT on fracture management is unknown. The algorithm detected less fractures than attending radiologists, but detected most fractures undetected by the radiologists including almost all in need of stabilising therapy. The artificial intelligence algorithm shows potential as a concurrent reader.

AB - Objectives: To compare diagnostic accuracy of a deep learning artificial intelligence (AI) for cervical spine (C-spine) fracture detection on CT to attending radiologists and assess which undetected fractures were injuries in need of stabilising therapy (IST). Methods: This single-centre, retrospective diagnostic accuracy study included consecutive patients (age ≥18 years; 2007–2014) screened for C-spine fractures with CT. To validate ground truth, one radiologist and three neurosurgeons independently examined scans positive for fracture. Negative scans were followed up until 2022 through patient files and two radiologists reviewed negative scans that were flagged positive by AI. The neurosurgeons determined which fractures were ISTs. Diagnostic accuracy of AI and attending radiologists (index tests) were compared using McNemar. Results: Of the 2368 scans (median age, 48, interquartile range 30–65; 1441 men) analysed, 221 (9.3%) scans contained C-spine fractures with 133 IST. AI detected 158/221 scans with fractures (sensitivity 71.5%, 95% CI 65.5–77.4%) and 2118/2147 scans without fractures (specificity 98.6%, 95% CI 98.2–99.1). In comparison, attending radiologists detected 195/221 scans with fractures (sensitivity 88.2%, 95% CI 84.0–92.5%, p < 0.001) and 2130/2147 scans without fracture (specificity 99.2%, 95% CI 98.8–99.6, p = 0.07). Of the fractures undetected by AI 30/63 were ISTs versus 4/26 for radiologists. AI detected 22/26 fractures undetected by the radiologists, including 3/4 undetected ISTs. Conclusion: Compared to attending radiologists, the artificial intelligence has a lower sensitivity and a higher miss rate of fractures in need of stabilising therapy; however, it detected most fractures undetected by the radiologists, including fractures in need of stabilising therapy. Clinical relevance statement The artificial intelligence algorithm missed more cervical spine fractures on CT than attending radiologists, but detected 84.6% of fractures undetected by radiologists, including fractures in need of stabilising therapy. Key Points: The impact of artificial intelligence for cervical spine fracture detection on CT on fracture management is unknown. The algorithm detected less fractures than attending radiologists, but detected most fractures undetected by the radiologists including almost all in need of stabilising therapy. The artificial intelligence algorithm shows potential as a concurrent reader.

KW - Artificial intelligence

KW - Cervical vertebrae

KW - Diagnosis

KW - Spinal injuries

KW - Spiral computed tomography

UR - http://www.scopus.com/inward/record.url?scp=85182221062&partnerID=8YFLogxK

U2 - https://doi.org/10.1007/s00330-023-10559-6

DO - https://doi.org/10.1007/s00330-023-10559-6

M3 - Article

C2 - 38206401

SN - 0938-7994

JO - European Radiology

JF - European Radiology

ER -

Diagnostic accuracy of an artificial intelligence algorithm versus radiologists for fracture detection on cervical spine CT

Abstract

Keywords

Access to Document

Other files and links

Cite this