What Are the Applications and Limitations of Artificial Intelligence for Fracture Detection and Classification in Orthopaedic Trauma Imaging? A Systematic Review: A Systematic Review

David W.G. Langerhuizen; Stein J. Janssen; Wouter H. Mallee; Michel P.J. Van Den Bekerom; David Ring; Gino M.M.J. Kerkhoffs; Ruurd L. Jaarsma; Job N. Doornberg

doi:https://doi.org/10.1097/CORR.0000000000000848

What Are the Applications and Limitations of Artificial Intelligence for Fracture Detection and Classification in Orthopaedic Trauma Imaging? A Systematic Review: A Systematic Review

David W.G. Langerhuizen, Stein J. Janssen, Wouter H. Mallee, Michel P.J. Van Den Bekerom, David Ring, Gino M.M.J. Kerkhoffs, Ruurd L. Jaarsma, Job N. Doornberg

Research output: Contribution to journal › Review article › Academic › peer-review

92 Citations (Scopus)

Abstract

BackgroundArtificial-intelligence algorithms derive rules and patterns from large amounts of data to calculate the probabilities of various outcomes using new sets of similar data. In medicine, artificial intelligence (AI) has been applied primarily to image-recognition diagnostic tasks and evaluating the probabilities of particular outcomes after treatment. However, the performance and limitations of AI in the automated detection and classification of fractures has not been examined comprehensively.Question/purposesIn this systematic review, we asked (1) What is the proportion of correctly detected or classified fractures and the area under the receiving operating characteristic (AUC) curve of AI fracture detection and classification models? (2) What is the performance of AI in this setting compared with the performance of human examiners?MethodsThe PubMed, Embase, and Cochrane databases were systematically searched from the start of each respective database until September 6, 2018, using terms related to "fracture", "artificial intelligence", and "detection, prediction, or evaluation." Of 1221 identified studies, we retained 10 studies: eight studies involved fracture detection (ankle, hand, hip, spine, wrist, and ulna), one addressed fracture classification (diaphyseal femur), and one addressed both fracture detection and classification (proximal humerus). We registered the review before data collection (PROSPERO: CRD42018110167) and used the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA). We reported the range of the accuracy and AUC for the performance of the predicted fracture detection and/or classification task. An AUC of 1.0 would indicate perfect prediction, whereas 0.5 would indicate a prediction is no better than a flip-of-a-coin. We conducted quality assessment using a seven-item checklist based on a modified methodologic index for nonrandomized studies instrument (MINORS).ResultsFor fracture detection, the AUC in five studies reflected near perfect prediction (range, 0.95-1.0), and the accuracy in seven studies ranged from 83% to 98%. For fracture classification, the AUC was 0.94 in one study, and the accuracy in two studies ranged from 77% to 90%. In two studies AI outperformed human examiners for detecting and classifying hip and proximal humerus fractures, and one study showed equivalent performance for detecting wrist, hand and ankle fractures.ConclusionsPreliminary experience with fracture detection and classification using AI shows promising performance. AI may enhance processing and communicating probabilistic tasks in medicine, including orthopaedic surgery. At present, inadequate reference standard assignments to train and test AI is the biggest hurdle before integration into clinical workflow. The next step will be to apply AI to more challenging diagnostic and therapeutic scenarios when there is absence of certitude. Future studies should also seek to address legal regulation and better determine feasibility of implementation in clinical practice.Level of EvidenceLevel II, diagnostic study.

Original language	English
Pages (from-to)	2482-2491
Number of pages	10
Journal	Clinical Orthopaedics and Related Research
Volume	477
Issue number	11
DOIs	https://doi.org/10.1097/CORR.0000000000000848
Publication status	Published - 1 Nov 2019

Access to Document

https://doi.org/10.1097/CORR.0000000000000848

Cite this

Langerhuizen, D. W. G., Janssen, S. J., Mallee, W. H., Van Den Bekerom, M. P. J., Ring, D., Kerkhoffs, G. M. M. J., Jaarsma, R. L., & Doornberg, J. N. (2019). What Are the Applications and Limitations of Artificial Intelligence for Fracture Detection and Classification in Orthopaedic Trauma Imaging? A Systematic Review: A Systematic Review. Clinical Orthopaedics and Related Research, 477(11), 2482-2491. https://doi.org/10.1097/CORR.0000000000000848

@article{d47b9a74a59c4fd6a0395692e28b2b2b,

title = "What Are the Applications and Limitations of Artificial Intelligence for Fracture Detection and Classification in Orthopaedic Trauma Imaging? A Systematic Review: A Systematic Review",

abstract = "BackgroundArtificial-intelligence algorithms derive rules and patterns from large amounts of data to calculate the probabilities of various outcomes using new sets of similar data. In medicine, artificial intelligence (AI) has been applied primarily to image-recognition diagnostic tasks and evaluating the probabilities of particular outcomes after treatment. However, the performance and limitations of AI in the automated detection and classification of fractures has not been examined comprehensively.Question/purposesIn this systematic review, we asked (1) What is the proportion of correctly detected or classified fractures and the area under the receiving operating characteristic (AUC) curve of AI fracture detection and classification models? (2) What is the performance of AI in this setting compared with the performance of human examiners?MethodsThe PubMed, Embase, and Cochrane databases were systematically searched from the start of each respective database until September 6, 2018, using terms related to {"}fracture{"}, {"}artificial intelligence{"}, and {"}detection, prediction, or evaluation.{"} Of 1221 identified studies, we retained 10 studies: eight studies involved fracture detection (ankle, hand, hip, spine, wrist, and ulna), one addressed fracture classification (diaphyseal femur), and one addressed both fracture detection and classification (proximal humerus). We registered the review before data collection (PROSPERO: CRD42018110167) and used the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA). We reported the range of the accuracy and AUC for the performance of the predicted fracture detection and/or classification task. An AUC of 1.0 would indicate perfect prediction, whereas 0.5 would indicate a prediction is no better than a flip-of-a-coin. We conducted quality assessment using a seven-item checklist based on a modified methodologic index for nonrandomized studies instrument (MINORS).ResultsFor fracture detection, the AUC in five studies reflected near perfect prediction (range, 0.95-1.0), and the accuracy in seven studies ranged from 83% to 98%. For fracture classification, the AUC was 0.94 in one study, and the accuracy in two studies ranged from 77% to 90%. In two studies AI outperformed human examiners for detecting and classifying hip and proximal humerus fractures, and one study showed equivalent performance for detecting wrist, hand and ankle fractures.ConclusionsPreliminary experience with fracture detection and classification using AI shows promising performance. AI may enhance processing and communicating probabilistic tasks in medicine, including orthopaedic surgery. At present, inadequate reference standard assignments to train and test AI is the biggest hurdle before integration into clinical workflow. The next step will be to apply AI to more challenging diagnostic and therapeutic scenarios when there is absence of certitude. Future studies should also seek to address legal regulation and better determine feasibility of implementation in clinical practice.Level of EvidenceLevel II, diagnostic study.",

author = "Langerhuizen, {David W.G.} and Janssen, {Stein J.} and Mallee, {Wouter H.} and {Van Den Bekerom}, {Michel P.J.} and David Ring and Kerkhoffs, {Gino M.M.J.} and Jaarsma, {Ruurd L.} and Doornberg, {Job N.}",

year = "2019",

month = nov,

day = "1",

doi = "https://doi.org/10.1097/CORR.0000000000000848",

language = "English",

volume = "477",

pages = "2482--2491",

journal = "Clinical Orthopaedics and Related Research",

issn = "0009-921X",

publisher = "Springer New York",

number = "11",

}

Langerhuizen, DWG, Janssen, SJ, Mallee, WH, Van Den Bekerom, MPJ, Ring, D, Kerkhoffs, GMMJ, Jaarsma, RL & Doornberg, JN 2019, 'What Are the Applications and Limitations of Artificial Intelligence for Fracture Detection and Classification in Orthopaedic Trauma Imaging? A Systematic Review: A Systematic Review', Clinical Orthopaedics and Related Research, vol. 477, no. 11, pp. 2482-2491. https://doi.org/10.1097/CORR.0000000000000848

What Are the Applications and Limitations of Artificial Intelligence for Fracture Detection and Classification in Orthopaedic Trauma Imaging? A Systematic Review: A Systematic Review. / Langerhuizen, David W.G.; Janssen, Stein J.; Mallee, Wouter H. et al.
In: Clinical Orthopaedics and Related Research, Vol. 477, No. 11, 01.11.2019, p. 2482-2491.

Research output: Contribution to journal › Review article › Academic › peer-review

TY - JOUR

T1 - What Are the Applications and Limitations of Artificial Intelligence for Fracture Detection and Classification in Orthopaedic Trauma Imaging? A Systematic Review

T2 - A Systematic Review

AU - Langerhuizen, David W.G.

AU - Janssen, Stein J.

AU - Mallee, Wouter H.

AU - Van Den Bekerom, Michel P.J.

AU - Ring, David

AU - Kerkhoffs, Gino M.M.J.

AU - Jaarsma, Ruurd L.

AU - Doornberg, Job N.

PY - 2019/11/1

Y1 - 2019/11/1

N2 - BackgroundArtificial-intelligence algorithms derive rules and patterns from large amounts of data to calculate the probabilities of various outcomes using new sets of similar data. In medicine, artificial intelligence (AI) has been applied primarily to image-recognition diagnostic tasks and evaluating the probabilities of particular outcomes after treatment. However, the performance and limitations of AI in the automated detection and classification of fractures has not been examined comprehensively.Question/purposesIn this systematic review, we asked (1) What is the proportion of correctly detected or classified fractures and the area under the receiving operating characteristic (AUC) curve of AI fracture detection and classification models? (2) What is the performance of AI in this setting compared with the performance of human examiners?MethodsThe PubMed, Embase, and Cochrane databases were systematically searched from the start of each respective database until September 6, 2018, using terms related to "fracture", "artificial intelligence", and "detection, prediction, or evaluation." Of 1221 identified studies, we retained 10 studies: eight studies involved fracture detection (ankle, hand, hip, spine, wrist, and ulna), one addressed fracture classification (diaphyseal femur), and one addressed both fracture detection and classification (proximal humerus). We registered the review before data collection (PROSPERO: CRD42018110167) and used the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA). We reported the range of the accuracy and AUC for the performance of the predicted fracture detection and/or classification task. An AUC of 1.0 would indicate perfect prediction, whereas 0.5 would indicate a prediction is no better than a flip-of-a-coin. We conducted quality assessment using a seven-item checklist based on a modified methodologic index for nonrandomized studies instrument (MINORS).ResultsFor fracture detection, the AUC in five studies reflected near perfect prediction (range, 0.95-1.0), and the accuracy in seven studies ranged from 83% to 98%. For fracture classification, the AUC was 0.94 in one study, and the accuracy in two studies ranged from 77% to 90%. In two studies AI outperformed human examiners for detecting and classifying hip and proximal humerus fractures, and one study showed equivalent performance for detecting wrist, hand and ankle fractures.ConclusionsPreliminary experience with fracture detection and classification using AI shows promising performance. AI may enhance processing and communicating probabilistic tasks in medicine, including orthopaedic surgery. At present, inadequate reference standard assignments to train and test AI is the biggest hurdle before integration into clinical workflow. The next step will be to apply AI to more challenging diagnostic and therapeutic scenarios when there is absence of certitude. Future studies should also seek to address legal regulation and better determine feasibility of implementation in clinical practice.Level of EvidenceLevel II, diagnostic study.

AB - BackgroundArtificial-intelligence algorithms derive rules and patterns from large amounts of data to calculate the probabilities of various outcomes using new sets of similar data. In medicine, artificial intelligence (AI) has been applied primarily to image-recognition diagnostic tasks and evaluating the probabilities of particular outcomes after treatment. However, the performance and limitations of AI in the automated detection and classification of fractures has not been examined comprehensively.Question/purposesIn this systematic review, we asked (1) What is the proportion of correctly detected or classified fractures and the area under the receiving operating characteristic (AUC) curve of AI fracture detection and classification models? (2) What is the performance of AI in this setting compared with the performance of human examiners?MethodsThe PubMed, Embase, and Cochrane databases were systematically searched from the start of each respective database until September 6, 2018, using terms related to "fracture", "artificial intelligence", and "detection, prediction, or evaluation." Of 1221 identified studies, we retained 10 studies: eight studies involved fracture detection (ankle, hand, hip, spine, wrist, and ulna), one addressed fracture classification (diaphyseal femur), and one addressed both fracture detection and classification (proximal humerus). We registered the review before data collection (PROSPERO: CRD42018110167) and used the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA). We reported the range of the accuracy and AUC for the performance of the predicted fracture detection and/or classification task. An AUC of 1.0 would indicate perfect prediction, whereas 0.5 would indicate a prediction is no better than a flip-of-a-coin. We conducted quality assessment using a seven-item checklist based on a modified methodologic index for nonrandomized studies instrument (MINORS).ResultsFor fracture detection, the AUC in five studies reflected near perfect prediction (range, 0.95-1.0), and the accuracy in seven studies ranged from 83% to 98%. For fracture classification, the AUC was 0.94 in one study, and the accuracy in two studies ranged from 77% to 90%. In two studies AI outperformed human examiners for detecting and classifying hip and proximal humerus fractures, and one study showed equivalent performance for detecting wrist, hand and ankle fractures.ConclusionsPreliminary experience with fracture detection and classification using AI shows promising performance. AI may enhance processing and communicating probabilistic tasks in medicine, including orthopaedic surgery. At present, inadequate reference standard assignments to train and test AI is the biggest hurdle before integration into clinical workflow. The next step will be to apply AI to more challenging diagnostic and therapeutic scenarios when there is absence of certitude. Future studies should also seek to address legal regulation and better determine feasibility of implementation in clinical practice.Level of EvidenceLevel II, diagnostic study.

UR - http://www.scopus.com/inward/record.url?scp=85072351512&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85072351512&partnerID=8YFLogxK

UR - https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85072351512&origin=inward

UR - https://www.ncbi.nlm.nih.gov/pubmed/31283727

U2 - https://doi.org/10.1097/CORR.0000000000000848

DO - https://doi.org/10.1097/CORR.0000000000000848

M3 - Review article

C2 - 31283727

SN - 0009-921X

VL - 477

SP - 2482

EP - 2491

JO - Clinical Orthopaedics and Related Research

JF - Clinical Orthopaedics and Related Research

IS - 11

ER -

Langerhuizen DWG, Janssen SJ, Mallee WH, Van Den Bekerom MPJ, Ring D, Kerkhoffs GMMJ et al. What Are the Applications and Limitations of Artificial Intelligence for Fracture Detection and Classification in Orthopaedic Trauma Imaging? A Systematic Review: A Systematic Review. Clinical Orthopaedics and Related Research. 2019 Nov 1;477(11):2482-2491. doi: https://doi.org/10.1097/CORR.0000000000000848

What Are the Applications and Limitations of Artificial Intelligence for Fracture Detection and Classification in Orthopaedic Trauma Imaging? A Systematic Review: A Systematic Review

Abstract

Access to Document

Other files and links

Cite this