Investigating the potential of deep learning for patient-specific quality assurance of salivary gland contours using EORTC-1219-DAHANCA-29 clinical trial data

Hanne Nijhuis; Ward van Rooij; Vincent Gregoire; Jens Overgaard; Berend J. Slotman; Wilko F. Verbakel; Max Dahele

doi:https://doi.org/10.1080/0284186X.2020.1863463

Investigating the potential of deep learning for patient-specific quality assurance of salivary gland contours using EORTC-1219-DAHANCA-29 clinical trial data

Hanne Nijhuis, Ward van Rooij, Vincent Gregoire, Jens Overgaard, Berend J. Slotman, Wilko F. Verbakel, Max Dahele

Research output: Contribution to journal › Article › Academic › peer-review

7 Citations (Scopus)

Abstract

Introduction: Manual quality assurance (QA) of radiotherapy contours for clinical trials is time and labor intensive and subject to inter-observer variability. Therefore, we investigated whether deep-learning (DL) can provide an automated solution to salivary gland contour QA. Material and methods: DL-models were trained to generate contours for parotid (PG) and submandibular glands (SMG). Sørensen–Dice coefficient (SDC) and Hausdorff distance (HD) were used to assess agreement between DL and clinical contours and thresholds were defined to highlight cases as potentially sub-optimal. 3 types of deliberate errors (expansion, contraction and displacement) were gradually applied to a test set, to confirm that SDC and HD were suitable QA metrics. DL-based QA was performed on 62 patients from the EORTC-1219-DAHANCA-29 trial. All highlighted contours were visually inspected. Results: Increasing the magnitude of all 3 types of errors resulted in progressively severe deterioration/increase in average SDC/HD. 19/124 clinical PG contours were highlighted as potentially sub-optimal, of which 5 (26%) were actually deemed clinically sub-optimal. 2/19 non-highlighted contours were false negatives (11%). 15/69 clinical SMG contours were highlighted, with 7 (47%) deemed clinically sub-optimal and 2/15 non-highlighted contours were false negatives (13%). For most incorrectly highlighted contours causes for low agreement could be identified. Conclusion: Automated DL-based contour QA is feasible but some visual inspection remains essential. The substantial number of false positives were caused by sub-optimal performance of the DL-model. Improvements to the model will increase the extent of automation and reliability, facilitating the adoption of DL-based contour QA in clinical trials and routine practice.

Original language	English
Pages (from-to)	575-581
Number of pages	7
Journal	Acta Oncologica
Volume	60
Issue number	5
DOIs	https://doi.org/10.1080/0284186X.2020.1863463
Publication status	Published - 2021

Keywords

Clinical trial
Deep learning
Quality assurance
Radiotherapy
Salivary glands
Segmentation

Access to Document

https://doi.org/10.1080/0284186X.2020.1863463

Cite this

@article{de03a8744d9f44cb880db2ea4e01a8c6,

title = "Investigating the potential of deep learning for patient-specific quality assurance of salivary gland contours using EORTC-1219-DAHANCA-29 clinical trial data",

abstract = "Introduction: Manual quality assurance (QA) of radiotherapy contours for clinical trials is time and labor intensive and subject to inter-observer variability. Therefore, we investigated whether deep-learning (DL) can provide an automated solution to salivary gland contour QA. Material and methods: DL-models were trained to generate contours for parotid (PG) and submandibular glands (SMG). S{\o}rensen–Dice coefficient (SDC) and Hausdorff distance (HD) were used to assess agreement between DL and clinical contours and thresholds were defined to highlight cases as potentially sub-optimal. 3 types of deliberate errors (expansion, contraction and displacement) were gradually applied to a test set, to confirm that SDC and HD were suitable QA metrics. DL-based QA was performed on 62 patients from the EORTC-1219-DAHANCA-29 trial. All highlighted contours were visually inspected. Results: Increasing the magnitude of all 3 types of errors resulted in progressively severe deterioration/increase in average SDC/HD. 19/124 clinical PG contours were highlighted as potentially sub-optimal, of which 5 (26%) were actually deemed clinically sub-optimal. 2/19 non-highlighted contours were false negatives (11%). 15/69 clinical SMG contours were highlighted, with 7 (47%) deemed clinically sub-optimal and 2/15 non-highlighted contours were false negatives (13%). For most incorrectly highlighted contours causes for low agreement could be identified. Conclusion: Automated DL-based contour QA is feasible but some visual inspection remains essential. The substantial number of false positives were caused by sub-optimal performance of the DL-model. Improvements to the model will increase the extent of automation and reliability, facilitating the adoption of DL-based contour QA in clinical trials and routine practice.",

keywords = "Clinical trial, Deep learning, Quality assurance, Radiotherapy, Salivary glands, Segmentation",

author = "Hanne Nijhuis and {van Rooij}, Ward and Vincent Gregoire and Jens Overgaard and Slotman, {Berend J.} and Verbakel, {Wilko F.} and Max Dahele",

note = "Funding Information: This work was supported by Varian Medical Systems, Palo Alto, CA, USA. We thank Varian Medical Systems for providing a research grant for this work. Publisher Copyright: {\textcopyright} 2021 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group. Copyright: Copyright 2021 Elsevier B.V., All rights reserved.",

year = "2021",

doi = "https://doi.org/10.1080/0284186X.2020.1863463",

language = "English",

volume = "60",

pages = "575--581",

journal = "Acta Oncologica",

issn = "0284-186X",

publisher = "Informa Healthcare",

number = "5",

}

TY - JOUR

T1 - Investigating the potential of deep learning for patient-specific quality assurance of salivary gland contours using EORTC-1219-DAHANCA-29 clinical trial data

AU - Nijhuis, Hanne

AU - van Rooij, Ward

AU - Gregoire, Vincent

AU - Overgaard, Jens

AU - Slotman, Berend J.

AU - Verbakel, Wilko F.

AU - Dahele, Max

N1 - Funding Information: This work was supported by Varian Medical Systems, Palo Alto, CA, USA. We thank Varian Medical Systems for providing a research grant for this work. Publisher Copyright: © 2021 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group. Copyright: Copyright 2021 Elsevier B.V., All rights reserved.

PY - 2021

Y1 - 2021

N2 - Introduction: Manual quality assurance (QA) of radiotherapy contours for clinical trials is time and labor intensive and subject to inter-observer variability. Therefore, we investigated whether deep-learning (DL) can provide an automated solution to salivary gland contour QA. Material and methods: DL-models were trained to generate contours for parotid (PG) and submandibular glands (SMG). Sørensen–Dice coefficient (SDC) and Hausdorff distance (HD) were used to assess agreement between DL and clinical contours and thresholds were defined to highlight cases as potentially sub-optimal. 3 types of deliberate errors (expansion, contraction and displacement) were gradually applied to a test set, to confirm that SDC and HD were suitable QA metrics. DL-based QA was performed on 62 patients from the EORTC-1219-DAHANCA-29 trial. All highlighted contours were visually inspected. Results: Increasing the magnitude of all 3 types of errors resulted in progressively severe deterioration/increase in average SDC/HD. 19/124 clinical PG contours were highlighted as potentially sub-optimal, of which 5 (26%) were actually deemed clinically sub-optimal. 2/19 non-highlighted contours were false negatives (11%). 15/69 clinical SMG contours were highlighted, with 7 (47%) deemed clinically sub-optimal and 2/15 non-highlighted contours were false negatives (13%). For most incorrectly highlighted contours causes for low agreement could be identified. Conclusion: Automated DL-based contour QA is feasible but some visual inspection remains essential. The substantial number of false positives were caused by sub-optimal performance of the DL-model. Improvements to the model will increase the extent of automation and reliability, facilitating the adoption of DL-based contour QA in clinical trials and routine practice.

AB - Introduction: Manual quality assurance (QA) of radiotherapy contours for clinical trials is time and labor intensive and subject to inter-observer variability. Therefore, we investigated whether deep-learning (DL) can provide an automated solution to salivary gland contour QA. Material and methods: DL-models were trained to generate contours for parotid (PG) and submandibular glands (SMG). Sørensen–Dice coefficient (SDC) and Hausdorff distance (HD) were used to assess agreement between DL and clinical contours and thresholds were defined to highlight cases as potentially sub-optimal. 3 types of deliberate errors (expansion, contraction and displacement) were gradually applied to a test set, to confirm that SDC and HD were suitable QA metrics. DL-based QA was performed on 62 patients from the EORTC-1219-DAHANCA-29 trial. All highlighted contours were visually inspected. Results: Increasing the magnitude of all 3 types of errors resulted in progressively severe deterioration/increase in average SDC/HD. 19/124 clinical PG contours were highlighted as potentially sub-optimal, of which 5 (26%) were actually deemed clinically sub-optimal. 2/19 non-highlighted contours were false negatives (11%). 15/69 clinical SMG contours were highlighted, with 7 (47%) deemed clinically sub-optimal and 2/15 non-highlighted contours were false negatives (13%). For most incorrectly highlighted contours causes for low agreement could be identified. Conclusion: Automated DL-based contour QA is feasible but some visual inspection remains essential. The substantial number of false positives were caused by sub-optimal performance of the DL-model. Improvements to the model will increase the extent of automation and reliability, facilitating the adoption of DL-based contour QA in clinical trials and routine practice.

KW - Clinical trial

KW - Deep learning

KW - Quality assurance

KW - Radiotherapy

KW - Salivary glands

KW - Segmentation

UR - http://www.scopus.com/inward/record.url?scp=85099364901&partnerID=8YFLogxK

U2 - https://doi.org/10.1080/0284186X.2020.1863463

DO - https://doi.org/10.1080/0284186X.2020.1863463

M3 - Article

C2 - 33427555

SN - 0284-186X

VL - 60

SP - 575

EP - 581

JO - Acta Oncologica

JF - Acta Oncologica

IS - 5

ER -

Investigating the potential of deep learning for patient-specific quality assurance of salivary gland contours using EORTC-1219-DAHANCA-29 clinical trial data

Abstract

Keywords

Access to Document

Other files and links

Cite this