Standardized Assessment of Automatic Segmentation of White Matter Hyperintensities and Results of the WMH Segmentation Challenge

Hugo J Kuijf; J Matthijs Biesbroek; Jeroen de Bresser; Rutger Heinen; Simon Andermatt; Mariana Bento; Matt Berseth; Mikhail Belyaev; M Jorge Cardoso; Adria Casamitjana; D Louis Collins; Mahsa Dadar; Achilleas Georgiou; Mohsen Ghafoorian; Dakai Jin; April Khademi; Jesse Knight; Hongwei Li; Xavier Llado; Miguel Luna; Qaiser Mahmood; Richard McKinley; Alireza Mehrtash; Sebastien Ourselin; Bo-Yong Park; Hyunjin Park; Sang Hyun Park; Simon Pezold; Elodie Puybareau; Leticia Rittner; Carole H Sudre; Sergi Valverde; Veronica Vilaplana; Roland Wiest; Yongchao Xu; Ziyue Xu; Guodong Zeng; Jianguo Zhang; Guoyan Zheng; Christopher Chen; Wiesje van der Flier; Frederik Barkhof; Max A Viergever; Geert Jan Biessels

doi:https://doi.org/10.1109/TMI.2019.2905770

Standardized Assessment of Automatic Segmentation of White Matter Hyperintensities and Results of the WMH Segmentation Challenge

Hugo J Kuijf, J Matthijs Biesbroek, Jeroen de Bresser, Rutger Heinen, Simon Andermatt, Mariana Bento, Matt Berseth, Mikhail Belyaev, M Jorge Cardoso, Adria Casamitjana, D Louis Collins, Mahsa Dadar, Achilleas Georgiou, Mohsen Ghafoorian, Dakai Jin, April Khademi, Jesse Knight, Hongwei Li, Xavier Llado, Miguel LunaQaiser Mahmood, Richard McKinley, Alireza Mehrtash, Sebastien Ourselin, Bo-Yong Park, Hyunjin Park, Sang Hyun Park, Simon Pezold, Elodie Puybareau, Leticia Rittner, Carole H Sudre, Sergi Valverde, Veronica Vilaplana, Roland Wiest, Yongchao Xu, Ziyue Xu, Guodong Zeng, Jianguo Zhang, Guoyan Zheng, Christopher Chen, Wiesje van der Flier, Frederik Barkhof, Max A Viergever, Geert Jan Biessels

Research output: Contribution to journal › Article › Academic › peer-review

161 Citations (Scopus)

Abstract

Quantification of cerebral white matter hyperintensities (WMH) of presumed vascular origin is of key importance in many neurological research studies. Currently, measurements are often still obtained from manual segmentations on brain MR images, which is a laborious procedure. The automatic WMH segmentation methods exist, but a standardized comparison of the performance of such methods is lacking. We organized a scientific challenge, in which developers could evaluate their methods on a standardized multi-center/-scanner image dataset, giving an objective comparison: the WMH Segmentation Challenge. Sixty T1 + FLAIR images from three MR scanners were released with the manual WMH segmentations for training. A test set of 110 images from five MR scanners was used for evaluation. The segmentation methods had to be containerized and submitted to the challenge organizers. Five evaluation metrics were used to rank the methods: 1) Dice similarity coefficient; 2) modified Hausdorff distance (95th percentile); 3) absolute log-transformed volume difference; 4) sensitivity for detecting individual lesions; and 5) F1-score for individual lesions. In addition, the methods were ranked on their inter-scanner robustness; 20 participants submitted their methods for evaluation. This paper provides a detailed analysis of the results. In brief, there is a cluster of four methods that rank significantly better than the other methods, with one clear winner. The inter-scanner robustness ranking shows that not all the methods generalize to unseen scanners. The challenge remains open for future submissions and provides a public platform for method evaluation.

Original language	English
Pages (from-to)	2556-2568
Number of pages	13
Journal	IEEE Transactions on Medical Imaging
Volume	38
Issue number	11
DOIs	https://doi.org/10.1109/TMI.2019.2905770
Publication status	Published - 1 Nov 2019

Access to Document

https://doi.org/10.1109/TMI.2019.2905770

Cite this

Kuijf, H. J., Biesbroek, J. M., de Bresser, J., Heinen, R., Andermatt, S., Bento, M., Berseth, M., Belyaev, M., Cardoso, M. J., Casamitjana, A., Collins, D. L., Dadar, M., Georgiou, A., Ghafoorian, M., Jin, D., Khademi, A., Knight, J., Li, H., Llado, X., ... Biessels, G. J. (2019). Standardized Assessment of Automatic Segmentation of White Matter Hyperintensities and Results of the WMH Segmentation Challenge. IEEE Transactions on Medical Imaging, 38(11), 2556-2568. https://doi.org/10.1109/TMI.2019.2905770

@article{afef72bebca94e9bad59e73cf36d9e3f,

title = "Standardized Assessment of Automatic Segmentation of White Matter Hyperintensities and Results of the WMH Segmentation Challenge",

abstract = "Quantification of cerebral white matter hyperintensities (WMH) of presumed vascular origin is of key importance in many neurological research studies. Currently, measurements are often still obtained from manual segmentations on brain MR images, which is a laborious procedure. The automatic WMH segmentation methods exist, but a standardized comparison of the performance of such methods is lacking. We organized a scientific challenge, in which developers could evaluate their methods on a standardized multi-center/-scanner image dataset, giving an objective comparison: the WMH Segmentation Challenge. Sixty T1 + FLAIR images from three MR scanners were released with the manual WMH segmentations for training. A test set of 110 images from five MR scanners was used for evaluation. The segmentation methods had to be containerized and submitted to the challenge organizers. Five evaluation metrics were used to rank the methods: 1) Dice similarity coefficient; 2) modified Hausdorff distance (95th percentile); 3) absolute log-transformed volume difference; 4) sensitivity for detecting individual lesions; and 5) F1-score for individual lesions. In addition, the methods were ranked on their inter-scanner robustness; 20 participants submitted their methods for evaluation. This paper provides a detailed analysis of the results. In brief, there is a cluster of four methods that rank significantly better than the other methods, with one clear winner. The inter-scanner robustness ranking shows that not all the methods generalize to unseen scanners. The challenge remains open for future submissions and provides a public platform for method evaluation.",

author = "Kuijf, {Hugo J} and Biesbroek, {J Matthijs} and {de Bresser}, Jeroen and Rutger Heinen and Simon Andermatt and Mariana Bento and Matt Berseth and Mikhail Belyaev and Cardoso, {M Jorge} and Adria Casamitjana and Collins, {D Louis} and Mahsa Dadar and Achilleas Georgiou and Mohsen Ghafoorian and Dakai Jin and April Khademi and Jesse Knight and Hongwei Li and Xavier Llado and Miguel Luna and Qaiser Mahmood and Richard McKinley and Alireza Mehrtash and Sebastien Ourselin and Bo-Yong Park and Hyunjin Park and Park, {Sang Hyun} and Simon Pezold and Elodie Puybareau and Leticia Rittner and Sudre, {Carole H} and Sergi Valverde and Veronica Vilaplana and Roland Wiest and Yongchao Xu and Ziyue Xu and Guodong Zeng and Jianguo Zhang and Guoyan Zheng and Christopher Chen and {van der Flier}, Wiesje and Frederik Barkhof and Viergever, {Max A} and Biessels, {Geert Jan}",

year = "2019",

month = nov,

day = "1",

doi = "https://doi.org/10.1109/TMI.2019.2905770",

language = "English",

volume = "38",

pages = "2556--2568",

journal = "IEEE Transactions on Medical Imaging",

issn = "0278-0062",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "11",

}

Kuijf, HJ, Biesbroek, JM, de Bresser, J, Heinen, R, Andermatt, S, Bento, M, Berseth, M, Belyaev, M, Cardoso, MJ, Casamitjana, A, Collins, DL, Dadar, M, Georgiou, A, Ghafoorian, M, Jin, D, Khademi, A, Knight, J, Li, H, Llado, X, Luna, M, Mahmood, Q, McKinley, R, Mehrtash, A, Ourselin, S, Park, B-Y, Park, H, Park, SH, Pezold, S, Puybareau, E, Rittner, L, Sudre, CH, Valverde, S, Vilaplana, V, Wiest, R, Xu, Y, Xu, Z, Zeng, G, Zhang, J, Zheng, G, Chen, C, van der Flier, W , Barkhof, F, Viergever, MA & Biessels, GJ 2019, 'Standardized Assessment of Automatic Segmentation of White Matter Hyperintensities and Results of the WMH Segmentation Challenge', IEEE Transactions on Medical Imaging, vol. 38, no. 11, pp. 2556-2568. https://doi.org/10.1109/TMI.2019.2905770

TY - JOUR

T1 - Standardized Assessment of Automatic Segmentation of White Matter Hyperintensities and Results of the WMH Segmentation Challenge

AU - Kuijf, Hugo J

AU - Biesbroek, J Matthijs

AU - de Bresser, Jeroen

AU - Heinen, Rutger

AU - Andermatt, Simon

AU - Bento, Mariana

AU - Berseth, Matt

AU - Belyaev, Mikhail

AU - Cardoso, M Jorge

AU - Casamitjana, Adria

AU - Collins, D Louis

AU - Dadar, Mahsa

AU - Georgiou, Achilleas

AU - Ghafoorian, Mohsen

AU - Jin, Dakai

AU - Khademi, April

AU - Knight, Jesse

AU - Li, Hongwei

AU - Llado, Xavier

AU - Luna, Miguel

AU - Mahmood, Qaiser

AU - McKinley, Richard

AU - Mehrtash, Alireza

AU - Ourselin, Sebastien

AU - Park, Bo-Yong

AU - Park, Hyunjin

AU - Park, Sang Hyun

AU - Pezold, Simon

AU - Puybareau, Elodie

AU - Rittner, Leticia

AU - Sudre, Carole H

AU - Valverde, Sergi

AU - Vilaplana, Veronica

AU - Wiest, Roland

AU - Xu, Yongchao

AU - Xu, Ziyue

AU - Zeng, Guodong

AU - Zhang, Jianguo

AU - Zheng, Guoyan

AU - Chen, Christopher

AU - van der Flier, Wiesje

AU - Barkhof, Frederik

AU - Viergever, Max A

AU - Biessels, Geert Jan

PY - 2019/11/1

Y1 - 2019/11/1

N2 - Quantification of cerebral white matter hyperintensities (WMH) of presumed vascular origin is of key importance in many neurological research studies. Currently, measurements are often still obtained from manual segmentations on brain MR images, which is a laborious procedure. The automatic WMH segmentation methods exist, but a standardized comparison of the performance of such methods is lacking. We organized a scientific challenge, in which developers could evaluate their methods on a standardized multi-center/-scanner image dataset, giving an objective comparison: the WMH Segmentation Challenge. Sixty T1 + FLAIR images from three MR scanners were released with the manual WMH segmentations for training. A test set of 110 images from five MR scanners was used for evaluation. The segmentation methods had to be containerized and submitted to the challenge organizers. Five evaluation metrics were used to rank the methods: 1) Dice similarity coefficient; 2) modified Hausdorff distance (95th percentile); 3) absolute log-transformed volume difference; 4) sensitivity for detecting individual lesions; and 5) F1-score for individual lesions. In addition, the methods were ranked on their inter-scanner robustness; 20 participants submitted their methods for evaluation. This paper provides a detailed analysis of the results. In brief, there is a cluster of four methods that rank significantly better than the other methods, with one clear winner. The inter-scanner robustness ranking shows that not all the methods generalize to unseen scanners. The challenge remains open for future submissions and provides a public platform for method evaluation.

AB - Quantification of cerebral white matter hyperintensities (WMH) of presumed vascular origin is of key importance in many neurological research studies. Currently, measurements are often still obtained from manual segmentations on brain MR images, which is a laborious procedure. The automatic WMH segmentation methods exist, but a standardized comparison of the performance of such methods is lacking. We organized a scientific challenge, in which developers could evaluate their methods on a standardized multi-center/-scanner image dataset, giving an objective comparison: the WMH Segmentation Challenge. Sixty T1 + FLAIR images from three MR scanners were released with the manual WMH segmentations for training. A test set of 110 images from five MR scanners was used for evaluation. The segmentation methods had to be containerized and submitted to the challenge organizers. Five evaluation metrics were used to rank the methods: 1) Dice similarity coefficient; 2) modified Hausdorff distance (95th percentile); 3) absolute log-transformed volume difference; 4) sensitivity for detecting individual lesions; and 5) F1-score for individual lesions. In addition, the methods were ranked on their inter-scanner robustness; 20 participants submitted their methods for evaluation. This paper provides a detailed analysis of the results. In brief, there is a cluster of four methods that rank significantly better than the other methods, with one clear winner. The inter-scanner robustness ranking shows that not all the methods generalize to unseen scanners. The challenge remains open for future submissions and provides a public platform for method evaluation.

UR - http://www.scopus.com/inward/record.url?scp=85074378885&partnerID=8YFLogxK

U2 - https://doi.org/10.1109/TMI.2019.2905770

DO - https://doi.org/10.1109/TMI.2019.2905770

M3 - Article

C2 - 30908194

SN - 0278-0062

VL - 38

SP - 2556

EP - 2568

JO - IEEE Transactions on Medical Imaging

JF - IEEE Transactions on Medical Imaging

IS - 11

ER -

Standardized Assessment of Automatic Segmentation of White Matter Hyperintensities and Results of the WMH Segmentation Challenge

Abstract

Access to Document

Other files and links

Cite this