TY - JOUR
T1 - Reliability of brain atrophy measurements in multiple sclerosis using MRI
T2 - an assessment of six freely available software packages for cross-sectional analyses
AU - van Nederpelt, David R.
AU - Amiri, Houshang
AU - Brouwer, Iman
AU - Noteboom, Samantha
AU - Mokkink, Lidwine B.
AU - Barkhof, Frederik
AU - Vrenken, Hugo
AU - Kuijer, Joost P. A.
N1 - Funding Information: The authors acknowledge ZonMW and stichting MS Research for their support. Frederik Barkhof acknowledges support by the NIHR Biomedical Research Center at UCLH. This research has been executed within the MS Center Amsterdam, Amsterdam UMC. Funding Information: The authors acknowledge ZonMW and stichting MS Research for their support. Frederik Barkhof acknowledges support by the NIHR Biomedical Research Center at UCLH. This research has been executed within the MS Center Amsterdam, Amsterdam UMC. Funding Information: This research was funded by: ZonMw & Stichting MS Research (446002506), Health Holland (LSHM19053) and Novartis (SP037.15/432282). Publisher Copyright: © 2023, The Author(s).
PY - 2023/10
Y1 - 2023/10
N2 - Purpose: Volume measurement using MRI is important to assess brain atrophy in multiple sclerosis (MS). However, differences between scanners, acquisition protocols, and analysis software introduce unwanted variability of volumes. To quantify theses effects, we compared within-scanner repeatability and between-scanner reproducibility of three different MR scanners for six brain segmentation methods. Methods: Twenty-one people with MS underwent scanning and rescanning on three 3 T MR scanners (GE MR750, Philips Ingenuity, Toshiba Vantage Titan) to obtain 3D T1-weighted images. FreeSurfer, FSL, SAMSEG, FastSurfer, CAT-12, and SynthSeg were used to quantify brain, white matter and (deep) gray matter volumes both from lesion-filled and non-lesion-filled 3D T1-weighted images. We used intra-class correlation coefficient (ICC) to quantify agreement; repeated-measures ANOVA to analyze systematic differences; and variance component analysis to quantify the standard error of measurement (SEM) and smallest detectable change (SDC). Results: For all six software, both between-scanner agreement (ICCs ranging 0.4–1) and within-scanner agreement (ICC range: 0.6–1) were typically good, and good to excellent (ICC > 0.7) for large structures. No clear differences were found between filled and non-filled images. However, gray and white matter volumes did differ systematically between scanners for all software (p < 0.05). Variance component analysis yielded within-scanner SDC ranging from 1.02% (SAMSEG, whole-brain) to 14.55% (FreeSurfer, CSF); and between-scanner SDC ranging from 4.83% (SynthSeg, thalamus) to 29.25% (CAT12, thalamus). Conclusion: Volume measurements of brain, GM and WM showed high repeatability, and high reproducibility despite substantial differences between scanners. Smallest detectable change was high, especially between different scanners, which hampers the clinical implementation of atrophy measurements.
AB - Purpose: Volume measurement using MRI is important to assess brain atrophy in multiple sclerosis (MS). However, differences between scanners, acquisition protocols, and analysis software introduce unwanted variability of volumes. To quantify theses effects, we compared within-scanner repeatability and between-scanner reproducibility of three different MR scanners for six brain segmentation methods. Methods: Twenty-one people with MS underwent scanning and rescanning on three 3 T MR scanners (GE MR750, Philips Ingenuity, Toshiba Vantage Titan) to obtain 3D T1-weighted images. FreeSurfer, FSL, SAMSEG, FastSurfer, CAT-12, and SynthSeg were used to quantify brain, white matter and (deep) gray matter volumes both from lesion-filled and non-lesion-filled 3D T1-weighted images. We used intra-class correlation coefficient (ICC) to quantify agreement; repeated-measures ANOVA to analyze systematic differences; and variance component analysis to quantify the standard error of measurement (SEM) and smallest detectable change (SDC). Results: For all six software, both between-scanner agreement (ICCs ranging 0.4–1) and within-scanner agreement (ICC range: 0.6–1) were typically good, and good to excellent (ICC > 0.7) for large structures. No clear differences were found between filled and non-filled images. However, gray and white matter volumes did differ systematically between scanners for all software (p < 0.05). Variance component analysis yielded within-scanner SDC ranging from 1.02% (SAMSEG, whole-brain) to 14.55% (FreeSurfer, CSF); and between-scanner SDC ranging from 4.83% (SynthSeg, thalamus) to 29.25% (CAT12, thalamus). Conclusion: Volume measurements of brain, GM and WM showed high repeatability, and high reproducibility despite substantial differences between scanners. Smallest detectable change was high, especially between different scanners, which hampers the clinical implementation of atrophy measurements.
KW - Brain volumetry
KW - Multiple sclerosis
KW - Reliability
KW - Segmentation
UR - http://www.scopus.com/inward/record.url?scp=85165179611&partnerID=8YFLogxK
U2 - https://doi.org/10.1007/s00234-023-03189-8
DO - https://doi.org/10.1007/s00234-023-03189-8
M3 - Article
C2 - 37526657
SN - 0028-3940
VL - 65
SP - 1459
EP - 1472
JO - Neuroradiology
JF - Neuroradiology
IS - 10
ER -