Abstract
Background:
Magnetic resonance imaging (MRI) acquisition/processing techniques assess brain volumes to explore neurodegeneration in Alzheimer’s disease (AD).
Objective:
We examined the clinical utility of MSmetrix and investigated if automated MRI volumes could discriminate between groups covering the AD continuum and could be used as a predictor for clinical progression.
Methods:
The Belgian Dementia Council initiated a retrospective, multi-center study and analyzed whole brain (WB), grey matter (GM), white matter (WM), cerebrospinal fluid (CSF), cortical GM (CGM) volumes, and WM hyperintensities (WMH) using MSmetrix in the AD continuum. Baseline (
Results:
The cohort consisted of cognitively healthy controls (HC,
Conclusion:
WB and GM volumes extracted by MSmetrix could be used to define the clinical spectrum of AD accurately and along with CGM, they are able to predict cognitive impairment based on (decline in) MMSE scores. Therefore, MSmetrix can support clinicians in their diagnostic decisions, is able to detect clinical disease progression, and is of help to stratify populations for clinical trials.
INTRODUCTION
In Alzheimer’s disease (AD), the spread of neurodegeneration, and especially tau pathology and synapse loss, is the most important pathological substrate of clinical symptoms [1]. Biomarkers of neurodegeneration, including volumetric analyses of relevant brain regions on magnetic resonance imaging (MRI), correlate better with the degree of cognitive impairment in AD patients as compared to biomarkers of amyloid-β (Aβ) deposition [1–3]. Volumetric brain imaging should include areas in which the neuropathological process of AD is prominent, such as the medial temporal lobe that includes the hippocampus. The atrophy can often be detected at the stage of mild cognitive impairment (MCI) [4–15]. Therefore, hippocampal volume has been proposed as a neuroimaging biomarker for early AD diagnosis in the revised diagnostic criteria of AD [16–20]. Other neuro-anatomical structures, such as the entorhinal cortex [7, 22] and cerebral cortex [23] are also prone to AD pathology. Their morphological characteristics such as volume, shape, and thickness can be used as biomarkers of the extent of neurodegeneration as well [24, 25]. Nevertheless, measuring brain atrophy has some limitations because evolution of disease-related regional atrophy does not necessarily follow the anatomical boundaries of structures. To overcome these limitations, the whole brain (WB) and the (whole) grey matter (GM) volumes have been used as neuroimaging biomarkers for AD diagnosis and evolution [3, 26–33]. Since these anatomical structures are considerably larger than the hippocampus or cortex, the automated quantification of their volumes is, in general, less prone to measurement errors.
To date, different MRI acquisition and processing techniques have been developed to measure volumes of specific brain regions. However, their accuracy is still limited by a moderate sensitivity and a rather low specificity for AD [34]. Manual segmentation approaches, which are time consuming, have been widely used and are considered a standard approach by experts in neuroanatomy [35, 36], though large-scale studies are limited [4, 37]. Semi-automated techniques are less time consuming, however a priori information such as user-defined landmarks is needed, which also limits their usefulness for large clinical studies [38–40]. To date, an increasing number of studies have investigated structural brain changes in AD populations with an automated image-based brain morphometry analysis, as this is less time consuming and does not suffer from large intra- and inter-observer variability compared to manual and semi-automated approaches. This trend relates to both the widespread availability of brain imaging equipment in clinical routine and research, and the concurrent development of image analysis software packages released over the past years [41–46]. MSmetrix, is another fully automated and European Conformity (CE)-labelled and Food and Drug Administration (FDA)-cleared tool, specifically designed to measure atrophy in patients with multiple sclerosis (MS) and has been validated in these patients (we refer to Supplementary Material for more detailed information of the validation process) [47–51]. As this method uses clinical brain MRI scans in contrast to many other tools that apply MRI scans from selected clinical trial/research cohorts, we decided to use the output of MSmetrix, which included WB, (whole) GM, white matter (WM), cerebrospinal fluid (CSF), cortical GM (CGM) volumes, and WM hyperintensities (WMH), in a ‘retrospective Belgian multi-center MRI biomarker study in dementia’ (REMEMBER). We examined whether MSmetrix is an accurate and reproducible segmentation approach to differentiate between healthy controls and AD patients, covering the entire AD continuum, and thus is of clinical utility for the diagnostic work-up and for clinical trials. Therefore, we set up a study to assess the diagnostic value of automated volumetry using MSmetrix in a population comprising the entire AD continuum (AD, MCI, subjective cognitive decline (SCD)), as well as cognitively healthy elderly (primary objective). The secondary objective of this study was to investigate if automated volumetry using MSmetrix is an early diagnostic marker for AD and as a possible predictor for clinical progression.
MATERIALS AND METHODS
Study design
The retrospective Belgian multi-center study cohort consists of participants who underwent a baseline brain MRI scan, in combination with a clinical neurological and neuropsychological evaluation for diagnostic purposes. A neuropsychological evaluation, at least consisting of a screening instrument such as the Mini-Mental State Examination (MMSE) test was sufficient to be included in the study.
A subset of the population underwent a repeat brain MRI scan with or without a neuropsychological evaluation (time interval between baseline and follow-up investigations, for both MRI and neuropsychological evaluations, was at least three months).
The study was approved by the ethics committee of University of Antwerp/Universitair Ziekenhuis Antwerpen (N°16/2/18), Antwerp and by the ethics committees of Algemeen Ziekenhuis Sint-Jan Brugge-Oostende, Brugge (N°1992); Centre Hospitalier Universitaire Brugmann (CHU Brugmann), Brussels (N°2016/84); Centre Hospitalier Universitaire Liège (CHU Liège), Liège (N°2012/274); Cliniques Universitaires de Bruxelles (ULB), Hôpital Erasme, Brussels (N°P2016/187); Cliniques Universitaires Saint-Luc (UCL), Brussels (N°2016/07jui/261); Cliniques St-Pierre Ottignies, Ottignies (N°OM045); Universitair Ziekenhuis Brussel, Brussels (N°2016/183); and Ziekenhuis Netwerk Antwerp (ZNA), Antwerp (N°4730).
Study population
Patients and cognitively healthy controls were selected from existing cohorts in several memory clinics that are members of the Belgian Dementia Council (BeDeCo). SCD, MCI, and dementia due to AD patients were included in this retrospective multi-center study. The group of cognitively healthy elderly was selected among available (research) cohorts, like spouses of patients who visited the memory clinic and community-dwelling volunteers.
Clinical diagnostic criteria
In order to avoid a selection bias, all patients were considered for inclusion, irrespective of the severity of cognitive deterioration. The patients were diagnosed by an experienced clinician in each center. Diagnosis of dementia due to AD was based on NIA-AA criteria [17, 19]. At baseline, MCI due to AD was diagnosed by applying the NIA-AA criteria [16–18, 20]. When sufficient neuropsychological data were available, MCI patients were divided into four subgroups. MCI patients were categorized in single-domain (sd) and multi-domain (md) MCI based on the neurocognitive domains scoring below 1.5 SD on their respective z-scores. Z-scores were used from population-based norms and were age-adjusted. In addition, MCI patients were also categorized as amnestic MCI (aMCI) and non-amnestic MCI (naMCI) based on z-scores below – 1.5 SD per cognitive domain. In case the delayed memory domain was below a z-score of 1.5 SD, a patient was classified as aMCI, whereas naMCI was based on z-scores from one of the other cognitive domains, such as immediate memory, visuospatial/constructional abilities, language or attention. SCD patients were diagnosed by the Jessen’s et al. criteria in the same way as the MCI patients but without an objective cognitive impairment, so all neuropsychological subtests having a z-score above – 1.5 [52]. All control subjects underwent at least a cognitive screening test to rule out cognitive impairment. The control subjects did not meet the Jessen’s et al. criteria of SCD [52].
Exclusion criteria for the total population consisted of brain tumors, large cerebral infarction/bleeding, strategic infarctions, other neurodegenerative diseases, severe head trauma, epilepsy, brain infections, severe depression, unregulated diabetes mellitus, untreated thyroid disorders, or any severe somatic co-morbidity that interferes with study participation.
Imaging
Image acquisition
All MRI scans were obtained from respective Neuroimaging Departments, to which subjects were referred. MRI scans from all scanner types were accepted. The minimal requirements for the MRI protocol included a T1-weighted image with a preferred voxel size of the 3D T1 (1×1×1 mm). A FLAIR image was optional, with the same preferences as the above described T1-weighted image.
Image analyses
The MRI data were processed using the CE-labelled and FDA-cleared software called MSmetrix, to extract WB, GM, WM, CSF (intra- and inter-ventricular CSF), CGM, and WMH volumes (MSmetrix-cross). Thereto, the T1-weighted MRI images were segmented into WM, GM, and CSF using a probabilistic model, including bias field correction. In case a FLAIR image was available, WMH were extracted using an outlier model and lesion filling of the T1-weighted image was performed prior to segmentation into WM, GM, and CSF [50]. Finally, WB was computed as WM and GM, while CGM was extracted from the GM segmentation based on prior knowledge of the shape and appearance of the cortical area. The duration was approximately less than 30 minutes per individual. This includes on average 22 minutes processing and 7 minutes QC per subject.
If a repeat brain MRI was available a longitudinal registration based approach was used, using the cross-sectional segmentations as an input to extract brain atrophy and lesion changes (MSmetrix-long). This ensures a low measurement error of the brain atrophy measurements [47] and a more consistent evaluation of lesion changes [49].
All extracted measures were corrected for head size, as a consequence only normalized measures were used in further analyses. Head size was normalized by scaling the measured volume of the image with the determinant of the affine transformation matrix that describes the transformation between the Montreal Neurological Institute (MNI) atlas and the image.
A quality control (QC) of the extracted measurements was performed per center by E.N. and H.S. Moreover, a visual assessment of the segmentations was performed for all ‘outlier’ measurements. The ‘outliers’ included scans with volumes below the 10th and above the 90th percentile within the cohort of the center. Volumes that were not correctly segmented were completely rejected or approved with remarks. In detail, in case the segmentation approach completely failed, volumes were not trustworthy and values were excluded for further analyses. If the volumetric approach was in part correctly segmented and the WB volumetric analysis could be trusted, the scans were classified as approved with remarks. All other scans, with a correct segmentation were subdivided as approved. WMH underwent the same ‘outlier QC’ and were divided into two groups, namely approved or rejected. The QC of the scans was always dependent on the protocol used in each center. Therefore, scans were often rejected due to a low quality of the scans and/or large slice thickness (>3 mm) that led to missegmentation of the volumes.
Statistical analyses
Distribution of categorical variables, such as gender, within subject groups were analyzed with a Chi-Square test, and percentages were reported. Demographic comparisons and other analyses including MRI measures were based on ANOVA and/or ANCOVA tests with
For all analyses, two-tailed
RESULTS
Overview of the REMEMBER cohort
The study cohort from eight Belgian centers, included cognitively healthy controls (
REMEMBER baseline study population
Legend: data are mean±SD and [IQR], percentages (%), and numbers (n). P -values are general Chi-Square for gender or ANCOVA test results (for all other measures), whereas the p -values described below are based on the differences of post hoc analyses. Normalized brain volumes (WB, GM, WM, CSF, CGM, and WMH) are reported. ¶SCD versus MCI p = 0.013. Other comparisons: p = 0.001. ¥SCD versus MCI p = 0.003; MCI versus AD dementia p = 0.043. Other comparisons: p = 0.001. £Controls versus SCD p = 0.018. Other comparisons: p = 0.001. ±Controls versus SCD p = 0.008; MCI versus AD dementia p = 0.007. Other comparisons: p = 0.001. §SCD versus MCI p = 0.004. Other comparisons: p = 0.001. *Significantly different compared to SCD. ∧Significantly different compared to MCI. #Significantly different compared to AD dementia. ‡Significantly different compared to controls. AD, Alzheimer’s disease; BL, baseline; CGM, cortical grey matter; CSF, cerebrospinal fluid; FU, follow-up; GM, grey matter; IQR, interquartile range; MCI, mild cognitive impairment; MMSE, Mini-Mental State Examination; NPE, neuropsychological examination; REMEMBER, retrospective Belgian multi-center MRI biomarker study in dementia; SCD, subjective cognitive decline; SD, standard deviation; WB, whole brain; WM, white matter; WMH, white matter hyperintensities; YOE, years of education.
REMEMBER baseline study population
Legend: data are mean±SD and [IQR], percentages (%), and numbers (n).
Baseline brain MRI scans were available for all subjects and volumetric analyses were performed (MSmetrix-cross). Baseline scans were acquired at 1.5T (Siemens,
REMEMBER based on follow-up MRI: study population
Legend: data are mean and [IQR], percentages (%), and numbers (n).
Follow-up brain MRI scans were available in 95 subjects (11%) and all volumetric analyses were approved, for both MSmetrix-cross and MSmetrix-long. Follow-up scans were acquired at 1.5T(Siemens,
Study population
The demographic and clinical characteristics of the different diagnostic subgroups are given in Table 1. The AD dementia group consisted of significantly more females than males. Age at baseline was significantly different between all groups (
In total, 95 subjects had a follow-up MRI scan (Table 2). Gender, time between scans, and time between baseline and last clinical follow-up/conversion were not significantly different between the clinical diagnostic groups. The MCI and AD dementia patients were significantly older compared to the HC-SCD subjects (respectively,
Imaging analyses
All volumes and WMH discriminated AD dementia patients and cognitively healthy controls with a high diagnostic accuracy (sensitivity and specificity), as shown in Table 3 (for cut-offs, see Supplementary Table 4). Cross-sectional analyses, corrected for age at baseline and center, showed significant differences between all clinical diagnostic groups for WB (Fig. 1A). GM was significantly different between all groups, except between cognitively healthy controls and SCD patients (Fig. 1B). WM and CSF volumes were significantly different between controls and all other groups, and CSF volumes of AD dementia patients were significantly larger than SCD and MCI patients. MCI and AD dementia patients showed a significant smaller CGM volume compared to controls and SCD (Fig. 1C). The largest WB, GM, WM, and CGM volumes were detected in the control group and decreased by cognitive impairment (SCD<MCI<AD dementia), whereas the opposite was found for the CSF volume. No significant differences were detected for the WMH between the different clinical diagnostic groups. Stable MCI patients (

WB, GM, and CGM volumes across the different clinical diagnosis. Scatterplots of WB (A), GM (B), and CGM (C) volumes in mL per clinical diagnostic category with their corresponding mean±SD. Significant differences were reported between clinical diagnoses,
Diagnostic performance of volumetric measurements between cognitively healthy controls and AD dementia patients
Data are AUC, sensitivity, specificity, and 95% CI. AUCs were determined by ROC for all volumetric measurements. Highest accuracies were detected between HC and AD dementia (0.746–0.882) or MCI patients (0.703–0.820) and between SCD and AD dementia patient (0.629–0.706), with the highest accuracies for WB. However, differentiating in the earlier stages of the disease showed lower accuracies (HC versus SCD 0.629–0.796; SCD versus MCI 0.558–0.645) for all volumetric measurements. AD, Alzheimer’s disease; AUC, area under the curve; CGM, cortical grey matter; CI, confidence interval; CSF, cerebrospinal fluid; GM, grey matter; HC, cognitively healthy controls; MCI, mild cognitive impairment; ROC, receiver operating characteristic; SCD, subjective cognitive decline; sens, sensitivity; spec, specificity; WB, whole brain; WM, white matter; WMH, white matter hyperintensities.
Prediction of disease severity based on extracted MRI measures by the MMSE score
Data are R square (R2), standardized regression coefficients (β-values) with 95% confidence intervals [95% CI], and
By dividing the MCI patients in sd MCI (
Longitudinal MSmetrix analyses (Table 2)
Longitudinal data analyses (
Neuropsychological data analyses in combination with volumetric MRI measures
An MMSE score was available for 93% of all subjects. Linear regression models per volume (model 1), significantly predicted the MMSE scores for all volumes (Table 4). When age and clinical diagnosis at baseline were introduced to the analyses (model 2), WB, GM, CSF, and CGM could significantly predict MMSE score.
Follow-up MMSE scores, with at least three months between baseline and last follow-up MMSE score, were available in 429 subjects (2.4 [1.1–3.4] years). The MMSE slopes were calculated based on the difference in MMSE scores and were divided by the follow-up time. Subjects were divided in four groups: (1) subjects with an improvement in MMSE score (

WB, GM, and CGM volumes across slow and fast decliners in the total population based on MMSE. Scatterplots of WB (A), GM (B), and CGM (C) volumes in mL at baseline with their corresponding mean±SD. The MMSE slopes were calculated based on the difference in MMSE scores and were divided by the follow-up time. Subjects were categorized as slow decliners in case the MMSE slope was smaller than three, and if the slope was equal or larger than three a subject was categorized as a fast decliner. Significant differences were found for all volumes showed. CGM, cortical grey matter; GM, grey matter; MMSE, Mini-Mental State examination; SD, standard deviation; WB, whole brain.
DISCUSSION
In the present large multi-center study, including 887 subjects, we analyzed volumetric brain measurements with the CE-labeled and FDA-approved software pipeline MSmetrix that has been developed for clinical use in MS and has proven high accuracy for longitudinal analyses at the individual patient level [47–51]. Moreover, this study used this fully automated segmentation approach to further decrease the observer dependency as well as the time needed from the expert (<30 min/subject), and standardize the quantification of MRI readings. Thus, the fully automated procedure has reduced reliance on anatomical expertise and provide rapid results compatible with clinical practice. In addition, MSmetrix uses clinical brain MRI scans in contrast to many other tools that apply MRI scans from selected clinical trial/research cohorts. Another advance of MSmetrix is the use of Digital Imaging and Communications in Medicine (DICOM) format images, which are ideal to create a fast clinical workflow. Therefore, MSmetrix translates the volumetric measures from a research setting to routine clinical practice. Nevertheless, in differential dementia diagnosis the usefulness of these MRI measures is limited due to effect that atrophy is not AD-specific.
By applying this software in the AD continuum and cognitively healthy elderly, we found that the volumetric volumes were significantly smaller with increased cognitive impairment. In addition, WB and GM accurately discriminated between clinical diagnostic groups, whereas WM, CSF, and CGM were less accurate. For this reason, the output measures of MSmetrix are able to support the clinical diagnostic work-up of AD. The diagnostic accuracies of the volumetric measures are comparable or better compared to the current literature [3, 53–55]. WMH were not significantly different between diagnostic groups. A higher degree of WMH is known to be associated with poorer cognitive function [56, 57]. However, the lack of significant differences could be explained by the reflection of low variability, as a trend was found in the expected direction with more WMH by more cognitive impairment (Table 1).
In addition, the method failed to distinguish stable MCI patients from MCI patients that progressed to AD dementia as no significant differences were detected for baseline volumetric measurements. A possible explanation is that volume change of specific brain regions over time may be more indicative than a static cross-sectional assessment. Moreover, the brain is affected by normal aging and disease progression, and there is individual variation in brain anatomy, which makes it difficult to discriminate between groups. Especially the MCI subgroups are difficult to distinguish as they are in the same disease stage of the continuum. Longitudinal data could avoid the limitations of cross-sectional analyses [58–61], however in our cohort also no differences between the stable MCI patients and MCI converters were detected in the longitudinal analyses. Probably, this is due to a small sample size of available longitudinal MRI scans in MCI patients (
The primary objective of this study was to identify which atrophy measures accurately discriminate between groups by using MSmetrix. Our cross-sectional data confirm that a high diagnostic accuracy was observed between AD dementia patients and cognitively healthy controls for the extracted volumes. However, when corrected for age and center, WB and GM were the most robust MRI biomarkers to distinguish between clinical groups. Those volumetric measures can support disease diagnosis and thus, may assist clinicians in their decisions. The secondary objective was to investigate MSmetrix as a possible predictor for clinical progression. The single variable linear regression models showed that increased CSF volumes and atrophy of WB, GM, and CGM predicted cognitive impairment, independent of age and baseline diagnosis, which is confirmed by other studies [65–67]. Next, the prediction of disease evolution based on the decline in MMSE score was also detected in WB, GM, and CGM where slow and fast MMSE decliners showed significant differences for those volumes, with smaller baseline volumes in the fast decliners, in line with the literature [3, 30–32]. In here, the volumetric measures are of help to the clinicians to track the clinical progression of the disease, as increased CSF volumes and brain atrophy correlate with clinical decline.
This study focused on the output measurements of MSmetrix, which are large volumes (WB, GM, WM, CSF, CGM) that can overcome the limitations of disease-related regional atrophy. Nevertheless, many groups investigated regional atrophy and especially medial temporal lobe, including hippocampal atrophy, as this is a sensitive AD biomarker [18, 34]. A potential benefit of these specific AD-pathological regions is that they possibly allow the discrimination between sd MCI and md MCI patients [13–15, 55]. This possible discrimination was not observed in the current study, as expected by the proposed models [18, 20]. Indeed, sd MCI patients and SCD subjects had comparable volumes and also no significant difference was found between md MCI and AD dementia patients. A possible explanation could be that the neuropsychological examinations showed inter- and intra-rater and/or center variability as the neuropsychological tests were administered in different centers by different neuropsychologists. To take this limitation into account a correction for center was applied in the statistical analyses.
Another limitation of this study was the significant difference in age for diagnostic groups at baseline, although we corrected the statistical analyses to overcome this problem. Next, few follow-up MRI scans were available which probably led to less significant differences in the longitudinal volumetric measurements. Lastly, analyzing the different volumes based on only T1 images versus T1 in combination with FLAIR sequences could be seen as another limitation of this study, as differences in results could be found in case larger WMH volumes were present. Those WMH volumes could be wrongly detected as GM volumes. Nevertheless, the WB volume is usually robust because WMH volumes are classified as CSF, as has been observed in the validation of MSmetrix in MS patients [47–51]. Despite these disadvantages, the large amount of data included in this multi-center study, the short time between baseline measures, and the mean clinical follow-up time of 2.1 years [IQR: 0.6–3.2] are certainly strengths of this study. Even though we have used MRI scans that were acquired in routine clinical practice, and thus from different scanner types which could be seen as a limitation, only 15.9% of the scans were rejected, which demonstrates the clinical usefulness of this method. Moreover, MSmetrix can be integrated easily in the clinical workflow as it uses DICOM images from all scanner types without any restrictions of the image sequences and produces results in an accurate and rapid way.
In conclusion, volumes measured with a fully automated tool (MSmetrix) accurately discriminated between clinical diagnostic groups in an AD population, and thus can support clinicians in their decisions. Especially WB and GM, and CGM are MRI biomarkers that distinguished between clinical diagnostic groups and were possible predictors for clinical progression based on (decline in) MMSE scores. Those volumes could therefore be used to define the clinical spectrum of AD more accurately, are able to track clinical progression of disease, are of help to select appropriate populations for clinical trials and can be applied for assessment of the efficacy of (future disease-modifying) treatments.
Footnotes
ACKNOWLEDGMENTS
This research was in part supported by the University Research Fund of the University of Antwerp. For the University of Liège center, this work was supported by a French Speaking Community Concerted Research Action (ARC-06/11-340) and a Belgian InterUniversity Attraction Pole (P6/29). We acknowledge the contribution of Ellen De Roeck, Naomi De Roeck, Charisse Somers, Peter Mariën, Jos Saerens, and Nore Somers.
MSmetrix is proprietary software, developed by ico
