Sage Journals: Discover world-class research

Abstract

Background

MRI-based hippocampus volume (HV) is widely used as neurodegeneration marker in Alzheimer's disease.

Objective

An easy-to-use and easy-to-interpret method to categorize T1-weighted MR sequences with respect to test-retest stability of hippocampus volumetry based on general image quality metrics (IQM).

Methods

The study included 446 3D T1-weighted MRI scans of one healthy middle-aged man obtained during 32 months in 122 scanning sessions performed with 96 different scanners at 76 different sites. Each scanning session represented a different acquisition sequence of ≥2 back-to-back repeat scans (3.7 ± 0.7 on average). Unilateral HVs were determined with 18 different tools for automatic volumetry. An acquisition sequence was considered “poor” if the z-score of the within-session coefficient-of-variation of the HV estimates from the session, averaged across all volumetry tools and both hemispheres, exceeded one standard deviation. General IQM were computed for each scanning session using the freely available MRI Quality Control Tool. A classification-and-regression tree (CART) was trained to discriminate between good and poor acquisition sequences using the IQM as input.

Results

The CART selected the left-right width of the acquisition field-of-view and the contrast-to-noise ratio as predictor variables. Overall accuracy of the CART was 79.5%. CART-based classification increased the ratio of good-to-poor acquisition sequences from 3.5 among all sequences to 7.4 among the sequences predicted to be good. This was at the expense of losing 15% of the good sequences.

Conclusions

The IQM-based decision tree model provides useful performance for the differentiation of T1-weighted sequences associated with good versus poor test-retest stability of hippocampus volumetry.

Keywords

Alzheimer's disease classification-and-regression tree contrast-to-noise ratio field-of-view hippocampus image quality outlier structural MRI volumetry

Introduction

Detection (or exclusion) of global and regional atrophy using MRI-based brain volumetric estimates is increasingly used in everyday clinical routine to assist the diagnosis, differentiation and monitoring of neurodegenerative diseases.^1–4 In the diagnostic workflow of Alzheimer's disease (AD), hippocampus volume (HV) estimated from high-resolution 3-dimensional (3D) gradient-echo T1-weighted structural MRI (sMRI) is used as a supportive diagnostic biomarker,^5–10 to stage neuronal damage,^11,12 and to predict progression to dementia from prodromal disease stages.^13–15

However, MRI-based volumetry is sensitive to the setting (scanner platform & acquisition sequence), resulting in non-biological variability of no interest.^16–21 In particular, the utility of MRI-based brain volumetry can be affected by low image quality. Relevant degradation of image quality in individual scans can be caused by subject-related factors including motion during the acquisition.^22–24 Systematically low image quality of all scans from a given source can be associated with hardware-specific factors (that cannot be changed) and with details of the acquisition sequence (that can be adjusted). Low image quality can be reflected by poor signal-to-noise ratios,²⁵ intensity non-uniformity, spatial image distortion, and substantial deviation from the standard white-to-gray matter contrast.¹⁷

In large scale imaging research collecting MRI scans across numerous sites,^26–28 the statistical power may be reduced by large test-retest variability of some “poor” acquisition sequences with low image quality. In clinical practice, large variability of HV estimates can lead to erroneous conclusions²⁹ that may negatively impact patient management. Thus, identification and exclusion of acquisition sequences associated with large test-retest variability of HV estimates may have considerable impact in both research and clinical practice.

Low image quality associated with hardware- and sequence-specific factors can be detected from scans of physical/geometrical phantoms.^30–32 However, these phantoms may not allow reliable prediction of test-retest stability with respect to very specific tasks such as hippocampus volumetry. Scanning healthy subjects, serving as human phantoms, can be more appropriate.^18,33–40 General image quality metrics (IQM) derived from normal human scans have been used for retrospective harmonization of volumetric MRI data across scanners and acquisition sequences.⁴¹ It is plausible, therefore, to assume that general IQM are useful also for the characterization of acquisition sequences with respect to test-retest stability of hippocampus volumetry.

Against this background, the aim of the current study was to develop an easy-to-use and easy-to-interpret method that uses general IQM derived from human phantom scans to distinguish between sMRI acquisition sequences with high versus low test-retest stability of hippocampus volumetry.

There was a strong focus on generalizability to data from unseen scanners and unseen volumetry tools. To address this, the study used 18 different volumetry tools, including commercial and research tools, and a subset of the frequently traveling human phantom (FTHP) dataset. The latter provides 557 sMRI scans of a single healthy male volunteer who completed 176 scanning sessions on 116 different MRI scanners at 96 different sites.⁴² In each scanning session, all back-to-back repeat scans were acquired with exactly the same sequence parameters. Thus, each scanning session represents a different acquisition setting (MRI scanner & acquisition sequence). The MRI Quality Control Tool (MRIQC),²⁹ which is freely available both for local implementation and as an online service, was used to compute general IQM as input to the prediction model.

Methods

Frequently traveling human phantom (FTHP) dataset

The FTHP dataset has been recently introduced and made freely available by our group⁴² (https://www.kaggle.com/datasets/ukeppendorf/frequently-traveling-human-phantom-fthp-dataset). It provides 557 sMRI scans of a single healthy male (48.6 years old at the first scan) who completed 176 scanning sessions on 116 different MRI scanners at 96 different sites. Most of the sites were private radiological practices and small hospitals. The scanning sessions were performed during a period of 32 months (07/2017–02/2020). In each scanning session, 1–6 (median 4) sMRI were acquired consecutively without repositioning and without delay using the same acquisition sequence. Different scanning sessions from the same MRI scanner differed with respect to the acquisition sequence. Thus, each scanning session represents a different acquisition setting, that is, a different combination of MRI scanner and acquisition sequence. In the following, we will use “acquisition sequence” to denote “combination of MRI scanner and acquisition sequence” to simplify the notation.

All imaging sites were informed that the scans were being acquired for the purpose of MRI-based volumetry. They were asked to use acquisition sequences recommended by the scanner manufacturer. Thus, the FTHP dataset can be considered representative of MRI-based volumetry in everyday clinical routine at non-academic sites. At the time of writing this manuscript (02/2025), more than 4 years after the last scan, the volunteer is still in excellent health.

For the analyses reported in this manuscript, 26 of the 557 sMRI scans were excluded because they were secondary images from the same acquisition (derived from the primary images by post-processing), 3 scans were excluded because of inconsistent DICOM header information (slice location attribute), and one scan was excluded because it was not acquired with a 3D sequence (Supplemental “Excluded scans”). The remaining 527 sMRI scans were submitted to automatic hippocampus volumetry.

The subject provided written informed consent for the retrospective use of the FTHP dataset for research purposes. This was approved by the ethics review board of the general medical council of the state of Hamburg, Germany (reference number PV5930).

The cost of travel and MR scanning for the FTHP dataset was covered by a private company, specializing in quantitative analyses of medical images (MRI, SPECT and PET) for use in clinical practice and drug trials, in order to qualify the structural MRI acquisition sequences used by their customers for MRI-based brain volumetric analyses. As the FTHP dataset was used retrospectively in the current study, no funding was required.

Automatic hippocampus volumetry

The HV in left and right hemispheres were determined for each scan independently with 19 different tools for automatic MRI-based volumetry, including both commercial and research tools (Table 1). The MRI-based volumetry was performed non-centrally by the providers/developers of the volumetry tools. For this purpose, the sMRI scans were pseudonymized in fully randomized order (i.e., without grouping of scans from the same session) and made available to the providers/developers of the volumetry tools in both DICOM and NIFTI format. Each provider/developer returned the HV estimates as a spreadsheet table for central analysis.

Table 1.

Tools used for automatic hippocampus volumetry. The tools are listed in random order not matching the alphabetical order of the pseudonyms A, …, V.

Tool	Method	Publications	Website
mdbrain AI	convolutional neural network (deep learning)	^43–45	www.mediaire.de/en/home
AI-Rad Companion Brain MR version VA40	single-atlas	^46,47	www.siemens-healthineers.com/en-sg/digital-health-solutions/digital-solutions-overview/clinical-decision-support/ai-rad-companion/brain-mr
MINC Toolkit 2	multi-atlas (patch-based)	⁴⁸	github.com/BIC-MNI/minc-toolkit-v2
KAIBA	automatic landmark detection and histogram analysis on the VOI	^49–51	www.nitrc.org/projects/art
ABV (Atlas-based Volumetry)	single-atlas	^19,52,53	https://kliniklengg.ch/schweizerisches-epilepsie-zentrum/diagnostik/medizinische-bildverarbeitung-mri/volumetrische-mri-analyse
CurAIBL	multi-atlas	^54,55	milxcloud.csiro.au/tools/curaibl
mdbrain atlas	atlas-based	⁵⁶	www.mediaire.de/en/home
icobrain dm 5.4	multi-atlas	⁵⁷	www.icometrix.com
Quibim Precision	multi-atlas	^58,59	www.quibim.com
SACHA	multi-template	^60–62
CAT12	multi-atlas		neuro-jena.github.io/cat/
volbrain	multi-atlas	⁶³	volbrain.net/
Neuroreader	multi-atlas	⁶⁴	brainreader.net
icobrain dl 5.15	convolutional neural network (deep learning)	⁶⁵	www.icometrix.com
MALPEM	multi-atlas	^66–68	github.com/ledigchr/MALPEM
Biometrica 1.1	single-atlas	^8,15	www.jung-diagnostics.de/
Freesurfer 7	single-atlas	^69,70	surfer.nmr.mgh.harvard.edu/
NeuroQuant	dynamic atlas	^71,72	www.cortechs.ai/
MAGeT-Brain	modified multi-atlas	⁷³	github.com/CoBrALab/MAGeTbrain

All volumetry tools provided HV in ml (except one tool that provided the hippocampal parenchymal fraction). The raw volumes in ml were used throughout this study without any correction or scaling (e.g., with respect to the total intracranial volume, age or sex).

Within-session test-retest variability of HV estimates

First, the following two steps were performed separately for each volumetry tool and each hemisphere:

For each scanning session, the within-session coefficient of variation (CoV = 100 * standard deviation / mean) of the HV estimates was computed across the 2 to 6 back-to-back scans in this session.

The within-session CoVs were transformed to z-scores relative to the mean and standard deviation of the within-session CoV across the different scanning sessions. Outlier sessions with respect to the within-session CoV were excluded from the computation of mean and standard deviation. A scanning session was considered an outlier if its within-session CoV was greater than CoV₃ + 1.5 × IQR or less than CoV₁ − 1.5 × IQR, where CoV₁ and CoV₃ denote the first and third quartiles of the within-session CoV across all scanning sessions and IQR (interquartile range) is CoV₃-CoV₁.⁷⁴ The Box-Cox transformation determined on the non-outlier sessions was applied prior to the z-score transformation to reduce non-normality effects.

Second, the z-scores were averaged across all volumetry tools and both hemispheres, separately for each scanning session. The resulting mean within-session CoV z-score of a given scanning session characterizes the acquisition sequence used in this scanning session with respect to the within-session test-retest variability of HV estimates independent of a specific volumetry tool.

Poor scanning sessions/acquisition sequences

The distribution of the mean within-session CoV z-score across the scanning sessions was fitted by a Gaussian function. The mean value of the Gaussian was fixed at z-score = 0. The standard deviation (SD) of the resulting Gaussian was used to define a cutoff on the mean within-session CoV z-score to discriminate between “poor” and “good” scanning sessions. More precisely, a scanning session was considered “poor” if its mean within-session CoV z-score was larger than +1.0*SD (of the Gaussian), otherwise it was considered “good” (mean within-session CoV z-score ≤ + 1.0*SD). An acquisition sequence was considered “poor” or “good” according to the categorization of the corresponding scanning session. “Poor” acquisition sequences result in worse (within-session) stability of HV estimates compared with “good” acquisition sequences.

General image quality control metrics

The MRI Quality Control Tool (MRIQC, version 22.0.6)^29,75 was used to compute the contrast-to-noise ratio (CNR) and the full-width-at-half-maximum according to the Analysis of Functional NeuroImages software package⁷⁶ (AFNI-FWHM) for each sMRI scan (average of 3dFWHMx, 3dFWHMy, 3dFWHMz). The CNR characterizes the separation of gray matter and white matter tissue signals relative to the noise level (larger values indicate better quality).⁷⁷ It is computed as $C N R = | {\hat{μ}}_{G M} - {\hat{μ}}_{W M} | /$ $\sqrt{{\hat{σ}}_{G M}^{2} + {\hat{σ}}_{W M}^{2} + {\hat{σ}}_{A i r}^{2}}$ , where $\hat{μ}$ and ${\hat{σ}}^{2}$ indicate the estimated mean value and variance of the signal intensity in the gray matter (GM), white matter (WM), and in the air background surrounding the head (mriqc.readthedocs.io/en/latest/iqms/t1w.html). The CNR has been used previously to assess the association between image quality and brain volumetric measurements.^33,56 The AFNI-FWHM characterizes the noise smoothness of the images (smaller values indicate better quality).

The within-session means of CNR and AFNI-FWHM across all back-to-back repeat scans in a given scanning session were used to characterize the image quality of the corresponding acquisition sequence.

In addition, the width of the acquisition field-of-view (FOV) in left-right direction (FOVx) and in anterior-posterior direction (FOVy) were used to characterize acquisition sequences. The rationale for this was that a narrow field-of-view just barely including the whole head might be associated with edge artefacts that negatively impact automatic volumetry.

Statistical analysis

The aim was to discriminate between good and poor acquisition sequences based on the session means of CNR and AFNI-FWHM and on FOVx and FOVy.

Decision tree analysis using a classification and regression tree (CART) was utilized for this purpose. The growth method of this decision tree variant aims at minimum intra-node impurity, that is, maximum intra-node homogeneity (optimally, all cases of a terminal node belong to the same category). The measure of impurity to be minimized by the CART was the Gini-Simpson diversity index, which is the standard for categorical dependent variables. The Gini-Simpson index of a node characterizes the rate of false categorization when randomly selected cases are randomly categorized according to the distribution of the categories within the node.⁷⁸ The minimum improvement in the Gini-Simpson index required to split a node was set to 0.02. The minimum size of parent and child nodes at each tree branching was set to 20 and 3, respectively. The maximum depth of the CART was set to 2. The CART was free to select the predictor for the first split (CNR, AFNI-FWHM, FOVx or FOVy), that is, it was not forced to select a specific predefined predictor first. The CART was trained with 5-fold cross-validation.

The impact of the CART-based categorization of acquisition sequences on the within-session stability of hippocampus volume estimates was tested by repeated measures analysis of variance (ANOVA) of the within-session CoV using volumetry tool (A, …, V) and hemisphere (left, right) as within-session factors and the CART-based categorization (good session, poor session) as between-session factor.

Results

Twenty-two (4.2%) of the 527 sMRI scans were excluded from the analyses due to non-acceptable image quality according to visual inspection (Supplemental “Excluded scans”): 5 (0.9%) scans were excluded due to very low gray-to-white matter contrast (Supplemental Figure 1a), 12 (2.3%) due to a severe bias field (Supplemental Figure 1b), and 5 (0.9%) sMRI scans were excluded due to a narrow field-of-view not including the entire head (Supplemental Figure 1c). From the remaining 505 sMRI scans, 35 (6.9%) were excluded because HV estimates were not provided for all of the 19 volumetry tools (HV estimates missing for 4/22/4/2/8 scans for tool C/O/P/Q/U). From the remaining 470 sMRI scans, 24 (5.1%) single scans were excluded (no back-to-back repeat scans in the same scanning session). The remaining 446 sMRI scans were included in the analyses. These had been acquired in 122 different scanning sessions with 96 different MRI scanners at 76 different sites (Supplemental Table 1). The number of back-to-back scans was 2 in 8 (6.6%) of the 122 scanning sessions, 3 in 37 (30.3%) sessions, 4 in 67 (54.9%) sessions, 5 in 9 (7.4%) sessions, and 6 in one (0.8%) scanning session. The MRI scanner manufacturer was Siemens in 74 scanning sessions (60.7%), Philips in 36 sessions (29.5%), and GE in 12 sessions (9.8%). The magnetic field strength (B₀) was 1.5 Tesla in 87 scanning sessions (71.3%), 3 Tesla in 34 sessions (27.9%), and 1 Tesla in one session (0.8%). Slice orientation was sagittal in 83 scanning sessions (68.0%), axial in 35 sessions (28.7%), and coronal in 4 sessions (3.3%).

The distribution of the within-session CoV of the left HV across the 122 included scanning sessions is shown in Figure 1, separately for each volumetry tool. The corresponding distributions of the within-session CoV of the right HV are shown in Supplemental Figure 2. The distributions of the within-session mean of left and right HV across the 122 scanning sessions are shown in Supplemental Figures 3 and 4, respectively.

Figure 1.

Within-session coefficient of variation (CoV) of hippocampus volume (HV) estimates in the left hemisphere: histogram of the within-session CoV (=100 × standard deviation / mean across all back-to-back repeat scans in the session) of the HV in the left hemisphere across the 122 included scanning sessions, separately for each volumetry tool (A, B, …, V). The (number of) outliers is shown in red color. Mean ± standard deviation of the CoV estimates were computed across the non-outlier sessions (blue). The scale of the horizontal axis is the same for all volumetry tools except tool I. As a consequence, not all outliers are displayed for all tools. Histograms of the HV within-session CoV in the right hemisphere are shown in Supplemental Figure 2.

The mean value of the within-session CoV across all non-outlier scanning sessions is shown in Figure 2, separately for each volumetry tool. Volumetry tool I showed considerably higher within-session variability compared with all other tools and, therefore, was excluded from the computation of the mean value of the within-session CoV z-score across volumetry tools.

Figure 2.

Mean within-session coefficient of variation (CoV): mean value ± standard error of the mean (SE) of the within-session CoV across all non-outlier scanning sessions, separately for each volumetry tool (A, B, …, V) and both hemispheres (L = left, R = right).

The distribution of the mean (across the remaining 18 volumetry tools and both hemispheres) CoV z-score across the 122 scanning sessions is shown in Figure 3. The Gaussian fit of the distribution revealed a SD of 0.360 z-score points. Using +1.0 SD as cutoff on the mean CoV z-score identified 95 (77.9%) acquisition sequences to be good and 27 (22.1%) to be poor (Supplemental Figure 5).

Figure 3.

Mean value of the within-session CoV z-score across all volumetry tools (except tool I) and both hemispheres: the histogram shows the distribution of the mean z-score across the 122 scanning sessions. The continuous line represents the fit of a Gaussian function with its mean value fixed at z-score = 0. The standard deviation (SD = 0.360) of the Gaussian fit was used to distinguish between “good” acquisition sequences (high within-session stability of HV estimates on average across all volumetry tools) and “poor” acquisition sequences (low within-session stability): “good” if z-score ≤ + 1 SD, “poor” if z-score > + 1 SD.

The number of back-to-back repeat scans within a session did not differ between good and poor acquisition sequences (3.6 ± 0.7 versus 3.7 ± 0.8, t-test p = 0.707). There was a trend towards a higher proportion of poor acquisition sequences at ≤1.5T compared with 3T (26.1% versus 11.8%, Pearson chi-square p = 0.086). The proportion of poor acquisition sequences did not depend on the scan orientation (sagittal versus other, Pearson chi-square p = 0.219). Good and poor acquisition sequences did not differ with respect to the duration of a single scan (279 ± 59 s versus 296 ± 96 s, heteroscedastic t-test p = 0.398).

Histograms of the 4 IQMs (FOVx, FOYy, CNR, AFNI-FWHM) across the 122 scanning sessions are shown in Figure 4.

Figure 4.

Image quality metrics: histograms of the acquisition field-of-view in left-right and anterior-posterior direction (FOVx, FOVy) and the session means of the MRIQC CNR and AFNI-FWHM across the 122 scanning sessions. “Good” and “poor” acquisition sequences are indicated by different colors.

The CART to discriminate between “poor” and “good” acquisition sequences is shown in Figure 5. The FOVx was selected for the first branching, the CNR was selected for a second branching of the cases with adequate FOVx. FOVy and AFNI-FWHM were not included in the model. Overall accuracy, sensitivity and specificity of the CART for the identification of good sessions were 79.5%, 85.3% and 59.2%, respectively (Figure 6). CART-based classification increased the ratio of good-to-poor acquisition sequences from 3.5 among all sequences to 7.4 among the sequences predicted to be good. This was at the expense of losing 15% of the good acquisition sequences. Representative cases are shown in Figure 7.

Figure 5.

Classification and regression tree (CART) to discriminate between “poor” and “good” acquisition sequences: the field-of-view in left-right direction (FOVx) and the contrast-to-noise ratio (CNR) were included in the model. The field-of-view in anterior-posterior direction and the AFNI full-width-at-half-maximum were not included. Correctly identified “poor” acquisition sequences are shown in green, “good” acquisition sequences falsely classified as “poor” are shown in red. The 2 × 2 contingency tables of the CART prediction performance is shown at the bottom.

Figure 6.

Performance of the classification and regression tree (CART) to discriminate between good and poor acquisition sequences: scatter plot of the within-session mean of the MRIQC contrast-to-noise ratio (CNR) versus the field-of-view in left-right direction (FOVx) across the 122 scanning sessions. The dashed lines represent the cutoffs on FOVx and CNR according to the CART (Figure 5).

Figure 7.

Representative acquisition sequences: the acquisition sequence used in scanning session 119 (left) passed the quality control by the classification and regression tree (CART, Figure 5): the field-of-view in left-right direction (FOVx) was above the cutoff of 150 mm on FOVx and the contrast-to-noise ratio (CNR) was above the cutoff 3.147 on the CNR. The within-session coefficient of variation (CoV) z-score (mean across 18 volumetry tools and both hemispheres) was negative, indicating better than average within-session test-retest stability of HV estimates. The acquisition sequence used in scanning session 121 (middle) was characterized by good CNR but poor FOVx. Thus, it was categorized as poor by the CART, in line with the mean within-session CoV of 0.44 clearly above the threshold of 0.36 (Figure 3). The acquisition sequence used in scanning session 70 (right) was acquired with an acceptable FOVx, but the CNR was clearly below the cutoff. Thus, it was categorized as poor, in line with the large mean within-session CoV above the threshold. For each session, the first scan is shown.

The repeated measures ANOVA of the within-session CoV revealed the impact of the volumetry tool to be highly significant (p < 0.001), whereas the tool * CART-based categorization interaction effect did not reach statistical significance (p = 0.595). The estimated marginal mean of the within-session CoV was 0.987% (standard error 0.022%) for acquisition sequences categorized as good by the CART versus 1.134% (0.038%) for acquisition sequences categorized as poor by the CART. The within-session CoV was smaller for acquisition sequences categorized as good by the CART compared with the acquisition sequences categorized as poor for 16 (88.9%) of the 18 included volumetry tools (Figure 8). For the remaining 2 (11.1%) volumetry tools, the within-session CoV was slightly larger for acquisition sequences predicted to be good. For the excluded volumetry tool I, the within-session CoV was significantly smaller for acquisition sequences categorized as good by the CART compared with the acquisition sequences categorized as poor in both hemispheres (left: 5.3% versus 12.3%, p = 0.038; right: 5.7% versus 14.1%, p = 0.015).

Figure 8.

Impact of the classification and regression tree (CART)-based categorization on within-session test-retest stability of hippocampus volume estimates. Within-session coefficient of variation (CoV) of the volume estimates of the left hippocampus in scanning sessions categorized as good by the CART (mean CoV across all good sessions) versus scanning sessions categorized as poor by the CART (mean CoV across all poor sessions), separately for each of the 18 included volumetry tools. The corresponding results for the right hippocampus volume are shown in Supplemental Figure 6.

Discussion

The primary finding of this study was that a simple CART model for binary classification of acquisition sequences based on only two general IQM (estimated from human phantom scans acquired with the considered acquisition sequences) provides useful accuracy for the discrimination between acquisition sequences with good versus poor (within-session) test-retest stability of HV estimates (Figures 5 and 6). This was relatively independent of the volumetry tool (Figure 8), suggesting that the same CART is beneficial with different volumetry tools.

A typical application scenario is as follows. For a given acquisition sequence at a given MR scanner, the freely available MRIQC tool would be used to obtain FOVx and CNR from human phantom scans acquired in this setting. If the CART predicts the acquisition sequence to be poor with respect to within-session stability of hippocampus volumetry, the acquisition parameters of the sequence should be adjusted, particularly with the aim of improving FOVx and/or CNR. We hypothesize that this will improve the stability of hippocampus volumetry in clinical routine across sites and scanners.

IQM

From the whole set of 68 IQM provided by the MRIQC tool, only two were considered for the discrimination between good and poor acquisition sequences with respect to test-retest stability of hippocampus volumetry, CNR and AFNI-FWHM, because these were expected to be particularly useful for this task.^33,56,79 The aim was to keep the prediction model not only easy-to-use but also easy-to-interpret by avoiding possible collinearity issues among a larger set of predictors. Collinearity does not affect the performance of the whole model but strongly limits its interpretability with respect to individual predictors. MRIQC IQM that may be particularly sensitive to specific characteristics of the individual human phantom's brain were not used. For example, MRIQC IQM that characterize the overlap of the tissue probability maps from the individual image with the corresponding normal tissue probability maps were disregarded for this reason. MRIQC signal-to-noise ratio metrics using the air background as a sole noise reference were also disregarded,⁸⁰ since the noise characteristics of the air signal often is not a reliable measure of the image quality of the tissue segments.^75,81,82 The coefficient of joint variation ( $CJV = ({\hat{σ}}_{G M}^{2} + {\hat{σ}}_{W M}^{2}) / | {\hat{μ}}_{G M} - {\hat{μ}}_{W M} |$ , mriqc.readthedocs.io/en/latest/iqms/t1w.html)⁸³ was disregarded, because it showed a very strong inverse correlation with the CNR (Spearman correlation coefficient of the session means across the 122 included sessions = −0.992, p < 0.001).

IQM and multi-variable IQM-based models for automatic or semi-automatic quality assessment of structural brain MRI have been proposed by various groups^{26,79,84–88} (systematic review⁸¹). Some of the volumetry tools used in the current study provide their own IQM. The rationale for using MRIQC IQM was that the MRIQC software is freely available (both for local implementation and as online service²⁹) so that users can have access to these IQM independent of the volumetry tool they are using. Furthermore, IQM provided by a given volumetry tool may depend on that tool resulting in unwanted bias.

The proposed application scenario assumes that the variability of the IQM between sMRI scans of different human phantoms acquired with the same scanner and the same acquisition sequence is smaller than their difference between “poor” and “good” scanning sessions acquired with the same human phantom. Pouwels and co-workers assessed the between-site variability of MRIQC IQM in 3D T1-weighted scans between 5 different 3T scanners from 3 different manufacturers (Philips, GE, Siemens) at 5 different sites.⁸⁹ The acquisition sequences had been previously harmonized between the sites according to ADNI-3 recommendations.⁹⁰ At each site, 28 healthy subjects were selected and scanned at this site. The subjects were matched for age, sex and education between the sites. The mean (across the 5 sites) of the between-subjects CoV was smaller than the between-sites CoV of the site-specific mean values for each of the MRIQC IQM tested by Pouwels and co-workers.⁸⁹ For the AFNI-FWHM, the mean (across the 5 sites) of the between-subjects CoV (3.05%) was about 3 times smaller than the between-sites CoV of the site-specific mean values (9.8%). For the CJV, the mean of the between-subjects CoV (7.5%) was about 1.4 times smaller than the between-sites CoV of the site-specific mean values (10.6%). A similar relationship can be assumed for the CNR due to its strong inverse correlation with the CJV (see above). This demonstrates that the variability of the MRIQC CNR IQM is smaller between different healthy subjects than between sites/scanners even in case of strict harmonization of the T1 acquisition sequences. Thus, it can be safely assumed that the variability of these IQM between different human phantoms is much smaller than the difference between “poor” and “good” acquisition sequences in non-harmonized settings.

We hypothesize that the exclusion of poor acquisition sequences is useful also for retrospective multi-site harmonization of sMRI scans, either using conventional approaches such as ComBat⁹¹ or deep learning-based methods such as SynthSR,⁹² MURD¹⁸ or CALAMITI.⁹³ Retrospective image harmonization may not only be less effective for poor acquisition sequences, but including poor acquisition sequences in the procedure may affect the model's performance also in good sequences.

Volumetry tools

A secondary finding of this study was the rather large difference between volumetry tools regarding the within-session stability of HV estimates. Among the 19 considered volumetry tools, the average of the HV within-session CoV across all non-outlier scanning sessions ranged between 0.3 and 5.5% (Figure 2).

Another secondary finding was the rather large difference between the volumetry tools regarding their sensitivity with respect to the acquisition sequence.^17,20 The between-sessions CoV of the within-session mean of the HV estimates varied between about 1% and almost 7% (Supplemental Figure 7). The tight correlation of the between-sessions CoV of left and right hemisphere (Supplemental Figure 7) suggests that the between-sessions CoV was mainly driven by differences in the image characteristics between the scanners and acquisition protocols used for the different scanning sessions. In multi-site/multi-scanner settings, volumetry tools with low between-sessions CoV may be preferred.

A third secondary finding was the rather large difference of HV estimates between different volumetry tools, in line with previous studies.⁹⁴ Among the 18 volumetry tools that provide HV estimates in ml (all except tool G), the average of the HV within-session mean across all non-outlier scanning sessions ranged between 2.15 and 5.99 ml for the left hemisphere and between 1.93 and 5.55 ml for the right hemisphere (Supplemental Figures 3 and 4). These differences are largely explained by different anatomical hippocampus definitions (e.g., with or without subiculum). The latter prompted an international group of experts to propose a harmonized protocol for the segmentation of the hippocampus on MR images both manually^95,96 and with software tools for fully automatic segmentation.⁹⁷ The fact that not all tools for automatic HV estimation are in accordance with this recommendation complicates the interpretation of HV estimates, not only because it limits the utility of longitudinal HV estimates obtained with different volumetry tools, but also because age-related decline and disease-related atrophy differ between hippocampal subregions.^98,99

Limitations

Limitations of the current study include the following. First, a CART and only four general IQM were tested for the discrimination between good and poor acquisition sequences with respect to test-retest stability of hippocampus volumetry. More complex classification methods often provide better performance compared to low-dimensional decision trees, but at the expense of lower transparency (“black-box” methods).¹⁰⁰ Low-dimensional decision tree models allow the user to verify that the classification decision made by the algorithm is plausible and coherent, which improves their acceptance for widespread clinical use.¹⁰¹ The CART variant among decision tree models uses a binary split at each node, which further simplifies the interpretation.

Second, the within-session CoV z-score averaged across all volumetry tools (and both hemispheres) was used for the definition of good versus poor acquisition sequences. Only one of the 19 considered volumetry tools was excluded, because it resulted in 3 to 16 times higher within-session variability compared with all other tools (Figure 2). The rationale for averaging over different volumetry tools was to generate a model for the classification of acquisition sequences that is independent of a specific volumetry tool. However, it is expected that a model trained for a specific volumetry tool can provide greater improvement of test-retest stability of HV estimates compared with the generic CART proposed here. The added value most likely is most pronounced for those volumetry tools that did not benefit from the generic CART (Figure 8, Supplemental Figure 6). All included volumetry tools have been validated technically and/or clinically for use in clinical practice and/or research (although the corresponding data may not have been published for each tool¹).

Third, test-retest stability of HV estimates was characterized by the CoV z-score across consecutive back-to-back repeat scans that were acquired without delay and without repositioning during a single scanning session. The FTHP dataset does not allow testing test-retest stability across scans with repositioning of the head.

Fourth, short-term test-retest stability of HV estimates across back-to-back repeat scans in the same scanning session may be improved by longitudinal processing.^102,103 This could not be tested here, as the participating developers/providers of the volumetry tools were blinded with respect to the scanning session.

Fifth, the threshold on the CoV z-score for the discrimination between “good” and “poor” scanning sessions was derived from the distribution of the mean within-session CoV z-score across all volumetry tools, that is, from the same data to which it then was applied. This approach will always identify “poor” sessions, even if all considered sequences are actually adequate. In the latter case, the “poor” sequences are not really bad but just “worse” than the “good” sequences.

Sixth, the study was blinded with respect to the volumetry tools. The developers/providers of the volumetry tools had been assured that the identity of the volumetry tools is not disclosed to avoid potential conflicts of interest, particularly among the commercial participants. However, we do not consider this a major limitation of the study, since the FTHP dataset does not allow direct comparison of volumetry tools with respect to clinically relevant tasks, e.g., detection of Alzheimer's disease or prediction of progression from mild cognitive impairment to dementia. Furthermore, the findings regarding within-session and between-sessions variability of HV estimates do not allow reliable conclusions regarding the performance of the tools in these clinical tasks, because volumetry tools that have the strongest inbuilt biases are likely to appear to be the most stable (a tool that simply returns a value of 3.0 ml for all hippocampi is perfectly stable but it is clearly useless). Most volumetry tools involve some kind of regularization, and increasing the amount of regularization will reduce variance but often at the expense of increasing bias.

Seventh, the decision tree model was derived from the same dataset on which it was tested. This might have resulted in overly optimistic performance estimates due to overfitting, despite the use of 5-fold cross-validation. Thus, the proposed decision tree (including its thresholds) requires validation in independent (out-of-distribution) test datasets.

Eighth, this study had an exclusive focus on test-retest stability of hippocampus volumetry. Thus, the decision tree model may not be transferred to MRI-based volumetry of other brain regions without adjustment.

Finally, motion artefacts present a significant problem in clinical practice and also cannot be ruled out in human phantom scans. This could result in the erroneous categorization of an acquisition sequence as “good” or as “poor” with respect to test-retest stability of hippocampus volumetry. Therefore, it is crucial to strictly avoid motion in human phantom scans intended for characterizing the acquisition sequence used (as implied by the term “phantom”). The IQM employed in the current study to train the CART model (CNR, AFNI-FWHM, FOVx, FOVy) are not particularly sensitive to motion and are therefore not useful for detecting or excluding (subtle) motion. Of the general IQM provided by the MRIQC software package, the entropy-focus criterion (EFC), which is based on the Shannon entropy of the voxel intensities, is sensitive to ghosting and blurring induced by head motion.²⁹ Thus, the following two-step approach could be used to characterise an acquisition sequence as either “good” or “poor” with respect to test-retest stability in hippocampus volumetry based on human phantom scans: First, the EFC is used to check for relevant motion artefacts in the given human phantom scans. If the scans pass this first step, indicating that there are no relevant motion artefacts, the proposed CART model can be used in the second step to classify the acquisition sequence with respect to test-retest stability in hippocampus volumetry. If the human phantom scans do not pass the first step, indicating relevant motion, human phantom scanning should be repeated using the same scanner with the same acquisition sequence. In the current study, the EFC (within-session mean) did not differ significantly between the 95 “good” scanning sessions and the 27 “poor” scanning sessions when assessed using the heteroscedastic independent samples t-test (EFC = 0.601 ± 0.085 versus EFC = 0.615 ± 0.063, p = 0.358). This suggests that motion was not a major factor in discriminating between “good” and “poor” scanning sessions in the FTHP dataset. This was to be expected, given that the healthy man who served as the human phantom for the FTHP dataset was particularly motivated to avoid any head motion during the MR scans.

In conclusion, an easy-to-use and easy-to-interpret IQM-based decision tree model provides useful performance for the discrimination between T1-weighted sequences associated with good versus poor test-retest stability of hippocampus volumetry. A narrow field-of-view in left-right direction (<150 mm) and low contrast-to-noise ratio were identified as risk factors for low test-retest stability of HV volumetry. This might be useful to increase the statistical power for the detection of cross-sectional differences or longitudinal changes in large-scale multicenter hippocampal volumetry research.

Supplemental Material

sj-docx-1-alz-10.1177_13872877251380301 - Supplemental material for Easy-to-use and easy-to-interpret quality control of 3D gradient echo T1-weighted MR acquisition sequences for improved test-retest stability of MRI-based hippocampus volumetry

Supplemental material, sj-docx-1-alz-10.1177_13872877251380301 for Easy-to-use and easy-to-interpret quality control of 3D gradient echo T1-weighted MR acquisition sequences for improved test-retest stability of MRI-based hippocampus volumetry by Ralph Buchert, Per Suppa, Babak A Ardekani, Fuensanta Bellvís Bataller, Pierrick Bourgeat, Pierrick Coupé, Robert Dahnke, Gabriel A Devenyi, Simon Fristed Eskildsen, Clara Fischer, Jose Vincente Manjón Herrera, Christian Ledig, Andreas Lemke, Bénédicte Maréchal, Roland Opfer, Diana M Sima, Lothar Spies, Aziz M Ulug and Hans-Jürgen Huppertz in Journal of Alzheimer's Disease

Footnotes

Acknowledgements

The authors have no acknowledgments to report.

ORCID iDs

Ralph Buchert

Gabriel A Devenyi

Ethical considerations

This study was approved by the ethics review board of the general medical council of the state of Hamburg, Germany (reference number PV5930).

Consent to participate

The volunteer provided written informed consent for the retrospective use of the FTHP dataset for research purposes.

Consent for publication

Not applicable

Author contribution(s)

Ralph Buchert: Conceptualization; Data curation; Formal analysis; Investigation; Methodology; Project administration; Resources; Software; Validation; Writing – original draft.

Per Suppa: Conceptualization; Data curation; Formal analysis; Investigation; Methodology; Project administration; Resources; Validation; Writing – original draft.

Babak A Ardekani: Formal analysis; Investigation; Methodology; Resources; Software; Writing – review & editing.

Fuensanta Bellvís Bataller: Formal analysis; Investigation; Methodology; Resources; Software; Writing – review & editing.

Pierrick Bourgeat: Formal analysis; Investigation; Methodology; Resources; Software; Writing – review & editing.

Pierrick Coupé: Formal analysis; Investigation; Methodology; Resources; Software; Writing – review & editing.

Robert Dahnke: Formal analysis; Investigation; Methodology; Resources; Software; Writing – review & editing.

Gabriel A Devenyi: Formal analysis; Investigation; Methodology; Resources; Software; Writing – review & editing.

Simon Fristed Eskildsen: Formal analysis; Investigation; Methodology; Resources; Software; Writing – review & editing.

Clara Fischer: Formal analysis; Investigation; Methodology; Resources; Software; Writing – review & editing.

Jose Vincente Manjón Herrera: Formal analysis; Investigation; Methodology; Resources; Software; Writing – review & editing.

Christian Ledig: Formal analysis; Investigation; Methodology; Resources; Software; Writing – review & editing.

Andreas Lemke: Formal analysis; Investigation; Methodology; Resources; Software; Writing – review & editing.

Bénédicte Maréchal: Formal analysis; Investigation; Methodology; Resources; Software; Writing – review & editing.

Roland Opfer: Formal analysis; Investigation; Methodology; Resources; Software; Writing – review & editing.

Diana M Sima: Formal analysis; Investigation; Methodology; Resources; Software; Writing – review & editing.

Lothar Spies: Formal analysis; Investigation; Methodology; Resources; Software; Writing – review & editing.

Aziz M Ulug: Formal analysis; Investigation; Methodology; Resources; Software; Writing – review & editing.

Hans-Jürgen Huppertz: Conceptualization; Data curation; Formal analysis; Investigation; Methodology; Project administration; Resources; Software; Validation; Writing – original draft.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Fuensanta Bellvís Bataller reports fees from Quibim related to this work. Andreas Lemke is CEO and shareholder of mediaire GmbH. Bénédicte Maréchal is employee of Siemens Healthineers. Roland Opfer is employee of jung diagnostics. Diana M. Sima is employee of icometrix. Lothar Spies is employee of jung diagnostics. Aziz M. Ulug is employee of Cortechs Labs. Ralph Buchert and Babak Ardekani are Editorial Board Members of this journal but were not involved in the peer-review process of this article nor had access to any information regarding its peer-review. All other authors declare no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability statement

The FTHP dataset is freely available at . Hippocampus volume estimates (anonymized with respect to the volumetry tools) are available from the corresponding author on reasonable request.

Supplemental material

Supplemental material for this article is available online.

References

Pemberton

Zaki

LAM

Goodkin

, et al. Technical and clinical validation of commercial automated volumetric MRI tools for dementia diagnosis-a systematic review. Neuroradiology 2021; 63: 1773–1789.

Raji

Benzinger

TLS

. Overview of MR imaging volumetric quantification in neurocognitive disorders. Top Magn Reson Imaging 2019; 28: 311–315.

Risacher

Saykin

. Neuroimaging biomarkers of neurodegenerative diseases and dementia. Semin Neurol 2013; 33: 386–416.

Vernooij

Pizzini

Schmidt

, et al. Dementia imaging in clinical practice: a European-wide survey of 193 centres and conclusions by the ESNR working group. Neuroradiology 2019; 61: 633–642.

Dubois

Feldman

Jacova

, et al. Research criteria for the diagnosis of Alzheimer's disease: revising the NINCDS-ADRDA criteria. Lancet Neurol 2007; 6: 734–746.

Frisoni

Fox

Jack

, et al. The clinical use of structural MRI in Alzheimer disease. Nat Rev Neurol 2010; 6: 67–77.

McEvoy

Brewer

. Quantitative structural MRI for early detection of Alzheimer's disease. Expert Rev Neurother 2010; 10: 1675–1688.

Suppa

Anker

Spies

, et al. Fully automated atlas-based hippocampal volumetry for detection of Alzheimer's disease in a memory clinic setting. J Alzheimers Dis 2015; 44: 183–193.

Teipel

Kilimann

Thyrian

, et al. Potential role of neuroimaging markers for early diagnosis of dementia in primary care. Curr Alzheimer Res 2018; 15: 18–27.

10.

Ten Kate

Barkhof

Boccardi

, et al. Clinical validity of medial temporal atrophy as a biomarker for Alzheimer's disease in the context of a structured 5-phase development framework. Neurobiol Aging 2017; 52: 167–182.e1.

11.

Jack

Bennett

Blennow

, et al. NIA-AA research framework: toward a biological definition of Alzheimer's disease. Alzheimers Dement 2018; 14: 535–562.

12.

Therriault

Schindler

Salvadó

, et al. Biomarker-based staging of Alzheimer disease: rationale and clinical applications. Nat Rev Neurol 2024; 20: 232–244.

13.

Buchert

Lange

Suppa

, et al.

Magnetic resonance imaging-based hippocampus volume for prediction of dementia in mild cognitive impairment: why does the measurement method matter so little?

Alzheimers Dement 2018; 14: 976–978.

14.

Hill

DLG

Schwarz

Isaac

, et al. Coalition against major diseases/European medicines agency biomarker qualification of hippocampal volume for enrichment of clinical trials in predementia stages of Alzheimer's disease. Alzheimers Dement 2014; 10: 421–429.

15.

Suppa

Hampel

Spies

, et al. Fully automated atlas-based hippocampus volumetry for clinical routine: validation in subjects with mild cognitive impairment from the ADNI cohort. J Alzheimers Dis 2015; 46: 199–209.

16.

Cavedo

Suppa

Lange

, et al. Fully automatic MRI-based hippocampus volumetry using FSL-FIRST: intra-scanner test-retest stability, inter-field strength variability, and performance as enrichment biomarker for clinical trials using prodromal target populations at risk for Alzheimer's disease. J Alzheimers Dis 2017; 60: 151–164.

17.

Kruggel

Turner

Muftuler

, et al. Impact of scanner hardware and imaging protocol on image quality and compartment volume precision in the ADNI cohort. Neuroimage 2010; 49: 2123–2133.

18.

Liu

Yap

. Learning multi-site harmonization of magnetic resonance images without traveling human phantoms. Commun Eng 2024; 3: 6.

19.

Opfer

Suppa

Kepp

, et al. Atlas based brain volumetry: how to distinguish regional volume changes due to biological or physiological effects from inherent noise of the methodology. Magn Reson Imaging 2016; 34: 455–461.

20.

Takao

Hayashi

Ohtomo

. Effects of the use of multiple scanners and of scanner upgrade in longitudinal voxel-based morphometry studies. J Magn Reson Imaging 2013; 38: 1283–1291.

21.

Wolz

Schwarz

, et al. Robustness of automated hippocampal volumetry across magnetic resonance field strengths and repeat images. Alzheimers Dement 2014; 10: 430–438.

22.

Alexander-Bloch

Clasen

Stockman

, et al. Subtle in-scanner motion biases automated measurement of brain anatomy from in vivo MRI. Hum Brain Mapp 2016; 37: 2385–2397.

23.

Backhausen

Herting

Buse

, et al. Quality control of structural MRI images applied using FreeSurfer-a hands-on workflow to rate motion artifacts. Front Neurosci 2016; 10: 558.

24.

Reuter

Tisdall

Qureshi

, et al. Head motion during MRI acquisition reduces gray matter volume and thickness estimates. Neuroimage 2015; 107: 107–115.

25.

Kaufman

Kramer

Crooks

, et al. Measuring signal-to-noise ratios in MR imaging. Radiology 1989; 173: 265–267.

26.

Alfaro-Almagro

Jenkinson

Bangerter

, et al. Image processing and quality control for the first 10,000 brain imaging datasets from UK Biobank. Neuroimage 2018; 166: 400–424.

27.

Miller

Alfaro-Almagro

Bangerter

, et al. Multimodal population brain imaging in the UK Biobank prospective epidemiological study. Nat Neurosci 2016; 19: 1523–1536.

28.

Van Horn

Toga

. Human neuroimaging as a “Big Data” science. Brain Imaging Behav 2014; 8: 323–331.

29.

Esteban

Birman

Schaer

, et al. MRIQC: advancing the automatic prediction of image quality in MRI from unseen sites. PLoS One 2017; 12: e0184661.

30.

Gunter

Bernstein

Borowski

, et al. Measurement of MRI scanner performance with the ADNI phantom. Med Phys 2009; 36: 2193–2205.

31.

Keenan

Ainslie

Barker

, et al. Quantitative magnetic resonance imaging phantoms: a review and the need for a system phantom. Magn Reson Med 2018; 79: 48–61.

32.

Price

Axel

Morgan

, et al. Quality assurance methods and phantoms for magnetic-resonance-imaging – report of Aapm nuclear-magnetic-resonance task group No-1. Med Phys 1990; 17: 287–295.

33.

Duchesne

Chouinard

Potvin

, et al. The Canadian dementia imaging protocol: harmonizing national cohorts. J Magn Reson Imaging 2019; 49: 456–465.

34.

Gouttard

Styner

Prastawa

, et al. Assessment of reliability of multi-site neuroimaging via traveling phantom study. Med Image Comput Comput Assist Interv 2008; 11: 263–270.

35.

Hawco

Dickie

Herman

, et al. A longitudinal multi-scanner multimodal human neuroimaging dataset. Sci Data 2022; 9: 332.

36.

Hawco

Viviano

Chavez

, et al. A longitudinal human phantom reliability study of multi-center T1-weighted, DTI, and resting state fMRI data. Psychiatry Res Neuroimaging 2018; 282: 134–142.

37.

Palacios

Martin

Boss

, et al. Toward precision and reproducibility of diffusion tensor imaging: a multicenter diffusion phantom and traveling volunteer study. AJNR Am J Neuroradiol 2017; 38: 537–545.

38.

Shinohara

Nair

, et al. Volumetric analysis from a harmonized multisite brain MRI study of a single subject with multiple sclerosis. AJNR Am J Neuroradiol 2017; 38: 1501–1509.

39.

Treit

Stolz

Rickard

, et al. Lifespan volume trajectories from non-harmonized T1-weighted MRI do not differ after site correction based on traveling human phantoms. Front Neurol 2022; 13: 826564.

40.

Zhu

Qiu

, et al. Quantification of accuracy and precision of multi-center DTI measurements: a diffusion phantom and human brain study. Neuroimage 2011; 56: 1398–1411.

41.

Garcia-Dias

Scarpazza

Baecker

, et al. Neuroharmony: a new tool for harmonizing volumetric MRI data from unseen scanners. Neuroimage 2020; 220: 117127.

42.

Opfer

Krüger

Spies

, et al. Automatic segmentation of the thalamus using a massively trained 3D convolutional neural network: higher sensitivity for the detection of reduced thalamus volume by improved inter-scanner stability. Eur Radiol 2023; 33: 1852–1861.

43.

Rudolph

Rueckel

Dopfert

, et al. Artificial intelligence-based rapid brain volumetry substantially improves differential diagnosis in dementia. Alzheimers Dement (Amst) 2024; 16: e70037.

44.

Haase

Lehnen

Schmeel

, et al. External evaluation of a deep learning-based approach for automated brain volumetry in patients with huntington's disease. Sci Rep 2024; 14: 9243.

45.

Purrer

Pohl

Lueckel

, et al. Artificial-intelligence-based MRI brain volumetry in patients with essential tremor and tremor-dominant Parkinson's disease. Brain Commun 2023; 5: fcad271.

46.

Schmitter

Roche

Marechal

, et al. An evaluation of volume-based morphometry for prediction of mild cognitive impairment and Alzheimer's disease. Neuroimage Clin 2015; 7: 7–17.

47.

Kober

and Siemens Healthineers Morphometry R&D Team . Letter to the editor regarding article “technical and clinical validation of commercial automated volumetric MRI tools for dementia diagnosis-a systematic review” Neuroradiology 2022; 64: 847–848.

48.

Coupe

Manjon

Fonov

, et al. Patch-based segmentation using expert priors: application to hippocampus and ventricle segmentation. Neuroimage 2011; 54: 940–954.

49.

Ardekani

Izadi

Hadid

, et al. Effects of sex, age, and apolipoprotein E genotype on hippocampal parenchymal fraction in cognitively normal older adults. Psychiatry Res Neuroimaging 2020; 301: 111107.

50.

Goff

Zeng

Ardekani

, et al. Association of hippocampal atrophy with duration of untreated psychosis and molecular biomarkers during initial antipsychotic treatment of first-episode psychosis. JAMA Psychiatry 2018; 75: 370–378.

51.

Ardekani

Convit

Bachman

. Analysis of the MIRIAD data shows sex differences in hippocampal atrophy progression. J Alzheimers Dis 2016; 50: 847–857.

52.

Huppertz

Kroll-Seger

Kloppel

, et al. Intra- and interscanner variability of automated voxel-based volumetry based on a 3D probabilistic atlas of human cerebral structures. Neuroimage 2010; 49: 2216–2224.

53.

Huppertz

Möller

Südmeyer

, et al. Differentiation of neurodegenerative parkinsonian syndromes by volumetric magnetic resonance imaging analysis and support vector machine classification. Mov Disord 2016; 31: 1506–1517.

54.

Bourgeat

Dore

Fripp

, et al. Web-based automated PET and MR quantification. AIBL research group. P3-178. Alzheimers Dement 2015; 11: 698.

55.

Acosta

Fripp

Dore

, et al. Cortical surface mapping using topology correction, partial flattening and 3D shape context-based non-rigid registration for use in quantifying atrophy in Alzheimer's disease. J Neurosci Methods 2012; 205: 96–109.

56.

Dieckmeyer

Roy

Senapati

, et al. Effect of MRI acquisition acceleration via compressed sensing and parallel imaging on brain volumetry. MAGMA 2021; 34: 487–497.

57.

Struyfs

Sima

Wittens

, et al. Automated MRI volumetry as a diagnostic tool for Alzheimer's disease: validation of icobrain dm. Neuroimage Clin 2020; 26: 102243.

58.

Franco

Granata

Fusco

, et al. Artificial intelligence and radiation effects on brain tissue in glioblastoma patient: preliminary data using a quantitative tool. Radiol Med 2023; 128: 813–827.

59.

Zaki

LAM

Vernooij

Smits

, et al. Comparing two artificial intelligence software packages for normative brain volumetry in memory clinic imaging. Neuroradiology 2022; 64: 1359–1366.

60.

Chupin

Mukuna-Bantumbakulu

Hasboun

, et al. Anatomically constrained region deformation for the automated segmentation of the hippocampus and the amygdala: method and validation on controls and patients with Alzheimer's disease. Neuroimage 2007; 34: 996–1019.

61.

Chupin

Hammers

Liu

, et al. Automatic segmentation of the hippocampus and the amygdala driven by hybrid constraints: method and validation. Neuroimage 2009; 46: 749–761.

62.

Fillon

Colliot

Hasboun

, et al. Multi-template approaches for segmenting the hippocampus: the case of the SACHA software. In: organization for human brain mapping (OHBM), Honolulu, United States, Jun 2015, 2015, p.5.

63.

Manjón

Coupé

. Volbrain: an online MRI brain volumetry system. Front Neuroinform 2016; 10: 30.

64.

Ahdidan

Raji

DeYoe

, et al. Quantitative neuroimaging software for clinical assessment of hippocampal volumes on MR imaging. J Alzheimers Dis 2016; 49: 723–732.

65.

Simarro

Meyer

Van Eyndhoven

, et al. A deep learning model for brain segmentation across pediatric and adult populations. Sci Rep 2024; 14: 11735.

66.

Ledig

Heckemann

Hammers

, et al. Robust whole-brain segmentation: application to traumatic brain injury. Med Image Anal 2015; 21: 40–58.

67.

Heckemann

Ledig

Gray

, et al. Brain extraction using label propagation and group agreement: pincram. PLoS One 2015; 10: e0129211.

68.

Ledig

Schuh

Guerrero

, et al. Structural brain imaging in Alzheimer's disease and mild cognitive impairment: biomarker analysis and shared morphometry database. Sci Rep 2018; 8: 11258.

69.

Fischl

Salat

Busa

, et al. Whole brain segmentation: automated labeling of neuroanatomical structures in the human brain. Neuron 2002; 33: 341–355.

70.

Fischl

. Freesurfer. Neuroimage 2012; 62: 774–781.

71.

Brewer

Magda

Airriess

, et al. Fully-automated quantification of regional brain volumes for improved detection of focal atrophy in Alzheimer disease. Am J Neuroradiol 2009; 30: 578–580.

72.

Azab

Carone

Ying

, et al. Mesial temporal sclerosis: accuracy of NeuroQuant versus neuroradiologist. Am J Neuroradiol 2015; 36: 1400–1406.

73.

Pipitone

Park

Winterburn

, et al. Multi-atlas segmentation of the whole hippocampus and subfields using multiple automatically generated templates. Neuroimage 2014; 101: 494–512.

74.

Tuckey

. Exploratory data analysis. Philippines: Addison-Wesley, 1977.

75.

Reguig

Chupin

Dary

, et al. Learning brain MRI quality control: a multi-factorial generalization problem. arXiv 2022; 2205.15898.

76.

Cox

. AFNI: software for analysis and visualization of functional magnetic resonance neuroimages. Comput Biomed Res 1996; 29: 162–173.

77.

Magnotta

Friedman

and FIRST BIRN . Measurement of signal-to-noise and contrast-to-noise in the fBIRN multicenter imaging study. J Digit Imaging 2006; 19: 140–147.

78.

Simpson

. Measurement of diversity. Nature 1949; 163: 688–688.

79.

Kim

Irimia

Hobel

, et al. The LONI QC system: a semi-automated, web-based and freely-available environment for the comprehensive quality control of neuroimaging data. Front Neuroinform 2019; 13: 60.

80.

Dietrich

Raya

Reeder

, et al. Measurement of signal-to-noise ratios in MR images: influence of multichannel coils, parallel imaging, and reconstruction filters. J Magn Reson Imaging 2007; 26: 375–385.

81.

Hendriks

Mutsaerts

Joules

, et al. A systematic review of (semi-)automatic quality control of T1-weighted MRI scans. Neuroradiology 2024; 66: 31–42.

82.

Marques

Kober

Krueger

, et al. MP2RAGE, A self bias-field corrected sequence for improved segmentation and T1-mapping at high field. Neuroimage 2010; 49: 1271–1281.

83.

Ganzetti

Wenderoth

Mantini

. Intensity inhomogeneity correction of structural MR images: a data-driven approach to define input algorithm parameters. Front Neuroinform 2016; 10: 10.

84.

Klapwijk

van de Kamp

van der Meulen

, et al. Qoala-T: a supervised-learning tool for quality control of FreeSurfer segmented MRI data. Neuroimage 2019; 189: 116–129.

85.

Mortamet

Bernstein

Jack

, et al. Automatic quality assessment in structural brain magnetic resonance imaging. Magn Reson Med 2009; 62: 365–372.

86.

Shehzad

Giavasis

, et al. The preprocessed connectomes project quality assessment protocol – a resource for measuring the quality of MRI data. Front Neurosci Conference Abstract: Neuroinformatics 2015. doi: 10.3389/conf.fnins.2015.91.00047

87.

Smith

Jenkinson

Woolrich

, et al. Advances in functional and structural MR image analysis and implementation as FSL. Neuroimage 2004; 23(Suppl 1): S208–S219.

88.

Woodard

Carley-Spencer

. No-reference image quality metrics for structural MRI. Neuroinformatics 2006; 4: 243–262.

89.

Pouwels

PJW

Vriend

Liu

, et al. Global multi-center and multi-modal magnetic resonance imaging study of obsessive-compulsive disorder: harmonization and monitoring of protocols in healthy volunteers and phantoms. Int J Methods Psychiatr Res 2023; 32: e1931.

90.

Gunter

Thostenson

Borowski

, et al. ADNI-3 MRI Protocol, 2017.

91.

Fortin

Cullen

Sheline

, et al. Harmonization of cortical thickness measurements across scanners and sites. Neuroimage 2018; 167: 104–120.

92.

Iglesias

Billot

Balbastre

, et al. SynthSR: a public AI tool to turn heterogeneous clinical brain scans into high-resolution T1-weighted images for 3D morphometry. Sci Adv 2023; 9: eadd3607.

93.

Zuo

Dewey

Liu

, et al. Unsupervised MR harmonization by learning disentangled representations using information bottleneck theory. Neuroimage 2021; 243: 118569.

94.

Næss-Schmidt

Tietze

Blicher

, et al. Automatic thalamus and hippocampus segmentation from MP2RAGE: comparison of publicly available methods and implications for DTI quantification. Int J Comput Ass Rad 2016; 11: 1979–1991.

95.

Boccardi

Bocchetta

Morency

, et al. Training labels for hippocampal segmentation based on the EADC-ADNI harmonized hippocampal protocol. Alzheimers Dement 2015; 11: 175–183.

96.

Frisoni

Jack

Bocchetta

, et al. The EADC-ADNI harmonized protocol for manual hippocampal segmentation on magnetic resonance: evidence of validity. Alzheimers Dement 2015; 11: 111–125.

97.

Wolf

Bocchetta

Preboske

, et al. Reference standard space hippocampus labels according to the European Alzheimer's disease consortium-Alzheimer's disease neuroimaging initiative harmonized protocol: utility in automated volumetry. Alzheimers Dement 2017; 13: 893–902.

98.

De Flores

La Joie

Chételat

. Structural imaging of hippocampal subfields in healthy aging and Alzheimer's disease. Neuroscience 2015; 309: 29–50.

99.

Zhang

Xie

Cheng

, et al. Hippocampal subfield volumes in mild cognitive impairment and Alzheimer's disease: a systematic review and meta-analysis. Brain Imaging Behav 2023; 17: 778–793.

100.

Mudali

Teune

Renken

, et al. Classification of Parkinsonian syndromes from FDG-PET brain data using decision trees with SSM/PCA features. Comput Math Methods Med 2015; 2015: 136921.

101.

Nazari

Kluge

Apostolova

, et al. Explainable AI to improve acceptance of convolutional neural networks for automatic classification of dopamine transporter SPECT in the diagnosis of clinically uncertain parkinsonian syndromes. Eur J Nucl Med Mol Imaging 2022; 49: 1176–1186.

102.

Ardekani

and Alzheimer’s Disease Neuroimaging Initiative . A new approach to symmetric registration of longitudinal structural MRI of the human brain. J Neurosci Methods 2022; 373: 109563.

103.

Ashburner

. A fast diffeomorphic image registration algorithm. Neuroimage 2007; 38: 95–113.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

1.92 MB