Abstract
Background:
Multiple neurological disorders including Alzheimer’s disease (AD), mesial temporal sclerosis, and mild traumatic brain injury manifest with volume loss on brain MRI. Subtle volume loss is particularly seen early in AD. While prior research has demonstrated the value of this additional information from quantitative neuroimaging, very few applications have been approved for clinical use. Here we describe a US FDA cleared software program, NeuroreaderTM, for assessment of clinical hippocampal volume on brain MRI.
Objective:
To present the validation of hippocampal volumetrics on a clinical software program.
Method:
Subjects were drawn (n = 99) from the Alzheimer Disease Neuroimaging Initiative study. Volumetric brain MR imaging was acquired in both 1.5 T (n = 59) and 3.0 T (n = 40) scanners in participants with manual hippocampal segmentation. Fully automated hippocampal segmentation and measurement was done using a multiple atlas approach. The Dice Similarity Coefficient (DSC) measured the level of spatial overlap between NeuroreaderTM and gold standard manual segmentation from 0 to 1 with 0 denoting no overlap and 1 representing complete agreement. DSC comparisons between 1.5 T and 3.0 T scanners were done using standard independent samples T-tests.
Results:
In the bilateral hippocampus, mean DSC was 0.87 with a range of 0.78–0.91 (right hippocampus) and 0.76–0.91 (left hippocampus). Automated segmentation agreement with manual segmentation was essentially equivalent at 1.5 T (DSC = 0.879) versus 3.0 T (DSC = 0.872).
Conclusion:
This work provides a description and validation of a software program that can be applied in measuring hippocampal volume, a biomarker that is frequently abnormal in AD and other neurological disorders.
INTRODUCTION
The hippocampus is a vital temporal lobe structure in memory [1, 2] and loses volume in multiple brain disorders including Alzheimer’s disease (AD), depression, schizophrenia, traumatic brain injury, post-traumatic stress disorder, and mesial temporal sclerosis from temporal lobe epilepsy [3–7]. There are therefore multiple diseases in which clinicians could derive actionable information from hippocampal volume. Of the disorders listed above, AD has received the most attention with respect to potentially applying hippocampal volumes in clinical practice.
Currently for any person suspected having AD, the standard of care includes obtaining magnetic resonance imaging (MRI) of the brain [8]. However, the main purpose of doing so is to exclude any other causes of cognitive impairment (e.g., tumor, large stroke) as opposed to directly assessing the patterns of atrophy for indications of AD. The hippocampus is one of the first structures affected by the neuropathology of AD that is visible with MR imaging [9, 10] and the extent of atrophy correlates strongly with general and domain specific tests of cognitive function [11]. One pathology study found that visual assessments by two experienced neuroradiologists of the size of the hippocampus on MR imaging was insensitive for detecting early stage AD, with only 27% sensitivity [12]. Thus, while visual assessments cannot detect early hippocampal pathology, such identification is possible with quantitative approaches [13]. An Alzheimer Disease Neuroimaging Initiative (ADNI) study also showed that quantitative hippocampal volumetrics with FreeSurfer in 189 subjects (49 controls, 89 with mild cognitive impairment (MCI), and 50 with AD) were superior to visual ratings both for identifying controls from persons with MCI and for tracking progression from MCI to AD over 3.2 years [14]. Hippocampal volumetrics can therefore longitudinally assess progression in those at earlier stages of neurodegeneration.
Relevance of hippocampal volumetrics in AD is applicable not only to accurate and early diagnosis but also for preventive approaches due to increasing recognition of lifestyle factors in AD risk reduction [15–20]. The hippocampus is an increasingly identified target of risk modification in AD [21, 22]. Lifestyle factors from obesity to physical activity and diet have been shown to influence hippocampal structure measured on quantitative MR imaging volumetrics [23, 24]. Such prevention strategies will be needed as the number of persons with AD is projected to increase from 5.1 million in 2015 to 13.8 million by 2050 [25].
Hippocampal volumetrics can also provide additional important information in diagnosis and treatment of other disorders. Mild traumatic brain injury, for example, can present with hippocampal atrophy [4, 26] that can be detected with MR imaging in collegiate football players. Longitudinal assessment of hippocampal volume in a prospective study of 62 moderate to severe traumatic brain injury (TBI) patients secondary to trauma also showed abnormal low volumes evaluated on MR imaging at 3 and 12 months. Hippocampal volume loss is also related to combat service and post-traumatic stress disorder [27, 28]. Hippocampal volume has been suggested as a viable treatment biomarker in major depressive disorder [29, 30]. Additionally, hippocampal volume can aid in distinguishing bipolar from unipolar depression [31]. Severity of hippocampal volume is also useful in assessment of psychiatric diseases such as schizophrenia, has a larger degree of volume loss on MRI quantified volumes, particularly in the presubiculum and subiculum compared to bipolar [32]. Mesial temporal sclerosis, which is seen in 65% of persons with temporal lobe epilepsy, can present with hippocampal atrophy [33–36]. Asymmetry of such hippocampal atrophy has been shown to distinguish temporal lobe epilepsy, the cause of 60% of all epilepsy cases, from controls with 94% accuracy [13, 37]. Hippocampal volumetric asymmetry may also be used to predict laterality of seizure activity in medial temporal lobe epilepsy [38]. Imaging the hippocampus is therefore important for neurodegenerative and non-neurodegenerative diseases.
Multiple quantitative methods exist for measuring hippocampal volume; the original method was by hand traced borders of the hippocampus on serial MR images by a trained operator with knowledge of hippocampal anatomy [39]. While this method is considered the most rigorous for hippocampal volumetrics, the length of time required to trace one scan makes routine clinical use impractical. This has given rise to automated or semi-automated quantitative algorithms of that the boundary shift integral was one of the earlier examples [40]. Voxel-based methods have also been developed for hippocampal quantitation [41–43]. However, despite being available since the 1980 s and the presence of meta-analysis results about the added value of hippocampal volumetrics in AD [44], automated hippocampal assessments are not routine standard of care.
Recently, commercial hippocampal segmentation algorithms are being developed, but they have yet to gain widespread use [45]. However, the importance of developing such tools is recognized and guidelines have been proposed [46–48] for incorporating hippocampal volumetrics into clinical assessments for AD and drug trials. The potential of such a clinical application can also be applied to other disorders known to affect the hippocampus such as epilepsy, TBI, and depression. This work describes the validation of this automated hippocampal volumetric measurement program on 99 subjects from the ADNI study with manually segmented hippocampi. For this work, we specifically draw these images from scans on which Delphi consensus criteria were reached on the boundaries and standards for gold standard manual hippocampal volumetry [48]. This consensus provides the single best available manual volumetry standard to which we can compare a new fully automated brain MR imaging program, NeuroreaderTM.
MATERIALS AND METHODS
Subjects
All analyses of the de-identified human data were done or in accord with the Helsinki Declaration of 1975. Subjects were drawn from the European Alzheimer’s Disease Consortium - Alzheimer’s Disease Neuroimaging Initiative Harmonized Protocol (EADC-ADNI HarP) sub-study from the larger ADNI [48]. The EADC-ADNI HarP for Manual Hippocampal Segmentation project provided manually segmented 100 ADNI MR images for analysis of this study. The scans were obtained on a variety of commercial 1.5 Tand 3.0 T MR imaging scanners on subjects across the continuum of cognitive health including healthy controls, MCI, and AD as described in recent work [49]. Experts designated “master tracers” segmented the hippocampi based upon standardized HarP guidelines for anatomical landmarks of the hippocampus on MR imaging detailed in a separate user manual (http://www.centroalzheimer.it/public/SOPs/online/HarmonizedProtocol_ACPC_UserManual_Biblio.Pdf). To be considered a master tracer, an intra- and inter-rater intra-class correlation coefficient of at least 0.9 was required for all segmented hippocampal volumes [42].
We downloaded the hippocampal expansion labels produced by that manual segmentation and used them as ground truth in assessing the quality of our automated hippocampal segmentation. The list of the 100 original images used for the segmentation and the labels were retrieved from the EADC-ADNI HarP webpage: http://www.hippocampal-protocol.net/SOPs/index.php.
MRI technique
All MR images were produced as part of the ADNI study as fully described in prior work [41, 51]. Table 1 describes the scanning protocol of the ADNI MRI dataset, specifically the T1-weighted axial 3D MRIs used in the EADC-ADNI HarP project.
Image quality pre-processing
Image pre-processing was done using standard methodology from ADNI as described in prior work [51] and on the LONI website (http://adni.loni.usc.edu/methods/mri-analysis/mri-pre-processing/). To summarize, the images acquired using GE scanners received the following correction steps: Gradwrap to correct for the geometric distortion, B1 non-uniformity correction of intensity non-uniformity and N3 correction to sharpen the image by removing residual intensity non-uniformity. The images acquired using the Philips scanner only received an N3 non-uniformity correction.
NeuroreaderTM automated hippocampal and brain MRI segmentation
Figure 1 describes steps of the NeuroreaderTM processing algorithms.
The following analyses were performed twice: once at the Brainreader ApS. (J.A.) and for a second time at the University of Wisconsin (E.D.; J.M.). There were no differences in the results of these separate analyses. This work is based in part upon hippocampal volumetric algorithms applied in earlier work [52].
We downloaded the original ADNI images from the Laboratory of Neuroimaging Image Data Archive (https://ida.loni.usc.edu/login.jsp) by following the instructions provided on the website. The level of agreement between manual segmentation and automated quantitation was assessed by the Dice Similarity Coefficient (DSC), first described by Dice [53]. The DSC measures the similarity index, defined as the intersection divided by the mean volume of the two volumes. DSC measures similarity indices range between 0 (no overlap) and 1 (complete or perfect agreement) [54]. Only one image showed poor dice similarity [53, 55] between manually segmented hippocampal structure and NeuroreaderTM. The DSC visual inspection of the manually produced hippocampal mask from this one image showed poor quality segmentation of that image and was therefore omitted from the DSC calculations.
The NeuroreaderTM image processing pipeline is based on multi-atlas segmentation through use of non-linear registration. All original code for the following steps was written in C++ and combined into a concise program with a graphical user interface for potential clinical use. Images to be segmented are first run through an N4 correction that adds additional bias correction not addressed by N3 to further optimize non-uniformity correction [56]. Hippocampal segmentation was achieved with a multi-atlas based approach as this provides superior segmentation performance compared to segmentation based on a single atlas [57]. In a multiple atlas approach, probabilistic information is incorporated from multiple templates to account for anatomical variability in different populations [58]. The bias corrected image was linearly registered to the ICBM 2009c Nonlinear Symmetric 1×1×1 mm template using a block matching algorithm [59]. From a cohort of 200 images also registered to this template, the 10 atlases that corresponded best with the input image were selected based upon the highest normalized correlation coefficient between the atlas and the input image. Each of these 10 atlases was non-linearly registered to the input image using an inverse-consistent symmetric free form deformation method [60] running on a graphics processing unit. Based on the computed deformation fields from the non-linear registration, the hippocampus segmentations from each atlas were transferred to the input image and hippocampus probability maps were created. Local intensity and gradient information from the input image were then used in the segmentation algorithm to resolve which voxels to include in the hippocampus segmentation.
Processing times for NeuroreaderTM range from between 3–7 minutes as a function of image size, for 64 brain structural volumes including both hippocampi, irrespective of magnetic field strength. Hippocampal volumetry by expert manual tracers takes 30 minutes, 15 minutes for each hippocampus [61]. A list of brain structures segmented by NeuroreaderTM is provided in Supplementary Table 1 with results and analysis detailed on hippocampal volume as that is the main focus of this work.
Statistical analysis
The General Linear Model was utilized to statistically predict hippocampal volumes. Age and gender adjusted comparisons are then used for computation of z-scores to allow for potential comparisons between an image and a normative database. The normative database used for z-score calculations in NeuroreaderTM is derived from MP-RAGE 3.0T volumetric sequences obtained in cognitively normal persons from the ADNI-GO study [62]. It included 231 individuals, 113 women (age range 62–90 years) and 118 men (age 60–88 years). Lower z-scores denote volume loss in the hippocampus compared to normative data. The statistical model is used to calculate the confidence interval and the z-score for each measured volume, which is presented in a sample patient report from NeuroreaderTM in Fig. 2. Output includes hippocampal volume in milliliters, proportion of hippocampal volume as a proportion of estimated head size as measured as total intracranial volume (gray matter volume + white matter volume + cerebrospinal fluid volume + dura) and a z-score that can be expressed when comparing NeuroreaderTM output to normative data.
Additional analyses were performed with independent samples t-tests in order to assess if the DSC varied as a function of field strength. Statistically significant results were further assessed with a Cohen’s D value to measure effect size [63].
RESULTS
Table 2 displays the subject characteristics and quantitative results from NeuroreaderTM as a function of field strength. There were no statistically significant differences between the 1.5 T and 3.0 T groups in age, gender, or total intracranial volume.
Table 2 also shows that the DSC between the 1.5 Tand 3.0 T groups statistically significant (p = 0.03) with a small magnitude effect size (Cohen’s D = 0.3).
Table 3A shows DSC values in control, MCI, and AD. Statistically significant differences are seen between the DSC values when comparing control and AD in both hippocampi. Statistically significant differences are seen between MCI and AD DSC values in the right hippocampus. Table 3B shows that NeuroreaderTM can segment the hippocampus with an average DSC of 0.87 for both the right and left hippocampus across 1.5 T and 3.0 T field strengths. The DSC reaches a maximum of 0.91 across both samples.
Figure 3 displays a row of images representing color-coded hippocampal images.
DISCUSSION
This work describes a hippocampal volumetric technique, NeuroreaderTM, that can be applied in a routine clinical environment. Our results demonstrate overlap with gold standard manual volumetry approaching close to perfect agreement with average and maximal DSC values of 0.87 and 0.91 in an ADNI sample. This information can be generalized across both 1.5 and 3.0 T scanners as we included data from both types of MR imaging field strengths. NeuroreaderTM is a Class I medical device with Food and Drug Administration (FDA) 510(k) clearance (http://www.accessdata.fda.gov/cdrh_docs/pdf14/K140828.pdf) for the automated segmentation and labeling of brain structures on MR images. FDA clearance refers to a specific process whereby the FDA permits the marketing of medical devices, including software, after a careful review process more fully described elsewhere (http://www.fda.gov/MedicalDevices/ProductsandMedicalProcedures/DeviceApprovalsandClearances/510kClearances). For NeuroreaderTM, this included a review of the segmentation process, as seen in Fig. 1, and an overview of the data output for clinical use as summarized in Fig. 2. FDA clearance carries the implication that clinicians may apply NeuroreaderTM with safety and effectiveness on patient MR images of the brain.
While Neuroreader is the second FDA cleared MR imaging volumetric software program, there is no information available on DSC values for the other FDA-cleared tool, NeuroquantTM. However, there have been multiple studies done comparing manual volumetry to Freesurfer for the hippocampus, the progenitor software program to NeuroquantTM [64]. One study of 10 healthy controls, 10 persons with AD, and 10 with semantic dementia found a similarity index range of 0.45–0.59 between Freesurfer and manual volumetry in the hippocampus [65]. Another study in a Spanish cohort of 41 healthy controls, 23 with MCI, and 25 persons with AD found overlap between Freesurfer and manual volumetry between 0.74–0.81 in the hippocampus [66]. An ADNI cohort study of 80 subjects including healthy controls, MCI, and AD subjects found a similarity index of 0.82 between Freesurfer calculated hippocampal volumes and manual volumetry [67]. In the context of this literature, NeuroreaderTM performs well in computing hippocampal volumes. While the magnitude of the DSC is slightly lower in persons with AD, the overall DSC still compares favorably to other values found in the literature, as described above. However, future studies should be performed to further compare NeuroreaderTM in direct comparisons to other automated volumetric tools available for clinical application.
A misdiagnosis of AD incurs high cost [68] ranging from $9,500 to $14,000 per year, thus highlighting the potential cost-savings from imaging approaches that improve the correct identification of AD. One suggested approach for structural imaging is to therefore use hippocampal volumetrics added to MRI scans that are already indicated by current practice standards to determine who should obtain additional biomarkers, such as cerebrospinal fluid amyloid or other imaging tests, if an underlying neurodegenerative process is suspected based upon the results of hippocampal volumetrics [69]. Fast efficient and accurate hippocampal volumetric algorithms as detailed in the current study can therefore provide important information to clinicians in the work up and care of persons with AD [70]. Although molecular imaging biomarkers with amyloid imaging exist [71], there are several practical limitations. First, such methods involve the use of ionizing radiation. Second, no reimbursement from Centers for Medicare & Medicaid Services exists for amyloid imaging, although this issue continues to produce controversy [72]. Third, even if reimbursement was available for amyloid imaging, the costs of PET imaging are high at between $2,700–$5,000 [73, 74]. FDG-PET, which provides a broader array of differential considerations compared to compared to amyloid imaging, including the ability to identify frontotemporal and Lewy body dementia, costs an average of approximately $1,300 per scan [74]. By contrast, a non-contrast volumetric MR imaging scan of the brain costs $437.20 on average per scan [74]. The additional cost of using NeuroreaderTM or a program like it is approximately $80 on average [74]. Thus, the addition of MR imaging quantitative volumetrics can provide useful information at modest financial costs.
While the focus of this study was to describe the overlap between NeuroreaderTM automated hippocampal volumetrics and manual volumetry, this program is also FDA cleared for computing volumes of other brain structures. This has implications for application to other disorders featuring volume loss in non-hippocampal regions. A recent meta-analysis of 193 studies composed of 15,892 individuals across six diverse diagnostic groups including addiction, obsessive compulsive disorder, and anxiety feature gray matter volume loss particularly in the frontal lobes [75]. Whole brain volume changes longitudinally can have clinical importance, with change in total gray matter and white matter volumes over time being predictive of treatment response in multiple sclerosis [76]. Consequently, the potential for NeuroreaderTM to evolve for clinical application in other disorders given its measurement of various brain structures carries considerable potential.
Advantages of the work presented here are the use of a rigorous algorithm with validation against manually segmented images in a well-characterized and validated cohort with MR imaging on different scanners and field strengths. We have also shown in this work that the agreement between hippocampal volume and manual segmentation is high at both 1.5 T and 3 T MR imaging. Thus, using older 1.5 MR scanners will still allow for a highly effective segmentation and volumetric assessment. Implications of this work are therefore applications of hippocampal volumetrics in routine clinical practice onto a pre-existing workflow that results in MR imaging on persons with memory complaints and dementia. The main area of future improvement in this work is testing on subjects with serial MR imaging over time. Consequently, future directions include applications of this algorithm in community cohorts with longitudinal information to see if it is possible to identify potential pre-destined converters to mild cognitive impairment and AD. Other cohorts with diseases such as TBI, depression, and epilepsy can also be tested with this algorithm. Such work will improve the identification and subsequent care of persons with hippocampal specific braindisease.
We have shown in this work that NeuroreaderTM quantifies hippocampal volume that correlates with high fidelity to manual tracings. This software demonstrates sensitive volumetric assessments regardless of field strength used, either 1.5 or 3 T scanners. Such a program has the potential to be used in the clinical assessment of hippocampal disorders with particular applicability to AD and possible extension to other neuropsychiatric disorders.
Footnotes
ACKNOWLEDGMENTS
No external funding was used to support this work.
Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at:
.
