Abstract
INTRODUCTION
Alzheimer’s disease (AD), the most common type of dementia in elderly individuals, slowly and progressively diminishes cognitive function, impairs activities of daily living, and imposes physical, mental, and economic burdens on patients and their caregivers [1]. The pathophysiological process in AD begins years before the onset of cognitive symptoms [2]. Patients in the predementia phase of AD, called mild cognitive impairment (MCI) due to AD, are converted to AD dementia at a rate of 16.1% over 1 year [3] and 36.3% over 2 years [4] according to studies by the Alzheimer’s Disease Neuroimaging Initiative (ADNI). According to a systematic review by Ward et al. [5], conversion rates from MCI or amnestic MCI to AD dementia over 5 or more years were greater than 33% in most clinic-based and community-based studies [6–9]. Although disease-modifying treatments for AD dementia or MCI due to AD— therapeutic agents that can inhibit progression of the disease by acting on the pathophysiological process and delaying neurodegeneration or neuronal loss— remain to be approved, vigorous development of novel disease-modifying treatments and their clinical trials are ongoing.
Clinical trials of AD-modifying treatments require longer periods of time and larger sample sizes than those of symptomatic drugs (e.g., acetylcholinesterase inhibitors or N-methyl-D-aspartate receptor antagonists) [10]. To reduce the duration of clinical trials and sample sizes required, it is essential to establish a valid biomarker suitable for tracking disease progression that has higher precision and lower variance than the current gold-standard outcome measures based on neuropsychological examinations such as the Clinical Dementia Rating Scale Sum of Boxes (CDR-SB) [11] or Alzheimer’s Disease Assessment Scale-Cognitive Subscale (ADAS-Cog) [12, 13]. Furthermore, the biomarker should have high sensitivity for disease progression, high biological plausibility, and a strong relationship with the clinical features of AD [14].
Quantitative brain atrophy measurement over months or years calculated from serial magnetic resonance imaging (MRI) scans is one of the most promising progression biomarkers being explored. This biomarker could enable physicians to track disease progression and the therapeutic effects of disease-modifying treatments. Such approaches are a potential alternative to conventional neuropsychological measures and have shown greater statistical power to detect longitudinal changes than neuropsychological measures in ADNI studies [15–21]. Such a surrogate biomarker may play a key role in clinical trials and accelerate the development of novel drugs, as noted by Cummings et al. [22].
Among patients with AD and MCI, several studies showed greater atrophic rates of change in the hippocampus and temporal lobe of carriers of theapolipoprotein E gene allele
Although many reports have been published by the ADNI estimating sample sizes using cerebral atrophic rates or amounts derived from serial structural MRI [15–21], no reports are available on power calculations or estimates of sample size in Japan. While about 90% of the ADNI participants were white [3], all of the Japanese ADNI (J-ADNI) participants were Asian (Japanese). Thus, it is crucial to estimate how large a sample size would be needed for future clinical trials in Japan using atrophy measures from serial MRI as a surrogate biomarker. Accordingly, in the present study, we used an automated segmentation technique for the whole brain and hippocampus and the k-means normalized boundary shift integral (KN-BSI) to calculate the atrophy rates and estimate the sample sizes from serial MRI scans in the J-ADNI study for boosting the development of AD-modifying treatments. The automated segmentation of the hippocampus conformed to the standard segmentation protocol— harmonized protocol— that was recently developed by the ADNI and European Alzheimer’s Disease Consortium working group [26]. The BSI was adopted in the present study because it has been used in several clinical trials for AD therapies, including those of the first anti–β-amyloid vaccine (AN1792) [27], acetylcholinesterase inhibitors [28–30], and an N-methyl-D-aspartate receptor antagonist [31]. In addition, we examined whether
METHODS
Participants
Participants were recruited in the J-ADNI study. The J-ADNI was a multicenter study assessing neuroimaging in diagnosis and longitudinal monitoring that was started in 2008 in Japan by the New Energy and Industrial Technology Development Organization (NEDO) and the Ministry of Health, Labour and Welfare (MHLW). All of the participants were recruited at 38 Japanese clinical sites. They were followed up for 2–3 years using 1.5-T MRI, positron emission tomography (PET), biological fluid analysis, and neuropsychological batteries. All of the protocols were designed to be as compatible as possible to those of the ADNI. For additional details about the J-ADNI, see the previous article by the J-ADNI [32].
Participants were 60 to 84 years of age, generally healthy, spoke Japanese, lived at home, and had a study partner. Details of the J-ADNI inclusion and exclusion criteria can be found at https://upload.umin.ac.jp/cgi-open-bin/ctr_e/ctr_view.cgi?recptno=R000001668. Briefly, the inclusion criteria for cognitively normal (CN) participants included the following: a score of 24–30 on the Mini-Mental State Examination (MMSE) [33], Japanese version; a global score of 0 on the CDR, Japanese version; and an education-adjusted score above the cutoff level on the Wechsler Memory Scale-Revised (WMS-R) Logical Memory II [34], Japanese version (education for 0–9 years was ≥3, for 10–15 years was ≥5, and for >15 years was ≥9). The inclusion criteria for the MCI subjects were a score of 24–30 on the MMSE, memory disturbance identified by the study partner with or without the subjective complaint of the participant, a score of 0.5 on the CDR, and an education-adjusted score below the cutoff level on the WMS-R Logical Memory II (education for 0–9 years was ≤2, for 10–15 years was ≤4, and for >15 years was ≤8). The inclusion criteria for AD subjects was a score of 20–26 on the MMSE score, a score of 0.5 or 1 on the CDR, and an education-adjusted score below the cutoff level on the WMS-R Logical Memory II (same as for MCI). AD subjects also had to meet the criteria of the NINCDS-ADRDA (the National Institute of Neurological and Communicative Diseases and Stroke and the Alzheimer’s Disease and Related Disorders Association) [35] for probable AD. Exclusion criteria included brain lesions on screening or baseline MRI, neurological and psychiatric disorders other than AD, addiction to alcohol or other drugs, and use of psychoactive drugs or warfarin.
The institutional review boards at all participating sites approved the data collection procedures and written informed consent was obtained from all participants. If participants were not capable of agreeing, their study partner signed the informed consent form in substitution.
A total of 750 participants were first recruited at the 38 clinical sites in Japan. Those who provided written informed consent and passed screening based on the above inclusion/exclusion criteria were enrolled in the J-ADNI study. Finally, 537 participants were enrolled. The 537 participants underwent brain MRI at baseline. Follow-up MRI was performed at 6, 12, and 24 months for all participants and at 36 months only for MCI and CN participants. MCI participants additionally underwent MRI at 18 months. Clinical and cognitive assessments were also performed for all participants at the time of the baseline and follow-up scans. These assessments included MMSE, ADAS-Cog, and CDR-SB. Data were used for analysis from 149 AD, 234 MCI, and 154 CN participants. Clinical and demographic data are shown in Table 1. The participants’ IDs and visits used in the present study are listed in Supplementary Material A.
Data for the automated segmentation atlas set
Data used in the preparation of the atlas set for the automated segmentation described in Supplementary Material B were obtained from the ADNI database (http://adni.loni.usc.edu). The ADNI was launched in 2003 as a public–private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial MRI, PET, other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of MCI and early AD. For up-to-date information, see http://www.adni-info.org.
MRI acquisition and image correction
Structural MR images were acquired on 1.5-T MRI scanners from three vendors (GE Healthcare, Milwaukee, WI; Siemens Medical Solutions, Erlangen, Germany; and Philips Medical Systems, Best, The Netherlands) using a three-dimensional sagittal magnetization-prepared rapid gradient-echo imaging (MPRAGE) sequence. Typical acquisition parameters were an inversion time of 1000 ms, repetition time of 2400 ms, minimum echo time, flip angle of 8°, field of view of 240×240 mm2, and in-plane resolution of 192×192 (1.25×1.25 mm2) or 256×256 (0.94×0.94 mm2) with slice thickness of 1.2 mm.
In this study, 3D MR images were acquired at 38 sites using scanners from the three different vendors. We performed the following three-step approach to minimize variations among scanners. First, we used a consistent MRI pulse sequence for MPRAGE over time. The MPRAGE sequence was used for all scanners to enhance gray/white matter contrast for superior gray/white matter segmentation. The parameters of the MPRAGE were chosen to be as close as possible to those of the MRI sequence of the US-ADNI [36]. For GE scanners, we installed a customized MPRAGE sequence with the permission of the University of Virginia.
Second, we checked whether any images suffered serious degradation due to motion artifacts, aliasing artifacts inside the skull, low signal-to-noise ratio, signal loss, or metal artifacts. Seriously degraded images were excluded to alleviate the influences of degradation on the results of the longitudinal and cross-sectional analyses.
Third, original MR images were pre-processed with the N3 intensity inhomogeneity correction [37] for all scanners and the B1 correction for scanners with a phased array receive coil to reduce intensity inhomogeneity due to non-uniform sensitivity of the receive coil [38]. Subsequently, phantom-based distortion correction [39] was performed to correct geometric distortion caused by the gradient non-linearity and static magnetic field inhomogeneity of each scanner.
Image processing
Our fully automated measurement procedure for the assessment of whole brain and hippocampalatrophy in serial MRI scans consists of two components: (1) automated segmentation of the whole brain and hippocampus using the multi-atlas image segmentation approach [40] and the corrective learning technique [41]; and (2) KN-BSI using multi-time-point symmetric affine registration with symmetric differential bias correction [19, 42–46]. For full details and assessments of this quantification procedure, see Supplementary Material B. Moreover, for head-to-head comparison with this procedure using the same dataset of the J-ADNI, one of the current state-of-the-art image analysis methods— FreeSurfer version 5.3 cross-sectional and longitudinal stream [47–49]— was used to estimate the atrophic changes of specific regions, including the hippocampal volume, lateral ventricle volume, and entorhinal cortical thickness, from serial MRI scans. The lateral ventricle consisted of the left and right lateral ventricles and inferior lateralventricles.
Recently, FreeSurfer has been shown to have similar reproducibility for atrophic measurements as manual hippocampal segmentation [50]. In the present study, no manual editing or exclusion due to processing failure was done at any stage of our procedure using KN-BSI and FreeSurfer. Note that FreeSurfer did not complete the cross-sectional stream for one participant (ID = JADNI0563, at 18 months) and the longitudinal stream for two participants (ID = JADNI0048, at 24 months and ID = JADNI0602, at 6 months).
ApoE genotyping
Of the 537 participants, 534 agreed to blood sampling for
Statistical analyses
We used sample size estimation to evaluate the neuropsychological examination scores (CDR-SB, ADAS-Cog (the modified 13-item version) [13], and MMSE) and the measures from serial MRI using KN-BSI. Statistical analyses were separately performed for the MRI brain atrophy measures and cognitive measures. At each visit, if an MPRAGE scan was judged not to be suitable for image analysis at quality control assessments (e.g., due to gross motion artifacts) or the image processing resulted in failure, only cognitive measures at the same visit of the same participant were included in the statistical analysis (Fig. 1). Please note that cognitive measures were obtained at every visit for all participants. Adopting a previous method [25], we considered two-arm and equal allocation trials for a hypotheticalAD-modifying treatment versus placebo with the scores and above measures as the longitudinal outcomes and with two durations, 1 year and 2 years. In this setting, we calculated sample sizes to detect a 25% reduction in the mean rate of change (annual change) in the outcomes during the trial period with 80% power and a two-sided significance level of 5% with and without comparison to normal aging.
The power analysis is based on a linear mixed-effects model with random intercepts and slopes. Let
where β0 and β1 are a fixed intercept and slope, respectively, α0 and α1 are a random intercept and slope, respectively, assuming bivariate normal distribution, and
For the 1-year trial, data at baseline, 6 months, and 12 months were used. For the 2-year trial, data at baseline, 6 months, 12 months, and 24 months were analyzed. In addition, data at 18 months were included in the analyses of the MCI participants. Furthermore, sample sizes were separately estimated for
In addition, to perform head-to-head comparisons of sample sizes between cognitive and MRI-derived measures, the confidence intervals of the paired differences in the sample sizes from the two measures were calculated using a bootstrap sampling procedure. Samples were drawn randomly from the original samples with replacement and the number of samples was the same as the original. The sampling was repeated 10000 times, that is, 10000 bootstrap samples were obtained. When the 95% confidence interval (from the 2.5th to the 97.5th percentiles) did not include the null value of zero, the difference was assessed to be statistically significant at the 5% level.
All statistical analyses were performed using R version 3.2.1 [55] and the “longpower” package [56].
RESULTS
Sample size comparison between neuropsychological measuresand MRI-derived measures
The rates of change of the hippocampal volume using KN-BSI consistently provided a smaller sample size than the neuropsychological examination scores in AD and MCI with and without controlling for normal aging, assuming a 12-month trial (see Tables 2 and 3) and a 24-month trial (see Supplementary Table C1 and C2 in Supplementary Material C). Although the rates of change estimated from the whole brain volume provided a smaller sample size than those from CDR-SB, ADAS-Cog, and MMSE in AD and MCI patients without controlling for normal aging, they provided comparable or larger sample sizes than those from the cognitive measures CDR-SB and ADAS-Cog after controlling for normal aging. The sample sizes after controlling for normal aging in a 12-month trial were 2.5 to 3.5 times larger for the atrophic changes of the whole brain and about 1.5 times larger for the atrophic changes of the hippocampus than those obtained without controlling for normal aging in AD and MCI patients. KN-BSI, FreeSurfer, and cognitive measures are compared in Tables 5 and 6. As shown in Table 5, the best measure among MRI-derived measures using KN-BSI and FreeSurfer was the rate of change in the hippocampus using KN-BSI, followed by the rates of change in the cortical thickness in the entorhinal cortex and hippocampus by the FreeSurfer longitudinal stream and the whole brain using KN-BSI after controlling for normal aging. Table 6 shows the sample size comparison of KN-BSI, FreeSurfer, and cognitive measures using a bootstrap sampling procedure. KN-BSI hippocampus offered significantly smaller sample sizes than almost all of the FreeSurfer-derived and cognitive measures in AD and MCI patients with and without controlling for normal aging.
Overall, the hippocampal atrophy rates obtained using KN-BSI provided a smaller sample size than the other MRI and cognitive measures in a 12-month trial after controlling for normal aging in AD and MCI patients with statistical significance based on the bootstrap sampling procedure, except for the hippocampal atrophy rates obtained using the FreeSurfer longitudinal stream in AD.
Effects of ApoE ɛ 4 status on sample sizes and atrophic rates of change
Table 4 presents the mean rates of change of the whole brain volume and hippocampal volume with 95% confidence intervals estimated from baseline, 6-month, and 12-month scans using the linear mixed-effects model in
DISCUSSION
Our results indicate that atrophic changes in the hippocampus using KN-BSI on serial MRI offers a significantly smaller sample size for detecting a reduced disease progression by a hypothetical AD- and MCI-modifying treatment than that estimated from neuropsychological examination scores in ethnic Japanese. In addition, the results show that assessment of
In a 12-month trial for AD, the KN-BSI hippocampus offered 40.8% and 15.7% of the sample size of ADAS-Cog and 40.8% and 30.2% of that of CDR-SB with and without controlling for normal aging, respectively (see Tables 2 and 3). Holland et al. [17] reported that the longitudinal hippocampal measure in AD patients provided 40.8% and 17.8% of the sample size of that of ADAS-Cog and 38.6% and 23.2% of that of CDR-SB with and without controlling for normal aging, respectively. In a 12-month trial for MCI, the KN-BSI hippocampus offered 26.7% and 13.7% of the sample size of ADAS-Cog and 42.0% and 30.1% of that of CDR-SB with and without controlling for normal aging, respectively (see Tables 2 and 3). In Holland et al. [17], the authors reported that the longitudinal hippocampal measure in MCI patients provided 34.8% and 5.6% of the sample size of ADAS-Cog and 64.9% and 26.9% of that of CDR-SB with and without controlling for normal aging, respectively. Taken together, the sample sizes estimated from the KN-BSI hippocampus in AD patients with and without controlling for normal aging showed similar reductions in sample sizes as those reported by Holland et al. when they are compared with those estimated from the cognitive measures. However, the sample sizes estimated from the KN-BSI hippocampus in MCI patients while controlling for normal aging showed larger reductions in sample sizes than those in Holland et al. when compared with those estimated from the cognitive measures. One possible interpretation is that the proportion of MCI patients with more advanced disease status was larger in the present study than in the study by Holland et al. because the sample size reduction rates by the KN-BSI hippocampus in MCI patients compared with the cognitive measures were similar to those in AD patients.
In the present study, atrophic changes in structures in the medial temporal lobe, including the KN-BSI hippocampus and FreeSurfer longitudinal stream entorhinal cortex, offered smaller sample sizes than those estimated from other brain regions in AD and MCI patients (see Table 5). These findings support the view that the medial temporal lobe exhibits the first atrophic changes during the progression of AD [58]. In contrast, sample sizes estimated by the rates of change in the KN-BSI whole brain and FreeSurfer longitudinal stream lateral ventricle were not as small as those in the KN-BSI hippocampus or FreeSurfer longitudinal stream entorhinal cortex after controlling for normal aging. Although sample sizes are relatively small without controlling for normal aging, the difference may be due to the larger sample sizes after controlling for normal aging because the regions did not show specific AD-related atrophy.
In comparison with the FreeSurfer longitudinal stream, the longitudinal volume change in the KN-BSI hippocampus offered a significantly smaller sample size after controlling for normal aging (see Table 6). Both the FreeSurfer longitudinal stream [48] and the KN-BSI hippocampus compute brain volume changes of serial scans on a subject-specific template to minimize within-subject variability and maximize statistical power. However, the BSI “directly” computes volume changes by calculating voxel intensity differences between two serial scans at the boundary region of the whole brain or hippocampus, whereas the FreeSurfer longitudinal stream “indirectly” computes volume changes by separately calculating segmentations of the whole brain or hippocampus at each time point. The direct measurement has been reported to greater reduce within-group variability and increase statistical power than the indirect measurement [20, 59]. Moreover, the segmentation accuracies of our method using the multi-atlas image segmentation approach (see Supplementary Table B1 in Supplementary Material B) in the Dice similarity coefficient are higher than those of FreeSurfer for the hippocampus, even though the validation data sets were different from each other. That is, the accuracies of our method versus those of FreeSurfer were 0.899±0.016 versus 0.82±0.015 for the left hippocampus and 0.894±0.016 versus 0.82±0.028 for the right hippocampus [60]. These factors might have caused the significant differences in sample size estimates between our method and the FreeSurfer longitudinal stream.
To date, brain atrophy measurement using serial MRI scans has not been qualified as a surrogate endpoint for AD-modifying trials. However, the recent US Food and Drug Administration (FDA) draft guidance on the development of drugs for early-stage AD [63] stated that they “are open to considering the argument that a positive biomarker result (generally included as a secondary outcome measure in a trial) in combination with a positive finding on a primary clinical outcome measure may support a claim of disease modification in AD”, given that there is “widespread evidence-based agreement in the research community that the chosen biomarker reflects a pathophysiologic entity that is fundamental to the underlying disease process”. Although brain atrophy measurement does not reflect the molecular pathophysiologic processes of AD, it could serve as an approximate surrogate biomarker of the severity of neuronal loss, neuronal shrinkage, and synaptic loss [64]. Because the present study was a longitudinal observation study without any preventive or curative interventions, we could not reveal the effect of a disease-modifying therapy on brain atrophy measurement. In past clinical trials of drugs for mild-to-moderate AD patients using brain atrophy measure as an imaging endpoint, an unexpected paradoxical treatment effect— increased brain volume loss— was found in some study arm patients [27, 65]. In future clinical trials, it will be necessary to examine the effects of disease-modifying treatments on brain atrophy measurement and whether the paradoxical effect is transitory by long-term follow-up using MRI [65].
The present study has several strengths. First, it includes a large number of participants who were followed up for 2 or 3 years using identical protocols for neuropsychological examinations and image acquisitions over 38 clinical sites in Japan. Second, it has
Conversely, the present study has several limitations. First, the participants’ diagnoses were not based on neuropathological confirmation. Therefore, some participants assigned to the AD and MCI groups may have had cognitive decline due to causes other than AD. Cerebrospinal fluid biomarkers and/or brain PET imaging of amyloid and tau could help to exclude participants with other causes of cognitive decline. Second, we did not take into account the attrition rate in the sample size estimation. In a clinical trial of an AD-modifying treatment, some participants would drop out of the trial due to a large time commitment, a lack of incentive to continue the trial, or health problems. Sample sizes estimated when accounting for attrition are larger than those estimated without accounting for attrition. In the present study, the sample sizes were estimated from data that included images that underwent failed automated image processing in order to amplify the sample sizes and somewhat reflect the effect of attrition [66]. Third, we did not perform manual editing or exclusion due to processing failure at any stage of our procedure using KN-BSI and FreeSurfer. If treatment and placebo arms are not equally balanced across MR scanners, lower segmentation quality due to scanner-specific susceptibility artifacts would induce an artifactual difference of treatment effect between the arms.
In conclusion, this study demonstrates the potential of longitudinal atrophic changes of the hippocampus using automated segmentation andthe KN-BSI on serial MRI as a progression biomarker that could offer a significantly smaller sample size than cognitive measures in a clinical trial of an AD-modifying treatment in a Japanese population. Moreover,
DATA AVAILABILITY
Access to the original data of the J-ADNI is available on request from the NBDC Human Database (http://humandbs.biosciencedbc.jp/en/) hosted by the National Bioscience Database Center (NBDC) of the JST.
