Abstract
Background:
Utilization of NIA-AA Research Framework requires dichotomization of tau pathology. However, due to the novelty of tau-PET imaging, there is no consensus on methods to categorize scans into “positive” or “negative” (T+ or T–). In response, some tau topographical pathologic staging schemes have been developed.
Objective:
The aim of the current study is to establish criterion validity to support these recently-developed staging schemes.
Methods:
Tau-PET data from 465 participants from the Alzheimer’s Disease Neuroimaging Initiative (aged 55 to 90) were classified as T+ or T– using decision rules for the Temporal-Occipital Classification (TOC), Simplified TOC (STOC), and Lobar Classification (LC) tau pathologic schemes of Schwarz, and Chen staging scheme. Subsequent dichotomization was analyzed in comparison to memory and learning slope performances, and diagnostic accuracy using actuarial diagnostic methods.
Results:
Tau positivity was associated with worse cognitive performance across all staging schemes. Cognitive measures were nearly all categorized as having “fair” sensitivity at classifying tau status using TOC, STOC, and LC schemes. Results were comparable between Schwarz schemes, though ease of use and better data fit preferred the STOC and LC schemes. While some evidence was supportive for Chen’s scheme, validity lagged behind others—likely due to elevated false positive rates.
Conclusions:
Tau-PET staging schemes appear to be valuable for Alzheimer’s disease diagnosis, tracking, and screening for clinical trials. Their validation provides support as options for tau pathologic dichotomization, as necessary for use of NIA-AA Research Framework. Future research should consider other staging schemes and validation with other outcome benchmarks.
INTRODUCTION
While the etiology of Alzheimer’s disease (AD) has long been associated with the accumulation of amyloid-β (Aβ) plaques and neurofibrillary tau tangles [1], only within the past couple decades have imaging substrates been developed to identify deposition of these AD biomarkers
In response to this need for dichotomization, a handful of staging schemes have been developed to characterize tau positivity (T+) versus tau negativity (T–). Each of these topographical staging schemes consider the hierarchical spreading of tau pathology by examining the quantity and regional temporal involvement of tau deposition in participants undergoing PET imaging. These quantity x temporal location staging schemes result in profiles of tau deposition, which generally follow a consistent pattern of spread from the inferior and lateral temporal cortices, to parietal and frontal cortices, followed by primary visual cortices [8, 9].
The first set of topographical pathologic staging schemes examined in this study was created by Schwarz and colleagues [10, 11] based on 18F-Flortaucipir PET imaging. Two of these schemes (
A fourth topographic staging scheme using 18F-Flortaucipir examined here was recently created by Chen and colleagues [12], which appears to be advanced from initial work by earlier models [13, 14]. Similar to the Schwarz models, the
The current study sought to advance the use of the NIA-AA Research Framework in future research by establishing criterion validity for these different tau pathologic staging schemes in a sample of participants across the AD continuum. Direct comparison of these four schemes is also needed as each scheme differs in terms of size and number of ROIs, complexity of staging rules and number of stages, and standardized uptake value ratio (SUVR) thresholds used for dichotomization. Traditional measures of episodic memory have been used in the current study given their known association with AD pathology [16], and learning slopes have also been incorporated owing to their association with both episodic-memory-related and working memory/attention-related aspects of cognition [17], along with hippocampal [18], ventrolateral prefrontal [17], and dorsolateral prefrontal atrophy [19]. Including both traditional and process-related scores permits potential validation with a wider range of cognitive benchmarks than traditional memory measures alone. It is hypothesized that all four staging schemes would display concordance with both traditional measures of memory and process-related learning slope metrics, but that schemes incorporating a greater number of ROIs and stages, and more detailed scoring rules (e.g., TOC and Chen schemes), would reflect greater sensitivity to cognitive deficit and better diagnostic accuracy. By establishing concordance between neuroanatomical and clinical outcome markers of AD in these schemes, we hope to enhance their utility in identifying tau-PET positivity and increase confidence in their application within the ATN framework. Given the associations between cerebral tau deposition and early memory changes in AD [20], such validation may similarly inform predictive capacity of future cognitive decline.
Please note that because of the high number of abbreviations used frequently throughout the manuscript, a list of abbreviations is included in Supplementary Table 1.
METHODS
We obtained participant data for the current study from the ADNI multi-center longitudinal study (http://adni.loni.usc.edu). ADNI [21] was launched in 2003 and is a public-private partnership with scientific goals of examining progression of mild cognitive impairment (MCI) and early AD dementia using magnetic resonance imaging, PET, other biological markers, and clinical and neuropsychological assessments. See http://www.adni-info.org for up-to-date information. Written informed consent was obtained from study participants or authorized representatives, and Institutional Review Board approval has been obtained for each multi-center site within the ADNI consortium. All conducted research is in accord with the Helsinki Declaration of 1975.
Data were available for 2,373 ADNI participants enrolled in various ADNI protocols [15, 22–24] as of April 26, 2021. Participant data collection began on August 23, 2005, with enrolled participants being followed cognitively for up to 180 months. Tau-PET investigation was initiated in 2017. Inclusion criteria for these ADNI protocols included: being between the ages of 55 to 90 at baseline; the presence of a reliable study partner; having≥6 years of education; absence of significant head trauma, depression, or neurologic disease; stability on permitted medications; and fluency in either English or Spanish. For the current study, 1,901 participants were excluded for not having tau-PET data from their baseline visit, and 7 participants were excluded for having missing baseline cognitive data. Consequently, 465 participants were included in the current study across all disease stages. Please see Fig. 1 for a schematic representation of the current study’s participant utilization from the ADNI.

Flow diagram of participants recruited into the current study from the total sample of ADNI participants.
Tau-PET scan preprocessing
All participants in the current study underwent tau-PET imaging using 18F-Flortaucipir, as per standard ADNI protocols [22, 23]. Please see ADNI protocols for greater details about Tau-PET methods. Pre-processed ADNI 18F-Flortaucipir 80–100-min smoothed static scans were downloaded (http://adni.loni.usc.edu) and processed using standard techniques in Statistical Parametric Mapping-12. Briefly, static Tau-PET scans were rigidly co-registered to the closest in-time structural magnetic resonance imaging scan. Then, the structural magnetic resonance imaging scans were segmented using voxel-based morphometry in Statistical Parametric Mapping-12 to generate matrices describing needed transformations to normalize to standard Montreal Neurologic Institute space. Next, the matrices were applied to the co-registered tau-PET images to normalize them to standard Montreal Neurologic Institute space. Finally, SUVR images were generated by intensity normalization to a cerebellar crus ROI.
Tau-PET topographical staging schemes
The classification of tau positivity from the baseline visit was conducted using a series of tau-PET topographical staging schemes for AD. Briefly, mean 18F-Flortaucipir SUVR was extracted with grey matter masking from the ROIs detailed below and used to generate tau stage classification for all participants. For the interested reader, further details about the methods used by the original authors for deriving SUVR cutoffs can be observed in the Supplementary Material. The criterion and resulting classifications for the staging schemes adhered to developers’ protocols, as follows:
Schwarz staging schemes: Schwarz and colleagues’ [10] work characterized three separate pathologic staging schemes based on pre-defined patterns of tau-burden. The TOC model used small ROIs in the anterior temporal and occipital lobes, along with classification rules designed for consistency with Braak and colleagues’ [9] six-stage operationalized neuropathologic staging scheme. Specifically, the TOC scheme incorporated the following brain regions based on the associated SUVR cutoff (in parentheses): hippocampus (SUVR threshold of≥1.222), transentorhinal cortex (≥1.310), fusiform gyrus (≥1.352), middle temporal gyrus (≥1.296), superior temporal gyrus (≥1.219), extrastriate visual cortex (≥1.308), and primary visual cortex (≥1.268). Resultant patterns of positivity led to a Pathologic Staging score ranging from 0–6, with scores of 0–3 being considered T–, and scores of 4–6 being considered T+ [10]. Specifically, T+ corresponds to PET positivity in the middle temporal gyrus and extrastriate visual cortex, whereas scans with regional positivity restricted to medial temporal regions (transentorhinal cortex, hippocampus, and fusiform gyrus) were considered T–. The STOC model used larger ROIs associated with fewer regions within the anterior temporal and occipital lobes, along with less complex decision rules. Specifically, the STOC scheme incorporated the following brain regions based on the associated SUVR cutoff (in parentheses): medial temporal lobe (SUVR threshold of≥1.222), lateral temporal lobe (≥1.306), superior temporal gyrus (≥1.255), and primary visual cortex (≥1.310). Resultant patterns of positivity led to a Pathologic Staging score ranging from 0–4, with scores of 0–1 being considered T–, and scores of 2–4 being considered T+ [10]. Consistent with the TOC decision rule, T+ corresponds to regional positivity in the lateral temporal lobe, whereas scans with regional positivity restricted to the medial temporal lobe ROI were considered T–. The LC model used the largest and fewest ROIs, and the least complex decision rules, of the Schwarz schemes. Specifically, the LC scheme incorporated whole-lobe data based on the associated SUVR cutoff (in parentheses): temporal lobe (SUVR threshold of≥1.263), parietal lobe (≥1.297), frontal lobe (≥1.290). Resultant patterns of positivity led to a Pathologic Staging score ranging from 0–, with scores of 0 being considered T–, and scores of 1–3 being considered T+ [10]. In this scheme, since there is only a single ROI for the entire temporal lobe, T– scans correspond to below-threshold SUVR values in all ROIs.
Chen staging scheme: Pre-defined patterns of tau-burden were identified according to ROIs localized around transentorhinal/hippocampal, limbic, and neocortical regions, which mapped onto Braak stages I/II, III/IV, and V/IV, respectively [12]. Stage 4 was assigned to participants with Braak V/VI SUVR > 1.873. Subthreshold participants with Braak III/IV SUVR < 1.873 and > 1.523 were assigned to Stage 3. Subthreshold participants with Braak III/IV SUVR < 1.523 and > 1.307 were assigned to Stage 2. Subthreshold participants with Braak I/II SUVR > 1.130 were assigned to stage 1. All remaining participants were assigned to stage 0. Resultant patterns of positivity led to a Pathologic Staging score ranging from 0–4, with a score of 0 being considered T–, and scores of 1–4 being considered T+ [12]. Consequently, T+ corresponds to PET positivity anywhere in the transentorhinal, limbic, and neocortical regions, and T– scans correspond to below-threshold SUVR values inall ROIs.
Actuarial diagnostic classification
For further characterization of the AD biomarker status groups, participants were classified into diagnostic groups (cognitively normal, MCI, or dementia due to AD). Because of recent critique of ADNI’s diagnostic classification [25], a modified version of Jak/Bondi and colleagues’ [26, 27] actuarial model of diagnosis for MCI was used in the current study. Of note, as the modified Jak/Bondi criteria is only used to discern participants with MCI versus normal cognition, ADNI diagnostic classification of participants with AD dementia were unaltered. For participants with an ADNI diagnosis of either MCI or normal cognition, age-, education-, and sex-adjusted normative scores were generated using published normative data from the National Alzheimer’s Coordinating Center neuropsychological battery [28, 29]. Normative scores were generated for the following measures and domains: Logical Memory I and II (“Story A”) from the Wechsler Memory Scale – Revised [30] for the memory domain, Trail-Making Test Parts A and B [31] for the speed/executive functioning domain, and Category Fluency – Animals [32], and Multi-Lingual Naming Test [33] for the language domain. Participants were classified as having an actuarial diagnosis of MCI if they possessed an ADNI diagnosis of normal cognition or MCI and any of the following criteria were met: the presence of 1) impaired scores (>1
To ensure validity of these actuarial diagnoses, cognitive performance on select variables, hippocampal volumes, and Aβ positivity were examined between diagnostic groups. As seen in Supplementary Table 2, the AD dementia group performed worse than the MCI group (
Procedure
All participants underwent an extensive clinical and neuropsychological battery at a baseline visit upon their enrollment in ADNI. For the current study, the relevant neuropsychological measures used were as follows: RAVLT is a verbal memory task with 15 words learned across 5 trials, with the number of correct words summed for the Total (or Immediate) Recall score (range = 0–75). The Delayed Recall score is the number of correct words recalled after a 30-min delay (range = 0–15). All RAVLT scores reflect raw scores, with higher values indicating better performance. Logical Memory I and II from the Wechsler Memory Scale – Revised are immediate and delayed (20–30 min) memory measures for a verbally presented short story. Only “Story A” was provided to participants based on ADNI-3 protocol, therefore the range of scores for Logical Memory I and II is both 0–23. All values reflect raw scores, with higher values indicating better performance. ADAS-Cog comprises 13 subtests pertaining to learning and memory, language production and comprehension, constructional praxis, ideational praxis, orientation, and executive skills. The Total Score ranges from 0–85, with higher scores indicating worse performance. For the present study we focused on two subtests. The Word Recall subtest (officially titled “Question 1” of the ADAS-Cog) is a verbal list-learning task of 10 words learned over 3 trials. Words from this list cannot be easily clustered into semantic categories. The Delayed Recall subtest (officially titled “Question 4” of the ADAS-Cog) is the recall of those words after a 10-min delay. For the purpose of the current study, modifications to test developer’s scoring procedures were undertaken for consistency with all other memory measures in the study (i.e., higher values reflecting better memory performance). Specifically, Immediate Recall in the current study was the number of correct words identified across trials (range = 0–30), and Delayed Recall was the number of correct words recalled after delay (range = 0–10).
Additional neuropsychological test measures used in ADNI are common to most dementia clinicians and researchers, therefore they will not be described here. Readers are referred to ADNI protocols [15, 22–24] for neuropsychological test descriptions and psychometric properties. Additional measures include American National Adult Reading Test [41], Mini-Mental State Examination [42], Montreal Cognitive Assessment, Clinical Dementia Rating Scale – Sum of Boxes, Functional Activity Questionnaire, and the 15-item Geriatric Depression Scale [43]. Higher scores indicated better performance for American National Adult Reading Test, Mini-Mental State Examination, and Montreal Cognitive Assessment. Lower scores indicated better performance for Clinical Dementia Rating Scale – Sum of Boxes, Functional Activity Questionnaire, and Geriatric Depression Scale (cutoff for depression > 5).
Calculation of learning slopes
Learning slopes were calculated from the Immediate Recall subtest of the RAVLT and Word Recall subtest of the ADAS-Cog. Specifically, the Raw Learning Score (RLS) was computed as the highest performance (on Trials 2 through the Final Trial) minus Trial 1. The Learning Ratio (LR) [44] was represented as a proportion: the RLS score in the numerator, and the total points available for a trial minus Trial 1 in the denominator. Please note that the “Total Points Available for a Trial” for RAVLT is 15 and Word Recall subtest is 10. Learning Over Trials (LOT) score was computed as the total information learned (sum of Trials 1 through the Final Trial) minus the weighted information learned by Trial 1 (value of Trial 1 multiplied by the number of trials presented). There were five trials presented for the RAVLT, and three trials presented for Word Recall (modified from [45]). The formulas for RLS, LOT, and LR derived from the RAVLT and Word Recall subtest of the ADAS-Cog are as follows:
Data analysis
For demographic comparisons and to determine the appropriateness of covariates, independent samples
For the criterion validity primary analyses, multivariate analysis of covariance was conducted comparing tau biomarker status groups on cognitive performance, using all four tau pathologic staging methodologies. Separate multivariate analysis of covariance were conducted for immediate and delayed memory scores (RAVLT, Logical Memory, ADAS-Cog Word Recall), and learning slope performances (LR, RLS, LOT) derived from the RAVLT and ADAS-Cog Word Recall subtests. Subsequent one-way analyses of covariance were conducted for differences in individual cognitive measures within the omnibus test.
For consideration of test operating characteristics for immediate and delayed memory, and learning slope metrics, receiver operating characteristic area under the curve (ROC-AUC) analyses were conducted between participants in the T– and the T+ groups separately for each pathologic staging scheme. For the interpretation of ROC-AUC values, the current study followed guidelines suggested by Hosmer and colleagues [46] of ROC-AUC values < 0.600 being a “failure”, values between 0.600 and 0.699 being “poor”, values between 0.700 and 0.799 being “fair”, values between 0.800 and 0.899 being “good”, and values 0.900 or greater being “excellent”. Cut scores for cognitive performances were determined based on optimal sensitivity and specificity for detecting the presence of tau pathology.
Finally, diagnostic accuracy metrics (e.g., false positive rate, positive predictive power, negative predictive power) for each pathologic staging scheme were examined by comparing tau positivity rates relative to actuarial diagnoses of cognitively normal and cognitively impaired (either AD dementia or MCI).
Measures of effect size were expressed as Cohen’s d (
RESULTS
Demographics
The sample was composed of 465 participants who underwent tau-PET from ADNI (Table 1). The mean age of the sample was 70.9 (SD = 7.1; range 55–90) years old, averaging 16.5 (SD = 2.3; range 10–20) years of education. There was a slight female predominance (55.3% female), with most participants being Caucasian (84.3%). Mean intellect at baseline according to American National Adult Reading Test Verbal Intellect was estimated to be high average (
Demographic variables for the biomarker status groups and total sample
TOC, Temporal-Occipital Classification; STOC, Simplified Temporal-Occipital Classification; LC, Lobar Classification Scheme; Chen, Chen Classification; T-, Tau negative; T+, Tau positive; CDR, Clinical Dementia Rating Scale – Global; CDR-SB, Clinical Dementia Rating Scale – Sum of Boxes; MMSE, Mini-Mental State Examination; MoCA, Montreal Cognitive Assessment; ADAS-Cog, Alzheimer’s Disease Assessment Scale – Cognitive Subscale; FAQ, Functional Activities Questionnaire; AMNART, North American Adult Reading Test. All scores are raw scores, and all values are Mean (SD) unless listed otherwise. 1 Denotes significant difference between T- and T+ groups.
The results of classification by the TOC staging scheme led to 363 participants being categorized as T– and 102 as T+. STOC staging led to 341 participants categorized as T– and 124 as T+. LC staging led to 356 participants categorized as T– and 109 as T+. Finally, Chen staging led to 236 participants categorized as T– and 229 categorized as T+. When comparing demographic differences across T+/– groups, Table 1 shows that the T+ group was consistently older than the T– group across pathologic staging schemes (TOC: t(463) = –2.69,
Criterion validity analyses
When comparing learning and memory scores between the T+/– groups (Table 2 and Fig. 2), significant differences were observed across pathologic staging schemes after controlling for age (
Learning and memory variables for the Tau+and Tau – groups for each of the pathologic staging schemes
TOC, Temporal-Occipital Classification; STOC, Simplified Temporal-Occipital Classification; LC, Lobar Classification Scheme; Chen, Chen Classification; T–, Tau negative; T+, Tau positive; RAVLT, Rey Auditory Verbal Learning Test; LR, Learning Ratio; RLS, Raw Learning Score; LOT, Learning Over Trials. All scores are raw scores, with values being

Comparison of performances on the Rey Auditory Verbal Learning Test (RAVLT) Immediate Recall (A) and Learning Ratio (B) variables between for the Tau+and Tau – groups for each of the pathologic staging schemes. TOC, Temporal-Occipital Classification; STOC, Simplified Temporal-Occipital Classification; LC, Lobar Classification Scheme; Chen, Chen Classification; T–, Tau negative; T+, Tau positive. *T+ versus T– comparisons significant,
Effect size and Compatibility Intervals for Tau positive and Tau negative group differences among neuropsychological variables using all four pathologic staging schemes
TOC, Temporal-Occipital Classification; STOC, Simplified Temporal-Occipital Classification; LC, Lobar Classification Scheme; Chen, Chen Classification; 95%
Similarly, omnibus multivariate analyses of covariance indicated that differences also existed in learning slope metrics between T+/– groups across pathologic staging schemes after controlling for age (
ROC-AUC analyses
Tables 4 and 5 display the ROC-AUC values for both the memory and learning slope measures when differentiating individuals between the T+ and T– groups for the TOC, STOC, LC, and Chen pathologic staging schemes. For the traditional learning and memory subtests, fair AUC values were observed for the TOC, STOC, and LC schemes (0.714 to 0.792; 95%
Receiver operating characteristic area under curve, cut scores, and sensitivity/specificity when differentiating Tau negative from Tau positive biomarker groups for learning and memory variables using the TOC, STOC, LC, and Chen classification schemes
TOC = Temporal-Occipital Classification, STOC = Simplified Temporal-Occipital Classification, LC = Lobar Classification Scheme, Chen = Chen Classification, AUC =
Receiver operating characteristic area under curve, cut scores, and sensitivity/specificity when differentiating Tau negative from Tau positive biomarker groups for learning slope variables using the TOC, STOC, LC, and Chen classification schemes
TOC, Temporal-Occipital Classification; STOC, Simplified Temporal-Occipital Classification; LC, Lobar Classification Scheme; Chen, Chen Classification; AUC,
Additionally, we derived cut scores for the learning and memory scores, and learning slope metrics, to produce the highest balance of sensitivity and specificity for each scheme. In circumstances where different cut scores were present across schemes for a cognitive variable, we selected the cut score that displayed the most agreement across schemes. As can be seen in Tables 4 and 5, for a given cut score for a measure, the TOC, STOC, and LC schemes generated comparable sensitivity and specificity metrics, with the Chen scheme frequently generating lower sensitivity. For example, for Logical Memory Delayed Recall, a cut score of≤17.50 had a sensitivity of 0.714 to 0.753 for TOC, STOC, and LC, and a sensitivity of 0.567 for Chen. Similarly, for RAVLT LR, a cut score of≤0.4495 had a sensitivity of 0.623 to 0.673 for TOC, STOC, and LC, and a sensitivity of 0.467 for Chen. Across measures, the sensitivity and specificity data similarly tended to be stronger for the LR metrics than either RLS or LOT metrics, with comparable results between RAVLT LR metric and traditional learning and memory measures.
Rates of tau positivity and clinical diagnostic accuracy
Finally, diagnostic composition of participants from each biomarker status group after applying actuarial methods [26, 27] were examined. The TOC, STOC, and LC schemes resulted in tau negativity for 86% to 89% of participants classified as being cognitively normal, but only 61% tau negativity for the Chen scheme (Fig. 3). Conversely, tau positivity was observed in 61% to 69% of AD dementia participants using the TOC, STOC, and LC schemes, whereas tau positivity was observed in 90% of AD dementia participants using Chen.

Proportions of Tau positivity and negativity among participants diagnosed as cognitively normal (NL; top row) and Alzheimer’s disease (AD; bottom row) dementia using actuarial criteria among the four pathologic staging schemes.
Further, using an actuarial diagnosis of MCI or ADNI diagnosis of AD dementia to designate cognitive impairment, we examined the diagnostic utility of the various pathologic staging schemes. Specifically, positive predictive power, which predicts how likely it is for someone to be truly clinically impaired, in case of a T+ test result, ranged from 0.644 to 0.683 for TOC, STOC, and LC schemes, but was 0.473 for the Chen scheme (Table 6). Similarly, the false positive rate, the probability of a T+ result when the true result is normal cognition, ranged from 10.5% to 14.3% for TOC, STOC, and LC schemes, whereas it was 39.5% for the Chen scheme. Finally, negative predictive power, which predicts how likely it is for someone to be truly intact, in case of a T– test result, was between 0.754 to 0.791 for allschemes.
Accuracy of the pathologic staging schemes based on actuarial diagnosis
TOC, Temporal-Occipital Classification; STOC, Simplified Temporal-Occipital Classification; LC, Lobar Classification Scheme; Chen, Chen Classification; T–, Tau negative; T+, Tau positive; NL, Normal Cognition; MCI, Mild Cognitive Impairment; AD, Alzheimer’s Disease. Values for the Baseline Diagnosis variable reflect raw frequencies.
DISCUSSION
Our results reflect the first attempt to examine criterion validity of both the Schwarz (TOC, STOC, and LC) and Chen tau topographical pathologic staging schemes, as developed from 18F-Flortaucipir following ADNI protocols [15, 22–24]. In Schwarz and colleagues’ [10] original manuscript, the authors developed the TOC, STOC, and LC staging schemes in 35 scans and then compared results to amyloid-status in 98 participants from ADNI-2 [23]. They observed high concordance (90% to 94%) between TOC, STOC, and LC schemes, with 82% to 96% of participants classified as T+ also being Aβ positive; conversely, 35 to 41% participants classified as T– were Aβ positive. Our findings with an expanded dataset (
When attempting to discriminate between utility of the three Schwarz schemes relative to cognitive and clinical outcomes, our results suggest high overlap. For example, the proportion of the sample classified as T+ was highly similar (ranging from 73% to 78%; Table 1), the magnitudes of effect for differences between memory and learning slope measures were comparable (mean
In addition to the Schwarz staging schemes, the current study also examined the validity of Chen and colleagues’ [12] pathologic staging scheme. We found that dichotomization of tau using the Chen scheme resulted in T+ classified individuals having significantly worse cognitive performances than their T– peers; in particular, T+ participants performed on average 0.61
While these findings offer support for the Chen scheme, several other observed results were not as positive. First, use of AUC-ROC analyses revealed that cognitive measures were categorized as having “poor” sensitivity at classifying tau biomarker status using the Chen scheme (Tables 4 and 5). Second, across both multivariate analysis of covariance and AUC-ROC analyses, examination of 95%
As a result, the Chen scheme may not have as high of utility for classifying tau pathology as the Schwarz schemes. When considering that the two sets of schemes generally incorporated comparable regions of interest with similar staging processes, a possibility for these differential findings between groups may be the choice of cut point for the Chen scheme. Specifically, the authors observed that because the earliest cognitive decline was detected by the memory composite in stage 1 of their staging process, “the SUVR threshold in Braak I/II ROI classifying stage 0 and stage 1 might be considered as the cutoff of tau biomarker to define Alzheimer’s disease” [12]. However, these higher false positive rates, relative to the TOC, STOC, or LC schemes, suggest that this staging cutoff may have been too liberal. When applying these stages to neuroanatomical correlates, the suggested cutoff for tau positivity at Chen stage 1 (range 0 to 4) equates to “a dominating tau elevation in medial temporal regions (Braak I/II ROIs)” [12]. Conversely, the suggested cutoff for tau positivity at TOC stage 4 (range 0 to 6) equates to tau accumulation in the hippocampus, trans-entorhinal cortex, fusiform gyrus, middle temporal gyrus, and extra-striate visual cortex (though hippocampal or extra-striate visual sparing was possible). Consequently, a cutoff of 1 for Chen involves notably less tau accumulation in AD-specific neuroanatomical regions than a cutoff of 4 for TOC (see Fig. 4), which may explain the discrepancy in T+ rates between the schemes. While it is tempting to suggest that a more conservative cutoff for tau positivity for the Chen scheme—from stage 1 to stage 2—would lead to findings more in line with the Schwarz schemes, 169 of the current sample of 465 participants were classified as Chen stage 1. This means that transitioning those participants from being T+(as in the current decision tree) to T– would result in a drop of the T+ rate of from 49% to 13%, which may be too extreme of a compensation (leading to positive predictive power improving from 0.459 to 0.857, and false positivity rate declining from 40.1% to 2.7%) at the expense of sensitivity. Adjustment of Chen’s SUVR thresholds for stages 1 or 2 may be more advisable, and future research is required to properly consider the strategy of modifying thresholds or cutoff scores for the Chen scheme to optimize its utility when applied to clinical outcomes.

Mean right hemispheric 18F-Flortaucipir SUVR maps of participants with normal cognition, mild cognitive impairment, and Alzheimer’s disease in ADNI. A) Tau maps for a participant classified as T– by both TOC and Chen pathologic staging schemes. B) Tau map for participant classified as T+ by Chen staging scheme but T– by TOC staging scheme. C) Tau map for participant classified as T+ by both TOC and Chen staging schemes. SUVR, standardized uptake value ratio; ADNI, Alzheimer’s Disease Neuroimaging Initiative.
Finally, although the focus of this manuscript was on the tau topographic staging schemes, it should not be overlooked that performance by select learning slope metrics performed comparably to more established memory measures. In particular, the LR metric derived from the RAVLT displayed similar magnitudes of effect (Table 3) and AUC values (Table 5) relative to memory measures from the RAVLT, Logical Memory, and the ADAS-Cog. These findings correspond with the limited research investigating learning slopes and AD biomarkers of Aβ [18] and tau [52]. As the LR metric from the RAVLT appeared to outperform that derived from Word Recall of the ADAS-Cog, it is possible that the greater number of trials (5 versus 3) and words per trial (15 versus 10) improved the sensitivity for the former, though future research is needed to thoroughly investigate the effect of trial number and length on learning slope performance.
Our current study is not without limitations. First, these results are unique to the Schwarz and Chen tau pathologic staging schemes as applied to 18F-Flortaucipir PET imaging according to ADNI protocols, using SUVR thresholds as developed by the original authors. Appropriateness of other tau staging schemes or decision rules cannot be inferred from these results, particularly as they relate to different imaging radioligands or scanning protocols. Second, our use of a sample from ADNI has resulted in a disproportionately large number of highly-educated non-Hispanic white adults, who have met stringent exclusion criteria specific to ADNI and industry-sponsored clinical trials. Future development of tau staging schemes outside of the ANDI framework (e.g., [54]) will be necessary to broaden generalizability of these findings. Third, our use of ADNI data led to the incorporation of neuropsychological test measures into the study that have been modified specially for ADNI (e.g., Logical Memory only includes “Story A”). Relatedly, the original Chen et al. [12] manuscript included the ADNI-Memory composite [55] in its validation study, which incorporates some – but not all – of the memory measures used in the present study. Although the overlap may have led to overly similar results between studies, the Chen study focused on stages of tau pathology (range 0 to 4) whereas ours examined overall tau positivity/negativity. Our contrasting findings for the Chen scheme, particularly in relation to the Schwarz schemes, support that our results were not confounded by the memory measures used. Fourth, it could be questioned, however, that the high concordance between the TOC, STOC, and LC schemes may be due to shared method variance, given that they were all developed by Schwarz and colleagues on the same cohort. While this likely contributes to some of the differences in concordance between these schemes and the Chen scheme, the Chen scheme developed pathologic staging dichotomization based on
Limitations withstanding, the current study appears to provide evidence of criterion validity for these different tau pathologic staging schemes, when examined in the context of traditional learning and memory measures, learning slope metrics, and actuarial diagnoses. Although results were comparable between the TOC, STOC, and LC schemes of Schwarz, ease of use and better data fit preferred the STOC and LC schemes. While some evidence was supportive for the Chen scheme, validity lagged behind the other schemes, likely due to elevated false positive rates. Tau PET staging schemes appear to be valuable for AD diagnosis, tracking, and screening for clinical trials. The validation of these schemes subsequently provides support for their use as options for tau pathologic dichotomization (T+ versus T–), which will advance the use of the NIA-AA Research Framework (“ATN” model) when using tau-PET techniques. Future research should consider other staging schemes and validation with other outcome benchmarks.
Footnotes
ACKNOWLEDGMENTS
The authors have no acknowledgments to report.
FUNDING
Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (
). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.
CONFLICT OF INTEREST
Shannon L Risacher is an Editorial Board Member of this journal but was not involved in the peer-review process nor had access to any information regarding its peer-review. Adam J. Schwarz is an employee and minor shareholder of Takeda Pharmaceuticals, Ltd.
DATA AVAILABILITY
The data supporting the findings of this study are available on request from the corresponding author. The data are also publicly available through ADNI.
