Abstract
Background:
The validity of neuropsychological tests for the differential diagnosis of degenerative dementias may depend on the clinical context. We constructed a series of logistic models taking into account this factor.
Methods:
We retrospectively analyzed the demographic and neuropsychological data of 301 patients with probable Alzheimer’s disease (AD), frontotemporal degeneration (FTLD), or dementia with Lewy bodies (DLB). Nine models were constructed taking into account the diagnostic question (eg, AD vs DLB) and subpopulation (incident vs prevalent).
Results:
The AD versus DLB model for all patients, including memory recovery and phonological fluency, was highly accurate (area under the curve = 0.919, sensitivity = 90%, and specificity = 80%). The results were comparable in incident and prevalent cases. The FTLD versus AD and DLB versus FTLD models were both inaccurate.
Conclusion:
The models constructed from basic neuropsychological variables allowed an accurate differential diagnosis of AD versus DLB but not of FTLD versus AD or DLB.
Keywords
Introduction
The differential diagnosis of degenerative dementias is an increasingly important task in clinical practice. 1 -3 Accurate identification of the distinct disorders may have practical implications on both prognosis and management. 1 -3 The most promising tests are those based on structural, functional, and molecular brain imaging, 4 -6 and cerebrospinal fluid biomarkers, 7,8 but their widespread use may be limited due to technical and economical factors.
Neuropsychological tests are routine tools for the evaluation of patients with cognitive complaints. These tests may accurately distinguish normal individuals from patients with cognitive impairment (eg, 9 ). However, it is uncertain whether they are also useful for the differential diagnosis of degenerative dementias. In a recent review, Karantzoulis and Galvin concluded that although typical cases show characteristic neuropsychological patterns, there is substantial overlap between the different disorders. 10 However, several authors have obtained good results using basic tests or combinations thereof. 11 -23 Mathuranath et al developed a composite score from the Adenbrooke’s Cognitive Examination that was highly discriminating between Alzheimer’s disease (AD) and frontotemporal dementia. 11 Kramer et al identified 5 variables that correctly classified 89.2% of their cases, comprising 30 patients with AD, 21 with frontotemporal dementia, and 14 with semantic dementia. 14 Diehl et al constructed a logistic model with animal fluency and the Boston Naming Test, which correctly classified 90.5% of their patients with AD and frontotemporal dementia. 15 Marra et al showed that clusters of cognitive and behavioral disorders could correctly identify 84.5% of their cases, including 20 patients with AD, 22 patients with frontal dementia, and 11 patients with nonfluent aphasia. 17 A combination of the Frontal Assessment Battery (FAB) and selected items from a novel questionnaire accurately classified 97% of 35 patients with frontotemporal dementia, 46 patients with AD, and 36 normal individuals. 19 Taken together, these data suggest that neuropsychological tests may be useful for the differential diagnosis of degenerative dementias, but their validity may be much dependent on the clinical context.
In this study, we performed a retrospective analysis of the demographic and neuropsychological findings in 301 patients with clinically diagnosed AD, frontotemporal lobar degeneration (FTLD), or dementia with Lewy bodies (DLB). From these data, we developed a series of logistic regression models with diagnostic aims. In order to facilitate their clinical application, we only included a series of neuropsychological variables that are easily collected in practice. Furthermore, instead of constructing an all-purpose tool, we estimated different models taking into account the specific clinical question (AD vs FTLD, AD vs DLB, and FTLD vs DLB) and patient subpopulation (incident vs prevalent cases).
Methods
For the purpose of this study, we retrospectively analyzed 301 patients with degenerative dementias attended at the General Neurology Unit of the Hospital Ruber Internacional between January 1998 and January 2009. We included all patients with probable AD, FTLD, or DLB and formal neuropsychological testing. For the sake of consistency, we applied the diagnostic criteria available in 1998. 21 -23 The data were extracted from the clinical records. We registered the demographic and neuropsychological variables at the first visit and the final clinical diagnosis. The study followed the ethical requirements of our institution and the principles of the Declaration of Helsinki.
We collected the following demographic data: age, sex, and level of education (categorized into 4 groups: none to preprimary level, primary level, secondary level, and bachelor to higher levels). Cases were also divided into 2 categories: incident (newly diagnosed) or prevalent (previously diagnosed).
The neuropsychological tests registered for analysis were the Miniexamen Cognoscitivo (MEC, an Spanish version of the Mini-Mental State Examination which scores over 35), 24 the selective reminding test and the clock test included in the 7 Minute screening battery, 25 the categorical (animals) and phonological fluency (letter p) tests, 26 the Trail Making Test (TMT) parts A and B, 27 the Geriatric Depression Scale with 15 items (GDS-15), 28 the Shortened Spanish-Informant Questionnaire on Cognitive Decline in the Elderly (SS-IQCODE), 29 and the Functional Assessment Questionnaire (FAQ). 30 The selective reminding test was coded into 4 variables: naming, free recall, facilitated recall, and memory recovery. Memory recovery was calculated as facilitated recall/(16 − free recall), where 16 is the total number of items in the reminding test. Results of TMT were categorized into 7 groups according to percentiles.
Statistical analyses were performed with SPSS 19 (IBM SPSS Statistics, Armonk, New York) and R 2.10.1 software. 31 We constructed a series of logistic regression models taking the clinical diagnosis as the dependent variable (AD, FTLD, or DLB), and the demographic and neuropsychological data as the independent variables. The models were selected with a back step elimination method according to the likelihood ratio test and applying the standard P values for variable inclusion (.05) and exclusion (.1). We used an automated selection method because our aim was predictive and not explicative. Since the size of the smallest group was limited (37 patients with DLB), we only included in the initial models those variables with P <.2 in the bivariable tests (Kruskall-Wallis and Wilcoxon tests for ordinal and quantitative data and Pearson’s chi-square test for nominal data). The individual significance of the variables was estimated according to their odds ratios (ORs) with 95% confidence intervals (95% CIs). Goodness of fit was evaluated with the Nagelkerke R 2 coefficient and the Hosmer-Lemeshow test. Internal validity was analyzed according to the area under the receiver–operator characteristic (ROC) curve (AUC), the sensitivity, and the specificity. The optimal cutoff points were selected from the ROC curves. Among all possible values, we chose those that best equilibrated the sensitivity and specificity of the models around 80% or above. The sensitivity and specificity of the models for incident and prevalent cases were compared with the Pearson’s chi-square test.
Results
We identified 301 patients with clinically diagnosed degenerative dementias, including 199 (66.1%) patients with AD, 65 (21.6%) patients with FTLD, and 37 patients (12.3%) with DLB. The FTLD cases comprised 38 (58.5%) patients with the behavioral variant, 6 (9.2%) patients with progressive nonfluent aphasia, 3 (4.6%) patients with semantic dementia, and 18 (27.7%) patients with mixed manifestations. Therefore, our patients with FTLD were mainly representative of the (frontal) behavioral variant (fv-FTLD). There were 181 incident cases, including 122 (67.4%) patients with AD, 42 (23.2%) patients with FTLD, and 17 (9.4%) patients with DLB, and 120 prevalent cases, including 77 (64.2%) patients with AD, 23 (19.2%) patients with FTLD, and 20 (16.7%) patients with DLB. The demographic and neuropsychological data of the 3 groups, and the significance of the overall (3 groups) bivariable tests, are shown in Table 1.
Demographic and Neuropsychological Findings in 301 Patients With Clinically Diagnosed Alzheimer’s Disease, Frontotemporal Lobar Degeneration, or Dementia With Lewy Bodies.a
Abbreviations: MEC, Miniexamen Cognoscitivo; TMT, Trail Making Test; SS-IQCODE, Shortened Spanish-Informant Questionnaire on Cognitive Decline in the Elderly; FAQ, Functional Assessment Questionnaire; GDS-15, Geriatric Depression Scale with 15 items.
a The results are shown as mean (standard deviation) in the first line and median (range) in the second line, except for female sex.
b Overall comparisons of the 3 groups with the Kruskall-Wallis test for ordinal and quantitative data, and the Pearson chi-square test for nominal data.
Dementia with Lewy body versus AD (2 groups) bivariable tests showed significant differences in sex (P = .001), phonological fluency (P < .001), clock test (P = .003), free recall (P = .035), facilitated recall (P = .008), and memory recovery (P = .005). The FTLD versus AD tests showed significant differences in age (P < .001), level of education (P = .005), MEC (P = .016), free recall (P < .001), facilitated recall (P = .024), memory recovery (P = .001), and SS-IQCODE (P = .03). The DLB versus FTLD tests showed significant differences in age (P = .035), phonological fluency (P < .001), and clock test (P < .001).
The logits and goodness-of-fit parameters of the 9 multivariable logistic regression models are shown in Table 2. The model for the differential diagnosis between DLB and AD constructed from all cases was composed of memory recovery (OR = 1.73; 95% CI = 1.4-2.1) and phonological fluency (OR = 0.491; 95% CI = 0.375-0.642). The OR values for memory recovery reflect an increase in 0.1 points. The model for incident cases was also composed of memory recovery (OR = 1.6; 95% CI = 1.2-2.1) and phonological fluency (OR = 0.510; 95% CI = 0.355-0.732). The corresponding values for prevalent cases were memory recovery (OR = 2.0; 95% CI = 1.4-2.8) and phonological fluency (OR = 0.472; 95% CI = 0.309-0.720).
Logits, Nagelkerke R 2 Coefficients, and Hosmer-Lemeshow (H-L) P Values of 9 Logistic Regression Models for the Differential Diagnosis Between Alzheimer’s Disease (AD), Frontotemporal Lobar Degeneration (FTLD), and Dementia With Lewy Bodies (DLB).
Abbreviation: Ph. fluency, Phonological fluency.
The model for the differential diagnosis between FTLD and AD constructed from all cases included 3 variables: age (OR = 0.928; 95% CI = 0.890-0.967), level of education (OR = 1.412; 95% CI = 1.007-1.981), and facilitated recall (OR = 1.180; 95% CI = 1.057-1.316). The model for incident cases was composed of 2 variables: age (OR = 0.923; 95% CI = 0.874-0.975) and facilitated recall (OR = 1.081; 95% CI = 0.953-1.227). The model for prevalent cases also included age (OR = 0.940; 95% CI = 0.886-0.997) and facilitated recall (OR = 1.200; 95% CI = 1.027-1.403).
The model for the differential diagnosis between DLB and FTLD constructed from all cases was composed only of phonological fluency (OR = 0.815; 95% CI = 0.717-0.926). The model for incident cases included age (OR = 1.024; 95% CI = 0.956-1.096) and facilitated recall (OR = 0.973; 95% CI = 0.796-1.189). The model for prevalent cases included the same variables: age (OR = 1.083; 95% CI = 0.983-1.194) and facilitated recall (OR = 1.096; 95% CI = 0.913-1.316).
Since memory recovery is obtained from free and facilitated recall, the initial steps of logistic regression could be interfered by multicollinearity. Therefore, we also calculated the AUC of the models composed of free and facilitated recall instead of memory recovery (DLB vs AD) and memory recovery instead of facilitated recall (FTLD vs AD). None of these models improved the previous results.
The parameters of internal validity of the multivariable models are shown in Table 3, and the ROC curves of the 3 models constructed from all patients are depicted in Figure 1. The cutoff values for estimation of sensitivity and specificity were 0.150 for the 3 DLB versus AD models, 0.200 for the 3 FTLD versus AD models, and 0.450, 0.250, and 0.500 for the DLB versus FTLD models obtained from all patients, incident cases, and prevalent cases, respectively. The comparison of the validity of the models for incident and prevalent cases did not show statistically significant differences, except for the specificity of FTLD versus AD: (1) DLB vs AD: sensitivity: chi-square = 0.754, P = .385 and specificity: chi-square = 0.022, P = .883; (2) FTLD vs AD: sensitivity: chi-square = 0.010, P = .921 and specificity: chi-square = 7.446, P = .006; and (3) DLB vs FTLD: sensitivity: chi-square = 2.358, P = .125 and specificity: chi-square = 3.728 P = .054.

Receiver–operator characteristic curves of 3 logistic regression models for the differential diagnosis between dementia with Lewy bodies and Alzheimer’s disease (upper panel), frontotemporal lobar degeneration and Alzheimer’s disease (middle panel), and dementia with Lewy bodies and frontotemporal lobar degeneration (lower panel).
Parameters of Internal Validity of 9 Logistic Regression Models for the Differential Diagnosis Between Alzheimer’s Disease (AD), Frontotemporal Lobar Degeneration (FTLD), and Dementia With Lewy Bodies (DLB).
Abbreviations: AUC, area under the curve; 95% CI, 95% confidence interval.
Discussion
The validity of neuropsychological tests for the differential diagnosis of degenerative dementias is still uncertain. On one hand, there is ample evidence on the existence of typical neuropsychological patterns in the distinct disorders. 10,14,32 On the other hand, there is substantial overlap in the distribution of neuropsychological variables in the different groups, which may limit their diagnostic accuracy in particular clinical contexts. 10,33 With this point in mind, we constructed a series of multivariable logistic regression models taking into account the specific clinical question (eg, DLB vs AD) and patient subpopulation (incident vs prevalent cases).
The resulting model for the differential diagnosis of DLB and AD obtained from all cases was highly accurate: AUC = 0.919, sensitivity = 90.3%, and specificity = 80.2%. These results were comparable to those obtained separately for incident and prevalent cases. The 3 models were composed of memory recovery and phonological fluency. Levy and Chelune concluded that DLB and Parkinson’s disease with dementia are contrasted with AD by defective processing of visual information, better performance on executively supported verbal learning tasks, greater attentional variability, poorer qualitative executive functioning, and the presence of mood-congruent visual hallucinations. 32 In our series, patients with AD had the lowest scores in free recall, facilitated recall, and memory recovery. This pattern of memory impairment is typical of AD from its early stages. 34 In contrast, patients with DLB had the lowest scores in phonological fluency and clock drawing test, accounting for the executive and visuospatial deficits observed in this disorder. 35 In agreement with our data, other authors have shown that memory deficits are more severe in AD than in DLB, 36 and executive and visuospatial deficits are more severe in DLB than in AD. 35 Concerning demographic variables, Kraybill et al did not find significant differences between AD and DLB. 37 In our series, the AD group included a higher proportion of females than the DLB group (59.8% vs 29.7%; P = .001). Using a different approach, other authors have analyzed particular cognitive or behavioral functions, such as qualitative performance, 38 personality traits, 39 pentagon drawing, 40,41 clock drawing, 42 the Bender Gestalt Test, 43 and action fluency tasks. 44 In contrast to a previous report, 45 we did not find significant differences in GDS scores between patients with AD and DLB.
In comparison to the DLB versus AD models, the multivariable models for the differential diagnosis of FTLD versus AD and FTLD versus DLB showed low accuracy. Although their corresponding AUC values were moderate, most of the sensitivity and specificity values were under 80%. These results suggest that the behavioral and executive changes characteristic of our patients with FTLD, mainly fv-FTLD, were not adequately addressed by our battery of neuropsychological tests. This test battery may be representative of those used in primary care or general neurology settings 1 -3 but not in specialized memory clinics.
With regard to the differential diagnosis of AD and FTLD, a previous meta-analysis of 94 studies, comprising 2936 patients with AD and 1748 patients with FTLD, showed that the most discriminating cognitive tests were measures of orientation, memory, language, visuomotor function, and general cognitive ability. 33 Although there were large and significant differences between groups on these measures, there was substantial overlap in the scores of the AD and FTLD groups, even for executive functions. 33,46 In this line, our patients with FTLD did not show a particular pattern of impairment compared to the patients with AD. This result might be explained by the mixture of patients with FTLD having different lesional distribution 47 and the major behavioral problems of these patients. In spite of the apparent overlap, several authors have obtained good results with multivariable models constructed from simple tests, such as a logistic model combining phonemic fluency, Rey-Osterreith complex figure recall, oral apraxia, and cube analysis 12 ; a logistic model composed of social conduct disorders, hyperorality, akinesia, absence of amnesia, and absence of perceptual disorders 13 ; a discriminant function derived from a brief neuropsychological battery 14 ; a logistic model comprising animal fluency and the Boston Naming Test 15 ; a model derived from the Frontal Behavioural Inventory, the Rey Auditory Verbal Learning Test, and the TMT part B 16 ; a discriminant function based on 3 factors (amnesic, behavioral, and linguistic) 17 ; a logistic model derived from the Philadelphia Brief Assessment of Cognition 18 ; a linear discriminant function combining the FAB with selected items from a novel behavioral and cognitive questionnaire 19 ; and a model combining 3 executive tasks (Hayling Test of Inhibitory Control, Digit Span Backward, and Letter Fluency). 20 Other authors have instead analyzed the performance of patients with AD and FTLD in single tests or scales, which makes it difficult to compare their data to our findings. 48 -66 Of note, the accuracy of the FAB for the differential diagnosis of AD and FTLD has been shown to be moderate 67 to low, 68,69 but some FAB subscores could still be useful. 70 The Addenbrooke’s Cognitive Examination seems to be more promising to this end. 11,71
Concerning the differential diagnosis of DLB and FTLD, there is a paucity of information on direct comparisons of their neuropsychological features. In our series, patients with DLB were older and obtained lower scores in the clock drawing test and the phonological fluency test than those with FTLD. Engelborghs et al reported that patients with FTLD had a high prevalence (70%) of apathy, whereas delusions and hallucinations were rare. 72 In contrast, patients with DLB showed a high prevalence of disinhibition (65%) and frontal lobe involvement according to the Middelheim Frontality Score. Piguet et al studied the frequency of FTLD and DLB in a cohort of 170 patients with probable or possible AD. They concluded that the presence of core clinical features of non-AD dementia syndromes is common in AD and that the concordance between clinical and pathological diagnoses of dementia was variable, reflecting the need for further improvement in current diagnostic criteria. 73 Other authors have described an interesting case series of 6 patients with signs and symptoms suggestive of both FTLD and DLB. 74 Histologic examination of 2 of them was consistent with a Transactive Response DNA Binding Protein 43 KDa (TDP-43) proteinopathy, showing again the difficulty in predicting the pathological substrates from the clinical manifestations.
Additionally, we identified some interesting data when the models for the differential diagnosis of FTLD versus AD or DLB were evaluated. First, age at first examination and facilitated recall remained in most of the final models, which suggests that they could be valuable components of future multivariable models. The inclusion of age in these models most likely reflects the younger age of patients with FTLD in comparison to patients with AD and DLB. 23 Second, the patients with FTLD had a higher level of education than those with AD even after controlling for other demographic and neuropsychological variables, which suggests a differential effect of cognitive reserve on both disorders (see also 33 ).
The main limitation of our study is the lack of pathological confirmation. In order to increase the accuracy of clinical diagnosis, we considered the last available diagnosis in medical records. This way we took into account a consistent clinical progress. A second limitation is the retrospective nature of data collection. However, we registered the first neuropsychological evaluation, conducted before reaching the final diagnosis, which provides a forward sense to subsequent data analysis. A third potential problem is the possibility of incorporation bias that may lead to a kind of circular argument. The risk of this bias is especially high for AD, which is clinically defined by its neuropsychological features. In contrast, DLB and fv-FTLD are mainly identified on the basis of noncognitive manifestations, such as visual hallucinations and parkinsonism in the former and behavioral changes in the latter. Therefore, in strict sense, our results show the value of the demographic and neuropsychological variables in predicting the clinical diagnoses and not the pathologically defined disorders. A fourth conflicting point is the analysis of the patients with FTLD. In spite of the existence of obvious clinical subtypes, we decided to merge all our cases, as we were most interested in detecting group differences between the main degenerative disorders. However, this approach might reduce the ability of the models to differentiate FTLD from atypical presentations of AD and DLB, such as the logopenic variant of AD. In any case, since most of our patients with FTLD corresponded to the behavioral variant, our results mainly apply to this particular subtype. Finally, we also noticed a low prevalence of DLB cases in our study population. This finding might be caused by the fact that these patients were primarily evaluated in the movement disorders unit of our institution, and it suggests that the patients with DLB who were finally analyzed were those with prominent cognitive or behavioral manifestations and mild motor symptoms.
From the previous data, we can draw 2 suggestions for clinical practice. First, the differential diagnosis of AD and DLB may be significantly aided by the evaluation of simple neuropsychological variables, such as memory recovery and phonological fluency. Second, the differential diagnosis of fv-FTLD from AD and DLB most likely needs the application of behavioral scales and/or executive tests other than the TMT or the FAB.
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: PC was supported by a Ramon y Cajal Fellowship from the Spanish Ministry of Science and Innovation (RYC-2010-05748).
