Abstract
Objective:
Limited research exists to explain differential executive functioning impairment in clinical populations, particularly between the Wisconsin Card Sorting Task (WCST) and the Trail Making Test (TMT).
Methods:
The distribution of clinical diagnoses was examined in patients failing none, one, or both tasks, and executive task performance was compared among dementia-related diagnoses. Two hundred and sixty-six participants received evaluations through an Alzheimer’s Disease Research Center, which included executive tasks. Dementia-related diagnoses were established through consensus.
Results:
Chi-square analyses indicated that TMT failure, with or without WCST failure, possessed higher associations with dementia diagnoses. Repeated measures analysis of variance similarly indicated that participants with dementia, especially mild and moderate severity, performed worse on TMT.
Conclusions:
Executive dysfunction was observed in dementia-related diagnoses, and TMT failure was implicated in dementia in higher proportions than WCST impairment. Trail Making Test appears more sensitive than WCST for assessing executive impairment across diagnoses, especially when time and resources are limited in screening and clinical settings.
Introduction
As evident by the 2011 updates to the National Institute of Neurological and Communicative Diseases and Stroke/Alzheimer’s Disease and Related Disorders Association diagnostic criteria for Alzheimer’s disease (AD), 1,2 impairments in executive functioning have increasingly been acknowledged as an early sign of many forms of dementia. 3 Tasks of executive functioning, which evaluate domains related to abstraction, mental flexibility, strategy generation for solving complex problems, and altering strategies in response to changing contingencies, 4 have been associated with the clinical manifestation of mild cognitive impairment (MCI) 5 and have been implicated in the conversion from MCI to AD. 6,7 Results from brain imaging studies have consistently associated executive functioning with the frontal lobes in healthy 8 and dementia-related (AD and behavioral variant frontotemporal dementia [bv-FTD]) populations, 9 -13 and executive dysfunction is also common in dementia with Lewy Bodies (DLB). 14 The Wisconsin Card Sorting Task 15 (WCST) and the Trail Making Test, Part B 16 (TMT-B) are neuropsychological measures commonly used to assess executive functioning in AD evaluations. 17 -20 The WCST is considered to evaluate problem solving, strategy formation, cognitive set-shifting, and the implementation of verbal feedback. 21 Poor performance has been suggested to reflect reduced inhibition of previous mental sets, perseveration, and reduced problem-solving skills. 22 While the TMT-A has been associated with visual search, perceptual/motor speed, speed of processing, working memory, and general intelligence, Sanchez-Cubillo and colleagues proposed that the TMT-B is related primarily to working memory and secondarily to task-switching ability. 23 Although both the WCST and the TMT-B reflect executive functioning, differential impairment in these tasks has been observed in AD 24 and nondementia 25 clinical populations.
For example, while evaluating the heterogeneity of executive dysfunction within participants with very mild AD using 7 different executive functioning tasks, Stokholm and colleagues found a higher proportion of impairment (−2.0 standard deviation [SD] or greater) on the TMT-B relative to the WCST. 24 In their study, 47% of the participants in the mild dementia group were impaired on the TMT-B, whereas only 6% were impaired on the WCST. The authors suggested that in a given clinical setting where only 1 to 2 executive functioning tests are administered, test selection is of critical importance and that the TMT-B appeared to be more sensitive to decline in the early stages of AD. 24
In addition, among healthy populations, it is not uncommon for participants to perform poorly on the WCST. A meta-analysis determined that age and education were both significant moderators pertaining to WCST performance among neurologically healthy populations, possibly due to declines in working memory. They noted in particular that WCST performance became more variable as healthy participants aged. 26 Similarly, research has provided evidence that healthy older adults exhibit inefficient feedback utilization during the WCST, which has been proposed as a proximate cause for WCST failure among neurologically intact individuals. 27 Given that the mean age of the above-mentioned studies is typical of patients seeking consultation for cognitive difficulties, increased variability in performance due to working memory or feedback utilization inefficiency may contribute to why, among elderly populations, the WCST appears to provide higher variability in performance in healthy controls (HCs) in the context of AD evaluations.
Limited research exists to provide an explanation as to the cause of this differential impairment of executive functioning tasks, such as TMT-B and WCST, in not only AD populations but populations of other types of dementia. The current study sought to identify the diagnostic classifications of those who fail these particular tasks. Specifically, the present study evaluated the distribution of specific dementia diagnoses of patients failing both the WCST and the TMT-B (hereafter referred to as the TMT), the WCST-alone, the TMT-alone, or neither task. Such a design is clinically relevant because some patients display discrepant executive functioning on neuropsychological testing, thus identifying performance characteristics for particular dementia types could be of large clinical benefit. Given the results of prior research, the study’s first hypothesis was that TMT failure either alone or combined with WCST failure would be exhibited in dementia (both AD and other dementias) in a higher proportion than WCST alone. Second, it was anticipated that the WCST would overidentify clinical impairment for the HC and MCI populations relative to the TMT, whereas in the more severe diagnoses (AD and other dementias), the task would underidentify impairment. Lastly, using age- and education-normed T-scores, performance differences across diagnoses were expected, with more mild to moderately severe dementia groups performing worse on the TMT.
Method
Sample and Design
A convenience sample of 266 participants was utilized for this study from a longitudinal cohort (University of Michigan-Memory and Aging Project) of the University of Michigan Alzheimer’s Disease Research Center (ADRC). The participants were recruited through a number of avenues, including the Cognitive Disorders Clinic in the Department of Neurology, the Neuropsychology Section at the University of Michigan or through the community. Participants were enrolled in the Alzheimer’s Center as part of the Memory and Aging Project following a brief screening, excluding individuals with a history of stroke, traumatic brain injury, Diagnostic and Statistical Manual of Mental Disorders (Fourth Edition, Text Revision) psychiatric disorder, or intellectual disability. Participants were evaluated by a neurologist and underwent neuropsychological testing with a trained technician. The Memory and Aging Project has been approved by the Institutional Human Use Review Board of the University of Michigan Health System.
Diagnosis of participants was carried out at a consensus meeting consisting of at least 1 neuropsychologist and 2 neurologists, as well as other support staff. Participants were diagnosed using neurological and neuroanatomical impressions, and diagnostic criteria established by the National Alzheimer’s Coordinating Center, the Uniform Data Set, 28 and included HCs (n = 81), MCI (n = 89), probable AD (n = 66), bv-FTD (n = 9), and DLB (n = 10). Since the study’s recruitment was heavily reliant on the Alzheimer’s Center cohort, few individuals with vascular-related cognitive deficits were available because individuals with possible stroke or multi-infarct states would not be eligible for most clinical drug trials conducted at the ADRC; specifically, of the approximately 350 participants in the cohort at the time of data collection, only 2 confirmed cases with vascular dementia were identified at consensus. Each participant diagnosis was derived from conclusions of the most recent consensus meeting, and diagnoses were temporally linked to the most recent participant neuropsychological test data.
Neuropsychological Measures
Related to their participation in the Michigan ADRC, all participants were administered a battery of neuropsychological tests as specified by the Uniform Data Set test battery, 28 along with additional neuropsychological measures. In order to categorize participants according to executive performance, age- and education-normed T-scores for completion time on the TMT 29 and the Total Number of Correct Matches variable from the 64-card version of the WCST 30 were used. For participants who were unable to complete the TMT in the standard 300 second maximum administration time or discontinued the task prior to completion, a T-score of 20 was assigned (based on Heaton, Grant, and Matthews’s practice of 301+ seconds on the TMT equating to a Scaled Score of 1 31 ). On the WCST, the Total Number of Correct Matches variable was utilized because it was inversely related to the Total Error variable, yet was felt to provide a better distribution of scores than the Total Error variable while still being better able to account for frontal pathology than the Perseverative Errors variable alone. 32 The Uniform Data Set included the Mini-Mental State Examination (MMSE 33 ) for an examination of global mental status and the Geriatric Depression Scale-short form (GDS 34 ) to screen for depressed mood, using the standard cutoff in the literature for the short form of endorsing 5 of 15 items. 34
Procedure
The entire cognitive testing process took approximately 2 to 3 hours to complete. Participants were oriented to the testing process and were then administered the neuropsychological battery associated with the Uniform Data Set, including the TMT, MMSE, and GDS, as well as select additional tasks such as the WCST. Participants were provided breaks at their leisure.
Organization of participants into groups according to executive dysfunction was established based on performance on the WCST and the TMT-B tasks, with performance below a clinical T-score of 37 (10th percentile) for age and education being considered impaired for the current study. Groups were organized into having no impairment on either executive functioning task, impairment on both tasks, or selective impairment on 1 executive functioning task (but not the other). Specifically, Group Trails↓/WCST↓ was impaired on both the WCST and TMT, Group WCST↓ was impaired on the WCST, but not the TMT, Group Trails↓ was impaired on the TMT, but not the WCST, and Group No Impairment was not impaired on either the WCST or TMT.
Data Analysis
For the analyses associated with the organization of participants into groups according to executive dysfunction, multivariate analysis of variance analysis (MANOVA) was performed for the continuous demographic variables using executive functioning group as the independent variable. For any significant omnibus tests, post hoc comparisons were performed using least squared means to determine specific group differences. For categorical demographic variables (gender, handedness, and clinical diagnosis), Chi-square analyses were performed using the executive functioning group as the independent variable, and secondary post hoc comparisons were also performed. Significant demographic variables were covaried in the primary analyses using analysis of covariance (ANCOVA), as appropriate.
For the analyses related to clinical diagnostic group membership, repeated measures ANOVA was performed evaluating executive task T-score performance across diagnostic groups. The main effects for these analyses were diagnostic group and executive task performance, and the interaction effect was a diagnostic group × task interaction; for any significant interaction effects, post hoc comparisons were performed using paired samples
For evaluating the difference between TMT and WCST performance across groups based on dementia severity, two different analyses were conducted to categorize dementia severity. First, MMSE severity staging of intact/questionable = 26 to 30, mild = 21 to 25, moderate = 11 to 20, and severe
35
= 0 to 9 was used to categorize dementia severity; it is of note that there were no participants who were categorized as severe (MMSE below 10) in the sample. Second, a cognitive composite T-score was also calculated to categorize dementia severity, using the mean performance on the Wechsler Memory Scale III (WMS-III
36
) Logical Memory II Delayed Recall, the WMS Visual Reproduction II Delayed Recall, the Controlled Oral Word Association Test,
21
and the Visual Form Discrimination Test
37,38
; severity staging of intact/questionable (T > 37.0), mild (T = 30.0-36.9), moderate (T = 25.0-29.9), and severe (T < 24.9) was utilized; because of small number of participants in the Severe staging group, Moderate and Severe groups were combined. For both sets of categorizations, paired samples
Results
Of the 266 overall participants (mean age = 71.8 ± 9.3 years, ranging 50-94 years old, mean education = 15.6 ± 2.9 years, 50% male), 60 were impaired on both the WCST and TMT (Group Trails↓/WCST↓), 35 were impaired on the WCST but not the TMT (Group WCST↓), 45 were impaired on the TMT but not the WCST (Group Trails↓), and 127 were not impaired on either the WCST or TMT (Group No Impairment). Table 1 shows the results for the demographic variables. The MANOVA indicated that demographic differences existed between executive impairment groups, Wilk’s λ = .89,
Demographic and Neuropsychological Variables for Each Group.a
Abbreviations: Group Trails↓/WCST↓, impaired on both Wisconsin Card Sorting Test (WCST) and Trail Making Test Part B (TMT); Group WCST↓, impaired on WCST alone; Group Trails↓, impaired on TMT alone; Group No Impairment, impaired on neither WCST or TMT; Gender, percentage male; GDS, Geriatric Depression Scale (short form, cutoff 5/15); MMSE, Mini-Mental State Examination.
aEffect Sizes were measured using partial η2 values or Cramer’s V.
bSignificant differences on post hoc multiple comparisons from Group No Impairment,
Executive Group Analyses
Chi-square analyses were performed to evaluate the distribution of clinical diagnoses across the four groups in the current study. Results suggest that the clinical diagnoses were not distributed equally across the executive functioning groups, χ2
9 = 114.51,
Distribution of Clinical Diagnoses Across Groups.
Abbreviations: Group Trails↓/WCST↓, impaired on both Wisconsin Card Sorting Test (WCST) and Trail Making Test Part B (TMT); Group WCST↓, impaired on WCST alone; Group Trails↓, impaired on TMT alone; Group No Impairment, impaired on neither WCST or TMT; HC, healthy control; MCI = mild cognitive impairment; AD, probable Alzheimer’s disease; DLB, dementia with Lewy Bodies; FTD, behavioral-variant Frontotemporal dementia.
Clinical Diagnosis Analyses
Performance on the executive functioning tasks was also examined across the diagnostic groups using the continuous dependent variables of respective norm-based T-scores. As can be observed in Figure 1, repeated measures ANOVA indicated that there was a significant executive functioning task by diagnostic group interaction effect, Wilk’s Lambda = .93,

Executive functioning performance across diagnostic groups. TMT indicates Trail Making Test Part B; WCST, Wisconsin Card Sorting Test; HC, healthy control; MCI = mild cognitive impairment; probable AD, probable Alzheimer’s disease.
Severity Analyses
Finally, norm-based T-score performance on the WCST and TMT tasks was also examined across the level of dementia severity, using severity categories of intact/questionable, mild, and moderate or moderate/severe based on the MMSE
35
and a cognitive composite; the results were comparable when using the two severity stagings. For the intact/questionable severity group using the MMSE, there were no differences between executive task performance (WCST T-score 48.5 [13.5] vs TMT T-score 48.6 [13.1];
Discussion
Given the evidence of differential impairment on the WCST and the TMT, both in the clinic and in research, the current study was undertaken to identify the clinical characteristics of participants who fail these tasks. Participants were organized into groups based on their performance on the executive functioning tasks (failing both the WCST and TMT, the WCST alone, the TMT alone, or neither task). Participants also were organized into clinical diagnoses, based on neurological and neuroanatomical impressions, along with their performance on tests from the Uniform Data Set, in order to evaluate the distribution of dementia-related diagnoses within these groups. Finally, participants were organized into dementia severity groups, based on MMSE scores and cognitive composite T-scores, in order to evaluate performance on the executive functioning tasks as a function of severity.
The results of the current study suggest that executive dysfunction, either by impairment on the WCST alone, the TMT alone, or by impairment on both tasks, was observed in AD or other dementia to a greater extent than in HCs or MCI groups. These results are consistent with the literature indicating that executive dysfunction is associated with other areas of cognitive decline among dementia populations 3,5 -7,17 -20 and also with research suggesting a close relationship across clinical populations between executive functioning and other cognitive domains, such as general intelligence, memory, and perceptual and processing speed. 39
In addition, our findings suggested that TMT failure either alone (Group Trails↓) or combined with WCST failure (Group Trails↓/WCST↓) was apparent in dementia at a higher proportion than WCST failure alone (Group WCST↓). As observed in Table 2, three quarters of the participants who failed both executive tasks were diagnosed as having either AD or another form of dementia, as were almost two thirds of the participants who selectively failed the TMT. These groups also displayed worse performances on a measure of mental status, and when categorizing their MMSE performance (using categories correlated with Clinical Dementia Rating Scale performance 35 ) and an overall cognitive composite T-score into a level of dementia severity, the results indicated that TMT performance was significantly worse than WCST performance for both mild and moderately severely impaired participants, but not for intact/questionable severity participants. Alternatively, only a quarter of the selective WCST failure participants were diagnosed with a dementing disorder and, as expected, as were only one-tenth of the executively intact participants. The TMT failure appeared to exhibit a unique contribution toward poor clinical outcomes in our study.
Further, our results suggested that the TMT and WCST performed differently between participants diagnosed with dementia (AD, DLB, and FTD) and MCI/control groups, and that a discrepancy existed between participants’ performance on the two executive tasks. While, not surprisingly, both executive tasks reflected reduced performance in groups with cognitive impairment, participants with more clinically severe diagnoses (AD or other dementia) and worse severity (mild and moderate severity based on MMSE and cognitive composite) displayed worse performance on the TMT relative to the WCST using clinically normed T-scores, whereas no differences across tasks were observed for either milder clinical diagnoses (HC and MCI) or less severe (intact/questionable) participant groups.
These results were anticipated for the groups with a dementia diagnosis given previous research suggesting that in a direct comparison between the two tasks, the TMT was impaired in a larger proportion of a mild AD sample than was the WCST. 24 The authors in that study speculated that although the frontal lobes were implicated in both tasks, the specific brain region involved in each executive task may explain this discrepancy in performance. As has been suggested, the ventromedial prefrontal cortex, which is primarily implicated in TMT performance, is affected earlier in the AD course than the dorsolateral prefrontal cortex, which is associated with WCST performance. 40 As such, TMT failure may be more apparent in AD, as seen in the current study. In addition to differential frontal lobe involvement, in a recent meta-analysis evaluating voxel-based morphology in the prediction of AD, it was observed that individuals with MCI who progressed to AD displayed significant morphometric changes in their inferior parietal lobe and precuneus. 41 Given the involvement of these brain regions in visual spatial attention, motor coordination, and oculomotor skills, 42,43 it would be expected that the TMT task would be compromised under such conditions; consequently, it is not surprising that poor TMT performance may reflect reduced cognitive capacity and possible progression to AD.
The HCs (and participants with Intact MMSE
The finding that no differences existed in the clinical diagnostic distribution among the groups primarily comprising the “other dementia” diagnosis was somewhat unexpected (Table 2). While executive functioning can be seen in DLB, the diagnosis of bv-FTD relies on changes in ‘frontal-related’ behaviors such as increased perseveration, inflexibility, and emotional disinhibition, along with decreases in performance on “frontal-lobe tests.” 47 It was thought that executive impairment would be distributed differently between the two diagnoses because executive dysfunction is the core feature of FTD, however the distributions were comparable. It is possible that degradations to hippocampal projections to the frontal lobe, previously observed in DLB, 48 are resulting in equal rates of executive dysfunction between participants with DLB and FTD in our sample. Problems with parkinsonism in DLB and subsequent manual dexterity issues may also impact TMT performance, as could altered attentional/ working memory networks in DLB. Consequently, our results may provide neurobehavioral support to prior imaging research suggesting frontal-disconnections associated with DLB. In addition, the TMT may be identifying global impairment better than WCST, as identified by the worse performance on the MMSE for the participants also impaired on the TMT. However, rates of decline on the MMSE vary as a function of dementia diagnosis, 49,50 making this connection less clear.
The current study is not without limitations. The small number of participants who were diagnosed as DLB or FTD in the current sample may have influenced our findings between the DLB and FTD groups. As small samples increase the possibility of spurious findings and reduce the power to find true statistical differences, future studies should evaluate a larger number of participants in these diagnostic groups. However, given the challenge of recruiting and maintaining DLB and FTD populations in research, it was felt that their positive impact on clinical relevance outweighed their concerns related to statistical power, and the analyses were consequently included.
In addition, the potential use of these executive functioning measures in consensus conference deliberations is another possible limitation, which may have potentially confounded our diagnostic analyses. The TMT and WCST are part of the overall neuropsychological battery, used to help establish clinical diagnoses; however, the participants in the current study received extensive neurological and neuropsychological assessments, as well as neuroimaging over time with annual consensus conferences. Consequently, the 2 executive measures comprise only a small piece of data that contributes to an eventual consensus diagnosis, which was felt to limit the concern of circularity. In addition, other selected cognitive and computerized cognitive test measures were included that also sampled executive functioning at each annual assessment. The authors felt that the influence of the TMT and WCST variables on the overall diagnoses, and in particular the diagnosis of dementia versus no dementia, was minimal enough to not influence the validity of the analyses. Further studies should be undertaken to confirm our results using consensus diagnoses that did not include these measures.
A further potential limitation is the cutoff score used in the current study to denote impairment. Performances on the WCST and TMT below a T-score of 37 (−1.4 SD) were considered to be impaired, consequently we grouped borderline-impaired and clinically impaired performances together using standard Wechsler nosology. Previous studies have used a more conservative cutoff of −2.0 SD to reduce the likelihood of false positive cases of AD, 24 although support for our chosen cutoff was provided by an Ashendorf and McCaffrey study 26,27 that used a 10th percentile (T = 37; −1.4 SD) cutoff on the TMT to denote “extreme executive difficulties.” In addition, given the clinical relevance of the current study’s aims, and the fact that criteria for MCI recommend a deficit of −1 to −1.5 SD as a guideline for diagnosis, 51 we sought to use executive-score cutoffs that were most consistent with our clinical practices. However, it is possible that our less conservative cutoff score may explain why a few participants were diagnosed as HCs displayed deficits on one of the two executive functioning measures using our criteria and supports the argument for an extensive battery of tests for diagnosing dementia. While it is known that the frequency of MCI diagnoses is inversely related to the stringency of the cutoffs used clinically, 52 future studies evaluating an independent sample using multiple cutoffs could help determine whether the diagnostic correlates of executive impairment remained consistent across cutoffs.
Finally, the manner in which the current study incorporated participants that could not complete the executive measures may be raised as a limitation. As indicated in the Methods section, participants who discontinued one of the executive functioning tasks prior to its completion were assigned a value of T = 20. This practice presumed that poor performance on an executive functioning task was due solely to executive dysfunction, as compared to other cognitive weaknesses impacting the test administration and subsequent performance. In contrast, it is possible that reduced episodic memory, attention/working memory, visual spatial skills, and/or processing speed may have influenced the results of the current study, which is consistent with other research suggesting that these factors are inter-related with executive abilities 53 and that no executive functioning tasks appear to be “process pure.” 54 Despite this possible limitation, however, it was felt that inclusion of such cases was preferable to exclusion, as the alternative would have resulted in a possible selection bias for the study.
In conclusion, the current study indicated that executive dysfunction was associated with increased dementia diagnoses. In particular, TMT failure was implicated in a higher proportion of dementia-related cases than were problems with the WCST alone, and worse TMT performance was observed in mild and moderately severe dementia cases relative to WCST performance. These findings are of clinical relevance given the frequency of discrepancy between the TMT and WCST during clinical evaluations. Our study suggested that when faced with divergent performances on tasks of executive functioning during an evaluation for dementia, the results on the TMT tended to be more in line with clinical impressions and increasing diagnostic severity than the WCST, as the TMT produced more true negatives in dementia populations and was less ambiguous in healthy populations. Additionally, given the time difference between test administration for the TMT and WCST tasks (approximately 10 minutes vs 30 minutes, respectively 55 ) and the fact that poor performance on the WCST is more likely to heighten frustration levels than on the TMT, the TMT appears to have advantages to the WCST during relatively brief dementia assessments or screens. The use of the TMT may also be of benefit when considering broad community screenings for dementia, which by their nature are often required to be fast and noninvasive. Future research should consider evaluating the sensitivity and specificity of a battery comprising the TMT and measures of mental status and memory for use in such circumstances.
Footnotes
This article was accepted under the editorship of the former Editor-in-Chief, Carol F. Lippa.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Research supported by grant NIH-NIA P50 AG08671 and the Michigan Alzheimer’s Disease Research Center. No authors have reported conflicts of interest.
