Abstract
Background:
The classic version of the Wisconsin Card Sorting Test (WCST) consists of correctly sorting 128 cards according to changing sorting criteria. Its application is costly in terms of the time employed, with all the negative consequences this entails (decrease in motivation, frustration, and fatigue).
Method:
The main objective of this study was to test the usefulness of the shortened version of the WCST as compared to the full test by analyzing the equivalence between the two decks comprising the full 128-card version on a sample of patients diagnosed with sporadic late onset Alzheimer disease (SLOAD) and to check its clinical usefulness.
Results:
The variables showed equivalence between the two decks and their ability to differentiate between the control group (CG) and the Alzheimer disease (AD) group.
Conclusion:
The scores obtained suggest equivalence between decks and that the application of only the first deck is sufficient.
Keywords
Introduction
The term “executive function” (EF) is frequently utilized to refer to a series of high-level cognitive activities performed by a complex and multifaceted mental system with its neurological basis in the prefrontal cortex, particularly the cingulate and anterior neocortical areas. It is a system that deals with the operation of “controlled” processes with any type of information. 1 A great variety of skills has been included within the so-called EFs, such as the ability to set goals, develop plans of action, flexible thinking, inhibition of automatic responses, self-regulation of behavior, and verbal fluency. 2,3 Thus, any disturbance of these functions can limit an individual’s ability to lead an independent and productive life, even if other cognitive abilities remain intact. 4
In Alzheimer disease (AD), the EFs are altered prematurely such that the patient shows an inability to plan and execute goal-oriented actions as well as an important lack of cognitive flexibility.
Generally, the first indications of AD are cognitive complaints, particularly in relation to memory. But problems are also detected in EFs, which must also be taken into account. 5 Some studies even suggest that executive impairments are present in the predementia stages of the disease. 6 This type of deficit in the frontal functions is one of the ones having the greatest impact on an individual’s functioning and his or her ability to remain independent. The capacity of patients with AD for abstraction is also affected, making it hard for them to understand metaphorical or figurative language. Also typical in the more advanced stages is the lack of inhibition of motor and verbal responses, which can give rise to socially inadequate behavior, such as sexual disinhibition or vulgar language. 7 -9
The literature has made available many instruments for measuring the EFs, and one of them is the Wisconsin Card Sorting Test (WCST). The main objective of this test is to assess mental flexibility. It was developed by Berg and Grant 10,11 to evaluate a person’s capacity for abstraction in response to a change in environmental contingencies. But it was not until 1981 that Heaton utilized it as a clinical instrument and the first normative data and rules for correction were published in 1993.
Performing the test correctly requires strategic planning, organized searching, the ability to utilize feedback from the environmental surroundings to make changes on the cognitive test, goal-oriented behavior, and the ability to modulate impulsive answers. 12
The classic version of this test consists of correctly sorting 128 cards according to changing sorting criteria (color–form–number). Although the test has been widely implemented in the clinical setting, its application is costly in terms of the time employed, with all the negative consequences this entail (decrease in motivation, frustration, fatigue, etc). 13
Authors such as Nelson 14 and Vayalakkara 15 proposed several shortened versions. Among them, the one most highlighted is a 64-card version (WCST-64), whose application is similar to that of the standard version (WCST-128) and allows clinicians to obtain the same information from the neuropsychological point of view, with the advantages entailed in using only half of the cards. 16
Different studies have shown that performance on the full version of the test is much worse in patients with a diagnosis of dementia than in controls without dementia and furthermore that the WCST-64 version is sensitive to the impairments in EF typical of dementia, thus supporting the usage of the short version. 17 Among the different versions, the 64-card version stands out because its application is similar to the standard version, and it allows clinicians to obtain the same neuropsychological information with the advantage of using only 1 deck. 18 -21 Other authors, such as Sherer, 22 found a high association between the WCST-128 and the WCST-64, as well as high agreement in classifying patients as “normal” or “pathological,” although the WCST-128 turned out to be more sensitive in detecting impairments, since the patients showed greater difficulty in performing with the second deck of cards.
In this study, we aim to compare the usage of both of these versions by comparing the 2 decks that form the complete version of the test and thus assess the diagnostic effectiveness of the shortened version in a sample of patients diagnosed with sporadic late onset Alzheimer’s disease (SLOAD).
Participants and Methods
Participants
The sample utilized in this study comprised 213 patients who took part in the study voluntarily and were divided into 2 groups: 1 of 141 patients diagnosed with SLOAD and another of 72 patients who, at the time of examination, showed no type of neurological or psychiatric disorder, neither did they have any record of alcoholism or drug addiction. They were thus utilized as the control group (CG).
The patients with SLOAD were selected in the Neurology Service of the University Clinical Hospital of Salamanca (Spain). Selection of the patients in the CG began when the clinical group was already underway; the patients included in this group were chosen from among the relatives of the patients attending this Service.
In order to be included in the clinical group, the patients had to satisfy the following criteria: (1) neurological and neuropsychological diagnosis of AD according to Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5) criteria 23 and (2) cutoff age of 60 in order to differentiate between early and late onset according to parameters established by the researchers and based on the observations of early and late presentation in the population studied by the group. For the patients with SLOAD, diagnostic tests were utilized in order to exclude those individuals with possible secondary dementia. Included in these tests were thyroid function, luetic serology, and the levels of vitamin B12 and folic acid Table 1 shows the descriptive data of the socioeconomic and clinical factors in both groups.
Descriptive and Comparative Data of the Socioeconomic and Clinical Variables of the 2 Groups.
Abbreviations: SD, standard deviation; t exp, t-test; χ2, Chi-square test.
Significant differences were found between the groups in the following factors: age, years of schooling, mental status, and overall IQ.
No significant differences were detected in relation to gender (P = .072), socioeconomic level (P = .995), or educational level (P = .163).
Procedure
The evaluation of each patient in the clinical group consisted of a full neurological examination performed at the Neurology Service of the Clinical Hospital of Salamanca, a semistructured interview on sociodemographic and clinical aspects and a neuropsychological evaluation with a battery of tests designed for this study.
Sociodemographic and clinical evaluation
A sociodemographic and clinical questionnaire was applied to each patient and to the patients in the CG. This questionnaire included, according to the patient’s own self-report, information on his or her sociodemographic profile as well as medical, neurological, and psychiatric variables. The information thus obtained was contrasted and expanded by consulting the hospital clinical record where the patient had been selected. The main purpose of this register was to establish a control over these variables and at the same time guarantee a thorough description of the sample.
Neuropsychological evaluation
The tests comprising the neuropsychological examination battery designed for this study were selected for their validity and because they had been shown to be sensitive to neuropsychological deterioration in patients with AD.
Neuropsychological testing
Mental Status: Mini-Examen Cognoscitivo (the Spanish version of the MMSE). 24
Intelligence: Wechsler Adult Intelligence Scale (WAIS-III). 25
EFs: WCST. 26
The neuropsychological evaluation, both the administering of the tests and their correction, was performed according to the norms for each of the tests. In the case of the WCST, the complete WCST of 128 cards was administered to all the patients according to the norms set out in the manual by Heaton. 26
Data analysis
In the descriptive analysis, the means and standard deviations of sociodemographic and clinical variables were calculated. One-way analysis of covariance (ANCOVA) was utilized to analyze the differences in the global WCST scores to observe what best differentiated test performance between the patients in the AD group and the CG. Once the variables that best differentiated test performance between CG and AD were identified, those variables that did not differentiate were eliminated from the analysis. The percentage of nonperseverative errors, failure to maintain the category, and learning to learn were eliminated from the analysis. It is customary to utilize a smaller number of the scores obtained on this test, such as perseverative responses, errors, and complete categories. 27 Only the significant variables were utilized and in cases where the item was in terms of percentages, we worked only with the percentages. Mixed between–within patients ANCOVA was conducted to analyze these significant variables. This technique analyzes whether there would be differences between the decks of 64 cards (within-patients factor), study groups (between-groups factor), and the interaction between both factors. If the interaction was significant, the differences in scores between AD and CG groups would be analyzed by 1-way, between-patients ANCOVA for each deck of cards separately. Accordingly, the differences in scores between the decks of cards would be analyzed by 1-way within-patients ANCOVA for each study group. If the interaction was not significant, the main effects of factor would be interpreted. The chosen covariables for all the analyses were age and years of education, the same as occurs with the standardized scores in the Heaton manual. 26 Cohen effect size (d) was used as a measure of magnitude of effect between the AD and CGs and between the first and second decks of cards. The level of significance chosen was 5%. IBM-SPSS 23 was used to analyze the results.
Results
Table 2 shows the variables in which significant differences were found, the patients with CG having performed better than the patients with AD.
Descriptive Statistics From the Study Groups and P Value From 1-Factor ANCOVA Analysis.
Abbreviations: ANCOVA, analysis of covariance; AD, Alzheimer disease; CG, control group; SD, standard deviation.
Based on these results, the variables selected to analyze the differences in mean scores between the 2 decks were as follows: number of hits, percentage of errors, percentage of perseverative responses, percentage of perseverative errors, percentage of conceptual-level responses, sorting category, and trials to complete first category.
With these variables, we set out to see how the patients in both groups behaved with each of the versions and to observe whether the scores obtained with deck 1 differed from those obtained with deck 2 or whether the 2 decks provided the same information, thus showing that the usage of only 1 deck is sufficient for obtaining the scores in this test.
Number of hits
Analysis of variance of this variable showed that there was no interaction between the 2 groups, meaning that behavior in the 2 decks is similar in both groups (P = .948; F 1,207 = .004). There were statistically significant differences between the decks (P = .02; F 1,207 = 5.35; d = .49), and the differences between the CG and the AD were indeed significant (P < .001; F 1,207 = 49.93; d = −.41). These findings indicate that for the “number of hits” variable, although the behaviour is similar in the two groups, there is a significant decrease in the number of hits with the second deck. This decrease is the same in both groups although the scores are lower in the AD group as can be observed in Figure 1. In this variable, the hits tend to decrease as the test progresses, affecting both groups in the same way.

Differences between the decks in the “number of hits” variable.
Percentage of errors
Interaction was detected in the case of percentage of errors (P = .015; F 1,207 = 6.008), that is, the difference between groups is different in each deck. In the first deck, the standardized mean difference is .55 (P = .004, F 1,207 = 8.53), and for the second deck, it is .89 (P < .0001; F 1,207 = 22.23). The differences were observed above all in the scores of the second deck as can be seen in Figure 2.

Differences between decks in the percentage of errors.
Analysis of these differences between the 2 decks for each group showed that the difference was not significant in either group (CG: F 1,67 = .48, P = .490, d = −.05; AD: F 1,138 = .57, P = .452; d = .39), indicating that both decks contribute the same information in both groups, despite the between-group differences, the difference in means, and the variability being higher in AD than in CG.
Percentage of perseverative responses
No interaction was found, and therefore, the behavior in both groups was similar (F 1,207 = 2.16, P = .144). There were no differences between the decks (F 1,207 = .65, P = .423; d = .07), but there were indeed significant differences between the 2 groups of patients (F 1,207 = 14.14, P < .0001, d = −.46). Thus, the information offered by both decks of the test for this variable is equivalent, and both groups show similar behavior; furthermore, the scores between the groups are very different. This is a good indicator of the alterations in EF in patients with AD.
Percentage of perseverative errors
This same pattern is found in the percentage of perseverative errors. No interaction was detected (F 1,207 = 1.23, P = .269). There were no significant differences between the decks (F 1,207 = 1.58, P = .211, d = −.12), but there were indeed significant differences between AD and CG (F 1,207 = 14.29, P < .0001; d = .46). As in the previous variable, both decks are equivalent, offering the same information and differentiating the control patients from the patients with AD.
Percentage of conceptual-level responses
Interaction was not significant for this variable either (F 1,207 = .33, P = .569). No significant differences were found between the decks, although in this case the differences were close to significance (F 1,207 = 3.35, P = .069, d = .28). Significant differences were detected between the 2 groups of patients (F 1,207 = 30.18, P < .0001, d = −.61).
Sorting category
No significant interaction was detected in the case of the number of categories (F 1,207 = 1.27, P = .260); and there were no significant differences found between the decks (F 1,207 = .37, P = .54; d = .11), although they were indeed found between the groups (F 1,207 = 24.43; P < .0001; d = −.6), indicating that the control patients showed higher values than the patients with AD.
Trials to complete the first category
This pattern is also repeated in trials to complete the first category, there being no interaction (F 1,207 = 2.69, P = .103); the decks show equivalence (F 1,207 = .004, P = .953, d = −.21) and the groups are differentiated according to diagnosis, making it a good tool to differentiate between groups (F 1,207 = 38.58, P < .00001, d = .75).
Discussion
The main objective of this study was to compare the complete version of the WCST-128 with the shortened 64-card version by analyzing the equivalence of the 2 decks of 64 cards each in the complete version of the test on a sample of patients diagnosed with SLOAD and to test its clinical utility.
After analysis of variance, the results indicate that the scores obtained suggest equivalence between deck 1 and deck 2 and that the application of only the first deck is sufficient. This would be compatible with the statistical justification given by the WCST-128 test manual, 26 which states that there is good reliability based on the coefficient of homogeneity in the 2 halves of the test, thus justifying the usage of only 1 deck, given the equivalence between the 2 decks.
Our findings are in line with many studies that have demonstrated the utility of the short version (WCST-64), both in the clinical population 13,17,20,28 and in the population without the pathology. 16,18
In our study, we selected the items that showed significant differences between the performance of the CG and the patients diagnosed with AD on the complete 128-card test. The variables that showed differences between these 2 groups were the following: number of hits, percentage of errors, percentage of perseverative responses, percentage perseverative errors, percentage of conceptual-level responses, sorting category, and trials to complete first category. These results are in line with other studies of similar characteristics. 16,29 Furthermore, all the variables adequately differentiated the AD group from the CG, this being a good indicator of alterations in the EFs.
As regards the comparison of performance between patients with CG and AD in both decks, the results obtained suggest the equivalence between the decks for most of the variables mentioned above, in which no significant differences were found between the 2 decks in terms of the performance of either group. These findings are compatible with those published in a study comparing the validity of the 2 versions of the test, the shortened and the complete versions, in a sample of patients who had undergone cranioencephalic trauma; in 3 of the variables in which we found equivalence between the decks, in that study, there was a correlation in the scores between the 2 versions greater than .7 (perseverative responses, sorting category, and trials to complete first category). 13
In the case of the “number of hits” variable, significant differences were found in the scores between the 2 decks; it was observed that the number of hits in the second deck decreased significantly in both the groups. A possible explanation for this is the long duration of the test which places an increased cognitive demand on the patients and therefore the lower scores. 13 “Percentage of errors” increases in the second deck, but only in the AD group, although the differences are not significant when they are analyzed for each of the groups. This decrease in performance as the test progresses could be explained by the long time it takes to administer (30–60 minutes), being particularly complicated for the patients with AD 30 or CG participants. 31 This situation justifies the need to utilize the shortened version of the WCST for a reliable test to evaluate the EFs in elderly populations or populations with neurodegenerative diseases.
The WCST has been shown to be a good tool for measuring EF 16 , and furthermore, there seems to be a direct relation between performance on this test and a patient’s clinical condition, contributing relevant information for a diagnosis of AD. 30 As can be seen in our findings, this test is a tool that is sensitive to AD and allows clinicians to differentiate patients with the pathology from patients without it. The main objective of our study was to demonstrate the utility of the shortened version in patients with AD; a similar study aimed at learning the utility of the WCST-64 in patients with AD and PD 17 concluded that the usage of the shortened version is advisable, and our findings can be associated with the ones found in that study.
Other shortened versions of the test have also been used, such as Nelson modified version. 14 Some authors express caution when comparing the results between the 2 versions, emphasizing that it is important to have specific data available for each pathology 19 and not to generalize the results obtained solely from the nonclinical population. This supports the need to prove the validity of the test for each pathology, which is why we have focused on AD in our study.
Furthermore, both from the neuropsychological context and the research perspective, an increasing need has been observed to have shorter protocols that facilitate the task, since a protocol that is too long can consume resources that in some cases are very limited. 31
By way of conclusion, we can say that our findings indicate that the use of only 1 deck (the WCST-64) provides the same information in patients with AD as the use of both decks (the full version), as we found the 2 decks to be equivalent for all the items with the exception of “number of hits,” which decreases with the second deck, possibly as a result of the long duration of the test. 13 It is also important to underscore the sensitivity of the test, since it found significant differences between the CG and the experimental group for all the test items. This shows that it is a good indicator of AD, which can be extremely useful in clinical practice when accompanied by other diagnostic tests.
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflict of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
