Abstract
Patients with Alzheimer’s disease typically have initial deficits in memory. Memory testing can be categorized as verbal or nonverbal by the modality of the stimuli used. We compared the discriminative validity of selected verbal and nonverbal memory tests between non-dementia and Alzheimer's disease in Taiwan. Ninety-eight patients with mild Alzheimer's disease and 269 non-dementia individuals underwent story recall test (immediate and delayed recall), and constructional praxis test (copy and delayed recall). The receiver-operating characteristic curve and area under the curve were evaluated to compare between tests. Patients with Alzheimer's disease performed poorly across all memory tests, and the receiver-operating characteristic curve analysis indicated that story recall immediate and relayed recall, and constructional praxis delayed recall had good classification accuracy with area under the curve of .90, .87 and .87 respectively. These results provide support that both verbal and nonverbal memory tests are reliable measure for screening patients with Alzheimer's disease.
Significance Statement
(1) Both immediate and delayed story recall tests show good diagnostic accuracy for differentiating Alzheimer's disease. (2) Constructional praxis delayed recall helps differentiate Alzheimer's disease from non-dementia, while constructional praxis copy may not. (3) Education level affects test performance, so the use of different cutoff values for different education levels is reasonable.
Introduction
Alzheimer’s disease (AD) is the most common type of dementia and it is characterized by progressive cognitive impairment and behavioral disturbance. 1 Although the development of AD is multifactorial, the primary risk factor for AD is advanced age. 2 As average life expectancy has increased globally, 3 the number of patients with AD has also increased rapidly, having far reaching effects on individuals, families and society.
Memory impairment, especially episodic memory, is generally the first and most prominent symptom to appear in patients with AD. 4 When clinical neuropsychological tests are used to assess memory in AD patients, it is clear that both verbal and nonverbal memory assessment are impaired. 5 Common verbal memory tests include the story recall test (SRT), which assesses short- and long-term retention of a meaningful short story or paragraph (such as the logical memory subtest of the Wechsler Memory Scales). 6 Wordlist learning tests, which measure recall and recognition of the word lists across multiple trials, are also commonly used to assess verbal memory.7,8 Most nonverbal memory tests rely on the use of visual stimuli, such as drawing figures from memory, and figure recognition.9,10
Human imaging studies of AD patients have shown evidence of hypometabolism in the bilateral temporal lobes, and that material-specific memory functions are associated with medial temporal lobe volumes, such as verbal (left medial temporal lobe) and visual (right medial temporal lobe) stimuli.11–13
In previous studies, both verbal and non-verbal memory tests were effective in detecting early stage AD. 14 The literature seems to suggest that verbal episodic memory tests better identify those people who develop AD over time.15–17 Also, there is increasing support for the utility of non-verbal memory tests in dementia assessment. Various studies found that visual memory tests could predict memory decline and the development of AD.18–20
Research which directly compares the ability of verbal and nonverbal memory tests to discriminate AD from non-demented individuals is scarce, especially in a Taiwanese population. A meta-analysis of different tests utilized to differentiate AD patients from the cognitive normal population reported that both visual memory tests and verbal memory tests showed good sensitivity and specificity. 21 However, most studies have been conducted in Western populations. Given that these previous findings could be influenced by a variety of factors, including language and culture, it is important to replicate these studies in other cultures to know if the same results are found.
This study aims to investigate the sensitivity and specificity of verbal and nonverbal memory tests for discriminating AD and non-dementia individuals in Taiwan and to determine the corresponding cutoff values.
Methods
Participants
Participants were recruited from the Neurology Outpatients Department of Kaohsiung Municipal Ta-Tung Hospital between June 2021 and December 2022. The study was approved by the Kaohsiung Medical University Hospital Institutional Review Board. The inclusion criteria were: (1) age ≥60 years; (2) presence of cognitive complaints; (3) the ability to complete all neuropsychological tests used in the present study. The exclusion criteria were: (1) presence or history of major psychiatric illness or any other central nervous system disorder other than dementia; (2) serious physical disease with cerebral impact; (3) visual or auditory deficits that cannot be corrected. All participants received a medical evaluation that included a medical history, physical and neurological examinations, neuropsychological examinations, laboratory testing, and neuroimaging studies (cerebral computed tomography or magnetic resonance imaging).
A diagnosis of AD was made according to the 2011 National Institute on Aging and Alzheimer’s Association (NIA-AA) criteria for “probable AD”. 22 Based on their clinical dementia rating (CDR), 23 patients were further classified as having either very mild AD (CDR .5) or mild AD (CDR 1.0). Subjects were identified as being ‘non-dementia’ if they did not meet the NIA-AA criteria for all-cause dementia, the patient’s estimated Mini-Mental State Examination (MMSE) score above his/her education adjusted cutoff scores, and their CDR score was 0.
Assessment Methods
Verbal memory was assessed using the SRT and non-verbal memory was assessed using the Consortium to Establish a Registry for Alzheimer's Disease (CERAD) constructional praxis (CP) task. 24
Story Recall Test
The modified Chinese version of the SRT includes immediate story recall and delayed story recall. The story is similar in structure to the Wechsler Memory Scale logical memory test. 6 However, the story items are modified to better suit local customs. The story is about a woman being robbed and contains 25 details. The story is read to the participant, who is then asked to recall the story immediately including as many details as possible. After 30 minutes, the participant is then asked to recall the same story again. For both the immediate and delayed story recall, all correctly remembered details are summed yielding a maximum score of 25.
Consortium to Establish a Registry for Alzheimer's Disease Constructional Praxis task
The CP task is a measure of nonverbal memory in the CERAD Neuropsychological battery.24,25 The CP task includes CP copying and CP delayed recall. Participants are asked to copy a circle, diamond, overlapping rectangles and Necker cube, yielding a maximum score of 11 (the score range is 0-11). After a 10-item recall and recognition test are given as filler tests, the participants are again asked to recall the initial 4 figures using the same scoring system. The participant is not informed about the second recall task at the start of the test.
Other Assessments
The Cognitive Abilities Screening Instrument (CASI) was administered to all participants and the Mini-Mental State Examination (MMSE) score was derived from the CASI (ie, the estimated MMSE).26,27 For AD participants, the CDR was additionally utilized to assess the severity of AD. 23
Procedures
A modified Chinese version of a 10-item wordlist learning, recall and recognition test from the CERAD battery was used as a filler test. 24 The order of administration for the neuropsychology measures used was as follows: CASI, immediate SRT, 10-item learning, CP copy, 10-item recall and recognition test, CP delayed recall and delayed SRT. All participants performed the tests in the same order.
Statistical Analysis
Continuous variables are presented as the mean ± standard deviation (M ± SD), while categorical variables are presented as number and percentage (n, %). Student’s t-test and Pearson's chi-square test were performed to detect differences between groups for continuous and categorical variables, respectively. Receiver-operating characteristic (ROC) curves and area under the curve (AUC) were generated to compare the ability of the SRT and CP to discriminate between non-dementia and AD participants. The statistical significance was set at P < .05 for all tests.
Results
Demographic Data of the Participants
The Demographic Characteristics of Recruited Participants.
Abbreviations: AD: Alzheimer's dementia, CASI: Cognitive abilities screening instrument, CDR: Clinical dementia rating, SD: standard deviation.
Results of Four Cognitive Tests Among all Participants.
Abbreviations: AD: Alzheimer's dementia, CP: constructional praxis, SD: standard deviation, SRT: story recall test.
Cutoff Values and Area Under the Curve of the Four Subtests
Performance of Four Cognitive Tests for Discriminating Dementia From Non-dementia.
Abbreviations: AD: Alzheimer's dementia, AUC: area under the curve, CP: constructional praxis, SRT: story recall test.

The ROC curves for SRT and CP to detect dementia in different education group were as follows: all education levels, (A) in low education group, (B) in high education group. (C) ROC, receiver operating characteristic; SRT, story recall test; CP, constructional praxis.
According to previous studies, the education level of participants has an impact on their performance in neuropsychological tests.28,29 We therefore analyzed whether the discriminating ability of SRT and CP differed across the 2 education level groups (Table 3). In the low education group, the cutoff values and AUCs of the four subtests for distinguishing dementia from non-dementia patients were as follows: the AUC of the SRT immediate recall was .88 (95% CI 0.80 to .94) when using a cutoff score ≤3. The AUC of the SRT delayed recall was .81 (95% CI 0.72 to .88) when using a cutoff score ≤0. The AUC of the CP delayed recall was .82 (95% CI 0.73 to .89) when using a cutoff score ≤3. The CP copying test did not significantly differentiate dementia from non-dementia patients (P = .28) (Figure 1B).
In the high education group (>6 years of education), the cutoff values and AUCs of the four subtests for distinguishing dementia from non-dementia patients were as follows: The AUC of the SRT immediate recall was .90 (95% CI 0.86 to .93) when using a cutoff score ≤6. The AUC of the SRT delayed recall was .88 (95% CI 0.84 to .92) when using a cutoff score ≤4. The AUC of the CP copying was .62 (95% CI 0.56 to .68) when using a cutoff score ≤9. The AUC of the CP delayed recall was .87 (95% CI 0.82 to .91) when using a cutoff score ≤6. The AUC of CP copying was still significantly lower than the other three subtests for distinguishing dementia from non-dementia patients (Figure 1C).
Finally, we compared the diagnostic accuracy of the four tests between the high and low education groups. The diagnostic accuracy of the four tests for detecting dementia in individuals with different education levels was not significantly different (all P > .05).
Discussion
In the present study, the results showed that SRT immediate recall (sensitivity 87.76%; specificity 79.18%), SRT delayed recall (sensitivity 92.86%; specificity 74.85%) and CP delayed recall (sensitivity 83.67%; specificity 72.76%) had good discriminating power for differentiating dementia from non-dementia in participants. However, the CP copying (sensitivity 66.33%; specificity 62.69%) test was not useful for detecting dementia from non-dementia. We have confirmed the usefulness of both verbal memory tests (ie, SRT) and non-verbal memory tests (ie, CP) for the purpose of discriminating dementia from non-dementia in the general Taiwanese population. We also investigated the effect of education level on the discriminating properties of SRT and CP.
In a meta-analysis of diagnostic accuracy for memory tests, including immediate memory, delayed memory and associated learning, the test was considered as having “adequate” diagnostic accuracy if the values of sensitivity and specificity were equal or above 70%. 21 Based on this, SRT (both immediate and delayed recall), which is one of the most commonly used verbal episodic memory tests, and CP delayed recall, can be considered as having adequate diagnostic accuracy for discriminating mild AD and non-demented elderly individuals. This finding is consistent with previous studies.30–33
The CP task from the CERAD is commonly used to assess the visuo-constructional ability and visual memory of patients with neuropsychiatric disorders, including copying and delayed recall tests. The results in the present study showed that delayed CP was good at discriminating between mild AD and non-dementia, but that CP copying was not adequate for discriminating between the 2 groups. In the original CERAD study, CP copying was helpful for the staging of AD severity, but not for detecting mild AD. 24 Our participants in the AD group all had mild stage AD (CDR .5 and 1.0). Therefore, our findings are in line with previous studies which indicated that CP copying can be used to detect moderate AD but is not useful for detecting mild AD disease.32,33 This difference may be explained by the fact that deterioration of visuospatial ability, as measured by CP copying, typically appears later in AD compared with episodic memory impairment. 34 Another possible explanation is that the copying figures task involves various other abilities in addition to visuospatial ability, such as motor skills and organization.35,36 These abilities also decline in non-demented elderly individuals so CP copying is not as sensitive for discriminating AD from non-demented elderly individuals. 37
Clinically, verbal cognitive tests are most used in evaluation of memory. Studies suggested difficulty retrieval words confound performance on verbal cognitive measures and may lead to the impression of memory impairment.38,39 Hence, verbal memory test results should be interpreted with caution and integrate of the test results with other forms of evidence.
In our study, the effect of education level on the cognitive tests was reflected in the different optimal cutoff values for the high and low education level groups. There was a difference of 3-5 for the cutoff values between the high and low education level groups among the four tests. Our findings were consistent with findings from previous studies that both SRT and CP performance can be poorer in less educated groups.40,41 It is reasonable to set different cut-off points for subjects with different levels of education.
A strength of our study is that we reported the psychometric properties and discriminating abilities of SRT, CP copying and CP delayed recall for AD in a Taiwanese population. There were however some limitations in this study. First, all participants were enrolled from a memory clinic. That means that even participants in the non-dementia group had subjective memory complaints and they may not accurately represent normal healthy people. Nevertheless, the study reflected the reality of people seeking help for cognitive impairment in memory clinics in Taiwan. Second, our sample size was relatively small compared with previous studies, and we did not have enough samples to determine different cutoff values for different age and education levels. Third, all our dementia participants had AD, so the results may not be applied to other dementia subtypes.
Conclusions
In summary, the present study showed that both verbal and non-verbal memory tests, namely SRT immediate recall, SRT delayed recall and CP delayed recall, can be used to detect mild AD in elderly individuals in Taiwan. The influence of education should be considered in the use of these memory tests, and the use of different cutoff values for different education levels is reasonable. Further studies with larger samples, and the collection of data from a community population, including different dementia subtypes are warranted to determine whether the same results as this study can be obtained or not.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported by National Health Research Institutes (NHRI-11A1-CG-CO-06-2225-1, NHRI-12A1-CG-CO-06-2225-1), Kaohsiung Medical University Research Center (KMU-TC112B02), Department of Neurology, Kaohsiung Municipal Ta-Tung Hospital, Kaohsiung, Taiwan (KMTTH- 111-004).
