Abstract
Background:
Findings from language sample analyses can provide efficient and effective indicators of cognitive impairment in older adults.
Objective:
This study used newly automated core lexicon analyses of Cookie Theft picture descriptions to assess differences in typical use across three groups.
Methods:
Participants included adults without diagnosed cognitive impairments (Control), adults diagnosed with Alzheimer’s disease (ProbableAD), and adults diagnosed with mild cognitive impairment (MCI). Cookie Theft picture descriptions were transcribed and analyzed using CLAN.
Results:
Results showed that the ProbableAD group used significantly fewer core lexicon words overall than the MCI and Control groups. For core lexicon content words (nouns, verbs), however, both the MCI and ProbableAD groups produced significantly fewer words than the Control group. The groups did not differ in their use of core lexicon function words. The ProbableAD group was also slower to produce most of the core lexicon words than the MCI and Control groups. The MCI group was slower than the Control group for only two of the core lexicon content words. All groups mentioned a core lexicon word in the top left quadrant of the picture early in the description. The ProbableAD group was then significantly slower than the other groups to mention a core lexicon word in the other quadrants.
Conclusions:
This standard and simple-to-administer task reveals group differences in overall core lexicon scores and the amount of time until the speaker produces the key items. Clinicians and researchers can use these tools for both early assessment and measurement of change over time.
Speech and language abilities are important factors in detecting cognitive impairments in older adults.1–5 They are especially important given that evaluation of speech and language can be simple, convenient, and non-invasive compared with many other diagnostic procedures. 6 Traditionally, using discourse-level language to investigate the lexical/semantic system in this group has been challenging due to the time and expertise required to collect, transcribe, and analyze large numbers of discourse samples.7,8, 7,8 However, shared databases and advances in computer technology, natural language processing, and machine learning procedures have vastly improved our ability to use connected speech as an efficient and non-invasive classification and measurement tool.9,10, 9,10
Automatic detection of dementia from connected speech has achieved varying degrees of accuracy depending on which classifiers and features are used. Most studies report accuracies ranging from mid-70% to mid-80%, and the best performing analyses reach close to 90% accuracy (see reviews).6,11–13, 6,11–13 Many of these studies analyze Cookie Theft picture descriptions from the Pitt corpus in DementiaBank 1 .14,15, 14,15 These techniques use various speech and language features (e.g., lexical, acoustic, temporal) from relatively short language samples to produce algorithms that can successfully detect dementia. It would be advantageous to find ways to translate some of that machine learning knowledge and expertise to the clinic.
Core lexicon is a discourse analysis tool developed from large, shared databases that use standard elicitation protocols, standard transcription formats (CHAT, https://talkbank.org/manuals/CHAT.pdf), and automated language analysis programs (CLAN, https://dali.talkbank.org/clan/). 16 Core lexicon analysis measures the “typicality” of words used in discourse based on normative data (words produced by 50% or more of controls who did the task). 17 It has the advantage of being straightforward and low-tech for busy clinicians. In fact, core lexicon can be reliably scored without transcription, alleviating one of the primary barriers to discourse analysis in clinical practice. 18 Core lexicon may be particularly useful in dementia, since adults with Alzheimer’s disease (AD) and mild cognitive impairment (MCI) often use less precise language than adults without cognitive impairments and would likely score lower on the total number of core lexicon items produced. The lexical/semantic system is most vulnerable to cognitive impairment in dementia, and language is often described as “empty”.19–21 Thus, early indications of impairment from a simple language task could facilitate early intervention such as clinical drug trials, external memory aid treatments, counseling, or lifestyle changes.
Slegers et al. conducted a systematic review of connected speech features from picture descriptions by individuals with Alzheimer’s disease across languages and reported that 61% used the Cookie Theft picture. 22 A core lexicon checklist for this task was developed by Dalton et al. using transcripts from 45 healthy controls in the Pitt corpus in DementiaBank. 23 The list consists of 26 words, including 12 function words (determiners, pronouns, prepositions, etc.) and 14 content words (nouns, verbs). Croisile et al. created a list of 23 information units created for the Cookie Theft picture based on the four categories of subjects, place, objects, and actions that has been used to analyze differences between groups with and without dementia. 20 Another list of 23 information units was derived from Swedish and English speakers’ Cookie Theft picture descriptions using cluster models. 24 The difference between these lists and the core lexicon list is that the core lexicon list contains additional function words.
Core lexicon scoring is straightforward, with 1 point given for each word produced from the checklist, regardless of the number of times it is produced. Importantly, inflected forms of core lexicon items receive credit (e.g., if
Fraser et al. (2019) reviewed speech and language findings for the Cookie Theft picture task in this population, focusing specifically on measures of information content, which are consistently reported to be reduced in AD. 24 Previous studies have shown that groups with and without dementia differ in the production of information units on the Cookie Theft description task, with AD groups producing fewer information units.20,25,26, 20,25,26 Bschor et al. found that German-speaking individuals with MCI did not differ significantly from the control group on the number of persons and objects, localizations, and actions mentioned. 27 Measures of efficiency (e.g., information units per second) are also reported to be reduced in AD, as individuals with AD use more time to produce fewer relevant words.3,25,28,29, 3,25,28,29 Fraser et al. (2019) reported that MCI group participants produced relevant information units at a significantly slower rate than controls, while not differing significantly in overall speaking rate. 24 The analysis was based on computing information efficiency (number of relevant information units, divided by the total time). Newly available automated word alignment analyses may allow for a related temporal measure of the elapsed time until essential lexical information is produced. This measure would give clinicians an additional tool to use when assessing individuals with complaints of cognitive impairments. 30 Recent work on pauses in connected speech reinforces the use of temporal features in early diagnosis of AD. 31
The analysis of core lexicon (both word production and timing) can also incorporate visuospatial aspects of the picture to explore dementia-related changes such as figure-ground analysis, agnosia, and visual perceptual organization.32–34 Quadrants and methods such as spatio-semantic graphs and various eye-tracking and image-text alignment procedures have been used in conjunction with linguistic analyses of the Cookie Theft picture in this population.35–43 These studies have reported significant differences between cognitively impaired and unimpaired groups as well as improved results for automatic classification when adding visual processing features.
The goals for this project were to better understand the linguistic behaviors that underlie the performance of these automated classification models. To accomplish this, we asked the following questions: Do the groups differ in total number of core lexicon words produced? Do the groups differ in the elapsed time before core lexicon content words (nouns, verbs) are produced? In terms of visuospatial processing, do the groups differ in elapsed time before core lexicon content words are produced for each quadrant of the picture?
METHODS
Participants
Participants with diagnoses of ProbableAD
2
(
Participant demographics and language sample data
Procedures
Language samples were transcribed in CHAT format and analyzed using automated commands from the CLAN program. CLAN is freely downloadable software (https://dali.talkbank.org/clan/) that includes the CHAT editor and allows for automated analysis of linguistic and discourse structures.
49
Two new CLAN commands were used: 1) the basic CORELEX command, which searches for lemmas on the morphological tier of the CHAT transcript (% mor) to compute the total number of Cookie Theft core lexicon words (see the Supplementary Material) produced in the sample at least once;
17
and 2) a modified version of the basic CORELEX command that searches on the word alignment tier of the transcript (% wor) to compute the time until a given word from the core lexicon was used. The example below shows a Participant’s (PAR) utterance from a CHAT transcript. The main speaker tier, *PAR, shows the speaker’s output with the utterance time stamp (in ms); the % wor tier has timestamps for each word; and the % mor tier has part-of-speech and morphological tagging for the utterance. Once a language sample is transcribed and linked to the media file, the % wor tier is created automatically using an automated Batchalign command and the % mor tier is created automatically with the MOR command in CLAN.
30
*PAR: mom &-uh washing or drying dishes. •9650_13350• % wor: mom •9650_10260•&-uh •10260_10660•washing •11400_11810•or •11810_12010•drying •12010_12570•dishes •12570_13350•. % mor: n|mom part|wash-PRESP coord|or part|dry-PRESP n|dish-PL.
For this utterance, the CORELEX command would score one point each for the production of
Timing data were computed in msec from the beginning of the participant’s description of the picture until the time when each content word was first produced using this command:
Finally, the content words were broken into quadrants, such that each word unambiguously belonged to a particular quadrant, as can be seen in the Cookie Theft picture in Fig. 1: Top Left included

Cookie Theft picture quadrants.
Statistical analysis
ANOVA was used to determine if there was a difference in the means of total number of core lexicon words produced at least once by each group. We used survival analysis to investigate the rate at which each core lexicon word was expressed by a given point in time (i.e., the hazard rate) and used the survival curve, a function of the hazard rate, to display the distribution of the time until a core lexicon word was said. Kaplan-Meier estimates of the survival curves were used to visualize the distribution of the time (
RESULTS
Analysis of group differences
ANOVA tests and Tukey
The average number of words used by each group in the picture description task and the average number of seconds spent on the task appear in Table 1. For number of words, the groups were not significantly different (F = 2.33486,
Overall core lexicon differences across groups
Figure 2 shows the empirical cumulative distribution function (cdf) of the total number of

Cumulative distribution of core lexicon words produced.
Figure 2 reveals another interesting feature about the performance of these groups. After about 17 or 18 words, the MCI group’s cdf curve begins to look more like the ProbableAD group and less like the Control group, suggesting that the MCI group may be a mixture of individuals, some of whom performed more like the ProbableAD group. A one-way ANOVA test was conducted to analyze this feature. Participants from each of the three groups were divided into two subgroups based on the number of unique core lexicon words they produced: subgroup 1 included participants who produced 18 or fewer core lexicon words; subgroup 2 included those who produced more than 18 core lexicon words. For subgroup 1, no significant difference was found across participant groups in the number of core lexicon words produced (F = 0.861;
To explore these results further, we examined the function words and the content words separately. Figure 3 shows the cdf of the total number of unique core lexicon

Cumulative distribution of core lexicon function words produced.

Cumulative distribution of core lexicon content words produced.
Timing of core lexicon word production
For each core lexicon content word, Kaplan-Meier survival curves were used to visualize the proportion of individuals who produced the word by a given time. The curves appear in Figs. 5–8, grouped by where they appear in the picture. For each curve, the x-axis shows time in seconds beginning at time 0, and the y-axis shows the proportion of individuals in each group who produced the core lexicon word. At time

Elapsed time until core lexicon content word from Top Left quadrant is produced.

Elapsed time until core lexicon content word from Top Right quadrant is produced.

Elapsed time until core lexicon content word from Bottom Left quadrant is produced.

Elapsed time until core lexicon content word from Bottom Right quadrant is produced.
The Kaplan-Meier survival curves make it easy to visualize the elapsed time until the groups produced the core lexicon content words. As an example, the first graph in Fig. 5 shows the distribution of the time it took participants in each group to say the word
Figure 6 shows a pattern where the ProbableAD group is slower to produce three of the four words (
For the majority of the core lexicon content words the same trend occurred, with the ProbableAD group differentiating itself early and the MCI and Control groups displaying varying differences of a lesser degree. A dramatic example is the word
Table 2 displays the results of the log-rank test for differences in the elapsed time to production for each of the 14 core lexicon content words, respectively. The groups differed significantly in the time it took to produce 11 of these words. The only words that showed no difference in time to produce were
Global test for differences in the elapsed time to production of core lexicon word among groups
Results of the Cox proportional hazards regression model: Differences between groups in the time to production of core lexicon content words controlling for age, sex, and education
*Hazard ratio estimate with 95% confidence interval and
Pairwise comparisons revealed that the ProbableAD group produced 10 of the 14 words more slowly (longer elapsed time until production) than the Control group, and the MCI group produced 1 word more slowly than the Control group. The MCI group produced 8 of the words more quickly than the ProbableAD group. It was never the case that MCI or Control groups produced any words more slowly than ProbableAD, or that the Control group produced any words more slowly than the MCI group.
Analysis of picture quadrants
To analyze where the individuals in each group were looking, we analyzed the elapsed time until the first word for each quadrant was said for each group. Figure 9 shows the survival curves for all four quadrants of the picture. With the exception of the Top Left (log-rank

Elapsed time until any quadrant content word is produced.
DISCUSSION
Overall, the data suggest that the measure of
The finding of significantly lower core lexicon scores in the ProbableAD group compared to Controls was expected, given the vast literature on word-finding problems in this population.19–21 More specifically, the results are consistent with reports of fewer information units in AD groups than control groups in both English and French speakers for the same Cookie Theft picture description task.20,25, 20,25 Information unit scoring differs slightly from core lexicon scoring in that it includes synonyms (e.g.,
Although core lexicon is a measure of typicality rather than informativeness, it seems likely that the underlying basis for the observed group differences reported here and in previous research is the same. It is generally accepted that reduced informativeness in MCI and AD groups compared to Controls is a result of degradations in the semantic network that impede lexical access. This would be supported by the significant difference found between the Controls and both groups in the total number of unique core lexicon content words produced. It is likely that reduced typicality of word usage is driven by some degree of lexical access impairment. However, it is also possible to imagine a scenario in which typicality is maintained while informativeness is impaired (e.g., reduced overall output, but intact typicality of words). Therefore, future research should investigate the extent to which informativeness and typicality represent shared or distinct mechanisms of lexical and semantic access.
The MCI group’s performance resembled that of the Control group until about 17 of the 26 core lexicon words were produced, at which point they began to resemble the ProbableAD group. Though not quantitatively dramatic, the difference was significant and identified a clear shift in the MCI group’s performance. This shift likely reflects the variability among individuals in the MCI group, which is consistent with many reports in the literature, such as the “mixed evidence” (p. 926) described by Mueller et al. in their review of picture description tasks in this population and Fraser et al.’s description of MCI as a “heterogeneous condition with varying etiologies” (p. 15).2,52, 2,52 A good way to learn more about this group’s behavior would be to increase sample sizes and follow the groups longitudinally. Given that some proportion of the MCI group is likely to convert to a diagnosis of dementia, further analyses may show that core lexicon analysis could be a potentially simple diagnostic and predictive tool.53–55
Another consideration regarding the shift in the curve after the production of 17 core lexicon words likely involves the relatively intact syntax in individuals with MCI and dementia on picture description tasks.1,56, 1,56 Though grammatical complexity may be reduced, such that sentences are simpler and include fewer clauses, the simple function words from the core lexicon list such as the conjunction (
The new word alignment tier in CHAT files allowed for the analysis of elapsed time until a core lexicon word was produced. This analysis measures an aspect of efficiency slightly differently from the way it has been reported in the literature for this task: dividing the total number of information units produced by the total sample time, yielding a result like 0.30 information units per second. Those studies report significant differences between AD and control groups in efficiency and even early MCI and control groups. However, that computation does not necessarily reflect the actual timing of the production of the information units.3,28,53, 3,28,53 In this study, the Control group produced most of the core lexicon content words (10/14) in significantly less time than the Probable AD group, as did the MCI group (8/14). Compared with the MCI group, the Control group produced one core lexicon content word in significantly less time. The MCI group’s curves for these 14 words can be seen to either fall between the other two groups (e.g., for
It is important to consider these results in light of the fact that rate of speech differs across these three groups. If the results were simply due to the ProbableAD group’s slower rates of speech (average of 89 words per minute versus 128 and 120 in the MCI and control groups, respectively), the Kaplan-Meier curves would show similar curves for each group shifted slightly to the right for the speakers with slower rates. As this was not a timed task, individuals had as much time as they needed to describe the picture. Instead, the curves show different patterns across words, where sometimes, as in the word
Finally, Kaplan-Meier curves were created to show the time until any word in a given quadrant was said, using this automated temporal analysis as a low-tech way to track visual processing of the picture. Eye tracking data using a variety of devices has proven to be effective in automatic dementia detection using the Cookie Theft picture, with better classification accuracy occurring with the addition of language features.36,37,39, 36,37,39 In this study, the goal was to find visuospatial patterns based on word alignment analyses that could be used clinically to differentiate the groups. All groups were quickest to mention a core lexicon content word in the Top Left. From there, the Control and MCI groups were quite similar in their time to each quadrant, moving mostly down to the Bottom Left, then Top Right and Bottom Right, which makes sense given the actions and layout of the picture. For example, the boy’s face, arms, and action are all in the Top Left, with his lower body in the Bottom Left. The same is true of the mother in the Upper and Lower Right. The only real connection between the left and right sides of the picture is that of the children’s devious behavior (left) being missed by the otherwise distracted mother (right). The ProbableAD group was significantly slower than the other two groups in the time it took them to get to each of those other three quadrants. The slower rate for the ProbableAD group on the specific, individual words as well as any word in a quadrant is consistent with many reports about the importance of temporal features of connected speech (e.g., hesitations and pauses) for dementia detection and diagnosis.22,31,41,58,59, 22,31,41,58,59
The core lexicon analysis method is a straightforward, accessible tool for clinicians and researchers to use with this population for both early assessment and measurement of change in lexical skills over time. Results of these analyses provide benchmarks for performance on a picture description task commonly used to elicit discourse in these populations and shed light on the lexical deficits encountered in MCI and AD. The need for large, shared datasets cannot be overstated. Fraser et al. made a strong case for data sharing, and plenty of review articles have summarized the small sample sizes in studies and the need for international databases.1,6,11,61,62, 1,6,11,61,62 A next direction for research should include using these kinds of results in a novel, large dataset to assess their accuracy in using changes in core lexicon as a diagnostic biomarker to predict group classification.
Limitations of the present study should be taken into consideration in interpreting the results and helping to guide future work in this area. For instance, the size of the study sample, which for the MCI group specifically was only 48, is relatively small. Furthermore, the MCI group comprised groups from two datasets that were collected almost 40 years apart. While state-of-the-art diagnostic guidelines were used for each corpus, and the guidelines did not change in any meaningful way during that time period, it is worth raising as a caution.
AUTHOR CONTRIBUTIONS
Davida Fromm (Conceptualization; Data curation; Investigation; Methodology; Project administration; Resources; Supervision; Validation; Writing – original draft; Writing – review & editing); Sarah Grace Dalton (Methodology; Writing – original draft; Writing – review & editing); Alexander Brick (Investigation; Methodology; Validation; Visualization); Gbenuola Olaiya (Formal analysis; Methodology; Validation; Visualization); Sophia Hill (Formal analysis; Methodology); Joel Greenhouse, Ph.D. (Formal analysis; Investigation; Methodology; Supervision; Validation; Visualization; Writing – original draft; Writing – review & editing); Brian MacWhinney (Funding acquisition; Resources; Software; Supervision; Writing – original draft; Writing – review & editing).
Footnotes
ACKNOWLEDGMENTS
The authors wish to thank the participants whose data were used in this study and the researchers who collected and contributed the data to the shared DementiaBank database.
FUNDING
This work was supported in part by an NIA DementiaBank supplement to the NIDCD AphasiaBank grant DC008524. Original acquisition of the Pitt corpus was supported by NIA grants AG005133 and AG003705 to the University of Pittsburgh.
CONFLICT OF INTEREST
The authors have no conflict of interest to report.
DATA AVAILABILITY
The data reported here are available to members of the DementiaBank consortium – https://dementia.talkbank.org/. Established researchers and clinicians working with dementia who are interested in joining the consortium should read the Ground Rules (
) and then send email to
