Abstract
Background
Memory loss is a core feature of typical Alzheimer's disease (AD) and amnestic mild cognitive impairment (aMCI). Standard memory tests such as word lists assess verbal episodic memory with delayed recall and recognition. However, actual memory fidelity is likely variable, continuous, and has a subjective component.
Objective
We investigated dual-processing models of episodic memory (recollection versus familiarity) using confidence ratings in a “judgment of knowing” paradigm (JOK).
Methods
This paradigm was applied to the Consortium to Establish a Registry for Alzheimer's Disease (CERAD) memory test as part of neuropsychological evaluation at University of Pittsburgh Alzheimer's Disease Research Center (ADRC), to generate novel indices of memory function to improve sensitivity to early memory problems and provide a memory awareness metric. On recognition testing, participants rated how sure they were of their yes/no responses to each item. We derived novel variables related to memory and metacognition, including an Accuracy-Certainty Index and the Relative Certainty Index.
Results
In this sample of 347 participants (185 with AD, 55 with MCI, 111 cognitively unimpaired), CERAD Delayed Recall was the best single variable for discriminating groups, although multiple certainty variables also discriminated groups well.
Conclusions
The addition of certainty indices to a standard verbal memory task increased discriminative power between groups, particularly between cognitively normal controls and MCI or AD.
Introduction
Many neurologic conditions have some degree of impaired awareness of neurologic deficits, which has been termed lack of insight, unawareness of deficit, or anosognosia. Anosognosia is seen frequently in patients with dementia, which has a significant impact on both the person, their family, and their care. 1 Up to 80% of patients with Alzheimer's disease (AD) dementia have some degree of anosognosia 2 and numerous studies have found this deficit to be even more severe in behavioral variant frontotemporal dementia (bvFTD).3–5
Anosognosia has also been described in patients with mild cognitive impairment (MCI), 6 although results are mixed, with some studies failing to find an association. 7 The mechanism of anosognosia in MCI, like other conditions, can be varied. Some studies have found an association with learning and memory 8 while others have shown associations with behavioral symptoms 9 and deficits in self-appraisal. 10 Mechanistically, it is both possible and likely that lack of awareness for a deficit could arise from either memory-related or memory-independent processes. Every major cognitive framework2,11,12 includes both memory and executive processes, but beyond that, theories may disagree. Accordingly, when researchers 13 performed a systematic review of structural and functional investigations of the basis of anosognosia they found a predictable degree of heterogeneity. Specifically, some studies have found associations with frontal regions, others with temporo-parietal, and others found areas within both regions that were associated with anosognosia in patients with dementia.14,15
Despite being important for practical purposes, anosognosia measures are less commonly included in standard neuropsychological batteries. Anosognosia is associated with higher levels of caregiver burden,16–18 poor medication adherence, 19 impaired financial abilities20,21 and scam susceptibility, 22 and worse ability to predict one's own competency at a driving simulation task. 23
Anosognosia has historically been studied as a distinct concept from metacognition, although the two are clearly related and should likely be conceptualized as interconnected. 24 Metacognition, the process by which we monitor and update our self-representation of various cognitive abilities, is an important and well-studied concept within cognitive psychology. 25 Similar paradigms have been used to study the self-awareness of neurologic deficits in a number of neurologic disorders.11,12,26 Due to the complex nature of self-awareness, it is likely that different assessment methodologies tap into different metacognitive processes. Some of these methodologies include feeling of knowing 27 (FOK; participants predict their future ability to recall learned information) and retrospective confidence ratings 28 (CONF; participants rate their confidence in the accuracy of a previously made recognition decision). When FOK and CONF have been evaluated, they show activation in different regions of the brain, further supporting the idea that while both are measurements of memory monitoring, they are the result of different cognitive processes. 29 The same researchers found that retrospective metamemory judgments have been studied extensively in cognitively unimpaired study participants and have been found to rely more on strength of the memory itself, ease of retrieval, and cue-related factors. While it may seem intuitive that memory-dependent processes would be more discriminative in disorders that primarily involve memory failure, the literature on task-specific metamemory judgements in AD is mixed. Some studies have demonstrated that patients with AD have impaired global metacognitive abilities but intact task-specific metacognitive abilities when compared to age-matched controls,30,31 but others have demonstrated clear deficits in task-related metamemory judgments.32,33 Given the compelling clinical and functional implications of poor metacognition, and this mixed literature on task-specific metacognition, clarification of these mixed results is needed.
Many studies of metamemory in both normal and impaired individuals use long or difficult tasks that are designed to differentiate between related cognitive components of a complex task such as metamemory judgment. However, to introduce the concept of metamemory assessment into routine clinical practice, it would be best if the task was familiar to clinicians, required minimal addition of time to the clinical encounter, and was useful in detecting neurodegeneration. Thus, we sought to create a simple addition to a standard verbal learning task (the CERAD word memory list) 34 that could increase discriminative power among cognitively unimpaired controls, those with MCI, and those with AD dementia. In our modified CERAD recognition task, we asked participants to indicate whether they had encountered the word previously and how certain they were of the accuracy of their answer. We use the term ‘certainty’ in place of confidence to highlight the pragmatic clinical re-purposing of the retrospective confidence rating strategy frequently employed in metamemory tasks. This approach brings metamemory assessment into the clinical encounter and helps to increase the discriminative power of standard memory tasks. Thus, our aim was to create a standard verbal learning task that could increase discriminative power among cognitively unimpaired controls, those with MCI, and those with AD dementia.
Methods
Participants/collection
Study design for data acquisition
All data collection was done in compliance with guidelines on human participants research and approved by University of North Carolina and University of Pittsburgh institutional review boards, and a data use agreement was approved. The certainty-weighted recognition paradigm was implemented on the CERAD word-list memory test as part of the standard neuropsychological evaluation at University of Pittsburgh Alzheimer's Disease Research Center (ADRC). On delayed recognition testing, participants rated how sure they were of each of their yes/no responses to the 20-item recognition list (10 targets, 10 foils) on a 4-point scale: Very sure (4), More than 50% sure (3), Less than 50% sure (2) and Not sure at all (1). Four points were used rather than a dichotomous “certain versus uncertain” decision in order to extend the scale for weighting answers. These data were used to derive several novel variables related to recognition memory and metacognition. These included the Accuracy-Certainty Index (ACI), (TCC), Total Incorrect Certainty (TIC), Mean Correct Certainty (MCC), and Mean Incorrect Certainty (MIC). Additionally, a putative metacognitive metric for AD dementia, the Relative Certainty Index (RCI), was generated by dividing the MIC score by the MCC score. Terms and definitions are provided in Table 1. Gathering certainty ratings from participants added an average of 4.15 min to administration time for the CERAD, and an average of 3.24 min scoring time per administration.
List of certainty-weighted memory variables with their calculations.
The test was administered to consecutive participants between February 2002 and April 2004: 111 normal controls (NC), 51 MCI, 185 patients with probable AD dementia per the original McKhann criteria. 35 These criteria were applied for each participant at the time of the study. Of note, five participants in the NC group had Clinical Dementia Ratings of 0.5 but were still considered unimpaired and kept in the NC group.
As part of the assessment battery, clinicians working with the participants completed the Hamilton Depression Scale 36 after a semi-structured clinical interview with the participant. This scale shows strong reliability and validity for use with older adults. 37 Study partners (e.g., caregivers, close family members) also completed the Blessed Rating Scale, 38 a scale of 22-items which can be divided into three sections: A) changes in performance of everyday activities; B) changes in habits including self-care; and C) changes in personality, interest, and drives. This scale also shows strong reliability and validity for use with older adults. 38 Unfortunately, we do not have records as to the informant's relationship to the study participants in each case.
Statistical methods
Data were analyzed using SPSS version 23. Assumptions of all tests were evaluated and considered met (for example, on our main ANCOVA we evaluated homogeneity of slopes, normality of residuals, variable co-linearity, homogeneity of variable, and normal distribution of variables, along with assessment of outliers). We assessed differences in demographic and other factors among diagnostic groups of AD, MCI, and NC using one-way between-subjects analysis of variance (ANOVA) for continuous variables and Chi-square tests for categorical variables. Pearson correlations were used to evaluate relationships between these variables and outcome variables.
Analysis of covariance (ANCOVA) compared outcome measures between groups (MCI, AD dementia, NC), adjusting for age and education level. Effect sizes were evaluated using partial ή2. For significant results, pairwise group comparisons were made with Tukey's Test.
To compare accuracy alone (Total Correct Recognition) directly against accuracy combined with certainty (ACI), logistic regressions and ROC curves examined how the two predictors predicted group members pairwise (NC versus MCI, NC versus AD, MCI versus AD).
Pearson correlations examined potential relationships between Hamilton Depression Scale (Ham-D) scores with recognition certainty outcomes, to explore any potential relationship and establish a type of discriminant validity, verifying that certainty recognition is something separate from confidence related to depression. Correlations also assessed for relationships between Blessed Dementia Rating Scale subscale scores and certainty outcomes, to establish a convergent and ecological validity for certainty outcomes.
Results
Preliminary analyses
The clinical groups (AD dementia, MCI, and NC) differed on multiple demographic and other variables (Table 2). Pearson correlations determined that of those demographic variables, only age and education showed significant (p < 0.05) relationships with CERAD Word List variables (which included nearly every variable, including learning trial scores, accuracy scores for recognition, and certainty ratings).
Demographics and other characteristics in sample by cognitive diagnosis.
Min- Mental State Exam (Folstein, 1975).
ANCOVAs
Results of ANCOVAs comparing various CERAD Word List outcomes between clinical groups are provided in Table 3.
CERAD word list learning task performance by group.
ANCOVAs are age/education adjusted.
As expected, the clinical groups differed in performance in an order which replicates well-known hierarchies of NC > MCI > AD for most outcomes. This was true for learning trial scores, delayed spontaneous recall, and various aspects of delayed recognition certainty. Certainty showed a similar pattern (the NC group was most certain of correct answers and the AD dementia group least certain), as did variables relying on a combination of accuracy and certainty (such as TCC, TIC, ACI). The RCI, designed to indicate potential anosognosia (as it is a ratio between certain on incorrect versus correct items and thus may represent lack of awareness of correctness), was higher in AD dementia than the other two groups. Largest effect sizes (magnitude of group differences) were found for later learning trial scores, delayed free recall, Total Correct Recognition, TCC, and ACI. Notably, the effect size was larger for ACI than for Total Correct Recognition, suggesting addition of certainty ratings added to group discrimination over accuracy alone.
Group classification
A logistic regression predicting group membership between NC (0) and MCI (1) using recognition memory accuracy (Total Correct Recognition) versus accuracy combined with certainty (ACI) was significant, χ2(2) = 59.798, p < 0.001, Nagelkerke R2 = 0.453. The ACI was a much stronger predictor (β = -0.238, SE = 0.079, p = 0.003, Exp[B] = 1.268) than Total Correct Recognition (β = 1.020, SE = 0.622, p > 0.05, Exp[B] = 0.361). An ROC curve (Figure 1) showed stronger overall diagnostic classification accuracy for ACI (AUC = 0.823) than Total Correct Recognition (AUC = 0.798).

ROC curve discriminating NC and MCI.
A logistic regression predicting group membership between NC (0) and AD (1) using recognition memory accuracy (Total Correct Recognition) versus accuracy combined with certainty (ACI) was significant, χ2(2) = 279.722, p < 0.001, Nagelkerke R2 = 0.861. The ACI was a much stronger predictor (β = -0.413, SE = 0.097, p < 0.001, Exp[B] = 1.511) than Total Correct Recognition (β = 1.886, SE = 0.694, p = 0.007, Exp[B] = 0.115). An ROC curve (Figure 2) showed slightly stronger overall classification accuracy for ACI (AUC = 0.981) than Total Correct Recognition (AUC = 0.968).

ROC curve discriminating NC and AD.
A logistic regression predicting group membership between MCI (0) and AD (1) using recognition memory accuracy (Total Correct Recognition) versus accuracy combined with certainty (ACI) was significant, χ2(2) = 70.226, p < 0.001, Nagelkerke R2 = 0.397. The ACI was a stronger predictor (β = -0.077, SE = 0.030, p = 0.011, Exp[B] = 1.080) than Total Correct Recognition (β = 0.088, SE = 0.233, p > 0.05, Exp[B] = 0.916). An ROC curve (Figure 3) showed slightly stronger overall classification accuracy for ACI (AUC = 0.852) than Total Correct Recognition (AUC = 0.834).

ROC curve discriminating MCI and AD.
Additional analyses
We found no significant correlations (p > 0.05) between Ham-D scores and any memory outcomes in our sample but found significant (p < 0.05) negative relationships between all Blessed scales and all CERAD Word List variables. When these correlations were repeated but conducted separately within clinical groups, correlations between Ham-D and outcomes were again non-significant (p < 0.05), and relationships between Blessed subscales and CERAD Word List outcomes were only significant (p < 0.05) for the following: In NC: none, in MCI: Blessed A with Total Certainty, TCC, and MCC, and in AD: Blessed A with each learning trial, Delayed Recall, Total Correct, Correct Targets, Correct Target Certainty, TCC, TIC, and ACI, and Blessed B with each learning trial. Please see the correlation matrices in Table 4.
Correlations (Pearson r values) between Blessed and Hamilton D scores and memory variables, by diagnosis group.
*p < 0.05; **p < 0.01.
Discussion
Typical memory assessment involves use of free recall and yes/no recognition testing, with each item being considered only correct or incorrect. However, memory is a more complex process, and true memory recognition likely involves degrees of recognition, or memory “strength.” The current study presents a methodology for assessing memory certainty, combining it with accuracy scoring, and utilizing these certainty-weighted recognition scores for better diagnostic classification.
Our results show that nearly all aspects of the CERAD Word List memory test differed between NC, MCI, and AD dementia groups, including typical accuracy scores along with certainty rating variables. This is in line with prior literature that has demonstrated the sensitivity of the CERAD word list to distinguishing MCI and AD dementia. 39 The greatest effect sizes were found for later learning trial scores, delayed free recall, Total Correct Recognition, TCC, and ACI. Notably, the effect size was larger for ACI than for Total Correct Recognition, suggesting addition of certainty ratings adds to group discrimination over accuracy alone. This was supported by logistic regressions and ROC curves showing ACI having stronger ability to discriminate groups than Total Correct Recognition. This difference was most pronounced discriminating NC with MCI and AD groups, rather than discriminating MCI from AD.
Our findings were consistent with previous research showing delayed free recall is the most discriminating measure for detecting memory impairment. 40 Recognition memory testing adds to simple free recall by accounting for relative storage versus retrieval deficits. 41 The dual-process model of recognition memory suggests that two distinct, independent processes contribute to recognition: recollection and familiarity. Recollection is closely related to recall and requires similar neuroanatomical substrates, particularly the hippocampus and prefrontal cortex. Likewise, familiarity is more closely related to implicit memory and relies more on regions surrounding the hippocampus. 41
Our results show that certainty ratings provide higher resolution to recognition findings. Thus, especially in only subtly impaired patients, there are some cases where delayed recall and recognition memory alone may not be optimal. For a patient/client on the borderline of impairment, a low degree of certainty in their responses could tip the scales and suggest relatively weak memory recognition despite a reasonable accuracy score. It is important to note that addition of certainty ratings only added limited value to discriminating MCI from AD. While it appears addition of certainty ratings may be clinically useful in “dementia” differential overall, it may be less vital discriminating more impaired patients. We attribute this to the already-shown effectiveness of traditional memory testing in identifying Alzheimer's dementia. That is not to say that adding certainty ratings to assessments with these populations in unjustified, just that it will add less than it does in the discrimination between NC and other groups. Another justification for still including certainty ratings in more impaired groups aside from any addition to group classification could be the addition of different information (e.g., if further research validates these as a measure of insight, or if they correlate between with real-world functioning) to drive clinical care recommendations.
Correct certainty may be particularly important in patients with milder cognitive impairment, who may have fewer incorrect responses. However, certainty with wrong answers may play a different but critical role. More research is needed to clarify this, but it is possible that in addition to just weakened strength of a memory (and thus decreasing certainty in general), a patient could be worsening if they strongly believe they know the answer while they do not. Thus, they may have unawareness of their memory problem (anosognosia) versus confabulation due to impairment in temporal consciousness. 42
In addition to assisting diagnosis of MCI, certainty combined with accuracy (ACI) was related to “real-world” functioning as well. In MCI participants, informant report of performance in everyday activities (Blessed Rating Scale A) was related only to certainty measures, and not to standard memory measures (learning, delayed recall, normal recognition). Certainty ratings were not related to performance of healthy controls, but several (certainty and/or accuracy) measures were related to everyday performance for those with AD. This is in line with current literature that suggests that informant report of current functioning has standalone utility for predicting cognitive impairment. 43 Again, this suggests a specific relationship between certainty performance and everyday performance in the group that these measures are most effective at categorizing people with MCI.
Our methodology discriminated between groups and separately was related to functional impairment ratings as well, but the reason why is still not completely clear. One explanation could be that weakening of memory traces are not necessarily detected with simple yes/no responses. It seems less likely that those with impairment have noticed their decline and have thus lost confidence in their memory in general, resulting in decreased certainty. Since certainty ratings also related to informant report of everyday functioning, the “loss of confidence” affecting certainty performance would have to extend well beyond testing circumstances and result in everyday errors, which would be observable by others. That theory is still possibly true, although in this study, certainty ratings were not at all correlated with Ham-d depression scores, and were often not related to informant reports of changes in personality, drives, and interests. Thus, if certainty ratings were only affected by a loss of confidence, it would have to be a very specific type of confidence but also severe enough to impact daily functioning, while not resulting in significant depression. Together, these finding suggest that certainty ratings more likely reflect a true memory process rather than personality, behavioral, or emotional factors, although this also needs more research.
This study has some notable strengths. The sample size is large for each diagnostic group. The CERAD list is similar to many other commonly used list-based verbal episodic memory tests (e.g., Rey Auditory Verbal Learning Test, California Verbal Learning Test, Hopkins Verbal Learning Test, word list from the Repeatable Battery for the Assessment of Neuropsychological Status) in its use of learning trials, delayed recall, and yes/no recognition. It is reasonable to expect that adding certainty ratings to recognition testing for those measures would add diagnostic power as it does with the CERAD. Future studies could also assess use of certainty for memory measures other than word lists that involve recognition testing (e.g., stories, visual designs), and how these compared to word list certainty recognition. We will also note that while they may seem complicated at first, calculation of the certainty-weighted variables is actually somewhat simple and can easily be done by clinicians or researchers manually if needed.
There are also some limitations with this study, including the sample characteristics and external validity. The ADRCs are known to be highly selected samples which historically had been often more educated, less ethnoracially diverse, with greater socioeconomic status than target populations of community-dwelling older adults. Targeted efforts have been made to improve this, 44 although at the time these current data were collected, the difference was likely greater than now. Thus, while the ethnoracial characteristics of this sample from the Pitt ADRC were on par with the local community population, these results may not be applicable to communities with greater ethnoracial diversity or community-dwelling older adults with less education.
The lack of biological classification of etiologic diagnosis through cerebrospinal fluid analysis or postmortem neuropathological confirmation is a further limitation that could be rectified in future studies of this type of item-by-item certainty on memory tasks. This would allow a more granular distinction between those meeting clinical criteria for an amnestic dementia syndrome and amnestic MCI with and without biological evidence of Alzheimer's disease neuropathologic change. We also must mention the data were collected between 2002–2004, with McKhann criteria used at the time for diagnostic classification, and that these criteria could be seen as out-of-date today. Future studies should validate certainty ratings with more updated classification criteria.
Future studies may also consider evaluating latency or reaction time for recognition correct and incorrect responses, as they may provide additional information into awareness.
Footnotes
Acknowledgements
The authors have no acknowledgments to report.
Ethical considerations
All ethical guidelines for research were followed. This study was approved by the University of North Carolina Institutional Review Board, study # 13-0429.
Consent to participate
Written consent to participate in ADRC testing was obtained as per ADRC procedures. IRB waived additional consent for this study as data were considered archival.
Consent for publication
Not applicable
Author contribution(s)
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability statement
Data not made available.
