Abstract
Fluctuating cognition (FC) is a core feature of dementia with Lewy bodies (DLB) but is challenging to assess. This study assessed the reliability and validity of the Clinician Assessment of Fluctuation (CAF), which assesses FC in patients with dementia. Interrater agreement of CAF outcomes (FC present and FC severe) was evaluated between physicians and nonphysicians in 141 patients with Alzheimer’s disease (AD) or DLB. Frequency of CAF outcomes by clinical and neuropathological diagnosis was examined. We found that interrater reliability was fair on FC present and almost perfect on FC severe, and both outcomes were higher in patients with clinical DLB than with clinical AD and were qualitatively more often endorsed in cases with neuropathological evidence of Lewy bodies. We conclude that the CAF is a reliable measure of FC and can be valuable in differential dementia diagnosis.
Background
Dementia with Lewy bodies (DLB) has been estimated to represent as high as 30.5% of cases with dementia, making it potentially the second most common cause of dementia. 1 One of the three core diagnostic features of DLB is the presence of fluctuating cognition (FC). 2 Fluctuating cognition is characterized by alternating periods of cognitive impairment and “normal or near-normal performance” and “pronounced variations in attention and alertness.” 3 Although FC can be present during the course of other neurodegenerative illnesses such as Alzheimer’s disease (AD) 4 and Parkinson’s disease dementia, 5 FC occurs with greater frequency and severity in DLB. 4 Thus, the presence and severity of FC is an important factor in the diagnosis of DLB. 2,4,6
The clinical diagnosis of DLB has historically had low sensitivity, with poor interrater reliability. 7 -9 A reliable method of identifying diagnostic criteria would have important clinical implications. The identification and evaluation of FC has been challenging and has relied primarily on subjective clinical judgment. In fact, FC has been identified as the most problematic of the DLB diagnostic criteria with regard to interrater agreement. 4,7 Additionally, FC may be related to other comorbid medical or psychiatric conditions, such as psychosis or delirium. As such, systematically assessing FC is essential for the provision of accurate diagnoses and effective clinical care. Indeed, including a formal measure of FC is recommended in the current diagnostic guidelines of the DLB Consortium. 2 Within the guidelines, the following measures were suggested: the Clinician Assessment of Fluctuation (CAF), 10 One Day Fluctuation Assessment Scale, 10 and The Mayo Fluctuations Composite Scale. 11 The CAF rating scale was designed to provide experienced clinicians with a structured method for assessing FC associated with DLB, 4 and it uniquely queries both frequency and duration of endorsed symptoms. Measuring variations in cortical activity using electroencephalography (EEG) captures important objective data that can help determine the presence of FC in dementia 12,13 ; however, interview-based clinical measures such as the CAF can be a more cost-effective and efficient method for assessing this symptom.
Methods developed to assess delirium are also a consideration, as there is symptom overlap between delirium and DLB. For example, the Confusion Assessment Method Instrument (CAM) 14 provides an important means to assess the presence of delirium in older adults. Despite similarities between delirium and FC, it is nonetheless important to recognize that these syndromes are distinct, with different underlying etiologies. The assessment of each syndrome should involve specific targeted measures. The CAM, for instance, also captures additional symptoms relevant to the presence of delirium which are not typically relevant to the assessment of FC (eg, memory impairment, perceptual disturbances, psychomotor agitation). Thus, efficient targeted tools such as the CAF can play an important role in dementia assessment, and studies have supported its validity in patient populations. Clinician ratings on the CAF have been found to be significantly correlated with fluctuations in neural activity on EEG recordings. 10 Clinician ratings on the CAF have also been found to be significantly correlated with variability in neuropsychological measures of attention and processing speed, with strongest correlations found among patients with DLB compared to AD and vascular dementia. 6,10 Finally, strong correlations were also found between clinician ratings on the CAF and another FC measure designed to be completed by a nonphysician rater. 10 Despite evidence to support the validity of the CAF, there have been no prior studies of interrater reliability nor have there been any investigations of the utility of the CAF when used by nonphysician raters. Additionally, it has been proposed that determining the scale’s relevance to established diagnostic criteria would also be valuable. 6
The current study contributes to our ability to assess FC by examining the interrater reliability of the CAF and the relationship between CAF scores and subjectively assessed FC as part of applying the diagnostic criteria for DLB. We also explored the potential differential use of the CAF by physicians and nonphysician research assistants. Finally, we qualitatively examined CAF scores among patients with neuropathological evidence of AD, Lewy body disease, or both.
Methods
Participants
One hundred forty-one participants were identified from the Multicenter Study of Predictors of Disease Course in Alzheimer’s Disease. 15 This multicenter longitudinal study was designed to collect clinical, neuropsychiatric, functional, and neuropsychological indices to characterize probable Alzheimer’s disease (pAD) and predict outcomes. Specific methodology of the Predictors Study has been described previously. 15,16 In brief, patients were drawn from the second cohort of the Predictors Study. Patients with AD having mild dementia (as defined by a modified Mini-Mental Score Examination [MMSE] 17 above 29) were recruited from 3 centers: Columbia University, Johns Hopkins University School of Medicine, and Massachusetts General Hospital. Patients with a history of certain psychotic or substance abuse disorders were excluded as were patients with a history of electroconvulsive treatment or any evidence of prior stroke. 15 In the second Predictors Study cohort, patients with DLB were recruited and followed using the same procedures as patients with pAD, except that an initial modified MMSE of 20 and above was allowed. For the purposes of this study, we selected only those participants who received both the physician and the nonphysician evaluation at the same visit interval and examined data collected from the first visit at which both took place. In approximately 80% of cases, the visit selected for analysis was the baseline visit or a follow-up visit within 1 year of baseline. The majority of the remaining visits (approximately 16%) took place within 3 years of baseline, and approximately 4% of the visits selected took place longer than 4 years from baseline. To our knowledge, there were no factors relevant to the proposed analyses that meaningfully and systematically varied by which visit both evaluations took place; rather, the reason for which a baseline visit was not sufficient appeared to be errors of omission or administration.
Patient demographics are summarized in Table 1. All patients had received a diagnosis of either pAD in accordance with 1984 National Institute of Neurological and Communicative Diseases and Stroke–Alzheimer’s Disease and Related Disorders Association Workgroup criteria
19
or DLB in accordance with 1996 Consortium on DLB consensus diagnostic criteria.
3
Among patients with DLB, in 3 cases, FC and only one other diagnostic criterion met to achieve a diagnosis of DLB. Of the 141 patients, 98 died during the course of this study and 35 were brought to autopsy, and we examined these groups for possible demographic differences. We found no gender or race differences between those who came to autopsy and those who did not. Those who came to autopsy were found to have more years of education (
, standard deviation [SD] = 3.12) than those who did not (, SD = 3.28;
Neuropathological criteria included the National Institute on Aging (NIA)-Reagan Institute criteria for AD and required intermediate likelihood 20 (1 case included as AD had only “senile change, Alzheimer’s type” and not intermediate likelihood), and the by McKeith et al criteria were used to identify diffuse Lewy body disease, 3 requiring at least limbic Lewy bodies. Neuropathological findings of 3 cases did not suggest AD or diffuse Lewy body disease and were excluded from these analyses (Neuropathological diagnoses of these 3 cases included Parkinson’s disease, stroke, progressive supranuclear palsy, corticobasal degeneration, atherosclerosis, and amyloid angiopathy.). The remaining subset of 32 participants were grouped by evidence of the following neuropathological findings: (1) pathological evidence of AD with no evidence of diffuse Lewy body disease (NAD), (2) pathological evidence of diffuse Lewy body disease with no evidence of AD (NLB), and (3) pathological evidence of both AD and diffuse Lewy body disease (NAD + NLB). Among these 32 participants, there was additional neuropathological evidence of stroke (n = 8), multisystem atrophy (n = 4), amyloid angiopathy (n = 12), and/or atherosclerosis (n = 23).
Measures
In addition to the assessments described in previous reports, 16 all patients were evaluated with the CAF 4 and a DLB diagnostic criteria evaluation form adapted from the 1996 Consortium on DLB consensus diagnostic criteria. 3 Participants were asked to rate whether symptoms were present over the past few months prior to assessment. The CAF is a 2-item questionnaire that was developed to assess fluctuations in attention and alertness, a core symptom of DLB. The first item captures the presence of fluctuating level of consciousness and the second item captures the presence of fluctuating cognitive impairment. If either of the scale’s 2 items is endorsed, frequency is assessed on a scale ranging from 1 to 4, with 4 being the most frequent, and duration of the symptom is assessed on scale ranging from 0 to 4, with 4 being the longest duration. These 2 values (frequency and duration) are multiplied to achieve a severity score, ranging from 0 to 16.
The DLB diagnostic criteria form is a 9-item questionnaire designed to document the symptoms of DLB. The first 3 items of the form are core criteria (ie, FC, visual hallucinations, and Parkinsonism), followed by 6 items related to supportive features. This study examined responses to the FC item. This consisted of 3 examples of FC (ie, episodes of going blank or switching off, periods of apparent spontaneous remission during which cognitive functions improve, and excessive daytime drowsiness with transient confusion on awakening) and 2 exclusionary conditions (ie, fluctuations typically occur in the late/afternoon/early evening and fluctuations are associated with a change in medication). If any 1 of the 3 examples was endorsed and both exclusionary conditions were denied, the participant met criteria for the FC item of the DLB diagnostic criteria form. See Table 2 for the specific CAF items and the FC diagnostic criteria item.
Assessment Items.
Abbreviation: CAF, Clinician Assessment of Fluctuation; FC, fluctuating cognition; DLB, dementia with Lewy bodies.
Procedure
This study was approved by the institutional review boards (IRBs) at the participating institutions. All participants underwent an IRB-approved informed consent process prior to enrollment in the study. Participants were administered the CAF and DLB diagnostic criteria form by 2 types of independent raters: one of the study research assistants (RA) and one of the study physicians (MD). Evaluation of FC was completed prior to assessment of other features of DLB. Both raters’ assessments were typically completed within a 1-month window, and 86% were completed on the same day.
Data Analysis
Two dichotomous outcome variables were derived from the CAF: (1)
For each rater (RA and MD), we evaluated the relationship of pAD versus DLB clinical diagnosis to each CAF variable using χ2 or Fisher’s exact test as appropriate. There were 5 RA assessments and 58 MD assessments in which the CAF severity score was not calculated due to missing data.
We also examined intrarater agreement between both CAF variables and the FC item on the DLB diagnostic criteria form using Cohen’s κ. Six cases were excluded due to missing item data.
We report the frequency of the following for each of the 3 neuropathology groups: the clinical diagnosis and the 2 CAF variables for each rater. All analysis were performed at the
Results
Clinician Assessment of Fluctuation
Interrater reliability of the FC present variable was fair, κ = .22,
CAF Interrater Agreement.
Abbreviations: CAF, Clinician Assessment of Fluctuation; FC, fluctuating cognition; FC present, fluctuating cognition is endorsed (ie, “yes” to item A and/or B); FC severe, fluctuating cognition score ≥ 5; MD, study physician; RA, research assistant.

Inter-rater agreement plot for FC Present, 95% Confidence Interval.

Inter-rater agreement plot for FC Severe, 95% Confidence Interval.
Clinical Diagnosis
When examining the relationship between CAF outcome variables and clinical diagnosis, FC present was endorsed for a greater proportion of cases with DLB compared to cases with pAD (RA: χ2 = 45.74,
CAF Variables by Clinical Diagnosis.
Abbreviations: CAF, Clinician Assessment of Fluctuation; DLB, dementia with Lewy bodies; FC, fluctuating cognition; FC present, fluctuating cognition is endorsed (ie, “yes” to item A and/or B); FC severe, fluctuating cognition score ≥ 5; MD, study physician; NA, not available; pAD, probable Alzheimer’s disease; RA, research assistant; UNKN, unknown.
Dementia With Lewy bodies Diagnostic Criteria
Moderate agreement was found between RAs’ ratings between the CAF and the DLB diagnostic criteria: FC present, κ = .47,
CAF Variables by FC Diagnostic Criterion.
Abbreviations: CAF, Clinician Assessment of Fluctuation; FC, fluctuating cognition; FC present, fluctuating cognition is endorsed (ie, “yes” to item A and/or B); FC severe, fluctuating cognition score ≥ 5; MD, study physician; NA, not available; RA, research assistant; UNKN, unknown.
Neuropathological Results
Thirty-two cases came to autopsy, and the number of years between their clinical evaluation and date of autopsy ranged from 0 to 7 ( years = 2.94; ±1.95). Of the 24 cases with a clinical diagnosis of pAD, 19 received a neuropathological diagnosis of AD, 1 received a neuropathological diagnosis of DLB, and 4 received a neuropathological diagnosis of both AD and DLB. Of the 8 cases with a clinical diagnosis of DLB, 3 received a neuropathological diagnosis of AD, 2 received a neuropathological diagnosis of Lewy body disease, and 3 received a neuropathological diagnosis of both AD and Lewy body disease. Each rater’s CAF outcome variables for each neuropathology group are presented in Table 6. In 22 cases, only AD pathology was found, in 3 cases, only diffuse Lewy body disease was found, and there was evidence of both diffuse Lewy body disease and AD pathology in 7 cases. Compared to cases with only AD pathology, a greater proportion of cases with neuropathological evidence of diffuse Lewy Body disease alone or with AD pathology had FC present by one or both raters.
CAF Variables by Neuropathological Diagnosis.
Abbreviations: CAF, Clinician Assessment of Fluctuation; FC, fluctuating cognition; FC present, fluctuating cognition is endorsed (ie, “yes” to item A and/or B); FC severe, fluctuating cognition score ≥ 5; MD, study physician; NA, not available; NAD, pathological evidence of AD and not diffuse Lewy Body disease; NAD + NLB, pathological evidence of both AD and diffuse Lewy Body disease; NLB, pathological evidence of diffuse Lewy body disease and not AD; RA, research assistant; UNKN, unknown.
Discussion
Fluctuating cognition is a core symptom in the diagnosis of DLB, and clinicians have historically struggled to identify and quantify FC. 4 The current study is the first to examine interrater reliability of the CAF in a sample of adults with dementia. We found fair interrater reliability for the FC present scores and almost perfect interrater reliability for FC severe scores. Our findings indicate that the CAF can provide an objective measure with good interrater reliability that can be utilized by either physicians or trained nonphysician providers. The severity score cutoff (>5) 4 produced very high agreement between raters and also proved to be better at differentiating patients diagnosed with DLB from patients with pAD among MD raters. Thus, the use of severity cutoff may result in a more reliable and specific measure of FC than merely endorsing the presence of FC on the CAF.
Although interrater agreement was sufficient on both CAF outcomes, we found higher agreement on FC severe compared to FC present. Examination of the data revealed that the CAF was applied somewhat differently by each rater group. The MD raters appeared less likely to endorse FC than the RAs and more commonly reported the presence of FC as “unknown,” even among cases that ultimately received a clinical diagnoses of DLB. Examination of agreement on both measures showed that RA raters had comparable levels of agreement between DLB diagnostic criteria for FC and either CAF outcome measure. In contrast, MD ratings demonstrated only slight agreement between the FC diagnostic criterion and FC present scores, but substantial agreement was found with FC severe scores. These findings may reflect that MD raters are more stringent in their confident endorsement of the presence of FC, doing so only at higher levels of symptom severity. In light of this finding, employment of the CAF in research or clinical settings may include recording the role of the assessor, so scores are considered accordingly.
Examination of neuropathological data revealed that, compared to those patients with only AD pathology, a greater proportion of patients with Lewy body pathology (either alone or in combination with Alzheimer’s pathology) were rated as FC present or FC severe. Endorsement of FC present or FC severe was comparable between rater groups in this sample. These findings further support that assessing FC with the CAF can play an important role in differential dementia diagnosis. With a larger sample, future studies could investigate the relationship between the clinical outcomes of the CAF and neuropathological evidence of DLB and AD at varying stages of disease burden.
This study has limitations. First, the clinical DLB sample was relatively small, reflecting the difficulty in identifying patients with DLB in the clinic whose cognitive severity was mild enough to meet the entry criteria. The small sample size may also reflect referral biases and prevalence of dementia type. Additionally, although we did not find any significant differences between dementia groups on cognitive screening measures, the lower cutoff for DLB cases nonetheless raises the possibility for more advanced disease progression in the DLB group, which may have had an impact on FC ratings. In order to examine this possibility, we conducted follow-up analyses and found no relationship with either rater between MMSE and CAF scores, across the sample or within diagnostic group. It is also important to note that in most cases, the CAF was completed on the same day by both raters, which may have inflated reliability scores. We also note that CAF raters were not blinded to other clinical variables, including clinical diagnosis at study entry or other clinical symptoms, and it is possible that this knowledge had an impact on the obtained scores. Finally, it is not uncommon to discover Lewy body pathology at autopsy in cases with a clinical presentation of AD, 22,23 a potential limitation of our examination of FC in cases who came to autopsy.
Overall, our results support the use of the CAF rating scale as a reliable tool for capturing FC when used by either a physician or a nonphysician evaluator. Our findings indicate that CAF can play a valuable role in differential dementia diagnosis and also support the use of the >5 cutoff score. Fluctuating cognition is a core symptom of DLB and one that has been historically difficult to capture. The CAF rating scale provides a means to assess the presence and severity of FC and can be a valuable tool in both clinical and research settings.
Footnotes
This article was accepted under the editorship of the former Editor-in-Chief, Carol F. Lippa.
Acknowledgments
The authors thank Prabha Siddarth for her help with the figures.
Authors’ Note
The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. The sponsors had no role in the study design or data interpretation.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The Predictors study is supported by NIA R01 AG007370 to Dr Stern, Dr Zahodne is supported by NIA T32 AG000261. This publication was also supported by the National Center for Advancing Translational Sciences, National Institutes of Health, through Grant Number UL1 TR000040, formerly the National Center for Research Resources, Grant Number UL1 RR024156.
