Abstract
While the impact of cognitive impairment is well recognized in patients with dementia [1], it is now increasingly recognized that cognitive impairment is a key contributor to impairment in patients with major mood disorders [2], schizophrenia [3], obsessive–compulsive disorder [4] and post-traumatic stress disorder [5]. The additive burden that cognitive dysfunction imposes on illness burden has been well described in patients with schizophrenia, for whom cognitive dysfunction is as significant as positive symptoms in determining functional outcome [6–8]. Psychiatric symptomatology can, however, limit cognitive assessments that rely on standard written and verbal assessment methods [9].
Behavioural disturbances such as agitation, disinhibition and psychomotor retardation may impede patients’ ability to sit with a clinician and concentrate on the task at hand. Active psychosis may impede engagement with the clinician through persecutory ideation or misinterpretation of assessment tasks. Patients with physical and verbal communication impairment can be disadvantaged or inaccurately rated due to difficulty fulfilling the requirements of standardized tests. Psychologically driven presentations, such as conversion or factitious disorders, can be unsuited to formal testing because of the distorting influence of illness behaviour on the validity of standard cognitive tests [10].
In these circumstances, clinicians rely on their observation of a patient's behaviour and functional capacity to infer the level of cognitive function. For patients who refuse or cannot undertake cognitive assessments, a combination of behavioural and functional observation within a clinical environment provides invaluable information regarding underlying cognitive function. Within the Neuropsychiatry Unit at Royal Melbourne Hospital we have found such observational information invaluable for the assessment of patients and sought a means to standardize and measure such information. A literature review utilizing Medline, PsychInfo, CINAHL, EMBASE and OTSeeker identified behavioural assessment and activities of daily living (ADL) tools that captured some, but not all, of the breadth of information that we had been collecting within the clinical environment.
The Behavioural Assessment Scale (BAS) [11] measures daily living skills, social skills and problem behaviour. The BAS has proven useful in monitoring decline in dementia sufferers and neuropsychiatric inpatients [12] but correlates poorly with cognitive tools such as the Mini-Mental Status Examination (MMSE) [11]. The Clifton Assessment Procedures for the Elderly (CAPE) [13] consists of a brief patient interview in addition to rating of a range of behaviours such as walking, bathing, sleep cycle and level of confusion. The sensitivity and specificity of the tool for detecting dementia were found to be significantly lower than for the MMSE [14]. The lack of correlation with cognitive assessments is not surprising given that these tools were principally developed for the measurement of behaviour, not cognition. Reisberg et al. examined the construct and criterion validity and reliability of using behaviour-rating instruments in cognitively impaired individuals and recommended that non-cognitive behavioural measures be excluded from tools of this type to avoid the confounding effects of behavioural features of psychosis, anxiety and mood disturbance on behavioural measures of cognition [15].
An alternative means of using observational information is through ADL assessments. The assessment of ADL and the relationship to cognition have been extensively studied in dementia populations. Functional performance on structured multi-step ADL is considered to be more representative of underlying cognitive abilities than simple overlearned ADLs such as bathing and toileting [16]. Standard ADL tools are not effective as stand-alone screening instruments for dementia [17] while more ‘cognitively weighted’ functional tools have been shown to correlate strongly with formal neuropsychological tests and cognitive screening tools such as the MMSE [18–20].
For patients who refuse or are unable to be directly cognitively assessed, a combination of behavioural and functional observation may provide a mechanism of assessing underlying cognitive function. Operationalizing this through the construction of a standardized instrument allows for performance to be quantified, and for this to be correlated against other measures of cognition to determine its construct validity and utility. For the neuropsychiatric population, who may have atypical presentations of illness, subtle cognitive disturbance or a number of comorbid illness processes, we have previously developed cognitive assessment [21] and informant questionnaire [22] tools. Because of the lack of a behavioural observational tool specifically designed to rate cognitive function, we sought to develop a non-intrusive behavioural assessment tool, which would not require direct engagement with patients. The Behavioural Assessment Tool for Cognition and Higher function (BATCH) records observations of patient functioning under subheadings that reflect cognitive domains, with the aim of providing an indirect but complementary method of cognitive assessment.
The primary aim of the current study was to determine whether informal ward behaviours can be grouped to represent semi-discrete domains of underlying cognitive function. We hypothesized that observations of patient functioning using BATCH cognitive domains would correlate with cognitive domains found on the Neuropsychiatry Unit Cognitive Screening Tool (NUCOG), a validated tool in use at the Neuropsychiatry Unit at Royal Melbourne Hospital [21].
Methods
Tool content
The BATCH is a clinician-rated tool composed of 60 items in 10 semi-independent observational domains: orientation, attention/concentration, personal responsibility, volition, adaptation, problem-solving/judgement, executive function, memory, language, and visuospatial function. The examiner rates the patient on 5–8 key behaviours within each domain (Table 1) after a defined observational period (in the present study, the fourth day of an inpatient admission). The frequency of each behaviour is rated on a 5-point Likert scale: 1 = never (behaviour does not occur during rating period); 2 = rarely (behaviour occurring once or twice during rating period); 3 = sometimes (behaviour occurring up to 50% of occasions during rating period); 4 = usually (behaviour occurring in up to 80% of occasions during rating period); and 5 = always (behaviour occurring in 100% of occasions during rating period). The total BATCH score is out of 300. The higher the score, the more cognitively intact the subject is judged to be on observational data.
Behaviours rated in each behavioural/functional domain on BATCH
BATCH, Behavioural Assessment Tool for Cognition and Higher Function.
Study subjects (n = 76) comprised a consecutive sample of adults admitted for >4 days to the Neuropsychiatry Unit at the Royal Melbourne Hospital. The BATCH was rated by the unit occupational therapist (KM) and a member of nursing staff on day 4 of admission. Demographic data were collected and included age, gender and years of completed education. Clinical data collected included (MMSE) [23], NUCOG [21], Health of the Nation Outcome Scale (HONOS) [24], carer-rated ADL data (Bristol Activities of Daily Living Scale; BADLS) [25], staff-rated physical dysfunction (Barthel Index) [26] and psychopathology (Neuropsychiatric Inventory; NPI) [27]. Upon discharge a consensus DSM-IV diagnosis was made by two neuropsychiatrists and one behavioural neurologist. These diagnoses were divided into three broad groups: psychiatric, neurological, and dementia diagnoses for the purposes of analysis.
Statistical analyses
Demographic variables were compared using Kruskal–Wallis test for age and education, χ2 tests for gender proportion, and one-way ANOVA for test cognitive data. Criterion validity was performed with receiver-operating characteristic (ROC) curves for comparison of BATCH performance with NUCOG and MMSE in separating demented and non-demented patients. The influence of demographic variables upon BATCH scores was calculated using Spearman's correlation coefficient. Concurrent validity was assessed using Spearman's correlation coefficient between BATCH scores, and NUCOG and MMSE scores. A multivariate linear regression analysis using total BATCH score as the dependent variable and cognitive (NUCOG), functional (BADLS), physical disability (Barthel Index) and psychiatric symptoms (NPI) as independent variables was used to determine the relative contributions of cognitive function, psychopathology and physical states to the BATCH score. Construct validity of the multi-dimensional structure of the BATCH was assessed using Spearman's correlation coefficient between BATCH symptom subdomains against corresponding NUCOG subdomains. Internal consistency was measured with Cronbach's α. Separation of diagnostic groups was assessed using one-way analysis of variance (ANOVA) for between-group comparisons of individual measures and repeated-measures analysis of covariance (ANCOVA), using Huynh–Feldt degrees of freedom reduction to account for non-independence of cognitive symptom subdomains and covarying for age, for a subdomain×group effect. Statistical analyses were undertaken with SPSS 15.0 (SPSS Inc., Chicago, IL, USA) and MedCalc 8.2 (Medcalc Software, Mariakerke, Belgium) software for Windows.
Results
Demographic measures
Seventy-six patients were included in the study, of which 51 were male and 25 female. Gender distribution did not differ between diagnostic categories (Table 2). Mean age was 50.74±14.80 years, and this differed significantly across groups (p < 0.005). The dementia group was approximately 10 years older than the neurological and psychiatric groups, who did not differ significantly from each other. Educational status did not differ significantly across groups (p = 0.362). There was no correlation between age (p = 0.200) or education (p = 0.487) and total BATCH scores. The mean time taken for completion of the BATCH was 15 min.
Comparison of diagnostic groups on demographic, cognitive, symptom and functional variables
BADLS, Bristol Activities of Daily Living Scale; BATCH, Behavioural Assessment Tool for Cognition and Higher Function; HoNOS, Health of the Nation Outcome Scale; MMSE, Mini-Mental Status Examination; NPI, Neuropsychiatric Inventory; NUCOG, Neuropsychiatric Unit Cognitive Screening Tool.
∗p < 0.05; ∗∗p < 0.005; †trend to significance.
Clinical measures
Mean MMSE and NUCOG scores were significantly lower in the dementia group than the other two groups (p < 0.005). There were no significant group differences between groups on NPI, HoNOS, BADLS or Barthel Index scores.
BATCH scores and diagnostic groups
The mean total BATCH score for the sample was 235.84±47.53. The dementia group mean score at 220.75±52.73 was lower, but not significantly (p = 0.098), than the neurological (248.90±48.88) and psychiatric (243.53±41.11) groups. Repeated-measures ANCOVA showed a significant BATCH domain×diagnostic group effect (df = 13.765, F = 3.67, p < 0.001), with the dementia group showing significant reductions on attention/concentration, orientation, executive function, memory and language compared to the other two patient groups (Figure 1). While the MMSE and NUCOG showed a significant diagnostic group separation that the BATCH did not achieve, a repeated-measures ANOVA showed no diagnostic group×cognitive tool interaction (df = 2.275, F = 0.861, p = 0.440).
Plot of Behavioural Assessment Tool for Cognition and Higher Function (BATCH) cognitive profile across subscales for each diagnostic group. Significant reductions in the dementia group compared to psychiatric and neurological groups were seen in subscales 1 and 2 (attention and orientation), 5–7 (executive functions), 8 (memory) and 9 (language). Receiver-operating characteristic curves for Behavioural Assessment Tool for Cognition and Higher Function (BATCH), Neuropsychiatric Unit Cognitive Screening Tool (NUCOG) and Mini-Mental Status Examination (MMSE), for which the areas under the curve (AUCs) did not differ significantly. Diagonal reference line represents an AUC of 0.50.

ROC analysis
The ROC analysis of the performance of the BATCH in classifying patients into dementia and non-dementia groups produced an area under the curve (AUC) of 0.638, with a cut-off score of >256 at which sensitivity is 55.3 and specificity 76.9. For the MMSE, AUC was 0.687 and for the NUCOG, 0.728. Although the AUC for the BATCH was lower, pair-wise comparisons between the tools showed its performance did not significantly differ from the NUCOG (p = 0.114) or the MMSE (p = 0.437) in classifying demented from non-demented patients.
Construct and concurrent validity
Total BATCH scores correlated highly (Figure 3) with NUCOG scores (Spearman's ρ = 0.530, p < 0.0001) and MMSE scores (ρ = 0.520, p < 0.0001). BATCH domain totals were correlated against the five subdomains of the NUCOG (Attention, Visuoconstructional, Memory, Executive, and Language). Each BATCH subscale correlated strongly with the matched NUCOG subscale, most at the p < 0.001 level (Table 3
Scatterplot of Neuropsychiatric Unit Cognitive Screening Tool (NUCOG) vs Behavioural Assessment Tool for Cognition and Higher Function (BATCH) total scores, for the total group and low- and high-symptom groups, split according to median Neuropsychiatric Inventory score for the sample.
Non-parametric correlations between BATCH and matched NUCOG subscale scores
BATCH, Behavioural Assessment Tool for Cognition and Higher Function; NUCOG, Neuropsychiatric Unit Cognitive Screening Tool.
To determine if BATCH scores correlated highly with measures of cognition in both high- and low-symptom patients, the total sample was split into two groups based on median NPI total score. The BATCH correlated strongly with the NUCOG in both the low-symptom (ρ = 0.495, p < 0.005) and high-symptom groups (ρ = 0,595, p = 0.001), and these correlations did not significantly differ (p = 0.567, Figure 3).
Internal consistency
All 60 items of the BATCH were entered into a reliability analysis. Cronbach's α was 0.979 across the sample, indicating a very high level of internal consistency.
Relationship to behavioural and functional data
The BATCH correlated with the NPI (ρ = − 0.402, p < 0.001), the BADLS (ρ = 0.582, p < 0.001) and the Barthel Index (ρ = 0.477, p < 0.001) but not the HoNOS (p = 0.858). Regression analysis showed that the most significant determinant of BATCH score was cognitive function measured on the NUCOG (β = 0.400, p < 0.001) followed by functional status (BADLS; β = − 0.373, p < 0.01) and symptom status (NPI; β = − 0.281, p < 0.01). These four measures predicted 63% of the variance in BATCH scores; cognition alone contributed 35.5% of the total variance. Physical disability (Barthel Index) did not affect BATCH scores (p = 0.535).
Discussion
In certain clinical circumstances cognition can be very difficult to measure by confrontational testing. The use of observational measures of behaviour and function provides clinicians with a proxy measure of cognitive performance in those patients who are otherwise unable to be assessed as a result of psychiatric, neurological impairment or behavioural disturbance. The BATCH was developed as a standardized means of collecting clinical observation and functional information that was being routinely used in an inpatient setting to provide a proxy measure of cognition. The current project has shown that in a diagnostically heterogeneous neuropsychiatric inpatient sample, total BATCH scores correlated very highly with cognitive performance on the MMSE and NUCOG. BATCH subscale scores correlated strongly with relevant subscales of the NUCOG and demonstrated very high internal consistency.
Patients with dementia had a different BATCH cognitive profile compared to psychiatric or neurologic patients, although the tool was not able to statistically discriminate demented from non-demented patients with the same degree of effectiveness as the NUCOG. The most likely reason for this is that, as pointed out by Reisberg et al. [15], psychiatric disturbance and cognitive impairment may lead to a similar type and degree of behavioural and functional impairment. This explanation is supported by the regression analysis, which identified that both cognition and symptomatology contribute strongly to overall BATCH scores. The clinical diagnosis of a dementia relies on multiple assessment modalities (e.g. clinical, cognitive, imaging) and while the BATCH was not very sensitive for the presence of dementia it exhibited moderately good specificity in that it was able to correctly identify 75% of patients without dementia. In the context of the reasons for which the BATCH has been developed, the ability to detect the absence of dementia is possibly more important than the ability to detect its presence. For example, a patient who refuses or cannot undertake cognitive assessment, who is rated as scoring >256 on the BATCH has a 76% chance of not having a dementia.
There are several potential limitations of the current study. The data were collected within a tertiary referral neuropsychiatry unit that assesses a group of patients with atypical and rare conditions [21, 22] who are not representative of most psychiatry, aged care, rehabilitation or neurology units. But the primary aim of the study was not to investigate the diagnostic ability of the BATCH but to assess its ability to detect cognitive deficits across a wide range of neuropsychiatric conditions. As such the varied and unique population of the Neuropsychiatry Unit at Royal Melbourne Hospital provided a breadth of cognitive, psychiatric, neurological and physical disability that was ideal for the purposes of the study, and we were also able to demonstrate the validity of the BATCH in patients with a high level of psychiatric symptomatology. The second limitation of the study was that all data were collected on a clinical population and we do not have data regarding the performance of normal subjects on the BATCH. Given that the data are based on observations on an inpatient ward by health professionals over several days it is unlikely that we will be able to collect such data. The final and possibly most important limitation is that the BATCH was developed in order to assess cognition in patients who cannot undertake routine cognitive assessments. In order to determine construct validity, however, we assessed it in patients who could undertake cognitive assessments and have made the assumption that psychometric properties herein described are applicable to the target population.
While the BATCH was developed to assist in the assessment of patients who could not undertake cognitive assessment, its performance in this sample suggests that it is of value in a range of patients. The BATCH has provided a way to formalize the interactions and observations between staff involved in the day-to-day care of neuropsychiatric patients during time spent in ‘incidental’ activities. Observations that may have previously not been formally documented and brought to the attention of the wider team are now routinely available for integration into clinical formulation and diagnosis.
The present study has shown that the BATCH correlates strongly with, and is determined by, measures of cognition, symptomatology and functional status, and that scores from the BATCH can be integrated with other clinical information to aid clinicians in the assessment of cognitive function. In those individuals who cannot undergo formal cognitive testing, performance on the BATCH may act as a proxy measure of cognition, although levels of psychiatric disturbance would need to be taken into account when interpreting a low score on the BATCH. A high score on the BATCH in this patient group is moderately predictive of the absence of dementia. Staff involved in the day-to-day management of cognitively impaired patients routinely make observations on which they base inferences about the cognitive status of their patients. The present study has shown that the BATCH is the first tool to provide a standardized format for the capturing and rating of such observations with high construct validity and internal reliability, and adds an additional mechanism of cognitive assessment to the current lexicon of cognitive assessment methods.
