Abstract
Numerous reviews (e.g. [1, 2]), have indicated that depression is increased in those with certain medical illnesses. This could be true or merely reflect the reality that many depression inventory items (e.g. anergia, loss of interest, insomnia) may be affirmed as a consequence of the medical illness itself. To redress inflated case identification rates and severity estimates, various options have been proposed (see overview [3]), including ‘inclusive’, ‘exclusive’, ‘substitutive’ and ‘aetiological’ approaches. No one approach has been accepted with confidence, as each has its own limitations, while most options involve making judgements that remain problematic in practice (e.g. is any ‘fatigue’ due to the medical illness and/or due to the depression).
A second approach has been to design measures that assess depression independently of potentially confounding items. While many have been developed, two will be noted: the Hospital Anxiety and Depression Scale or HADS [4] and the Beck Depression Inventory for Primary Care or BDI-PC [5]. The HADS was developed specifically to screen for symptoms of depression and anxiety in the hospital setting, and remains the most commonly used measure in the medically ill [6]. Its 7-item depression scale (now referred to as the HDS) has five items addressing anhedonia, which the authors considered to be ‘the central pathological feature of that form of depression that corresponds well to antidepressant drug treatment’ [4]. The suggested cut-off score for a definite case of depression is ≥ 11, a doubtful case is defined by scores 8–10 and non-cases by a score less than 8. The HDS has recently been accused of poor validity in both medically unwell and psychiatric patients [7], while Hermann [6] noted that there ‘is still no comprehensive documentation of its psychometric properties’.
The BDI-PC is a seven-item screening instrument designed on a rational-empirical basis by limiting consideration to non-somatic items from an updated version of the well-known Beck Depression Inventory, and only described recently. ‘Sadness’ and ‘loss of pleasure’ (anhedonia) were included on an a priori basis, as one of these two symptoms is necessary for a DSM-IV diagnosis of depression. ‘Suicidal ideation’ was similarly chosen on an a priori basis as an important clinical indicator of suicidal risk. The remaining four items (‘pessimism’, ‘past failure’, ‘self-dislike’ and ‘self-criticalness’) were determined empirically from data obtained from 500 psychiatric patients [5]. A cut-off score of ≥ 4 is used to define depression caseness, with quantified high sensitivity and specificity (reports ranged from 82% to 99%) for medical inpatients and outpatients [5, 8, 9]. In the keyreport [5], the utility of the BDI-PC and the HDS depression subscale were directly compared in terms of their capacity to differentiate depressed and non-depressed patients who had been referred to a consultation–liaison service (19 following a drug overdose and 31 with medical illnesses), with the BDI-PC shown to be superior.
We similarly seek to develop a valid measure of depression in the medically ill via a focus on cognitive symptoms. Our approach differs from Beck's in that, rather than adapting or ‘somatic stripping’ an accepted depression measure, we adopt a ‘bottom-up’ approach. That is, we seek to define the quintessential cognitive mood state of depression by assembling a set of constructs that define both depressive mood state and associated cognitive constructs, ideally establish that they are independent of medical illness and its impact (assessed by similarly obtaining a set of central illness constructs) and then refine the depression set.
Method
Development of the test measure
We selected 81 items assessing two central components. First, the ‘impact’ of a medical illness on the individual, as described by Cassell [10] and other writers. Thus, items addressed experiential issues such as disconnection and loss of mastery, as well as more physical manifestations such as fatigue, sleep problems, lack of vigour, exhaustion, inactivity, listlessness and weariness. We hypothesized, subject to the medical disorder having an illness ‘impact’, that both depressed and non-depressed subjects would score similarly on such items, but that there would be wisdom in identifying (and excluding) any such identified generic illness items from those specific to depression.
The majority of the items, however, sought to capture cognitive depressive constructs, and were selected from a range of theoretical expositions and empirical studies. Items were generated to capture the following domains: depressive mood state, anhedonia, suicidal ideation, pessimism, perceived failure, loss of control, hopelessness/ helplessness, self-reproach, non-reactive mood, irritability, fearfulness, threat, loss of humour, tearfulness, self-esteem, self-efficacy, mistrust, dissatisfaction with life, grief/loss, social withdrawal, guilt, loss of concentration, perceived stress, significant worry and brooding.
We elected to administer the questionnaire verbally, both to check on interpretation of items and to assist patients with concentration difficulties. Respecting our design objective of developing a state measure, subjects were requested to answer on the basis of ‘How you have been feeling in the last two to three days, and compared to how you usually or normally feel’. Rating options for each item (e.g. ‘Are you feeling depressed?’) were ‘not true at all’, ‘true to some degree’ and ‘very true’, scored 0, 1 and 2 respectively.
Recruitment procedures
The study was undertaken at a large general teaching hospital in Sydney (Prince of Wales). To ensure heterogeneity of medical disorders, patients were recruited from a number of departments. To ensure severity of medical disorder, we focused on inpatients, but included outpatients receiving radiotherapy on a daily basis. To minimize effects of hospitalization on mood, inpatients were not recruited until at least 2 days after admission, while radiotherapy outpatients were not screened in their first week.
Inclusion criteria were: age between 18 and 65, English language competence, and well enough (judged by the designated medical registrar) to be interviewed. Patients were excluded if they showed any evidence of cognitive disturbance, or if cerebral pathology had been diagnosed by their treating medical team, with the concern here being to exclude those whose capacity for self-reporting might be invalid. Relevant medical registrars were contacted daily for identification of eligible medically ill patients. To respect confidentiality, all patients were interviewed in a private area.
Phase I components: recruitment and study procedures
The research assistant (TH) explained to eligible patients that the study was voluntary, and that it aimed to assess ‘the stresses and emotions that are associated with having a medical condition’, and detailed all study logistics and potential components. If the patient refused participation (then or subsequently), their reasons were recorded.
At interview, sociodemographic details were sought, as well as details about the principal medical diagnosis (e.g. length of time since being informed of their diagnosis, current medication) and any current use of antidepressants. After completion of our questionnaire, alternate patients were requested to complete either the BDI-PC or the HDS.
Phase II components: assessment by a psychiatrist
We elected for a one-in-three subsample of the patients to be interviewed by a research psychiatrist (JB) experienced in the assessment of depression (via our Mood Disorders Unit). Interviewing occurred within 2 days of the research assistant assessment, and with the psychiatrist blind to measure scores.
The psychiatrist undertook a detailed interview focusing on the impact of the medical illness and seeking to determine any depressive symptomatology, any lifetime history of depression and past use of antidepressants. The psychiatrist then completed a modified DSM-IV depression checklist focusing on current functioning, modified by judgements as to whether symptoms were likely due to physical illness/effects of hospitalization or due to depression, using guidelines developed by Cavanaugh [11]. The psychiatrist derived two principal criterion measures of depression. First, a dimensional clinically derived estimate of depression (1 = not depressed, 2 = mildly depressed, 3 = moderately depressed and 4 = severely depressed) was made for all patients. Second, a judgement about the nature of any clinically significant, current depression (i.e. 1 = no psychiatric condition, 2 = adjustment disorder with depressed mood, 3 = major depressive episode or MDE). We did not impose a minimum of 2 weeks’ duration, as we were developing a state measure.
Following clinical interview, the Depression section of the Composite International Diagnostic Interview (CIDI) version 2.1 [12] was administered, generating diagnoses of MDE (past and current), dysthymia, and information on severity and recurrence of episodes (single/recurrent). As some patients having a history of dysthymia also met criteria for MDE, the data from the CIDI were coded into two categories ‘current’ and ‘non-current’ MDE, with current referring to the 2 preceding weeks. CIDI judgements were used to check on the validity of the psychiatrist's judgement.
Results
Logistic issues
Of 156 subjects referred by their registrars, 28 were discharged from hospital before our assessment. Another 30 declined to participate; of these, 18 appeared depressed or extremely stressed, 10 declined for logistic reasons (e.g. had visitors or were about to be discharged) while two stated that they did not want to talk about their condition. Another 23 referrals met our exclusion criteria. Eight questionnaires were left incomplete due to hospital procedures intruding. Recruitment ceased after 4 months when we judged that we had sufficient data for item refinement.
Sample characteristics
The sample comprised 67 medical patients (52% male), with a mean age of 47.6 (SD = 13.0) years. Principal recruitment sources were: 19. 4%, cardiology; 17.9%, gastroenterology; 14.9%, respiratory; 13. 4%, radiation oncology; 10.4%, haematology; 10.4%, nephrology; 7. 5%, rheumatology; and 6.0%, endocrinology.
Sociodemographic data established that 54% of patients were currently married or living with their partner, 27% were single, 10% were divorced and 9% separated. The median duration of educationwas 12.7 years. Prior to admission, 72% had been employed full-time or parttime, 15% received the pension or sickness benefits, 9% were retirees, 3% were students and 1% was unemployed. On a seven-point occupational scale [13], the mean ranking was intermediate at 4.4 (SD = 1.6).
Depression data: screening measures and CIDI
The HDS was completed by 32 patients, and generated a mean depression subscale score of 5.2 (SD = 4.1). Adopting the cut-off criteria noted earlier, 12.5% rated as definite and 15.6% as probable cases of depression. The mean score for the 35 patients receiving the BDI-PC was 3.5 (SD = 4.3), with 34.3% scoring above the depression cut-off point.
Twenty-nine (43.3%) of the overall sample were interviewed by the research psychiatrist, with only one selected patient declining. Seven (24.1%) of this subsample were judged by the psychiatrist to be suffering from a clinically significant depressed mood: four (13.8%) received a DSM-IV diagnosis of major depression (MDE) and three (10.3%) were judged to have an adjustment disorder with depressed mood. Twenty-eight of these patients also received a CIDI interview (one not completing this component for logistical reasons), with the CIDI identifying 21.4% of this subset as currently suffering from an MDE.
Agreement between estimates of major depression
Agreement between psychiatrist-rated MDE and CIDI-diagnosed MDE was high (κ = 0.76, p < 0.01). As the CIDI probes the somatic features of depression, such as sleep and appetite disturbance, without taking into account medical symptomatology, it risks overgenerating MDE diagnoses. Additionally, it does not produce a DSM-IV diagnosis of adjustment disorder with depressed mood, thus risking generating an alternate MDE diagnosis for such patients, as occurred for two. When we re-examined interrater reliability after including our psychiatrist's diagnosis of adjustment disorder with depressed mood in the analysis, the reliability coefficient improved further (κ = 0.90, p < 0.01).
Refining our 81-item questionnaire
Few items were deleted because of difficulty in their interpretation or understanding. Refinement then proceeded by first comparing item means for depressed and non-depressed subsamples as defined by the HDS and BDI-PC cut-off scores, and then subsequently checking on each item's discriminating capacity by using the psychiatrist's judgement as the criterion standard.
As noted earlier, we first sought to determine items ubiquitous to a medically ill sample and thus affirmed (ideally equally) by both the depressed and non-depressed groups alike. We anticipated that such items would reflect a more general construct, perhaps the ‘impact’ of a medical illness or even define ‘illness’ itself. Several such items were suggested (namely inactivity and insomnia) in the sense of having both similar and high affirmation rates across depressed and non-depressed subsets, although each tended to be rated more strongly by the depressed subjects. While not ubiquitous, several other items [16and 64 (non-reactive mood), 1 (concentration problems) and 48 (selfreproach) and all noted in Table 1, showed similar distributions across depressed and non-depressed subsets.
Items deleted from the final measure and rationale for deletion (items abbreviated here)
We next sought to identify items affirmed only by the depressed medically ill. Theoretically, the ideal item would be affirmed by all depressed subjects and by none of the non-depressed subjects (i.e. have absolute sensitivity and specificity), with item affirmation possibilities allowing us to focus only on ‘very true’ responses or to also acceptthe ‘true to some degree’ option. After inspecting multiple analyses, we selected on the basis of ‘very true’ responses only. In addition, while some constructs may be obligatory to the depressed state, others may be rare (e.g. suicidal thinking). We elected to reject low-prevalence items in the depressed subjects even if they were discriminating (as was shown for suicidal ideation). However, we later established that subjects who had affirmed the suicidal ideation item had all rated the more central depressive items so that they subsequently scored above the cut-off on the refined measure: so obviating any need to include the suicidal ideation item.
Again using the HDS/BDI-PC and psychiatrist judgements as two independent measures, we quantified overrepresentation of each remaining item in the depressed subsets by use of odds ratio (OR) analyses and imposed a minimum OR of 12.0. We then examined an intercorrelation matrix of all so-identified items, examining for high coefficients and to exclude synonymous items, with 11 additional items being deleted on that last basis. Table 1 details reasons for excluding specific items. The final refined set of 16 items is tabled (Table 2), together with data on their relative discrimination across assigned depressed and non-depressed subjects.
Refined final item set and differentiation across depressed and non-depressed subsets
Internal consistency
The coefficient alpha of internal consistency for our 16-item scale was very high (α = 0.95). The corrected item-total correlations appear in Table 2, and were significant beyond the level for a one-tailed test even after Bonferroni adjustment (α/16) was used to control for the family-wise error rate.
Convergent validity
We created a total score on the 16-item set (preserving the original scoring) and used Spearman correlations to quantify the convergent validity of our measure with the BDI-PC and the HDS. The total score of our measure correlated strongly with total BDI-PC scores (r = 0.80, p < 0.01, n = 35) and with total HDS scores (r = 0.72, p < 0.01, n = 32), for the relevant subsets.
Receiver operating characteristic analysis
A receiver operating characteristic (ROC) analysis derived a cut-off score (18 or more) for our measure yielding high sensitivity (100%) and specificity (96%) rates in relation to the psychiatrist's judgement of clinically significant depression (i.e. MDE or adjustment disorder with depressed mood), with the area under the curve indicating that our measure was highly accurate in differentiating the depressed from the non-depressed patient (AUC = 0.99).
Criterion-related validity
Spearman correlations quantified test measure scores with depression severity as rated by the psychiatrist. Our 16-item measure returned a high coefficient (r = 0.74, p < 0.001, n = 29), which compared favourably with the BDI-PC (r = 0.68, p = 0.03, n = 17) and the HDS (r = 0.54, p = 0.07, n = 12). To ensure that the high correlation for our measure was not an artefact of unequal sample sizes, we repeated the analysis in the separate subsets receiving alternative depression measures. Limiting consideration to the BDI-PC subset (n = 17), our measure had a similar strength of association with depression severity assessed by the psychiatrist (0.67 vs 0.68). In the 12 patients receiving the HDS, our measure was more strongly associated than the HDS (0.81 vs 0.54) with psychiatrist-rated depression severity.
Further evidence of criterion-related validity was obtained by examining the capacity of each measure's cut-off point to predict the psychiatrist's judgement of clinically significant depression. The agreement of our test (cut-off point ≥ 18) was first scrutinized forall 29 patients who received a psychiatric diagnosis, and generated a very high kappa of 0.91 (p < 0.001), and superior to those receiving the BDI-PC and the HDS (with κs of 0.68 and 0.57 respectively). In addition, in matched subsets, our measure was superior to the BDI-PC (κs of 0.82 and 0.68 respectively) and to the HDS (κs of 1.00 and 0.57 respectively).
A stepwise multiple regression analysis was performed to determine the most highly discriminating items using the psychiatrist's ratings of depression severity as the dependent variable for the 16 items. The most economical solution was a linear combination of three items: ‘stewing over things’ (brooding), ‘feeling guilt about things in your life’ (guilt) and ‘feeling more distant from other people’ (social withdrawal), with the three accounting for 80.8% of variance in psychiatrist-rated depression severity.
Discussion
The distorting impact of medical illness itself on assessing depression in the medically ill is well recognized. While the HDS measure has long been used for such a purpose, we earlier noted reference to its low validity. Beck's BDI-PC comprises a set of non-somatic items that show some correspondence to DSM-IV-defined major depression, and perhaps as a consequence, has been shown to be associated more strongly than the HDS with a diagnosis of major depression [5]. The present report describes a relatively uncommon strategy: of incorporating a psychiatrist's judgement as the principal criterion measure as against merely using formalized diagnostic criteria for depression (albeit ensuring the correspondence of the latter with the psychiatrist's judgement).
As noted, we hypothesized that the distorting effects of medical illness might be circumvented by use of items assessing cognitive manifestations of depression. Such an approach nevertheless requires establishing that any contributing ‘cognitive’ item has some specificity to depression and not to illness and its impact. To that end, we incorporated items that have been described [10] as central to the latter, such as a sense of disconnection and a loss of mastery. We had anticipated that the final measure might allow a ‘trunk and branch’ model, whereby all medically ill subjects would affirm truncal impact of medical illness items, and then affirm depression branch items only if they were depressed. Again, we failed to find similar affirmation rates of truncal items by the depressed and non-depressed subjects. Instead, the depressed subjects affirmed all such items more strongly, indicating that such items actually did have a depressive construct component or that depressed subjects were more likely to affirm most items irrespective of their construct domain.
We therefore focused on identifying a set of items most differentiating of depressed (and non-depressed) medically ill patients, using comparison measures and psychiatrist judgements to refine the lengthy item set. The final set is somewhat long (i.e. 16 items) but will be reduced after application studies. Against design objectives, the set does not appear limited to ‘pure’ depressive items such as depressed mood components. In addition, there is a brooding item and at least two items that have a distinct ‘anxiety’ connotation (i.e. ‘fearfulness'and ‘insecurity’). Other Australian researchers haveidentified [14] a set of relevant constructs in patients with medical illness (i.e. demoralization, anhedonia, autonomic anxiety, somatic symptoms and grief), with our item set seemingly relating most closely with their identified demoralization items, but also including ‘brooding’ in addition to clearly depressive constructs.
We were particularly struck by the failure of some items (e.g. assessing loss of concentration, non-reactive mood, sleeping problems and anticipatory anhedonia) to differentiate depressed and non-depressed subjects, especially when some are also in the criteria set for DSM-IV major depression. Such items may, in the medically ill, then be generated as much by the medical illness as by any depression, as occurs with themore ‘somatic’ depression items. As five of the seven items in the HDS (as against only one of the BDI-PC items) assess anhedonia, this may clarify why the HDS does not appear so discriminating of depression in the medically ill. It is also of interest that we deleted several ‘depressed mood’ synonym items from the set as they were not as discriminating as the direct item ‘Are you feeling depressed?’
The properties of the 16-item set are encouraging. We demonstrated high internal consistency, with a coefficient alpha of 0.95, compared to 0.86 for the BDI-PC. Convergent validity analyses established that our measure correlated highly with both the BDI-PC (r = 0.80) and HDS (r = 0.72). Respecting published cut-off scores for those latter two measures, and determining one for our measure, we established that our measure was distinctly superior to the HDS and somewhat superior to the BDI-PC when compared with a psychiatrist's independent clinical judgement of depression severity and case status. As noted, it is rare for screening measures to be tested against a psychiatrist's judgement, but we argue for the importance of this strategy as it has the capacity to reflect a pluralistic assessment process based on clinical judgement. Clearly, it would be unwise to rely on a psychiatrist's judgement alone due to the risk of potential rater bias, and thus we had the psychiatrist operate to the structured guidelines developed by Cavanaugh [11] and, additionally, incorporated the CIDI.
Our psychiatrist identified four cases of MDE compared with six by the CIDI, presumably due to the CIDI probing the vegetative symptoms of MDE and so risking overdiagnosis of mood disorders in the medically ill. The CIDI does not generate a diagnosis if patients attribute their sadness and anhedonia as being always due to the effects of medical illness or medication, however, many of our patients found it very difficult to interpret whether their medical illness was ‘always’ the cause of any depression and were confused as to whether ‘cause’ implied a direct physical or psychological impact mechanism. Such nuances argue for clinician judgements as a criterion measure of any case-establishing measure under development.
Our study had a relatively small sample size, and certainly the percentage assessed by the psychiatrist was quite small. We ceased recruitment, however, when we judged that the sample size was adequate for principal analyses but also as we became increasingly aware that the process placing excessive demands on many subjects. Again, using certain comparison measures to both define discriminating items and then examine the level of discrimination has a level of circularity, although we sought to minimize that issue by examining for consistency of discrimination across multiple approaches.
Any measure of depression in the medically ill clearly has to be acceptable, with key components including brevity and minimal intrusiveness. In relation to the latter, we deleted certain items, and specifically one assessing suicidal ideation, as several subjects reacted strongly on being read that item. It became clear that many did not identify with a state defined in part by suicidal ideation or other markers of a more gravid condition. Importantly, we established that all those who did affirm some level of ‘suicidality’ were highly likely to affirm less ‘intrusive’ items and to rate above our cut-off score, so that its inclusion appeared quite unnecessary, although it is included in the BDI-PC.
We conclude that screening and identification of depression in the medically ill may be best pursued by a focus on cognitive nuances of depression, and with our current measure (and multivariate analyses) identifying items that show promise. While our 16-item set appears to have quite impressive properties, it will require replication. We currently pursue the utility of the items derived in this study in an independent sample, with preliminary analyses again suggesting distinct superiority to the HDS and superior sensitivity to the BDI-PC, and with an additional objective of determining a brief screening version.
Footnotes
Acknowledgements
We thank Kerrie Eyers for manuscript assistance, Aaron Beck and the following medical consultants at Prince of Wales Hospital (Roger Allan, Robert Lindeman, David Mackenzie, Vic Duncombe, Jim Bertouch, Bruce Pussell, Denize Lonergan, Chris Milross and Stephen Colagiuri) for study assistance, as well as the NHMRC (Program Grant 993208) and the NSW Department of Health (Infrastructure Grant) for funding assistance.
