Abstract
Psychiatric morbidity commonly accompanies early stage breast cancer; as many as a third of patients may suffer a depressive or anxiety reaction [1, 2]. Prompt treatment is desirable to forestall or ameliorate adverse effects and well-established interventions, including medication and psychotherapy, are available [3]. In order to treat effectively, the disorder must first be accurately identified. One possibility is to screen as part of a routine assessment. After completing a systematic review of the literature, Gilbody et al. [4] concluded that screening improves the recognition of emotional disorders but does not necessarily translate into improved management. They asserted that clinicians in routine care are most interested in positive predictive values (PPV), which indicate the proportion of those scoring above a cut-off that does have the disorder, and that higher PPV values might reduce this discrepancy. We argue that other indices, such as negative predictive value (NPV), sensitivity and specificity are also important in evaluating the efficiency of a screening tool.
The Hospital Anxiety and Depression Scale (HADS) [5] has been widely employed for screening medical patients. It was designed with medical patients in mind, by avoiding any reference to somatic symptoms. Responses are framed in terms of the preceding week, which allows for possible changes over time. It is brief (14 items), taking minutes to complete. The authors recommend cut-off scores of 11, for both the Anxiety (A) and Depression (D) subscales, to indicate probable caseness, with scores between 8 and 10 suggestive of possible diagnoses [6].
A self-report questionnaire's psychometric properties are not fixed entities and must be reassessed with different clinical populations [7]. Factors such as the base-rate of a disorder can markedly affect its performance [6]. Previous studies of the HADS as a screening tool for patients with cancer have varied. For instance, Ibbotson et al. [8] studied patients with various cancers and found the HADS useful for those who were disease free or stable, but less accurate for those with progressive disease. The authors established an optimal cut-off score of 14 for the combined scale (indicating a level of general distress) but did not explore the subscales. A similar study [9] derived an optimal cut-off of 19 for the composite scale, which was recommended for use. Ramirez et al. [10] found that adequate accuracy was only achieved with cut-off scores of 5 and 6 for the D and A scales, respectively. Recently, a study of early stage breast cancer patients compared the HADS with Present State Examination-derived diagnoses [11]. At the recommended cut-off, sensitivity of the A scale was.24 and of the D scale was.14. When the cut-off was reduced to 7, sensitivity of the former improved but remained low for the D scale. The authors observed that, with the population in question, the HADS might fail to detect clinically important changes in psychiatric outcomes of clinical trials. Many have commented that the HADS is designed to identify major, but not minor, depressive disorders, which may contribute to its poorer performance in recent studies [6]. Altogether, these results suggest that further investigation of the HADS with specific patient populations is warranted.
Most early, more supportive, validation studies correlated the HADS with other self-report scales or with clinicians’ ratings of severity [5, 12, 13]. Comparisons with operationally defined diagnoses using a structured psychiatric interview, although preferable [7], were less common. In this investigation, the Monash Interview for Liaison Psychiatry (MILP) was selected as the structured interview, as it offers distinct advantages: it was developed specifically for medical patients, was carefully validated, and produces diagnoses in terms of standard criteria, including DSM-IV [14]. It was used to establish the psychometric properties of the HADS for a sample of women diagnosed with early stage breast cancer.
Method
Sample
The sample comprised 303 women, with either stage II (TNM classification T1 N1 M0; T2 N1 M0; T2 N0 M0) breast cancer, or stage I (T1 N0 M0) breast cancer and a poor prognostic factor (hormone receptor status or histology grade). They were taking part in a randomized controlled trial of cognitive-existential group therapy, but only pre-intervention data are considered here. Potential participants were receiving chemotherapy or radiotherapy at day centres of participating hospitals and they were approached to take part after their clinicians agreed and confirmed eligibility. Exclusion criteria included age over 65, geographical inaccessibility, prior diagnosis of cancer, minimal command of English, dementia, active psychosis or intellectual disability. After obtaining informed consent, we conducted a structured psychiatric interview and administered the HADS.
Instruments
This study used two instruments:
1. The Monash Interview for Liaison Psychiatry (MILP) is designed for medical patients and it provides standard DSM-IV diagnoses. The authors have reported that its psychometric properties include good interrater reliability and procedural validity [14].
2. The HADS consists of two subscales, Depression (D) and Anxiety (A), each of seven items. It is reported to have good reliability and validity by its developers [5].
Results
The 303 participants represented 62% of those eligible to take part in the trial. Reasons given by non-participants included: ‘too busy’ (54; 29%), ‘coping satisfactorily’ (47; 25%), ‘not a “group” person’ (19; 10%) and ‘wanting to move on and leave the illness behind’ (17; 9%). An assortment of other reasons was given by the remaining quarter. Most respondents were middle-aged, employed full-time (49%) or on sick-leave (16%), in professional occupations (52%), married (76%), Australian-born (73%), and educated to senior high or beyond (70%). Interviews took place a median 92 days (range, 18–352 days) after surgery for the breast cancer.
Clinical features
The characteristics of the breast cancer were as follows: Most (83%) were stage II, 45% were tumours of between 11 and 20 mm in size, 49% were histological grade III, and two-thirds were positive for hormone receptors. Eighty-seven women (29%) had three or more axillary nodes involved, indicating greater risk of recurrence. Conservative breast surgery was performed in 164 women (54%) and mastectomy in 139 (46%). Radiotherapy was received by 174 women (57%); 287 (95%) had chemotherapy. Hormone therapy was added later in 145 women (48%).
Psychiatric morbidity
Data obtained with the MILP have been published previously [2]. For this study, only diagnoses that the HADS might reasonably be expected to detect – namely depression or anxiety – were investigated. Other diagnoses, such as phobias or non-current anxiety disorders and substance dependence, were excluded. Altogether, 111 (36.6%) women met criteria for depression: 29 (9.6%) with major depressive disorder; the remaining 82 (27.1%) with minor depression, including 75 cases of adjustment disorder with depressed mood and seven with dysthymic disorder. There were 32 (10.6%) cases of anxiety, six of which were dual diagnoses of depression and anxiety. Of these anxiety cases, 18 (5.9%) were adjustment disorders with anxious mood, five (1.6%) were Generalized Anxiety Disorder, four (1.3%) were Panic Disorder, and five (1.6%) were Post Traumatic Disorder. Diagnoses of either depression or anxiety totalled 137 (45.2%).
Diagnostic efficiency
Table 1 summarizes the statistics for the two scales. At the recommended [5] cut-off of 11, the D scale identified eight (3%) possible cases. When compared with the depression diagnoses from the MILP, six were true positives, two were false positives, 190 were true negatives and 105 cases were missed. Sensitivity was thus.05, specificity, .99, NPV,.64, and PPV,.75. These results suggested that the cut-off might be too high, so they were recalculated with lower cut-off scores (see Table 1). Sensitivity was.23 and PPV was.74 at the cut-off score of 8. Thus, almost one quarter of true cases were identified and three quarters of those above cut-off were true cases. Specificity was.95 and NPV,.68. Alternatively, the results possibly indicated that the diagnostic criteria used were too broad, so we recalculated the statistics considering only the major depressive disorders. A cut-off of 11 successfully identified two cases of major depression and misidentified a further six cases, yielding sensitivity of.07, specificity of.98, NPV of.91, and PPV of.25.
Sensitivity, specificity, negative predictive value, positive predictive value and proportion screening positive for the Hospital Anxiety and Depression Scale (HADS) Depression, Anxiety and Composite Scales with different cut-off scores, using DSM-IV diagnoses of depression and anxiety as the criteria
In order to examine the possibility that a smaller number of items might prove more efficient than the seven-item subscales, a stepwise discriminant function analysis was performed, with the items as predictors and diagnostic status as the criterion. At the univariate level, mean scores were significantly different on all seven items. However, correlations between the items made much of this information redundant. When entered into the analysis, two variables met criteria for selection and the resultant function was significant, F 2,200 = 29.5, p < 0.001. The two items were: ‘I feel cheerful’ (reverse scored), standardized discriminant function coefficient of.78 and correlation with the function of.91; and ‘I feel as if I am slowed down’, coefficient of.44 and correlation of.67. Classification analysis (based on actual proportions) revealed that the function correctly identified 91% of true non-cases, 38% of true cases, and misidentified 9% as false positives and 62% as false negatives.
Including six comorbid diagnoses, 32 (10.6%) cases met the criteria for anxiety. A cut-off of 11 on the A scale identified six true cases, 26 false negatives, and 33 false positives (sensitivity,.19, specificity, .88, PPV,.15, NPV,.90). A cut-off of 8 correctly identified 11 (34%) of the true cases and generated 74 false positives. Despite this improvement, PPV of.13 indicated that only 13% supposed hits were true cases. Sensitivity was.34, specificity,.73, and NPV,.90.
One hundred and thirty-seven (45%) women had either depression or anxiety. Composite scores were calculated by combining D and A scores (see Table 1). With a cut-off of 22, 12 true positives and one false positive were identified. False negatives totalled 125 and true negatives, 165 (sensitivity,.09, specificity,.99, PPV,.92, NPV.57). Decreasing the cut-off to 19 raised the number screening positive: 24 true positives scored 19 or more, as did 20 false positives; 113 true cases were missed (sensitivity,.17, specificity,.98, PPV,.92, NPV,.59).
Discussion
Our results do not support use of the HADS as a screening tool for depression and anxiety in women with early stage breast cancer. As in other studies with this population [10, 11], at the recommended cut-off scores, diagnostic efficiency of the two subscales was low and when cut-off scores were reduced, increases in sensitivity invariably incurred a trade-off in specificity. For example, with the D scale, although the PPV value was good, at the original cut-off score, many true cases were missed, resulting in a low NPV value. Thus deciding not to intervene on the basis of a low HADS score could mean that a substantial proportion of those requiring appropriate treatment would not receive it. Combining the two subscales did not improve matters. At the recommended cut-off, diagnostic efficiency was again low. Reducing the threshold improved case identification but increased false positives.
The findings highlight the difficulties of designing a generic psychiatric screening instrument for medical populations. One problem is the complexity psychiatric categories, which are mostly syndromal. Their range of symptoms is broad and specific items may be neither necessary nor sufficient for diagnosis. One group of patients may be depressed because they suffer intractable pain and feel helpless about reducing it. Another group, such as our sample, may be preoccupied with existential concerns about life and death [15]. Our aim would be to identify patients with adjustment disorders because these problems contribute to a poorer quality of life for women with breast cancer. Timely therapeutic intervention could help women cope more effectively. A screening tool such as the HADS can help identify depression marked by anhedonia reasonably well, but other depressive conditions, such as those in the present study, may be overlooked. Careful screening has to capture an array of symptoms in order to identify psychiatric morbidity associated with physical illnesses.
Typical methods used to develop these instruments militate against this aim. Often, factor analysis is used to identify a homogeneous group of items that will reliably identify an underlying construct, such as depression. This helps create a scale that has high internal consistency, reflected in substantial coefficient alphas. For example, the HADS D subscale comprises mostly items related to anhedonia, because the authors [5] wished to identify ‘the form of depression which responds well to antidepressant drug treatment’ (p.362). As a consequence, the HADS apparently does not identify depressions not characterized by anhedonia.
This problem arises because the many different symptoms, while they are indicators of the latent construct, should not be expected to covary [16]. A different strategy, which recognizes this distinction, is to select more heterogeneous items and create a brief instrument that can be followed by a more substantial assessment, if indicated. Indeed, it has been suggested that a single question such as ‘Are you depressed?’ may be all that is needed [17]. In this study, reversed scores on the item ‘I feel cheerful’ made the biggest contribution to, and correlated strongly with, the two-item discriminant function. The function correctly identified more true cases than the seven-item scale, although it also increased the number of false positives. This result is consistent with previous research comparing combinations of selected items with total score on a screening instrument [18].
This study has several strengths. The sample size is large and the diagnoses were carefully derived on the basis of structured interviews. The frequency of psychiatric disorders resembled that reported in other studies of breast cancer [19–21]. While it might be argued that the results are because of an unsuitable ‘gold-standard’, this objection is difficult to sustain, given the MILP is a structured method of making DSM diagnoses and has been well validated. The results of the study are comparable with other, recent findings in relation to the HADS with this population [11].
Examination of the items’ content supports the interpretation of construct validity limitations of the HADS. Its depression items do emphasize anhedonia [5, 22]. DSM criteria for depression include other symptoms such as indecisiveness, guilt, helplessness, worthlessness, and suicidality. These may also indicate depression in medically ill patients. The women in this study were more likely to report feeling hopeless, discouraged, and despairing than problems of anhedonia. Other depression scales have superior construct validity in this respect [23, 24]. They provide comprehensive coverage of the broad syndrome and allow for better estimations of severity. The disadvantage of these measures is the time taken to complete them, although short versions of some scales are available [25]. One remedy we have outlined may be to develop shorter instruments capturing the particular clinical features of a specific patient group.
Conclusions
Given the current usage of the HADS in oncology research, this report provides important findings concerning diagnostic concordance. Based on this study's results, and those of others, we do not recommend adoption of the HADS as a screening instrument for this population. Future research might focus on developing more efficient screening methods.
Footnotes
Acknowledgements
This study was supported by a grant from the Bethlehem Griffiths Research Foundation; the larger background study was funded by RADGAC and NHMRC. The authors would like to thank two anonymous reviewers for their constructive comments.
