Abstract
Breast cancer patients often experience psychological distress severe enough to satisfy diagnostic criteria for psychiatric disorders [1]. Minor depression, including adjustment disorder with depressed mood, is particularly common. Although interventions that can alleviate unnecessary morbidity [2] and promote improved medical outcomes [3] are available, depression can easily escape detection [4]. Routine screening using self-report measures helps ensure therapeutic interventions are offered to those in need [5]. While screening to detect medical patients has a controversial history [6], [7], raising some of the concerns addressed below, the approach has gained increasing support. However, selecting a screening measure is not a simple task. Established depression scales have been evaluated but results are mixed.
Several factors complicate the task. Some reflect theoretical controversies relating to depression. For example, researchers have employed different definitions of depression for developing and for validating scales. Adopting standardized criteria, such as DSM-IV, as the reference, should assist comparisons between screening measures. Other factors relate to characteristics of medically ill patients, noted earlier. For example, early and advanced stages of breast cancer are associated with different physical symptoms and treatment sideeffects, affecting psychological responses [5], [8]. Hence, homogeneous samples are essential. Second, measures rarely allow for the overlap between psychiatric and physical symptoms or treatment side-effects. For example, as appetite loss is common in cancer and depression, many non-depressed patients endorse appetite disturbance items on standard depression inventories. Scales designed for use with medically ill patients should therefore be deployed [9]. Finally, methods such as concurrent validation with other self-report measures may affect estimates of diagnostic accuracy. Researchers need to set a reference standard for comparisons. They should estimate the prevalence of operationally defined psychiatric disorders by first administering a structured psychiatric interview [10]. Screening measures can then be assessed against this standard.
These factors were addressed in an investigation of screening measures for depression in women with metastatic breast cancer. We compared two self-report scales devised for the medically ill, examining their concordance with diagnoses derived from a psychiatric interview for medical patients [11]. The Hospital Anxiety and Depression Scale (HADS) [9] was created by excluding somatic symptoms and focusing on psychological features, particularly anhedonia. Early validation studies were favourable; they invariably correlated the HADS with other scales or clinicians' ratings [9], [12]. Investigations with structured psychiatric interviews have been less supportive, particularly of the depression scale [13–15]. Thus, its validity in oncology remains controversial.
The second was the 13-item short form of the Beck Depression Inventory (BDI-SF) [16], [17]. Although developed for use with the medically ill, it includes three ‘somatic’ items. The authors provided cut-off scores, yet others have calculated different figures, emphasizing the need to establish local norms. Its value as a screening tool in oncology remains unclear, so we compared its performance and that of the HADS-D in women with metastatic breast cancer.
Method
Sample
The sample of 227 women, with stage IV (TNM classification system: T n N n M 1) breast cancer, was involved in a randomised controlled trial of group therapy, but only pretreatment data are considered here. Women were approached to participate after their clinicians confirmed their eligibility. Exclusion criteria included age over 70, geographical inaccessibility, prior diagnosis of other cancer (except basal cell skin cancer), minimal command of spoken English, dementia, active psychosis and intellectual disability. After obtaining informed consent, we conducted a structured psychiatric interview and then administered the self-report questionnaires.
Instruments
The Monash Interview for Liaison Psychiatry (MILP), designed for medical patients, provides DSM-IV diagnoses and has very good interrater reliability and procedural validity [11]. The HADS consists of two scales, Depression (D) and Anxiety (A), each of seven items, and is reported to have good reliability and validity [9]. The HADS-D scale is considered here. Patients rate (0–3) items referring to symptoms experienced during the preceding week and responses are summed. For probable caseness, the recommended cut-off score for both the anxiety (A) and depression (D) scales is 11. The BDI-SF has 13 items covering depressive symptoms. Scores correlate 0.61 with clinicians' ratings of severity and 0.96 with the longer version of the BDI, indicating adequate reliability and validity [16]. Patients rate (0–3) the severity of each symptom over a seven-day period and the results are summed. A score above 4 is considered indicative.
Results
Participants represented 47% of those eligible (485) to take part in the therapy trial. Reasons given by non-participants (some gave multiple responses) were: ‘too busy’ (61; 27%); ‘coping satisfactorily’ (41; 18%); ‘health and the demands of treatment’ (46; 20%); ‘practical issues such as child care and transport’ (35; 16%); ‘not being a “group” person’ (20; 9%); and ‘wanting to get on with life’ (22; 10%). The remaining 55 (24%) gave assorted other reasons. Respondents were middle-aged (mean = 52 years, SD = 9), married or living with a partner (161, 71%), Australian-born (158, 70%), and educated to senior high school or beyond (143, 63%). One-third were in paid employment (76, 34%), one-quarter disabled or ill (56, 25%), and one-quarter retired (55, 24%).
Clinical features
Over half the patients had stage II breast cancer at initial diagnosis, while 37 (16%) already had metastatic disease. In relation to their metastatic disease, 124 (54.6%) of the women had visceral metastases and 41 (18.1%) had three or more metastatic sites. The most common sites were bone (156, 68.7%), lung (73, 32.2%) and liver (71, 31.3%). Of the sample, 140 (61.7%) received chemotherapy for their metastatic disease, while 149 (65.6%) received hormone therapy, 118 (52.0%) received radiotherapy, and 67 (29.5%) received bisphosphonates for their bone metastases.
Psychiatric morbidity
Although the MILP provided comprehensive diagnostic assessments, only depressive-type diagnoses that the self-report scales might reasonably be expected to detect were considered. Other conditions, such as phobias and substance dependence, were excluded. Altogether, 74 (32.6%) women met criteria for depression: 16 (7.0%) had a major depressive disorder and 58 (25.5%) minor depression, including 55 with adjustment disorder with depressed mood and three with dysthymic disorder.
Performance of the scales
Table 1 summarizes the performance indices of the BDI-SF and the HADS-D when compared with DSM-IV-defined diagnoses of major and minor depression. With a cut-off score of 5, the BDI-SF identified 54 (out of 74) true positive and 114 (out of 153) true negative cases, for an overall agreement of 0.74. When corrected for chance agreement, this yielded a kappa coefficient of 0.45. False positives totalled 39 and false negatives, 20. These figures yielded sensitivity of 0.73, specificity of 0.74, NPV of 0.85, and PPV of 0.58. Thus, almost three quarters of the DSM-IV depression cases were correctly identified, whereas under half of those with high scores were false alarms. Raising the cut-off to 6 reduced sensitivity to 0.65, with 48 true positives and 26 false negatives. While the higher score reduced the number of true positives identified, it also resulted in a decrease of false positives to 25, hence PPV improved to 0.66. Dropping the cut-off score to 4 improved sensitivity (0.84) with 62 of 74 true positives identified. However, specificity declined to 0.63, with 56 false positives. Positive predictive value was 0.52, since more than half of those identified were true positives. Overall agreement was 0.55; when adjusted for chance agreement, kappa was 0.41.
Overall agreement, kappa co-efficient, sensitivity, specificity, positive predictive value, negative predictive value for the Hospital Anxiety and Depression Scale-Depression and Beck Depression Inventory Short Form Scale with different cut-off scores; DSM-IV defined diagnoses of major and minor depression as the criteria
At the recommended cut-off of 11, 16 women had high scores on the HADS-D. When compared with the depression diagnoses from the MILP, 12 out of 74 were true positives, four were false positives, 149 out of 153 were true negatives and 62 were false negatives. Sensitivity was thus 0.16; specificity, 0.97; NPV, 0.71; and PPV, 0.75. Overall agreement was 0.71 and kappa, 0.17. These indices were recalculated with differing cut-off scores (see Table 1). When cut-off was set at 7, sensitivity was 0.50 and PPV was 0.67. Thus, half of true cases were identified and two-thirds of those above cut-off were true cases. Specificity was 0.88 and NPV was 0.78, since 135 true negatives and 18 false negatives were identified. Overall agreement was 0.76 and kappa, 0.29.
Figure 1 illustrates the receiver-operating characteristic (ROC) curves across all possible cut-off scores for the two scales and demonstrates the trade-off between identifying true cases and capturing false positives. The better a measure's performance, the closer the curve will approach the top left-hand corner. The total area under the curve (AUC) thus represents the global diagnostic accuracy of the questionnaire [18]. As can be seen, the BDI-SF (AUC = 0.82) performed better overall than the HADS-D scale (AUC = 0.78). The ROC curve also provides for simple calculation of likelihood ratios for a positive test result [19]. With a cut-off of 4 on the BDI-SF, a little over half the sample screened positive and a high score is more than twice as likely to indicate a patient who is truly depressed than one who is not. The likelihood rises to almost three times with a cut-off of 5. With a HADS-D cut-off score of 11, a high score is six times more likely to indicate true depression than not, although the proportion screening positive was only 7% in total.

Receiver operating characteristic curves for the Hospital Anxiety and Depression Scale-Depression and Beck Depression Inventory Short Form scales with DSM-IV-defined diagnoses of major and minor depression as the criteria.
Because the high false negatives for the HADS-D might indicate the scale is not sensitive to cases of minor depressive disorders, the analyses were repeated for diagnoses of major depression only. With a cut-off score of 5, the BDI-SF had a sensitivity of 0.94, indicating 15 out of the 16 cases were correctly identified. There were 78 false positives, for a PPV of 0.16. Overall agreement was 0.55 and kappa, 0.13. Raising the cut-off to 8 reduced sensitivity to 0.62 with 10 true positives; PPV rose to 0.21, as there were 37 false positives. At this cut-off, overall agreement was 0.81 and kappa, 0.24 (see Table 2).
Overall agreement, kappa co-efficient, sensitivity, specificity, positive predictive value, negative predictive value for the Hospital Anxiety and Depression Scale-Depression and Beck Depression Inventory Short Form Scale with different cut-off scores; DSM-IV defined diagnoses of major depression as the criterion
For the HADS-D, a cut-off score of 11 resulted in six out of 16 true positives, with 10 false positives, for a sensitivity of 0.38 and PPV of 0.37, with overall agreement of 0.91 and kappa of 0.33. Reducing the score to 7 increased sensitivity to 0.81 with 13 cases correctly identified, together with 42 false positives, yielding a PPV of 0.24. Overall agreement was 0.80 and kappa, 0.29.
The ROC curves are represented in Fig. 2. Total area under the curve for the HADS-D was 0.84 and for the BDI-SF, 0.86.

Receiver operating characteristic curves for the Hospital Anxiety and Depression Scale-Depression and Beck Depression Inventory Short Form scales with DSM-IV defined diagnosis of major depression as the criterion.
Discussion
In this comparison of two self-report measures, the BDI-SF performed better than the HADS-D scale in identifying cases of DSM-IV major or minor depression, especially at lower cut-off scores. As the ROC curves illustrate, screening requires a trade-off between identifying as many true positives as possible (maximizing sensitivity) while minimizing false positives [18]. The optimal cut-off for a screening tool will therefore depend on the specific requirements of the setting. Since psychosocial distress detracts from patients' quality of life and depression has many negative consequences, including poorer adherence and response to medical treatment [20–22], a lower cut-off can increase sensitivity even if the number of non-cases is increased. The BDI-SF, with a cut-off of 4, proved to be the most efficient tool in this respect, with satisfactory sensitivity despite a relatively high rate of false positives. A score above this cut-off could therefore be taken as an indication of possible ‘caseness’, requiring further assessment.
Sensitivity of the HADS-D was low at the recommended cut-off of 11. It increased as the cut-off was reduced, although there was a resultant trade-off in PPV, indicating that many scores above the lowered cut-offs were false positives. These relatively modest results underscore the importance of establishing local norms and cut-off scores for screening measures and are consistent with those from recent studies of women with early stage breast cancer [13], [15]. Other work, for example by Razavi and colleagues [23], has provided stronger support for the HADS. However, major differences between the studies, including the latter's use of the combined scales to screen for minor depressions and the fact that it used a French version of the measure, help explain these differences.
Interestingly, the former studies used operationally defined diagnostic criteria, such as DSM-IV diagnoses of depression, as the reference standards. The HADS-D was not developed to identify these cases. It was created to pinpoint depressed patients who ‘may be helped by antidepressants’ [24], p. 393] as distinct from depressed medically ill patients likely to benefit from counselling and psychotherapy [25]. Snaith recommended using the HADS for this purpose. It assesses the pervasiveness of anhedonia, which its authors considered the hallmark of conditions requiring antidepressant medication. In the DSM-IV system, anhedonia is not necessary for a diagnosis of major depression, nor is it a criterion of adjustment disorder, so it is unsurprising that the HADS missed many of the latter cases, resulting in lower accuracy than was obtained with the BDI-PC when depression was broadly defined. In terms of identifying major depression only, the two scales performed comparably, as indicated by the area under the respective ROC curves.
The performance of the HADS-D thus raises questions about the construct validity of DSM-IV criteria for depression, used as the reference standard in this study. Many theorists argue that depression is poorly defined in DSM-IV and that alternative systems could have superior construct validity [26], [27]. For example, a recent paper examining the latent structure of psychiatric symptoms in patients with medical illness discriminated, among others, two dimensions of anhedonia and demoralization [28]. Items similar to those in the HADS-D identify the former dimension, while the latter resembles adjustment disorder with depressed mood. Further work, replicating this latent structure for women with breast cancer, is needed to assess the construct validity of this taxonomic model.
Our findings illuminate the problematic nature of determining a suitable screening instrument for depression in women with breast cancer. Researchers and clinicians should bear in mind several considerations when evaluating research reports on this question. First, there has to be explicit recognition of the definitions of depression adopted as the reference standard and acknowledgement of their relative merits. Second, scales have to assess the adopted criteria appropriately. Questionnaire scoring methods usually assume that all items are equivalent and require respondents to rate each one in terms of frequency or severity. High scores can be obtained with low ratings of many items or with more extreme responses to fewer items. Operationally defined syndromes often require consideration of a range of criteria, a subset of which is sufficient but not always necessary for diagnosis. A DSM-IV diagnosis of depression, for example, requires negative mood or loss of pleasure as a criterion. Ideally, a screening tool would include items for both these symptoms and at least one would need to be endorsed before a person could be screened positive. A high score on most measures could potentially not include either item. Unless scales reflect current diagnostic categories in this way, they will be insensitive to possible symptom permutations that can produce a high score but not necessarily a diagnosis. The HADS concentrates on anhedonia and does not include other DSM-IV symptoms such as indecisiveness, guilt, helplessness, worthlessness, and suicidality [1], [9]. Most scales share this tendency to focus on fewer symptoms and give them equivalent weight. One reason for this reflects the way they are developed. Factor analysis is commonly used to identify groups of items with good internal consistency, indicated by high coefficient alphas. This technique usually improves reliability but can compromise comparisons with reference standards such as DSM-IV diagnoses [29].
Although our study leaves unanswered certain questions concerning screening for depression in patients with advanced breast cancer, we can conclude that while both scales identify major depression with comparable accuracy, the evidence supports the BDI-SF as a more suitable measure than the HADS-D for identifying those women in this patient population who meet criteria for DSM-IV major or minor depression.
Footnotes
Acknowledgements
This study was supported by the Kathleen Cuningham Foundation (National Breast Cancer Foundation), the Cancer Council of Victoria and the National Health and Medical Research Council.
