Abstract
In a series of publications [1, 2] we have demonstrated the clinical importance of distinguishing melancholic from non-melancholic depression, and also argued that the distinction is improved by assessment of signs rather than of symptoms. Thus, our data analyses established that the so-called ‘endogeneity symptoms’ (e.g. anhedonia, non-reactive mood, early morning wakening) are relatively non-specific in that, while commonly reported by those with melancholic depression, they are also frequently reported by those with non-melancholic depressive disorders. Following developmental work, an 18-item CORE measure of observable psychomotor disturbance was developed with a central ‘noninteractiveness’ scale (essentially measuring cognitive processing disturbances) and with two motoric scales, assessing ‘retardation’ and ‘agitation’, respectively. After establishing that those with psychotic and melancholic depression return significantly higher CORE scores than those with melancholic depression, the CORE measure has been validated extensively as a measure of depressive subtype. Such work raises the possibility of other advantages to the assessment of signs (compared to symptoms), whether for subtyping purposes or for assessing severity of depression.
There are, however, several difficulties associated with any depressive subtyping system relying on cliniciandriven observation. Valid assessment clearly requires individuals to be assessed at or near episode nadir. Second, it requires clear-cut decision rules for deciding when a feature is present or absent and, if present, in rating its severity. Third, it requires clinicians to rate in a standardized way, thus arguing for training.
Alternative strategies include self-report ratings but these have intrinsic response bias limitations as well as being particularly problematic in depressed patients with cognitive problems, psychotic symptoms and other severe clinical features. Another option is to have observers make judgements of important clinical features, with possibilities including having nurses rate patients when they are in hospital and, for outpatients, having family members rate their depressed relative. To that end, we elected to develop such observational measures, and here describe the development of the Recent Appearance of Depression Assessed by Relatives (RADAR) measure. This paper also assesses a relative's capacity to distinguish those with clinically diagnosed depressive subtypes and to rate severity of the depressive condition.
Method
Development of the RADAR
The provisional measure effectively had two subscales assessing 25 signs and 6 mood state items. A number of items were weighted to psychomotor disturbance (e.g. postural slumping, slowed movement, delay in responding verbally to questions, length of verbal responses), and thus judged as likely to discriminate psychotic and melancholic depression from non-melancholic depression. Others (i.e. appetite and sleep disturbance, dysphoria) were judged as more likely to assess depression severity.
A definition was provided for each of the 25 signs. For example, the first item ‘depressive appearance’ was defined as follows: ‘Appearance is the observable expression of a person's emotional state, as distinct from mood, which is the person's reported or experienced emotional state’. Relatives were asked to rate the individual ‘in recent times’. For the great majority of the items, 4-point rating options were allowed, with a score of ‘0’ being returned for the absence of a feature and scores of ‘1’, ‘2’ or ‘3’ for presence of a feature and quantifying increasing severity, frequency or persistence. Severity items generally required the rater to judge whether the feature was evident to a slight, moderate or severe degree; frequency items involved rating whether the feature was observed occasionally, most of the time or all of the time, and persistence items required rating whether the feature was present occasionally, frequently or persistently.
For most items, scoring was theoretically predicated on higher scores (i.e. 1–3) being generated by those with psychotic and melancholic depression, and non-melancholic subjects returning ‘0’ scores, or, at best ‘1’ scores. Some depressive constructs, however, are not readily captured along linear dimensions. For instance, ‘diurnal variation’ may be present or absent and, if present, may be reflected by mood and energy being worse at a particular time of the day. Clinical observation suggests that those with psychotic depression are highly likely to have a depressed mood across the day (i.e. without any diurnal variation) and we therefore rated such a response for diurnal variation of mood as a ‘4’. In melancholic depression, the classic picture is of mood being worse in the morning and improving as the day progresses, and we therefore accorded a score of ‘3’ for this pattern. By contrast, for those with non-melancholic depression, the mood is more likely to worsen as the day goes on and we therefore gave this pattern a score of ‘2’. For those whose mood varied across the day without any clear pattern, a score of ‘1’ was recorded, while a score of ‘0’ was recorded for those whose mood was not clearly depressed.
In addition to these 25 signs, we asked raters to assess the individual's ‘depressed mood’ by questioning of six differing constructs (e.g. ‘depression’, ‘self-criticism’). We hypothesized that these ‘mood’ items would be less discriminating of depressive subtype than observable signs.
Instructions indicated that the measure should only be completed by someone who had been in close contact with the individual in recent times, and that the observer should rate the individual at his or her worst in the last few weeks. Raters were asked to make judgements only on what they had observed as against offering speculation or interpretation, and it was emphasized that the measure was rating features that vary across differing types of depression rather than measuring depression severity itself. The objective here was to reduce any tendency of raters to ‘rate up’ if they were concerned about the severity of their relative's condition.
The sample
Subjects attending our tertiary referral mood disorders unit (MDU) facility were asked to complete a number of measures prior to assessment by a research psychiatrist. When a relative accompanied the patient, we requested the relative to complete the RADAR without showing it to or discussing it with the patient. Of 212 consecutive patients so assessed, 101 had a RADAR completed–59 by a spouse or partner, 28 by a parent, six by a child, one by a sibling and the remaining seven by ‘close friends’.
Other data
Turning to our general database, the patient completed a number of self-report measures assessing clinical features of anxiety and depression. Later, during consecutive recruitment to the study, we added two self-report measures with the later subset of patients also completing the Beck Depression Inventory (BDI) [3] and the DMI state depression measure [4]. A research psychologist interviewed the patient, obtaining additional information on clinical depressive features.
The research psychiatrist collected additional data on clinical depressive features, particularly those overrepresented in psychotic and melancholic depression (e.g. delusions and hallucinations, overvalued ideas, guilt, anhedonia, non-reactive mood) as well as making a clinical diagnosis of depressive subtype. Clinical diagnoses included psychotic depression, melancholic depression and non-melancholic depression. A diagnosis of ‘psychotic depression’ required a primary depressive condition with ‘clear evidence of delusions and/or hallucinations’, while ‘melancholic depression’ required no psychotic features and the patient admitting to ‘anhedonia and a distinctly non-reactive mood, and reporting a number of vegetative features (e.g. mood worse in the mornings, appetite and weight loss) and has evident psychomotor retardation or agitation’. DSM-IV decision rules also generated DSM psychotic, melancholic and non-melancholic diagnoses.
In addition, the psychiatrist completed the 21-item Hamilton Depression Rating Scale (HDRS) [5] measure, rated the patient's level of psychomotor disturbance using the CORE measure of psychomotor disturbance [1], completed a DSM-IV Global Assessment of Functioning (GAF) measure, and returned an overall ‘clinical judgement’ (CJ) as to whether the patient was severely, moderately or slightly depressed–or not depressed at all. The psychiatrist was also required to rate whether the patient was currently at or near depression ‘nadir’.
Results
Sample descriptors
The 101 patients had a mean age of 38.6 (± 13.0) and 60% were female. Mood Disorders Unit clinical diagnosis classified 10 as having a psychotic depression, 34 a melancholic depression, with the remaining 57 allocated to the non-melancholic depression group. The mean RADAR-derived mood scale was 13.6 (± 4.3), the sign scale 38.9 (± 13.3) and total score 52.3 (± 16.5). The mean HDRS score was 16.6 (± 6.1) and CORE score 6.1 (± 6.7). For the 51 and 39 patients who received a BDI or DMI measure, mean scores for those two measures were 29.0 (± 11.9) and 38.9 (± 17.2), respectively. The CJ judgements rated 22 patients as severely depressed, 57 as moderately depressed, 20 as slightly depressed and one as not (currently) depressed. Forty-five patients were rated by the psychiatrist as currently being at episode nadir.
Agreement between severity measures
The three psychiatrist-rated measures showed moderate levels of agreement. Specifically, the four-point rating of clinically judged depression severity correlated–0.66 with GAF scores and + 0.47 with total HDRS scores, and with the last two intercorrelating–0.49 (all p < 0.001). The two self-report depression severity measures (BDI and DMI) were strongly associated (r = 0.70, p < 0.001) with each other, and also correlated modestly with Hamilton (r = 0.43, p < 0.01; and r = 0.49, p < 0.05) and CJ (0.44 and 0.39, both p < 0.05) severity judgements.
Recent Appearance of Depression Assessed by Relatives scores (mood and sign scales) were minimally associated with scores on the other measures. Specifically, RADAR mood scores correlated 0.23 and 0.22 (both p < 0.05) with DMI and BDI self-report scores, respectively, and were unassociated with clinician-rated HDRS (0.06), CJ (0.17), CORE (0.07) and GAF (–0.16) scores. The RADAR signs' score correlated 0.40 (p < 0.05) and 0.30 (p < 0.05) with DMI and BDI self-report scores, and 0.24 with HDRS, 0.28 with CJ, 0.23 with CORE and–0.23 with GAF scores (all p < 0.05). Thus, the RADAR scales could not be validated as an acceptable measure of severity or of psychomotor disturbance in depressed individuals when comparison was made against self-report and clinician-rated severity estimates.
Could RADAR scores differentiate depressive subtypes?
Table 1 reports mean RADAR item scores for those assigned as having a psychotic, melancholic or non-melancholic depression. We found very few scale score differences when non-melancholic and melancholic subjects were compared, and when a combined set of melancholic and psychotic depressed subjects were contrasted with the residual non-melancholic group. In fact, after applying a Bonferroni correction for 25 multiple comparisons (Bonferroni p = 0.002), all differences were no longer significant.
Mean scores for patients belonging to each depressive subtype, with tests of significance comparing the non-melancholic versus melancholic patients and non-melancholic versus melancholic and psychotic patients
Examining depressive sub-types, the mean total RADAR sign score failed to differentiate the psychotic and melancholic from the nonmelancholic subjects, whether comparison used clinical diagnoses (40.5 vs 38.2; t = 0.86; df = 95; p = 0.39) or DSM-IV diagnostic assignment (40.5 vs 38.3; t = 0.80; df = 94; p = 0.43). In addition, we undertook a discriminant function analysis to examine the utility of the total set of signs. Group membership (i.e. psychotic/melancholic vs non-melancholic) assignment was modest at 69.9% for the signs, with a sensitivity of 72% (31/43) and specificity of 68% (34/50). The mood state items were even less discriminating (as hypothesized), with a sensitivity of 49% (21/43), specificity of 63% (32/51) and overall classification rate of 56.4%.
By comparison, other measures did show differentiation. Thus, comparisons of those with psychotic and melancholic depression against those with non-melancholic depression (using MDU clinical diagnoses) established differential Hamilton (18.3 vs 14.7; t = 3.0; df = 95; p < 0.005), CORE (10.3 vs 3.0; t = 5.9; df = 95; p < 0.001), GAF (49.7 vs 59.0; t = 3.7; df = 95; p < 0.001) and CJ (2.4 vs 1.7; t = 5.1; df = 95; p < 0.001) scores and a slight trend for BDI differentiation (30.3 vs 26.6; t = 0.7; df = 24; p = 0.46), but no differences in DMI scores. Comparison of melancholic and non-melancholic subjects also established differential Hamilton (18.3 vs 14.7; t = 2.7; df = 85; p < 0.01), CORE (9.3 vs 3.0, t = 4.8; df = 44; p < 0.001), GAF (50.3 vs 59.0; t = 3.1; df = 85; p < 0.01) and CJ (2.3 vs 1.7; t = 4.2; df = 85; p < 0.001) scores. There were no significant group differences observed for BDI or DMI self-report scores.
Discussion
In an earlier study [6], we noted that there have been few published studies examining levels of agreement between clinicians and corroborative witnesses in rating clinical depressive features. There, we had similarly requested family members to rate their depressed relatives on a set of 22 ‘signs’ weighted to the assessment of psychomotor disturbance and ‘endogeneity’ symptoms, establishing poor levels of agreement in rating presence and severity of both symptoms and signs. In case this reflected overly complex item descriptions, we went to some considerable trouble in developing the RADAR to ensure simple and straightforward descriptors of each item. As in the previous study, however, we found no support for corroborative witness reports having utility, in that item and total scores were not distinctly associated with self-report and psychiatrist-rated measures of depression severity; and in that those with psychotic/melancholic depression were not distinguished from those with non-melancholic depression. As we established that individual self-report measures and psychiatrist-rated measures did show agreement–albeit with intraclass agreement superior to interclass agreement–this argued for each of those approaches returning valid information (about depression severity at least). Failure, however, of the corroborative witness data to correspond with either approach argues against use of such a reference group. In addition, we established that those with psychotic and melancholic depression did score significantly more severely on a number of clinician-generated measures (i.e. Hamilton, CORE, GAF, CJ), indicating that the RADAR did not fail merely because such groups did not differ by intrinsic severity.
Our findings are in line with other studies considered in our earlier report [6] and suggest that a lack of utility to corroborative witnesses reports is unlikely to emerge from methodological problems. We assume again that the observational reference points of corroborative witnesses differ distinctly from clinicians (who have a wide comparative base of clinical observations from which to choose), and that relatives presumably rate in response to a range of subjective influences, including individual concepts of severity, and of biases (perhaps seeking to ensure that their relative is perceived as having a ‘severe’ disorder). Thus, clinicians rate in relation to many other patients while relatives lack the advantages of comparative referencing in rating an individual. The latter interpretation could have been advanced even more firmly if we had had the RADAR measure returned by informed clinicians, but we judge that we have provided enough evidence to suggest that that step would be unnecessary.
Thus, while an approach such as the RADAR measure has theoretical appeal in giving a clinician a preassessment (or even review) of the presence of certain clinical depressive features and their overall severity, such a method appears to lack validity. It also raises questions about the level of clinical training that may be required before lay interviewers can be expected to generate valid information about the clinical features of depression. Importantly, results argue indirectly for the value of ‘clinical assessment’ and for the extensive training and experiences required to obtain such assessment skills. However, we must concede that informed relatives might be able to be trained to generate valid ratings on observational measures such as the RADAR. Thus, the RADAR may have the potential to be a functional ‘screen’ but be constrained by relying on untrained ‘operators’. Again, scores on the RADAR may relate to indirect depression parameters (e.g. family burden of care, service utilization) not considered here.
If true depressive ‘types’ do exist, and are capable of differentiation by pattern recognition of discriminating clinical features, identification of those subtypes clearly requires valid assessment. Relatives appear to lack the capacity to discriminate severity of depressed mood and behavioural components distinguishing the depressive disorders.
Footnotes
Acknowledgements
This study was supported NHMRC Program Grant 993208 and a NSW Department of Health Infrastructure grant. We thank Kerrie Eyers and Christine Boyd; Marie-Paule Austin, Jay Bains, Kathryn Lovric and Jagdeep Sachdev for conducting psychiatric assessment interviews; and Kay Roy, Therese Hilton and Penelope Irvine for conducting adjunct assessment interviews.
