Abstract
Keywords
Depression outcome studies can assist clinicians and researchers in several ways, including identification of prognostic factors, assessment of the impact of intervening variables and clarification of effective treatments. Such studies, however, frequently involve samples of patients with heterogeneous expressions of depression and who receive varying treatments, with varying levels of compliance and for varying periods of time, so that outcome is likely to reflect an admixture of natural and treatment-related factors. Such studies tend to either produce a potpourri of multiple predictors (with few showing consistency in replication studies) or identify few predictors. Two contrasting examples are illustrative. Keitner
Less fine-focused studies tend to produce more consistent results. For example, a Sydney study of 145 patients [3] and a London study of 89 patients with depression [4] produced rather similar findings over their lengthy follow-up periods of some 15 years. Noting the fable of the ‘tortoise and hare’, Lee and Murray [4] observed that their subjects with psychotic depression did very well in the short-term, but were more likely to be severely and chronically disabled over time, while those with ‘neurotic depression’ tended to have a more chronic but less severe course.
A key limitation to addressing outcome is its very definition. Outcome is commonly assessed by comparing status at two assessment periods, whether measured by improvement in raw or percentage scores on a dimensional measure, or by indices of ‘caseness’ or of improvement categories. Interval trajectory measures may also be used. Thus, Prien
We now report a 1-year follow-up study, examining a range of possible determinants of progress and their variable influence on differing markers of outcome. By using multiple measures, we can examine their comparative utility (in terms of identifying predictors) as well as examine for consistency of predictors across measures, with any such consistency arguing for the utility of certain predictors.
Some potential advantages and disadvantages of the contrasting approaches will first be noted. In recent years, antidepressant studies have tended to define ‘recovery’ as representing a 50% improvement in a dimensional depression measure, whether judged by self-report or by an observer. One obvious problem is that the interpretation of a 50% improvement may vary considerably across those who are severely and those who are minimally depressed (e.g. those in the former group may still remain ‘clinically depressed’ while, for those who are minimally depressed, such improvement may be trivial and of no clinical importance). Again, phenomena such as ‘regression to the mean’ may dispose the more severely depressed to improve the most and thus satisfy the outcome criterion, a concern only partially addressed by controlling for baseline severity. In comparison to such a quantitative approach, qualitative assessments can be undertaken, such as determining at baseline and at appropriate review periods whether an individual meets depression ‘case’ criteria or not, but this can be somewhat problematic as there will always be those at or near the criterion cutoff. Clinical global improvement (CGI) measures are flexible in allowing both quantitative (e.g. no change, somewhat improved, distinctly improved) and qualitative assessment (e.g. remission achieved or not achieved), as well as allowing patient-volunteered and observer-rated information to be incorporated, however, these measures are dependent on having valid ratings made by the same rater. Most importantly, while all three approaches are common to studies determining efficacy of acute treatments, none appear intuitively sensible in determining medium and long-term outcome. Neither do they appear sensible in situations where individuals may move in and out of depressive states over the interval, with the result that their status on the day of review may or may not correspond with general progress from baseline to follow up.
If prediction of immediate outcome is not the key objective (although it may be the focus in efficacy studies), then indicators are required that measure both the course of the initial episode (e.g. partial or full remission) and subsequent progress over the follow-up interval (e.g. whether there has been any relapse or recurrence). Such methods do, however, require precise operational definitions to avoid rating variation and to ensure a ‘shared language’ for raters and any published results.
Methods
Baseline assessment
We studied a consecutive series of inpatients and outpatients of our tertiary referral Mood Disorders Unit (MDU). To redress the unit's weighting to those with more chronic and treatment resistant depressive disorders, we also recruited patients assessed by our consultant psychiatrists in non-MDU facilities, as previously described in this journal [7]. All subjects were required to meet DSM-III-R criteria for a major depressive episode present for less than 2 years, and for the depression to be essentially primary (i.e. not secondary to conditions such as schizophrenia or alcoholism). Patients completed a self-report questionnaire including sociodemographic questions, a set of state anxiety questions, and Beck Depression Inventory (BDI) [8] a state depression measure. Patients were required to rate the severity of life event ‘stress’ experienced during the 12 months prior to depression onset, using a 6-point scale (0, no stress at all; 1, mild; 2, moderate; 3, severe; 4, extreme; 5, catastrophic).
A detailed baseline assessment was undertaken by a psychiatrist and a research psychologist as detailed previously [9]. Details regarding lifetime depression and anxiety disorders were obtained, with the latter being confirmed with diagnoses generated by the composite international diagnostic interview (CIDI-A), Version 1.2 [10]. Several clinician-rated measures were undertaken; including the 21-item Hamilton depression scale [11] and the CORE measure of psychomotor disturbance [12]. The psychiatrist made a set of diagnoses, including DSM-IV assignment of melancholia or non-melancholia, and an MDU ‘clinical diagnosis’, with rules for the latter allocation of subjects to psychotic (PD), endogenous (ED), neurotic (ND) or reactive (RD) depressive groups being detailed elsewhere [12]. The psychiatrist rated the extent to which the patient judged their level of trait anxiety on dimensions of: ‘nervy’, ‘a worrier’, ‘tense’, ‘anxious’ and ‘keyed up and on edge’ using 3-point scales.
To assess aspects of personality disorder, the psychiatrist rated the patient's degree of functioning (i.e. ‘functional’, ‘probably dysfunctional’ or ‘definitely dysfunctional’) across five interpersonal domains (e.g. intimate relationships) and eight parameters of expression (e.g. ‘inflexibility’) as previously described [13], allowing total domain and total parameter scores to be generated. The psychiatrist was additionally required to rate the degree (on a scale of 0–5) to which the patient's long-term personality style matched each of 15 personality disorder class descriptors. Patients were later allocated a score on three personality factors derived from a factor analysis of the 15 class descriptors (see [14] for details) and corresponding to the three-cluster model of recent DSM editions. These personality factors were labelled ‘eccentric’ (cluster A), ‘dramatic’ (cluster B) and ‘sensitive’ (cluster C). The psychiatrist asked the patient a series of questions to assess ‘acting in’ and ‘acting out’ behaviours which might be engaged in when under stress, allowing total ‘act in’ and ‘act out’ scores generated.
The interviewing psychiatrist also used the structured DSM-III-R ‘severity of stressors’ measure for rating acute and enduring life events (i.e. none, mild, moderate, severe, extreme, catastrophic, coded 1–6). Subsequently, a consensus rating of life events meeting took place involving all interviewing MDU psychiatrists, during which consensus scores for ‘acute’, ‘post-onset’ and ‘chronic’ life events were derived (using the same rating options as noted for the patient's self-report form).
Follow-up assessment
All those interviewed at baseline were invited to attend for a 12-month review, where the research assistant obtained details on depressive episodes and treatment over that year, and assessed intervening life event stressors. The patients again completed several depression measures at that review, including the BDI, while the psychiatrist determined whether they met DSM-IV criteria for major depression at that review.
Our four ‘outcome’ measures were: (i) a 50 per cent reduction in BDI scores from baseline to follow-up, (ii) no longer meeting DSM-IV (or DSM-III-R) criteria for major depression, (iii) clinical global improvement (CGI) scores, and (iv) change point indicators. The last were derived from definitions proposed by Frank
The five-point CGI effectively considered both interval data and current status, with the options being: (i) patient more depressed/less functional, (ii) no change in depression/level of functioning, (iii) mild improvement in depression/functioning, (iv) moderate improvement in depression/functioning, and (v) marked improvement or complete remission.
Statistical analyses
Comparison of groups was made by use of Chi-squared and t-test statistics. As a large number of comparisons were undertaken, all tests were subject to Bonferroni correction of the family-wise error rate.
Results
Of the 270 assessed at baseline (64% female, mean age 43.3 years), 182 (67%) agreed to the 12-month follow-up. Their mean age was 44.4 years (SD = 14.9), and 113 (62%) were female. The mean baseline Hamilton and BDI depression scores were 22.5 (SD = 7.6) and 28.6 (SD = 11.7) respectively. Over the 12-month period, 162 (89%) had received anti-depressant medication and 132 (73%) some form of psychotherapy, while 52% had been hospitalised for their depression. Comparison of respondents and non-respondents on all baseline variables revealed few significant differences (after Bonferroni correction). Non-respondents were less likely to be in a stable relationship, and scored higher on the total parameter measure, indicating greater personality disturbance.
The BDI measure was completed by 128 respondents on the two occasions. Only 162 patients completed a BDI at baseline, the remainder did not for a variety of reasons, including their depression severity prohibiting completion or seemingly valid completion. The lack of a baseline BDI disallowed use of the follow-up BDI to determine percentage improvement. However, when we compared the 54 ‘non-BDI responders’ with the 128 for whom we had BDI score on both occasions, there was only one difference across baseline measures, with the BDI completers having higher socioeconomic scores (t = 3.28, p < 0.01). Analysis of the 128 BDI completers, assigned 55 (43%) as ‘recovered’ on the basis of a 50% improvement in BDI scores at 12 months.
In relation to the change point parameters, 103 (57%) met criteria for a ‘recovery’, reached at an average interval of 21 weeks after baseline. Ten patients (5%) met criteria for a full remission, on average 34 weeks after baseline assessment, while two subjects met both criteria sets. These two groups were amalgamated to generate the second ‘recovered’ group of 111 subjects for comparison against remaining subjects. In parenthesis, we note that 42 patients (23%) ‘relapsed’ back into a major depressive episode (MDE) from a partial or full remission; and 18 (10%) experienced a ‘recurrence’ of depression (new MDE) after a period of recovery.
At the 12-month review, 70% did not meet criteria for major depression, our third measure of ‘recovered’ status. Finally, CGI scores, which effectively assessed longitudinal course and current status, were available from all bar six subjects. Three patients (2%) were judged as worse, 17 (9%) as unchanged, 20 (11%) as showing mild improvement, 35 (19%) as showing moderate improvement and 101 (56%) as markedly improved or having completely remitted, and this last group form our fourth ‘recovered’ group for contrast with remaining subjects.
While varying percentages of subjects were allocated to the four ‘recovered’ groups, overlaps were distinct. Thus, of those rated as ‘recovered’ by the Beck measure, 93% were also allocated on the MDE criterion, 87% by the change point criteria and 83% by the CGI criteria. Of those assigned as ‘recovered’ by change point criteria, 87% were also assigned by MDE and 83% by CGI criteria. Of those assigned by MDE criteria, 77% were similarly assigned by CGI criteria. Agreement in allocation to ‘non-recovered’ groups was slightly less impressive, so that the rates for overall agreement for the respective analyses were 73%, 75%, 77%, 72%, 84% and 82%. Thus, the highest agreement linked CGI status with change point and MDE status, respectively.
Mean follow-up BDI scores for variably assigned recovered and non-recovered subjects (respectively) by the four systems were: 8.7 versus 25.9 (CGI), 10.7 versus 28.8 (MDE), 10.3 versus 25.9 (change point) and 5.6 versus 25.9 (BDI), all differences being significant at the p < 0.001 level.
Table 1 considers the capacity of a substantive list of baseline predictors to discriminate the variably defined ‘recovered’ groups, and we examine for consistency in trends across measures as well as for formal differences. Those in the non-recovered group consistently returned the higher baseline BDI scores (significant in relation to MDE, change point and CGI parameters). Assignment of ‘recovered’ status was consistently associated with higher baseline CORE (significant in relation to BDI and CGI outcome assessment). A diagnosis of psychotic depression (PD) was also associated with a high chance of assignment to the recovered group, but low numbers of PD subjects disallowed significance being demonstrated. Conversely, there was a consistent trend for those with a diagnosis of neurotic depression (ND) to be in the non-recovered group (significant in relation to CGI assignment only). CGI assignment also suggested a better outcome for those with a clinical diagnosis of endogenous depression (ED) or a DSM-IV diagnosis of melancholia and a poorer outcome for those with ‘non-melancholic’ depression.
Set of baseline predictor variables for patients experiencing recovery or non-recovery during or at 12-month review according to four different measures
The most consistent trends across the varying definitions were for those who judged themselves at baseline as generally ‘keyed up and on edge’ (significant for MDE and CGI allocations) and ‘nervy’ to be assigned to the non-recovered groups, and there was a consistent trend (significant for CGI and change point allocations) for those with a higher baseline anxiety score to be in the non-recovered group. Examining formal lifetime anxiety disorders, those with a diagnosis of panic disorder were more likely to be in the non-recovered change point group (significant only in relation to change point allocations), while those meeting criteria for obsessive–compulsive disorder and social phobia tended to be less likely to be recovered (no differences being significant).
A clear and relatively consistent finding was for those with higher total domain and parameter scores to be in the non-recovered group, indicating that the presence of personality disorder was distinctly associated with a worse outcome. A cluster A or ‘eccentric’ personality style was non-significantly but consistently associated with a greater chance of being allocated to the non-recovered groups, as was a cluster C or ‘sensitive’ personality style (significant in relation to CGI allocation). ‘Acute’ and ‘enduring’ life event stress (as rated at baseline) were relatively consistent predictors of a poor outcome, with the latter variable significant on all bar change point allocations.
Finally, an assessor of this paper requested contrasting those who at follow up scored below 15 (or ‘the upper bound of the normal BDI range’) on the BDI. We compared those 59 subjects with the remaining available 69 subjects, again applying Bonferroni corrections to the analyses. Those who so met this recovery criterion were, at baseline, more likely than those who did not to have had lower BDI (i.e. 25.1
Discussion
For our follow-up sample, 61% had met change point definition of a full remission and/or recovery at some stage during the 12-month review, but the respective intervals before resolution (i.e. 34 and 21 weeks) indicate a slow overall process. The other principal measures produced varying estimates of recovery (i.e. 43% by Beck percentage improvement, 55% on the CGI, and 70% in regard to not meeting MDE criteria), and demonstrate the predictable impact of differing strategies on measuring outcome. We nevertheless established general agreement in estimating ‘recovery’ across the several measures, and with the highest agreements involving the CGI (examined against change point and MDE status). The latter association is open to criticism of rater bias, in that the psychiatrist derived both the CGI and MDE status data. However, as the research assistants derived change point data, the agreement between those judgements and the psychiatrist-rated CGI data is an important finding in supporting the change point approach. The BDI percentage measure would seem the most problematic as it is clearly influenced by initial depression severity and returned a lower recovery estimate than all other strategies. Examining those who returned a low BDI score (of 15 or less) did identify a significant number of predictors, and was clearly more ‘successful’ than using percentage reduction in Beck scores. However, as occurred in this study, not all severely depressed patients are able to complete such a self-report measure successfully, either not wishing to complete it or doing so in ways that suggest valid responses to be unlikely.
The other three measures (i.e. change point, MDE and CGI), by identifying a relatively narrow band of marked improvement or recovery in 55–70%, suggest a likely recovery rate of two-thirds of our sample. While superior to the two estimates noted in the introduction, sample and treatment nuances will clearly influence outcome estimates, so that the level of improvement in this sample is not of any distinct intrinsic importance.
Examination of baseline predictors is of importance in and of itself, but can clarify the comparative properties of each outcome measure. Those with a high CORE score and those with diagnoses of psychotic and of endogenous depression were consistently more likely to be rated as ‘recovered’, while there was a weak trend across all bar change point definition for those with DSM-IV-defined melancholia to rate as recovered. High CORE scores, indicative of severe psychomotor disturbance, are distinctly more likely in those with psychotic and melancholic depression [12]. Thus, the CORE score may well have acted as a proxy for those diagnostic subtypes. It may be that those with psychotic and melancholic depression have a superior outcome due to the suggested greater specificity of response to physical treatments for such subtypes (see [12]), or it could be that those with non-melancholic disorders do less well due to their greater chance of comorbid anxiety and personality conditions.
Non-recovery was also relatively consistently associated with several broad baseline variables, viz. higher state anxiety and depression scores, trait anxiety constructs (i.e. being generally ‘keyed up and on edge’ and ‘nervy’), a lifetime anxiety disorder, higher ratings on our measures of disordered personality functioning, cluster A and C personality disorder styles, and reporting (at baseline) acute and enduring life event stressors. The identification of state and trait anxiety expressions is compatible with the long-standing identification of neuroticism as a predictor of poor outcome [6,16].
In other analyses of this sample [9], we have established the impact of anxiety and disordered personality style as vulnerability factors to onset of depression. Such factors may then not only predispose to onset (and have therefore influenced relapse in some sample members) but may also act as depression-propagating factors.
Does our study assist decisions as to how medium-term outcome should be assessed? As noted earlier, it appears theoretically more logical to use categorical change points, as they allow the longitudinal course to be charted with some precision and because the alternative approach of relying on a single cross-sectional point in time as a ‘defining’ moment is theoretically and clinically problematic. Thus, an individual may be depressed over most of a follow-up interval and be euthymic on the day of the follow-up, or the converse. This limits approaches such as no longer meeting MDE criteria, percentage BDI improvement and even a low BDI score (despite the suggested utility of the last here) and other ‘state’ measures.
Change point allocations generated by the research psychologists were validated by reference to the other measures, and importantly against measures rated by the psychiatrists. The change point approach respects an operationalised lexicon [15] and thus assists reliability and validity of assessment in individual studies, as well as interpretation of published studies. If, however, we compare our principal measures on their capacity to identify significant baseline predictors of outcome, the CGI was the superior measure, identifying significant differences on 13 variables (compared with 4–5 variables for the other measures). CGI assessment, like change point definition, has the advantage of respecting longitudinal course, but also builds in overall judgement by the assessing clinician, who may judge issues of improvement and change across a range of parameters. Thus, in comparison with the change point approach, subjective CGI judgements may or may not be reliable and valid. Combination of those two approaches would appear to address the latter concern, while also allowing consistency across study findings to be examined.
Predicting medium-term outcome is obscured by the heterogeneity imposed by differing depressive subtypes, severity of depression, antecedent and ongoing stressors, and by the presence of comorbid anxiety and personality disorders; particularly so, as this study indicates their salience as influences on outcome. We suggest that this study identifies many contributors to outcome which appear to correspond to accepted risk factors of depression onset. More importantly, results indicate the comparative utility of alternative measures of outcome, and allow the recommendation of including both CGI and change point components, while considering advantages and disadvantages to a range of approaches.
Acknowledgements
This study was supported by the NHMRC (Program Grant 953208) and a NSW Department of Health Infrastructure Program Grant. We also thank Dusan Hadzi-Pavlovic, Kerrie Eyers, Christine Taylor, Yvonne Foy and Heather Brotchie for study assistance.
