Abstract
…clinical trials, owing to features of design and implementation, have important limitations for informing clinical practice [1].
…current clinical trial procedures do not seem able to pick up differences between treatments, either in terms of effectiveness or in terms of clinical distinctiveness. One must wonder whether the problem hasn't partly to do with the methods of measuring effectiveness [2].
It is commonly suggested that all antidepressant drugs show similar efficacy in controlled studies and are thus equally ‘effective’. For example, Janicak and colleagues [3] evaluated some 400 studies of standard antidepressants and a large set of ‘second generation antidepressants’ (e.g. selective serotonin re-uptake inhibitors [SSRIs], reversible monoamine oxidase inhibitors [MAOIs], monocyclic and tetracyclic drugs), and concluded that, for the latter, ‘efficacy has been comparable to that of their predecessors’. A 1999 review by the US-based Agency for Health Care Policy and Research (AHCPR) comes to a similar conclusion. That review [4] examined 206 studies comparing newer and older antidepressants, principally as treatments of major depression but also including a sizeable number of dysthymia studies. The reviewers concluded that: ‘Newer antidepressants are equally efficacious compared to first and second generation tricyclic antidepressants’, and that while the number of studies comparing differing classes of newer antidepressants was small, the ‘studies show no difference in overall efficacy’.
Such efficacy studies, particularly those designed to be presented to regulatory authorities such as the US Food and Drug Administration are constrained by a relatively standardised protocol. Patients are generally selected on the basis of having a major depressive episode but without significant comorbidity (particularly personality disorder, chronic depression, drug and alcohol abuse). A new drug may be compared against a placebo or against a comparator antidepressant. Assessment of depression severity usually occurs on a weekly basis, and most drug trials last only 4–8 weeks. Group data are generally plotted, and when these plots are compared across studies, the impression that most antidepressant drugs show a similar improvement pattern is difficult to reject. Again, when an approved antidepressant of one class is compared to one of another class, it is rare for any significant difference in improvement pattern to be demonstrated using such study designs.
Clinical practice often challenges the view that all antidepressant classes are equally ‘effective’, with Schatzberg [5] noting that ‘some physicians still treat severely depressed patients with the older tricyclic antidepressants because of conflicting reports about the efficacy of the newer agents’. Why the discordance? Differences certainly emerge from differing patient profiles and depressive subtypes for those who take part in drug trials and the ‘real world’ of clinical practice [1]. In the clinical environment, pristine episodes of major depression are rare, as patients are likely to have significant comorbid problems, whether secondary to anxiety, personality style, medical problems or substance misuse; and where more severe and more ‘biological’ depressive disorders are also more likely. In formal drug trials, patients with psychotic depression are almost invariably excluded, while the majority of subjects are outpatients, and therefore less likely to have melancholic depression.
Thus, controlled trials inform us about an antidepressant's efficacy in certain restricted circumstances, but we also require effectiveness studies that, by contrast, evaluate ‘effects of treatments on health outcomes under conditions approximating usual care’ [1].
The question as to whether some of the new antidepressants are as ‘effective’ as the older ones in treating melancholia is not new. The Quality Assurance Project (QAP) [6] undertook a metaanalysis of published studies. For patients with ‘endogenous depression’, irreversible MAOI therapy was less effective than placebo, although the analysis involved only three studies. This result could well have reflected the early MAOI trials [7,8] identifying phenelzine and isocarboxazid as less effective than placebo, and later judged to be a consequence of the trialled doses of the MAOI being too low [9].
For endogenous depression, the QAP analyses established that antidepressant therapies had more than double the placebo effect, with electroconvulsive therapy (ECT) slightly more effective than tricyclic antidepressants (TCAs), and the TCAs in turn slightly more effective than the (then) newer antidepressants such as mianserin. For ‘neurotic depression’, the placebo response was much greater, with the tricyclic antidepressants, MAOIs, and (then) newer antidepressants having some greater effect than placebo but no more than established for sedative and antipsychotic drugs.
Since then, and especially in the last 10 years, a series of new antidepressant drugs has been released. Despite certain distinct tolerability and safety advantages, whether they are as effective as the older antidepressants (i.e. TCAs and irreversible MAOIs) for melancholic depression remains unclear. The Danish University Antidepressant Group (DUAG) has conducted two double-blind studies comparing response rates of a TCA to an SSRI (clomipramine vs paroxetine and vs citalopram) in those hospitalised with ‘endogenous depression’. In both studies [10,11] the SSRIs had a response rate approximately half that of the TCAs. A similar finding was reported when moclobemide was compared with clomipramine [12]. Roose and colleagues [13] examined outcome in hospitalised unipolar depression patients, with 22 fluoxetine-treated compared to 42 nortriptyline-treated patients, and assigned subjects to melancholic and non-melancholic groups by DSM-III-R criteria. The response rate of the melancholic patients was 10% in the fluoxetine group and 83% in the nortriptyline group. While the trend is striking, sample members were quite elderly and not randomly assigned to the comparative treatments. Other studies (e.g. [14,16]), have not, however, identified differential response rates across DSM-defined melancholic and non-melancholic subtypes.
We have challenged recent DSM definitions of melancholia and argued [17] that their criteria measure severe depression more than a categorical melancholic subtype per se. Thus, the issue as to whether antidepressants have varying effectiveness for differing depressive subtypes is contingent on a valid approach to distinguishing melancholic and non-melancholic depression. Here, we therefore pursue the issue of differential antidepressant effectiveness in a sample where we firstly differentiate those with ‘melancholia’ by formal DSM-IV criteria, and, secondly, by an empirically refined set of clinical features.
Our approach is unusual in that we rely on patients' judgements, when formal studies rely on standard observer-based or self-report questionnaires. We suggest, however, that patient-generated data may contribute an important perspective in ‘effectiveness’ studies, particularly as such information approximates the clinician's approach. Thus, in assessing a new patient, most clinicians would regard it as informative to know what treatments had been tried for the present and any past episodes, and the extent to which the patient judged such interventions as effective, as such judgements may well determine the management of the current episode. In addition, such an approach is implicit to many published clinical guidelines and algorithms. For example, the American Psychiatric Association's [18] General Clinical Guidelines for ‘Use of Antidepressants’ states that selection, in part, should be on ‘the basis of previous response or family history of a response to a particular antidepressant’. The recently published Texas Medication Algorithm Project [19] states, in an algorithm, that a ‘patient's previous response to antidepressant treatments should always be considered…. If a patient responded well and tolerated a specific pharmacotherapy or other treatment intervention during a previous episode of depression, the same treatment is recommended again’. Despite such a logic, the strategy of undertaking naturalistic studies of antidepressant use in clinical practice to assess outcome is rarely adopted, with the few exceptions being to test economic outcomes predicted by decision-analytic models [20].
Method
A clinical panel of 27 Australian and New Zealand psychiatrists was recruited (the Australasian Data Base or ADB). These psychiatrists collected data on patients who presented to their clinical practice with a depressive disorder (either first episode or first presentation of a new episode). Recruitment was terminated when we judged that we had an adequate sample size (of 385, and with incomplete data for 16, generating a final sample of 369 subjects). While psychiatrists were encouraged to accrue patients in a consistent way (be it consecutively based, on one particular day of the week, etc.), we did not establish implementation strategies. No exclusion criteria were provided to the participants. The ADB study design has been described elsewhere [22] but essentially involved patients and their assessing clinicians completing standardised questionnaires and interviews with precoded rating options. Clinicians were not required to give a clinical diagnosis but the data set included all DSM-IV features to allow depressive diagnoses to be derived.
To define melancholia empirically from clinical features, a set of ‘endogeneity symptoms’ (refined in several previous studies) were examined, together with observed psychomotor disturbance as rated by the CORE measure [17]. After excluding 28 subjects with psychotic depression, a cluster analysis of the clinical feature set identified 98 in putative melancholic (‘MEL’) and 243 in putative non-melancholic (‘Non-MEL’) clusters. Cluster members were best distinguished [22] on the basis of CORE scores (mean scores of 16.4 for the MELs and 4.1 for the Non-MELs). In addition, the MELs were more likely to report a non-reactive mood, loss of interest and diurnal variation.
The assessing psychiatrist was required to check which of a list of specific treatments had been received by the patients (i.e. antidepressant drugs, ECT, antipsychotics, mood stabilisers) for previous depressive episodes, and to record the patient's estimate of the effectiveness of any such treatments with ratings being: totally effective, 3; moderately effective, 2; somewhat effective, 1; and not at all, 0. Thus, we did not include this component in the patient's self-report questionnaire, preferring for the psychiatrist to obtain such information by clinical questioning (albeit standardised across defined treatments). The coding options allowed dimensional and categorical analyses, addressed, respectively, by t-tests and Chi-squared analyses.
Results
As described elsewhere [22], the average sample member was moderately depressed (with a mean Hamilton score of 21.0). All met DSM-IV symptom criteria for major depression, and we established that, apart from missing data for three, all bar one had been depressed for a minimum of two weeks. After exclusion of the 28 subjects with psychotic depression, the current sample of 341 comprised 216 (63%) females and 125 males, with a mean age of 44 (SD = 16) years. The average duration of their current depressive episode was 79 (SD = 155) weeks, the mean age at having their first clinically significant depression was 31 (SD = 15) years, for first receiving treatment was 32 (SD = 14) years, and the mean number of previous lifetime episodes was 15 (SD = 25).
Two-thirds of the subjects were assigned by both systems to either the designated melancholic or the non-melancholic categories. The level of disagreement (i.e. 28%) in assignment between the systems argued for examining those separately assigned by the cluster and by the DSM-IV subtyping allocations, as total or near total agreement would have made examination of both systems unnecessary.
Table 1 provides data on judged effectiveness of all treatments by recipients of those treatments across the whole sample. Electroconvulsive therapy receives the highest ratings, followed by antipsychotic medication, followed rather equally by irreversible MAOIs and tricyclic drugs (the old antidepressants), followed by the SSRI and SNRI classes, and last by moclobemide and mianserin. Table 1 also compares the judged relative effectiveness of differing treatments across the two definitions of melancholic and non-melancholic depression. First, we consider data across the rows. Those with melancholia were twice as likely as assigned non-melancholic subjects to receive ECT. Bilateral and unilateral ECT were judged as the most effective antidepressant interventions by those assigned to either the melancholic or non-melancholic categories. While those assigned to the melancholic groups were somewhat more likely to have received antipsychotic medication, the non-melancholic subjects rated it as more effective than the melancholic subjects. Across the melancholic and non-melancholic groups, the TCAs and irreversible MAOIs rated as moderately to highly effective (in comparison to other treatments). There were two significant differences between defined melancholic and non-melancholic groups. First, mianserin was superior for cluster-differentiated ‘melancholic’ than for ‘non-melancholic’ depression. Second, the SSRIs were more effective for DSM-differentiated ‘non-melancholic’ than ‘melancholic’ depression.
Rated effectiveness of previous treatments, for subjects assigned as having melancholic and non-melancholic depression by two systems
Our principal objective requires comparing column ‘mean effectiveness’ scores, here assessing effectiveness dimensionally. We focus on the TCAs (as representing a major ‘older antidepressant class’) and the SSRIs (as representing a major ‘newer antidepressant class’). For cluster-defined ‘melancholia’, the TCAs were judged as more effective than the SSRIs (i.e. 1.25 vs 0.70 or 80% more effective), similar to results in relation to DSM-defined ‘melancholia’ (i.e. 1.10 vs 0.66 or 67% more effective). By contrast, for ‘non-melancholic’ depression, the TCAs and the SSRIs returned strikingly comparable ratings (i.e. 0.98 vs 0.94, 1.03 vs 1.01) for those assigned according to each method.
Table 2 examines rates of patient-judged treatments as moderately or totally effective (i.e. now rated categorically rather than dimensionally). In the whole sample, the ranking of treatments is identical to the ranking derived in the dimensional analyses, with bilateral and unilateral ECT ranked highest, and moclobemide and mianserin ranked lowest. When effectiveness was examined across the melancholic and non-melancholic classes, ECT ranks highly for both depressive subtypes. Antipsychotic medication was rated as more effective by those with non-melancholic depression (significant in the case of DSM-IV assignment). For those assigned by the cluster analysis, the TCAs and mianserin were significantly more effective for ‘melancholia’, but this was not confirmed in relation to DSM-IV assignment, where the SSRIs were significantly more highly rated for those assigned as having non-melancholic depression by DSM-IV criteria. Comparison of the TCAs and SSRIs down the columns again suggests their greater effectiveness for melancholic rather than non-melancholic depression (i.e. 47% vs 23% for cluster assignment; 38% vs 22% for DSM-IV assignment). However, for non-melancholic depression, effectiveness rates are again comparable for the TCAs and SSRIs (32% vs 33% for cluster; 36% vs 36% for DSM-IV assignment).
Percentage of sample rating a previous treatment as moderately or totally effective, contrasting those with melancholic and non-melancholic depression
Discussion
First, methodological issues: while sample contribution was from many psychiatrists in Australia and New Zealand, there was a weighting to those attending tertiary facilities and academic units, patients had had many episodes of depression, and the average episode was both severe and persistent. Such weightings would clearly influence estimates of any overall ‘effectiveness’ of any antidepressant treatment. To the extent that subjects were any more likely to have treatment-resistant disorders, whether due to any incapacity to respond to an antidepressant or due to an antidepressant being inappropriate, such factors would contribute to a disproportionately higher rate of ineffectiveness in first-line antidepressant drugs. Clinical panel data, such as these, also have a number of shortcomings. Here, remembered effectiveness is assessed in relation to previous episodes, we have no data on dosages of the prescribed antidepressants, how long such medications were taken or the patient's compliance, while self-report data are open to a range of response biases. Nevertheless, and as noted in the introduction, such data are used by most clinicians. While not replacing formalised drug trials or other controlled study designs, such clinical panel data may complement or help shape studies addressing such questions as posed here.
Second, the numbers receiving some treatments were few, risking skewed data and erroneous interpretations (and we therefore focus on data for the TCAs and SSRIs where numbers were substantive) and risk non-replication. Rosenbaum and Hylan [21] have recently noted some of the problems in generating causal inferences from analyses of retrospective data. They argue that results from any retrospective data analysis require augmenting, which can be accomplished in a number of ways, including replication and prospective study designs. We note then that, despite acknowledged limitations of our current study, we have recently completed a 1-year prospective naturalistic study of another sample, with preliminary analyses indicating virtually identical results to those presented here, the TCAs and MAOIs being more likely to be rated as effective by those with melancholic depression, and those drugs and the SSRIs being judged as of comparable effectiveness for non-melancholic depression.
Third, we note the high effectiveness ratings returned for ECT and antipsychotic medication, surprising for several reasons. It is a rare patient who does not express antipathy to ECT, while a significant percentage report distressing side-effects from both ECT and antipsychotic drugs. Here, sample members nevertheless rated ECT (and antipsychotic medication) as highly effective in comparative terms, suggesting their judgements of effectiveness overrode concerns about side-effects or stigma associated with such treatments. Again, ECT was rated as of comparable effectiveness by those with melancholic and non-melancholic depression, a finding consistent with both a review by Abrams [23] and an APA Task Force [24]. Antipsychotic medication is rarely recommended for the treatment of depression (other than for psychotic or delusional depression) and its judged effectiveness (here more distinct for non-melancholic than melancholic depression) is therefore surprising. It may be that low-dose antipsychotic medication was prescribed for a previous episode of psychotic depression, for anxiolytic purposes or for those with certain personality disorders, but we sought no data concerning clinical reasoning.
Fourth, the importance of valid delineation of depressive subtypes. In examining for any differential effectiveness for the TCAs and SSRIs, there were theoretical advantages to using more than one definition of ‘melancholia’, particularly if the DSM definition more measures ‘severity’ of depression than it distinguishes the melancholic subtype. Consistency in findings across subtyping definitional systems would support interpretation of any differential effect. We examined those assigned by DSM-IV criteria and by an empirically derived solution focusing on clinical features.
Fifth, the comparative analyses, both assessing the judged effectiveness dimensionally as well as categorically, returned consistent findings when TCAs and SSRIs (as representative ‘older’ and ‘newer’ antidepressant classes) were compared. For those assigned to ‘melancholic’ classes, the TCAs were judged as more effective than the SSRIs, while for those assigned to ‘non-melancholic’ classes, the two drug types were judged to be of strikingly similar effectiveness. Such clinical panel data thus add to several formalised studies suggesting that the SSRIs (and other newer antidepressants) may not be as effective as the TCAs (and irreversible MAOIs) for melancholic depression. Such a general conclusion is based here on comparisons of overall drug classes. Of course, the tricyclic class contains individual drugs, which have quite varying relative effects on serotonin and noradrenaline neurotransmission, while the SSRIs also vary in their individual effects on neurotransmitters (including dopaminergic systems). This leaves open the possibility that some SSRI drug class members may be more effective (and others less effective) for ‘melancholia’ but any such possibility remains to be formally established. If the SSRIs are comparatively less effective, how does that inform us about the nature of ‘melancholia’ and what TCA ingredient, missing in the SSRIs and other newer antidepressants, might have been ‘lost’? These and other questions should be pursued.
We speculate that a number of the newer antidepressants have a reasonable claim as first-line options for the management of non-melancholic depression. The suggestion that such drugs should be the first-line treatment for managing treatment per se is clearly under challenge here. We join with Boyce and Judd [25], who stated that ‘It is premature to write off the tricyclics or relegate them to second-line treatment for depression. We argue that they have a place as a first-line treatment for severe (melancholic) depression where they seem to have a therapeutic advantage over the newer agents’, although we would add the MAOIs to the tricyclic list. The controversial recent article by Beerworth and Tiller [26], which argued that a practitioner could be liable for a negligence claim for using a TCA or MAOI was predicated, in part, on the argument that the newer and older antidepressants are equally efficacious. For melancholia this may not hold (at least in regard to their clinical effectiveness), and it would be of interest to obtain a legal view about any failure to trial a TCA or MAOI for a patient with melancholic depression who was not responding to a newer antidepressant.
Studies of this nature need to be repeated in other settings, including out-patient psychiatry and general practice in particular, because of the possibility that effectiveness gradients may be quite sensitive to varying levels of depression severity and varying prevalences of depressive subtypes across settings. If gradients and depressive subtype differences are confirmed, not only can the suggested myth that all antidepressant drugs are equally effective be dispelled but also treatment decisions may become more rational, and avenues for development of further new antidepressant drugs facilitated.
Footnotes
Acknowledgements
Our sincere thanks to all other psychiatrists contributing to the ADB study (Marie-Paule Austin, Gary Barnes, Philip Boyce, Simon Byrne, Robert Eidus, Scott Henderson, Ian Hickie, Alison Hickey, Bernard Hughson, Adrien Keller, Cathy Mason, Mark Montebello, Russell Pargiter, Robert Parker, Elizabeth Scott, Cathy Stringer and Kipling Walker) and to Kerrie Eyers, Heather Brotchie, Christine Taylor and Yvonne Foy for study assistance. Funding support was provided by the NHMRC (Program Grant 993208) and three pharmaceutical companies (Pfizer, Wyeth and Eli Lilly).
