Abstract
Depression is a leading cause of disability, and a recent estimate from the Global Burden of Disease study indicates that unipolar depressive disorders account for 4.4% of the global disease burden [1]. Antidepressants are effective in the acute treatment of depressive disorders [2,3]. Continuation therapy with antidepressants is recommended for depressed patients who respond to initial treatment. Treatment guidelines recommend this should be for an additional 4–9 months after initial response [4,5].
In contrast to the large number of drug trials in acute depression [6], there are relatively few studies assessing antidepressant efficacy in preventing relapse of depression during continuation treatment. In this type of study, patients who have responded to an initial course of therapy are then randomized to remain on blinded active drug or to be switched to blinded placebo, and rates of relapse subsequently compared across treatment arms (Figure 1). Two recent meta-analyses of this area reported that the continuation treatment reduced relapse by approximately 70% compared with placebo [5,7]. Neither analysis was able to determine an optimal duration of therapy, nor show preferential effects for any class of antidepressant.

Basic design of relapse prevention studies. Only patients responding to initial open label antidepressant (AD) are eligible for randomization, either to remain on double blind (DB) antidepressant, or to switch to double blind placebo.
The objective of this meta-analysis was to evaluate the effect of continuation treatment on the risk of relapse in depressive disorders. This analysis was based on a larger cohort of studies (n = 54) and patients (n = 9268), which allowed analysis of the effects of patient age and diagnostic system on relapse prevention, and re-analysis of optimal duration of continuation therapy. We also assessed the comparability of results calculated using different statistical methods.
Methods
Search strategy
A search was performed to identify all placebo-controlled, double blind antidepressant randomized-withdrawal studies (published or unpublished) that assessed relapse in patients that had previously responded to initial drug therapy. Studies were identified and obtained between 5 June 2008 and 15 August 2008 using electronic databases including Embase (1974 to present); MEDLINE (1950 to present, clinical trial results website http://www.clinicalstudyresults.org, and pharmaceutical industry trials registers. In addition, reference lists of identified articles were examined. The following Boolean phrases were used when searching electronic databases: relapse + prevention + (depression diagnosis) and discontinuation + (depression diagnosis) and continuation + (depression diagnosis). The depression diagnoses were major depressive disorder, dysthymia, bipolar II, bipolar not otherwise specified and/or double depression. When full-text information was not available, attempts were made to contact primary authors. Full manuscripts of studies of interest were screened according to inclusion criteria and selected based upon the consensus of both reviewers (P.G. and M.D.).
Inclusion criteria
Two reviewers (P.G. and M.D.) independently performed the literature search. Relapse-prevention studies with a randomized, double-blind continuation treatment phase that included a placebo arm were included. Patients entering the continuation phase of each study were responders to acute treatment and met response criteria prior to randomization. Only studies using approved pharmacological treatments for depression at labelled doses were included. Crossover designs and multiple presentations of identical data sets were excluded.
Data extraction
Data extraction and quality assessments were independently performed by each reviewer according to a protocol. All corrections were agreed upon by both parties. The primary variables of interest were the number of patients randomized to each group (active and placebo) and the number of patients that relapsed in each group during the continuation phase of the trial. Definitions of initial diagnostic criteria, baseline illness severity, criteria for response pre-randomization, and relapse criteria post-randomization were documented for each study. Critical assessments of several methodological domains were made to assess the risk of bias in the included studies based on the Cochrane Collaboration's guidelines (http://www.cochrane.org/reviews/revstruc.htm.
Data analysis
The primary measure for comparing relapse with active treatment versus placebo was the odds ratio (OR). The odds of an event occurring is defined as the probability that the event will occur divided by the probability that the event will not occur. Thus the odds ratio comparing relapse for active treatment versus placebo is defined as the ratio of the odds of relapse from the two treatments (an OR of 1 means that the odds of relapse are identical regardless of treatment). In the analysis to assess relapse, studies were categorized into the following sub-groups: (i) pharmacological antidepressant class; (ii) elderly and non-elderly adult patients; (iii) diagnostic classification/depressive subtype, and (iv) duration of treatment pre- and post-randomization; and the OR (and 95%CI) were estimated within each sub-group. Review Manager (version 5.0) was used to estimate Mantel-Haenzel ORs and 95% confidence intervals (CIs) using a fixed-effects method. The chi-squared test for identifying statistical heterogeneity between studies was also performed (a small p value for this test indicates evidence of heterogeneity).
The relationship between year of publication and OR was also examined using Spearman's rank order correlation coefficient.
Furthermore, a secondary comparison was performed to assess the effect of using different statistical methods when estimating the OR (and 95%CI) for relapse (comparing treatment versus placebo), across all studies. These included fixed-effects (Mantel-Haenszel, Peto, Logistic Regression) and random-effects (Peto and DerSimonian-Laird) methods for estimating the OR from a meta-analysis. Cochran Collaboration open learning materials (http://www.cochrane-net.org/openlearning/HTML/mod0.htm), give considerable explanation on the statistical methods used in this paper, a brief summary of which is provided here: fixed effect meta-analyses make the mathematical assumption that there is one identical true treatment–effect common to every study. Alternatively, random effects models of meta-analysis do not assume that a common (‘fixed’) treatment–effect exists. More specifically, in a random effects meta-analysis, there is no ‘fixed’ treatment-effect parameter to estimate, instead there is a distribution for the treatment-effect and therefore the meta-analysis estimates the mean and standard deviation for the distribution of treatment-effects. Thus, a random-effects assumption may be better when there is statistical heterogeneity between the studies. The Mantel-Haenszel methods have been shown to be more reliable with limited data (i.e. few small trials). The Peto method is also known to perform well with sparse data. The software packages used to implement the statistical methods described here are R (http://www.r-project.org/), SAS version 8 (SAS Institute, Cary, NC), and Comprehensive Meta Analysis (http://www.meta-analysis.com).
The probability of an event (and consequently the estimated OR) can also be estimated from a logistic regression model. Briefly, a logistic regression model is a regression model which allows modelling of the natural log of the odds (assuming a fixed-effects model) as a linear function of predictors.
The DerSimonian-Laird approach [8] for meta-analyses, acknowledges that heterogeneity of treatment-effects across studies is common and should be incorporated into the meta-analysis. This method estimates the magnitude of the heterogeneity, and assigns a greater variability to the estimate of overall treatment–effect to account for this heterogeneity.
Results
The initial literature search identified 78 potentially appropriate studies for inclusion in this meta-analysis (Figure 2). Twenty-four studies were excluded on the basis that they did not meet the inclusion criteria, were duplicate presentations, and/or data were not in a usable form.

Study selection flow diagram.
There were 54 studies included in this review which included 9268 randomized patients [9–62] (Supplementary Table 1). In 45 studies [9–53] (7980 patients) a diagnosis of a primary depressive disorder was made. The remaining nine studies [54–62] (1288 patients) included other depressive states and included primary diagnoses of bipolar II, bipolar not otherwise specified, dysthymia, and/or double depression. In 31 of the 45 studies in primary depression, patients were diagnosed using Diagnostic and Statistical Manual criteria (DSM-III/-IIIR/-IV) [63–65] and the remaining 14 used other criteria for diagnosis (e.g. Research Diagnostic Criteria, Medical Research Council, Feighner). Of the nine studies in the mixed depressive state category, eight used DSM diagnostic criteria.
The studies included in this review used broadly similar criteria for defining response and relapse (see Supplementary Table 1). The Hamilton Depression Rating Scale (HAM-D) was most frequently used for defining response (28/54 studies), and scores for defining response varied based on the version used (e.g. 17 or 24 item) [66]. Studies using the Clinical Global Impression Scale (CGI) (18/54) typically used a response criterion of ≤2 [67]. The majority of studies using the Montgomery-Asberg Depression rating Scale (MADRS) (11/54) used a response criterion of ≤12 [68]. HAM-D was also more frequently used for defining relapse (18/54 studies). Studies using CGI (13/54) typically used a score of ≥4 to define relapse. Most studies using MADRS used a relapse criterion of ≥22. Of the 54 studies eight used clinical judgement (alone or in combination with other criteria) to determine relapse.
The initial analysis examined the effects of covariates (class of antidepressant, age of patients, diagnostic system/depressive subtype, and duration of treatment pre-and post-randomization) on relapse rates.
Antidepressant class
The most commonly used interventions were mixed serotonin and norepinephrine reuptake inhibitors (SNRIs; 21 studies, 2420 patients) and selective serotonin reuptake inhibitors (SSRIs; 21 studies, 4447 patients). There were two studies that tested selective norepinephrine reuptake inhibitors (NRIs; 1406 patients), five studies with monoamine oxidase inhibitors (129 patients), and eight studies (866 patients) that assessed antidepressants with other pharmacologies (e.g. gepirone; mianserin; bupropion). All drug classes reduced the risk of relapse during continuation treatment (p <0.00001; Figure 3A). Individual ORs for each drug class ranged from 0.05 to 0.44, and the pooled OR (95%CI) was 0.38 (0.34–0.41). It was found that there was statistically significant heterogeneity between all antidepressant classes (χ2 = 19.27, p <0.01). When the monoamine oxidase inhibitor (MAOI) data were excluded, the statistical test for heterogeneity was found to be non-significant (χ2 = 1.11, p = 0.78).

Effect of (a) antidepressant drug class, (b) patient age, (c) diagnostic criteria/depressive subgroup, and (d) duration of antidepressant treatment pre- and post-randomization on relapse prevention.
Patient age
There were eight studies in elderly patients (775 patients) and 46 studies in non-elderly adult patients (8503 patients, Figure 3B). One study in adolescents [12] was not included in this analysis. Odd ratios were similar between elderly (OR = 0.30, 95%CI = 0.22–0.41) and non-elderly groups (OR = 0.39, 95%CI = 0.35–0.42). The statistical test for heterogeneity was found to be non-significant (χ2 = 2.08, p = 0.15).
Diagnostic system/depression subtype
We examined the influence of diagnostic system/ depressive subtype on relapse prevention. The OR for the 14 studies of patients with primary depression using earlier, non-DSM definitions of depression was lower compared to the 31 studies enrolling patients with DSM-III/IIIR/IV major depression (0.24 vs 0.39, with non-overlapping 95%CIs; Figure 3C). The OR for the nine studies enrolling patients with other depressive disorders was similar to that of studies enrolling DSM-diagnosed patients with major depression (0.43 vs 0.39; Figure 3C).
Treatment duration
Pre-randomization treatment duration was divided into three categories: short (≤ 2 months), intermediate (3–5 months) and long (≥ 6 months). The median and ranges of these three categories were short: 8 weeks (6–8); intermediate: 12 weeks (12–20), and long: 24 weeks (24–156), respectively. Duration of post-randomization continuation treatment was categorized as either short (1–6 months) or long (> 6 months). The median and ranges of these two categories were short: 24 weeks (7–24), and long: 52 weeks (26–156), respectively. Combining pre- and post-randomization duration categories gave five clusters of studies (there were no studies with short pre- and long post-randomization treatment). Odds ratios for all clusters were broadly similar, ranging between 0.31 and 0.49 (Figure 3D). Testing for heterogeneity was significant (χ2 = 10.43, p = 0.03). There was no apparent trend for duration of treatment pre- or post-randomization on ORs (e.g. ORs for short pre- and short post-randomization durations were similar to those in the long pre- and long post-randomization clusters (0.33 and 0.31, respectively)).
Year of publication and OR
There was a significant positive relationship between OR and year of publication (Spearman's r = 0.331, p = 0.01; Figure 4), indicating a tendency for earlier studies to have lower ORs.

Relationship between OR and year of publication for all studies. Spearman's r = 0.331, p = 0.01.
Comparison of statistical methods
All studies were included in a comparison of statistical methods on calculation of ORs and associated 95%CIs (Table 1). Random-effect methods produced slightly lower ORs than fixed-effect methods. However, ORs and their 95%CIs obtained using different statistical methods assuming a fixed-effect model were almost identical. This was also true in the case of different statistical methods assuming a random-effects model.
Odds ratios and 95% confidence intervals for all studies, calculated using fixed and random effects statistical models
Discussion
This meta-analysis of antidepressant drug trials in patients with depressive disorders identified a robust and consistent effect of treatment on relapse prevention. Continuing antidepressant treatment after initial response reduced the odds of relapse by approximately two thirds. These findings are consistent with two earlier meta-analyses [5,7], and with current treatment guidelines for depression [2,3] that support the recommendation that continuation antidepressant treatment is effective in preventing relapse following initial response. The present analysis extends these earlier findings in terms of the size of the database analysed (9268 patients, compared with ∼5000/study in earlier publications), and by specifically assessing the effects of patient age and type of diagnostic system/depressive subtype on relapse prevention by antidepressants.
The majority of assessed variables did not appear to influence the effect of antidepressants in preventing relapse of depression. With the exception of MAOIs, we did not identify differences in efficacy in relapse prevention between different classes of antidepressants. A recent analysis reported efficacy differences between some antidepressants in acute treatment trials [69]; however, no such trends were evident in relapse prevention trials, either between classes (Figure 3A; also [3]), or for agents within classes (data not shown). While it is possible that the lower OR noted for MAOIs indicates particular potency for this class of medication, it should be noted that all MAOI studies were older (pre-DSM-III-R; Supplementary Table 1), and there was a positive relationship between year of study publication and OR (Figure 4). This relationship may also explain the observation that studies enrolling depressed patients diagnosed with earlier, pre-DSM-III diagnostic criteria had lower ORs compared with later studies with patients diagnosed with DSM-III and later criteria (Figure 3C). It is not possible to determine whether this might reflect differences in population characteristics, site differences (e.g. older studies typically were carried out in academic settings), increasing placebo response rates over time [70], or other aspects that might influence initial or continuation responsiveness to antidepressants, or responsiveness in the placebo control arm.
The effects of continuation treatment in preventing depressive relapse were similar in elderly and non-elderly patients (Figure 3B). This augments the earlier observation that the acute efficacy of antidepressants is similar in elderly and in non-elderly adults [71]. The present findings, along with the single study in adolescents [12], underscore the importance of ensuring continuation treatment in all age groups of depressed patients.
One aspect of continuation antidepressant treatment that was not clarified by this meta-analysis is how long it should be for. Evidence-based treatment guidelines recommend continuation antidepressant therapy for an additional 4–9 months after initial response [4,5]. Our analysis did not identify systematic OR differences based on duration of treatment pre-or post-randomization (Figure 3D), and this has been noted previously [5,7]. A parsimonious explanation for these consistent findings is that the overall treatment duration for all studies was sufficient to demonstrate the benefits of continuation therapy; i.e. that patients enrolled in the briefest studies in this analysis received an adequate duration of therapy to demonstrate the benefits of relapse prevention, and that longer treatment durations did not provide additional benefit. In the cohort with short pre- and short post- randomization treatment durations, 10/13 studies had a combined duration of treatment of 24–32 weeks (Table 1), which is in line with current treatment guidelines for prevention of relapse. Only three studies [10,40,48] had a total duration of treatment <24 weeks (12, 12 and 14 weeks, respectively), and their ORs for relapse prevention were 0.26, 0.22 and 0.71. Thus the active arms of almost all studies included in this analysis represent outcomes of a population who have been treated for an appropriate duration of time to prevent recurrence of an index episode.
Another novel aspect of this analysis is the comparison of different statistical methods for calculating ORs. OR estimates were generally consistent across the different statistical methods and software packages. The only consistent difference between OR estimates was between fixed-effect and random-effects approaches (ORs approximately 0.36 vs 0.33 respectively). This might suggest that there is heterogeneity of treatment effects across the studies.
Given the nature of relapse-prevention trial designs, it is possible that the higher relapse rate in placebo arms might reflect drug withdrawal rather than recurrence of the underlying depressive disorder. If this were the case, one might predict lower ORs in studies with abrupt discontinuation of drug compared with those where there was down-titration after randomization. Information on titration was only reported in approximately half of the studies (28/54), limiting the applicability of any findings. Of the studies that did provide this information, two studies reported an abrupt switch from active to placebo, and the remainder had titration durations of 1 to 16 weeks (median = 2.5 wks). Inspection of Kaplan-Meier curves in studies that provided this information did not show an obvious increase in patients relapsing early in the post-randomization phase. The seven studies using fluoxetine were not considered in these comparisons, because of this drug's long half-life. Kaymaz [7] explored this question using meta-regression on a subset of studies and did not find differences in ORs between studies with <1 and ≥1 week down-titration rates, suggesting that drug-placebo differences are not exaggerated by drug withdrawal in the placebo arms.
Overall, most studies reported statistical significance in favour of active treatment, which raises the possibility of publication bias (i.e. only studies with positive outcomes were published). Diligent searches were made of pharmaceutical company and non-commercial clinical trial databases in an attempt to discover all potential trials. Despite this, all studies included in this meta-analysis showed at least numerical advantage for active over placebo. Therefore it appears plausible that the high proportion of statistically significant studies reflects the robustness of this type of study design. The consistency of these findings may seem initially surprising, given that acute studies in major depression typically have high failure rates or show considerable heterogeneity in response rates. For example, in acute antidepressant trials, 40–70% of active treatment arms did not separate statistically from placebo [6]. The most likely explanation for the difference between acute and relapse prevention designs is the enrichment of patients entering the randomized relapse prevention phase. In contrast to acute trials, where there is greater population heterogeneity (true responders, placebo responders and non-responders), non-responders are identified and removed prior to randomization in relapse prevention designs (Figure 1). Thus, reduced population heterogeneity may account for the consistency of results.
While this review includes a large sample of well-characterized patients, there are several limitations to this analysis that must be acknowledged. As mentioned above, while this analysis shows a benefit for continuation therapy in treatment responders, it cannot identify an optimal duration of therapy or most effective intervention. Additionally, in most studies patients were flexibly dosed pre-randomization, and this dose was maintained during the blinded randomization phase, so it is not possible to determine if dose might be a relevant factor in determining relapse or tolerability during continuation treatment. Some studies did not include details about the patients’ previous psychiatric history and/or the role of non-pharmacological interventions in their treatment (either past or present) both of which might contribute to potential for relapse. Another possible source of heterogeneity may be the criteria used to define response and relapse, and inclusion of patients diagnosed with primary depression by several diagnostic classification systems. Finally, as most studies did not permit enrolment of patients with co-morbid psychiatric disorders, the results reported may not be generalizable to more typical patients seen in clinical practice.
In conclusion, this meta-analysis of relapse prevention studies confirms the importance of continuation treatment in depressive disorders, in patients who have responded to initial antidepressant therapy. The magnitude of effect was similar for all classes of treatment and in adult and elderly patients, and was more robust than that noted in acute treatment trials, presumably reflecting population enrichment. The studies analysed could not further define the optimal conditions for continuation treatment (e.g. dose, duration, patient characteristics), and this will require further research.
Footnotes
Acknowledgements
