Abstract
Objective:
We investigated the comparative efficacy and tolerability of augmentation strategies for bipolar depression.
Data Sources:
We conducted a systematic review and network meta-analysis of 8 electronic databases for double-blind, randomized controlled trials of adjunctive pharmacotherapies for acute bipolar depression.
Data Extraction and Synthesis:
We followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines and applied the Cochrane risk of bias tool for study quality appraisal. Two reviewers independently abstracted data. We resolved all discrepancies by consensus.
Main Outcomes and Measures:
Primary outcomes were response and completion of treatment. We estimated summary rate ratios (RRs) and standardized mean differences (SMDs) relative to placebo controls using frequentist random-effects network meta-analysis.
Results:
We identified 69 trials meeting eligibility criteria (8,007 participants, 42.8 years, 58.0% female). Adjunctive racemic intravenous ketamine, coenzyme Q10, pramipexole, fluoxetine, and lamotrigine were more effective than placebo. Summary RRs for response ranged between 1.51 (95% confidence interval [CI], 1.11 to 2.06) for fluoxetine and 12.49 (95% CI, 3.06 to 50.93) for racemic intravenous ketamine. For completion of treatment, risperidone appeared less tolerable than placebo (RR = 0.59; 95% CI, 0.38 to 0.94), while fluoxetine seemed more tolerable than placebo (RR = 1.13; 95% CI, 1.02 to 1.24). None of the investigated agents were associated with increased treatment-emergent mood switches.
Conclusions and Relevance:
The evidence for augmentation strategies in bipolar depression is limited to a handful of agents. Fluoxetine appeared to have the most consistent evidence base for both efficacy and tolerability. There remains a need for additional research exploring novel treatment strategies for bipolar depression, particularly head-to-head studies.
Introduction
Bipolar disorder (BD) is a severe and persistent mental illness characterized by recurrent episodes of depression and mania (bipolar I disorder, BD-I) or hypomania (bipolar II disorder, BD-II). 1 The overall global prevalence of BD is approximately 1%, and population growth and aging are leading to an increasing burden from BD over time. 2 The most recent global estimates of the lifetime prevalence of BD-I, BD-II, and BD spectrum were 0.6%, 0.4%, and 1.4%, respectively. 3 Among people with BD, there is a high prevalence of psychiatric and medical comorbidities. 2 Due to its early onset, severity, and chronicity, BD is a primary cause of disability among young people, often leading to severe cognitive and functional impairment, and higher mortality—particularly death by suicide. 2
Accordingly, we must direct resources toward improving the coverage of evidence-based intervention strategies for BD. 4 In the United States, the total costs of BD-I were over $200 billion in 2015, corresponding to an average of roughly $80,000 per person. 5 There is also a need for improved diagnosis 2 , effective treatments, 6 –8 identification of biomarkers, 9 and greater treatment access. 9
While all phases of BD can cause significant impairment, the depressive phase accounts for the most substantial proportion of the illness. The depressive phase of BD is also often the most challenging stage of the disease to treat. 10 As depression is often the first episode of the disease, this leads to misdiagnosis and delays in treatment. Pharmacotherapies are the mainstay of BD treatment and represent the standard of care. Therefore, reliable estimates of comparative efficacy and acceptability are clinically and economically advantageous. However, several methodological problems and idiosyncrasies have introduced challenges in ascertaining comparative treatment performance. Chiefly, a shortage of head-to-head trials complicates the need to support clinical decision-making in psychiatry. While several recent reviews and meta-analyses in the literature have attempted to synthesize the available evidence, there is still controversy about the comparative performance of augmentation strategies managing bipolar depression. 11 –13 Few previous reviews have explored add-on treatments, nor have they considered both antidepressants and “nonantidepressant” adjuncts for bipolar depression. 6,14,15
Fortunately, a novel approach can yield useful information about the relative performance of different therapies that have not entered head-to-head studies. 16 This method is called network meta-analysis (NMA). 6,16,17 In brief, an NMA is a meta-analysis of multiple treatments. In the absence of direct comparisons between all available pharmacotherapies, an NMA can synthesize all the possible direct and indirect evidence across trials. 18 Although NMA requires close similarity of compared trials, including their design and patient characteristics, it is a potentially powerful tool for understanding the comparative performance of treatments in psychiatry. 16
In our previous NMA, we demonstrated that divalproex, olanzapine, quetiapine, cariprazine, and lamotrigine were effective monotherapies for bipolar depression. 19 The present study aimed to determine the comparative effectiveness of adjunctive pharmacotherapies for acute bipolar depression.
Methods
Protocol and Registration
We registered this study with PROSPERO (CRD42019122172). We followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) extension statement for reporting systematic reviews incorporating network meta-analyses. 20 We have provided a PRISMA checklist in Online Appendix 1.
Eligibility Criteria
We defined our eligibility criteria using the Population–Intervention–Comparison–Outcome–Study Design framework.
Population. We restricted eligibility to patients with a primary diagnosis of BD, currently in the depressed phase, using diagnostic criteria, such as the Diagnostic and Statistical Manual of Mental Disorders or the International Classification of Disease (ICD). We excluded studies of participants in a nondepressed mood episode including manic, hypomanic, and mixed states.
Intervention. The present study is an intentionally selective review of augmentation therapies for bipolar depression. As such, we excluded trials involving monotherapies or studies where there were no clearly defined augmentation strategies. To best inform clinical practice, we purposefully included multiple adjunctive psychotropics classes such as antidepressants, antipsychotics, mood-stabilizing agents, stimulants, N-methyl-
Comparison. We included trials involving either placebo or active comparator conditions. We excluded trials comparing medications to nonpharmacologic therapies such as neurostimulation or psychotherapies.
Outcomes. We restricted eligibility to studies reporting at least 1 measure of efficacy or acceptability of treatment (defined below).
Study designs. We restricted eligibility to randomized controlled trials (RCTs) to boost methodological rigor by minimizing performance and ascertainment biases.
21
We excluded prophylaxis or relapse prevention studies.
Information Sources
This review closely follows the approach taken by previous network meta-analyses of treatments for mood disorders. 6,14,15,17,19,22,23 We searched Cochrane Central Register of Controlled Trials, CINAHL and Pre-CINAHL, Embase, LILACS database, MEDLINE, PsycINFO, and PubMed from database inception to April 2020 with no language restrictions. We updated our search in July 2020. We supplemented our electronic search strategy by reviewing ongoing RCTs in the World Health Organization’s International Clinical Trials Registry Platform and ClinicalTrials.gov using the search term “bipolar depression.” Finally, we examined the reference lists of all eligible articles and previous reviews for additional studies.
Search
We have described our full search strategy in Online Appendix 2.
Study Selection
Three investigators (A.B., C.S., and D.E.) independently selected the studies, reviewed the main reports and supplementary materials, extracted the relevant information from the included trials, and assessed the bias risk. We resolved discrepancies by consensus and arbitration by a panel of investigators within the review team (A.B., C.S., D.E., E.H., and G.V.).
Data Collection Process
Three reviewers (A.B., C.S., and D.E.) independently extracted data using Cochrane’s Covidence, a web-based systematic review manager. 24 Where necessary, we contacted corresponding authors of articles to confirm data. We used a standardized instrument to extract information about authors, study objectives, sample characteristics, eligibility criteria, study design, experimental processes, treatment protocols, outcome variables, and analytic strategy.
Outcome Measures
Our primary outcomes were response and acceptability. We defined response as the proportion of study participants who reduced at least 50% in their baseline depression severity at the primary study end point. We considered any depression instrument for this purpose, such as the Montgomery–Åsberg Depression Rating Scale (MADRS) 25 or the Hamilton Depression Rating Scale (HDRS). 26 In the absence of information or supplemental data, we calculated response using a previously validated imputation method. 27 We defined acceptability as the proportion remaining in the study until its primary end point.
Our 5 secondary outcomes were as follows: (1) remission, (2) reduction in depression severity, (3) all-cause treatment discontinuation, (4) discontinuation due to adverse events, and (5) treatment-emergent mood switches. We defined remission as the proportion with a depression severity of
Risk of Bias within Individual Studies
Two coauthors (A.B. and C.S.) independently appraised all included trials against the Cochrane risk of bias tool for RCTs. 28 Briefly, this tool considers 6 domains of bias: randomization, concealment of allocation, blinding, loss to follow-up, selective reporting, and other sources. We assigned a rating of “low,” “high,” or “unclear” risk of bias to each domain. We set an overall risk of bias classification based on the count of “high risk of bias” domains per study: studies with zero high-risk domains had a “low overall risk,” those with 1 or 2 high-risk domains had a “moderate overall risk,” and those with 3 or more high-risk domains had a “high overall risk.”
Summary Measures
We performed our analyses on an intention-to-treat basis using data derived at the primary study end point. For each pairwise comparison of dichotomous outcomes (response, remission, completion of treatment, all-cause dropouts, withdrawal due to adverse events, and affective switch), we calculated rate ratios (RRs) with their 95% confidence intervals (CIs). For each pairwise comparison of continuous outcomes (e.g., reduction in the severity of depression), we calculated standardized mean differences (SMDs) with their 95% CI. We assumed a two-sided P < 0.05 to indicate statistical significance.
Planned Methods of Analysis
We conducted all statistical analyses in the open-source software environment, R Studio (Version 3.5.1). We used the pairwise function to transform our data into a contrast-based format and the netmeta package to conduct network meta-analyses (Online Appendix 3). 29 The netmeta package uses a frequentist random-effects model, which we selected to preserve randomization across trials. 18 We opted for the random-effects model to account for high between-study heterogeneity. We assumed a jointly randomizable network of pharmacotherapies, where eligible study participants were equally likely to be randomized to any of the interventions in the comparator set.
Assessment of Transitivity
NMA enables the indirect comparison of treatments that have not yet entered head-to-head trials using a common comparator (e.g., placebo). 30 Such comparisons assume transitivity, 30 which means that there are no systematic differences between the available comparisons other than the compared treatments. 31 To reduce intransitivity, we excluded RCTs evaluating monotherapies for bipolar depression. We abstracted information on potential effect modifiers that could violate the transitivity assumption, including population characteristics, treatment resistance, study design, risk of bias, participant age, and baseline depression severity. We conducted a qualitative synthesis to assess clinical and methodological heterogeneity sources. Finally, we quantified heterogeneity with forest plots and using the I 2 statistic. 32 I 2 values below 50% were low heterogeneity, values between 50% and 75% were moderate, and values higher than 75% were high. 33
Assessment of Consistency
We considered mixed evidence for each comparison in the network as we analyzed all available direct and indirect evidence. 30 Such syntheses assume consistency, which is the degree of congruence between direct and indirect evidence. We used 2 methods to compare a conventional NMA model assuming consistency with a model that does not assume consistency (i.e., a series of pairwise meta-analyses analyzed jointly). 34,35 We utilized the decomp.design command, which provides Q-statistics for between-study heterogeneity; this functions as a measure of consistency.
Risk of Bias across Studies
To evaluate the overall network quality and risk of bias, we followed the Grading of Recommendations Assessment, Development and Evaluation recommendations. We evaluated imprecision by the width of CIs for each effect size estimate. We assessed for publication bias by assessing funnel plots of the trial effect sizes for each outcome. 36 We assessed funnel plot symmetry with Egger’s, 37 adjusted rank correlation, and regression asymmetry tests. 38,39 For asymmetric plots, we applied the trim and fill method, acknowledging that other factors, such as trial quality or study heterogeneity, could reduce plot symmetry. 38,39
Additional Analyses
To assess the stability of the network, sensitivity of the results, and unexplained heterogeneity, we conducted a series of post hoc subgroup analyses for the following variables: BD subtype (BD-I only, BD-II only), treatment resistance, study sample size (N > 49), multisite studies (i.e., excluding small proof-of-concept trials), and intervention class (antidepressant only, antipsychotic only)
Results
Study Selection
The systematic search provided a total of 4,130 unique citations (Figure 1). We identified 70 trials comprising 8,007 patients (58.0% female), including 1 unpublished study, NCT00562861 40 (Online Appendix 4).

Preferred Reporting Items for Systematic Reviews and Meta-Analyses systematic review flow diagram.
Summary of Network Geometry
After we excluded closed-loop networks from 1 study (Juruena et al., 2009), 69 trials—including 50 interventions and 91 comparisons—were eligible for NMA (Figure 2).

Network graph of the included studies to enable visualization of the geometry of the treatment network.
Study Characteristics
Most trials were from the United States (n = 45), Europe (n = 13), and Canada (n = 5). The mean study sample size was 116 (SD = 131). Four thousand four hundred and fourteen participants received adjuvant pharmacotherapy, while 3,593 received a placebo (Table 1). The mean age was 42.2 years (SD = 6.3), while the median treatment duration was 8 weeks (range 6 to 12 weeks).
Summary of Characteristics of Randomized Controlled Trials for Bipolar Depression Treatment.
Note. N = 70. DSM = Diagnostic and Statistical Manual of Mental Disorders (third, fourth, fifth editions); TR = text revision; I = bipolar I disorder; II = bipolar II disorder; NOS = bipolar disorder not otherwise specified; I/P = inpatient; O/P = outpatient; HDRS = Hamilton Depression Rating Scale; MADRS = Montgomery–Åsberg Depression Rating Scale; QIDS-SR16 = Quick Inventory of Depressive Symptoms Self-Report, 16 items; IDS-C = Inventory of Depressive Symptoms; IU = international units; CDRS = Childhood Depression Rating Scale; SUM-D = Clinical Monitoring Form, Depression Subscale; AGO = agomelatine; AMI = amisulpride; ARI = aripiprazole; ARM = armodafinil; ASA = aspirin; BW = bodyweight; BD = bipolar disorder; BUP = bupropion; CEL = celecoxib; CIT = citalopram; CRE = creatine; DES = desipramine; DEX = dextromethorphan; D3 = vitamin D; EPA = eicosapentaenoic acid; FLX = fluoxetine; ICD = International Classification of Disease; IMI = imipramine; INF = infliximab; INO = inositol; KET = intravenous racemic ketamine; LAM = lamotrigine; LEV = levetiracetam; LIS = lisdexamfetamine; LUR = lurasidone; MEM = memantine; MINO = minocycline; MOC = moclobemide; NAC = N-acetylcysteine; Omega-3 = omega-3 fatty acids; PAR = paroxetine; PBO = placebo; PIN = pindolol; PIO = pioglitazone; PRAM = pramipexole; PREG = pregnenolone; QUE = quetiapine; Q10 = coenzyme Q10; RISP = risperidone; SAMe = S-adenosyl methionine; SERT = sertraline; TOP = topiramate; TRAN = Tranylcypromine; T3 = triiodothyronine; T4 = levothyroxine; VEN = venlafaxine; ZIP = ziprasidone.
a For both ketamine studies (Zarate 2012 and Diazgranados 2010), which were double-blind, randomized crossover randomized controlled trials, study participants received a single intravenous infusion of either ketamine hydrochloride (0.5 mg/kg) or placebo on 2 test days, 2 weeks apart. Given the nature of the treatment, we reported the maximal proportion who responded to ketamine and to placebo at some point during the trial rather than a fixed proportion at the 2-week mark.
Treatment Characteristics
There were 45 distinct agents across trials, which are described further in Online Appendix 5:
Antidepressants (n = 12): agomelatine, amitriptyline, bupropion, citalopram, desipramine, imipramine, fluoxetine, moclobemide, paroxetine, sertraline, tranylcypromine, and venlafaxine;
Mood stabilizers (n = 4): lamotrigine, levetiracetam, topiramate, and lithium;
Antipsychotics (n = 6): quetiapine, ziprasidone,
Stimulants (n = 2): lisdexamfetamine and armodafinil;
NMDA glutamate receptor antagonists (n = 3): racemic intravenous ketamine, dextromethorphan, and memantine; and
Other agents (n = 18): aspirin, celecoxib, coenzyme Q10, combination nutraceutical, creatine, infliximab, inositol,
Risk of Bias within Studies
The overall risk of bias was “low” (for 14 RCTs), “moderate” (for 48 trials), and “high” (for 7 trials)—Online Appendix 6.
Synthesis of Results
Primary outcomes
Racemic intravenous ketamine (RR = 12.49; 95% CI, 3.06 to 50.93), coenzyme Q10 (RR = 5.96; 95% CI, 2.03 to 17.48), pramipexole (RR = 4.17; 95% CI, 1.32 to 13.18), fluoxetine (RR = 1.51; 95% CI, 1.11 to 2.06), and lamotrigine (RR = 1.43; 95% CI, 1.00 to 2.04) had greater response rates than placebo (Figure 3). Retention in treatment, relative to placebo, was superior with fluoxetine and worsened with risperidone (Figure 4).

Contrast plots for rate ratios of depression response at primary study end point versus placebo.

Contrast plots for rate ratios of completion of treatment at primary study end point versus placebo.
Secondary Outcomes
Remission. Intravenous racemic ketamine (RR = 4.92; 95% CI, 1.07 to 22.71), celecoxib (RR = 3.30; 95% CI, 1.40 to 7.80), and fluoxetine (RR = 1.71; 95% CI, 1.14 to 2.59) were more effective than placebo (Online Appendix 7).
Reduction in depression severity. T3 (SMD = −4.18; 95% CI, −7.01 to −1.35), intravenous racemic ketamine (SMD = −2.23; 95% CI, −3.88 to −0.59), and fluoxetine (SMD = −2.15; 95% CI, −3.14 to −1.16) were more effective than placebo.
All-cause treatment discontinuation. Fluoxetine (RR = 0.76; 95% CI, 0.64 to 0.92) was more tolerable than placebo.
Discontinuation due to adverse events. Combination nutraceutical (RR = 2.32; 95% CI, 1.01 to 5.35), moclobemide (RR = 3.42; 95% CI, 1.10 to 10.59), imipramine (RR = 3.28; 95% CI, 1.65 to 6.52), and quetiapine extended release (RR = 8.00; 95% CI, 1.04 to 61.62) led to greater discontinuations from adverse events relative to placebo.
Treatment-emergent mood switches. None of the investigated adjuvants were more likely to induce a treatment-emergent manic or hypomanic episode relative to placebo. Overall, the mean rate of treatment-emergent switching was 2.68 episodes per study (SD = 3.51; range: 0 to 20); for placebo and active arms, the mean rates were 2.73 and 2.65 episodes per study (P = 0.78).
Exploration for Intransitivity and Inconsistency
We quantified transitivity and inconsistency for each outcome measure (Online Appendix 8). Heterogeneity estimates were only significant for response, remission, and depression severity reduction but not for the completion of treatment, all-cause discontinuation, discontinuation due to adverse events, or treatment-emergent mania.
Risk of Bias across Studies
While there was no overall evidence of network publication bias for any of the outcomes, we could not explore publication bias for individual agents (Online Appendix 9).
Additional Analyses
BD-I subgroup analysis. Coenzyme Q10 and fluoxetine were significantly better than placebo in terms of response. However, fluoxetine was substantially better than placebo for completion of treatment, remission, and reduction in depression severity. Imipramine was significantly worse than the placebo in terms of all-cause treatment discontinuation.
BD-II subgroup analysis. Pramipexole demonstrated a superior reduction in depression severity relative to placebo; no other agents separated from the placebo.
Multisite RCT subgroup analysis. After restricting the analyses to multisite RCTs, fluoxetine and lamotrigine appeared to show superiority over placebo for both response and remission. Fluoxetine also appeared to demonstrate superiority for the completion of treatment and reduction in the severity of depression. Risperidone and N-acetylcysteine were worse than the placebo in response to treatment and completion of treatment, respectively.
Treatment resistance subgroup analysis. We restricted our analyses to consider trials involving participants who did not have treatment-resistant bipolar depression. Fluoxetine demonstrated superiority over placebo for the response to treatment, the completion of treatment, and reduced depression severity. Pindolol also demonstrated superiority for both response and remission, while coenzyme Q10 appeared to outperform placebo for response to treatment. Several agents seemed to cause significantly more dropouts due to adverse events including moclobemide, imipramine, and quetiapine extended release.
Mood stabilizer subgroup analysis. When lithium was the primary mood stabilizer, fluoxetine demonstrated superiority for all efficacy outcomes, while imipramine reduced depression severity. Fluoxetine and lamotrigine were also effective in combination with second-generation antipsychotics for remission from depression.
Discussion
Summary of Evidence
This review provides the most comprehensive appraisal of the comparative performance of adjunctive pharmacotherapies for acute bipolar depression to the best of our knowledge. In terms of efficacy, adjunctive racemic ketamine, coenzyme Q10, pramipexole, fluoxetine, and lamotrigine appeared to outperform placebo, while fluoxetine was the best-tolerated agent.
While only a handful of agents appeared to demonstrate superiority over placebo, our findings are consistent with the 2018 Canadian Network for Mood and Anxiety Disorder Treatments and International Society of Bipolar Disorders guidelines, which provide an excellent summary of the extant literature. 41 With this in mind, our review’s finding that fluoxetine demonstrated consistent evidence as an augmentation strategy for bipolar depression across outcomes is congruent with these guidelines. Our review also found that risperidone was less tolerable than a placebo, which agrees with prior reviews for BD. While our study did not measure specific side effects apart from treatment-emergent affective switching, the most commonly reported adverse events of risperidone, such as extrapyramidal symptoms, metabolic syndrome, and sedation, may have contributed to this finding.
For other agents, however, our findings are somewhat contradictory and may be surprising to clinicians. We must first emphasize that our results are for augmentation strategies in acute bipolar depression, rather than as monotherapies, outlined in our previous review. 19 Our conclusions may be valid; however, they may stem from our decision to pool RCTs that may have been dissimilar in unmeasured ways. For example, in the 2 ketamine trials, 42,43 which were double-blind, crossover RCTs for treatment-resistant bipolar depression, study participants received a single intravenous infusion of either ketamine hydrochloride (0.5 mg/kg) or placebo on 2 test days 2 weeks apart. The decision to include these 2 trials alongside parallel RCTs involving oral medications may have contributed to some instability in our overall network estimates.
To address this issue, we conducted additional analyses to delineate the impact of known effect modifiers such as the BD subtype. For BD-I, coenzyme Q10 and fluoxetine outperformed placebo, while in BD-II, pramipexole was more effective at reducing depression severity. While these differences may be due to a relative shortage of trials focusing exclusively on BD-II, they may also reflect biological differences. For example, BD-I patients appear to demonstrate greater trait impulsivity and lifetime aggression, while BD-II patients appear to score higher on measures of hostility. 44 C-reactive protein appears to show some promise as a differential biomarker of BD-II depression over BD-I. 45 In large population-based samples of BD-I and BD-II, antidepressant use seems to be higher among people with BD-II. 46 There is also some evidence to suggest that BD-II patients show a slower response to treatments than patients with BD-I. 47 This latter point is of particular relevance given the relative shortage of effective agents identified by our review, which focused on the acute treatment phase.
As fluoxetine demonstrated consistent superiority relative to placebo, we explored for a potential class effect for antidepressants and selective serotonin reuptake inhibitors (SSRIs). However, other SSRIs and other antidepressants (bupropion, the tricyclic antidepressants, the monoamine oxidase inhibitors, venlafaxine, and agomelatine) failed to demonstrate superiority over placebo. While it remains unclear why compounds with similar biological effects did not perform similarly across RCTs in the present NMA, a relative shortage of data might be a reasonable explanation for the absence of superiority over placebo for individual agents. For example, fluoxetine had the largest pooled sample size across antidepressants (n = 529). Although we identified several individual RCTs involving paroxetine (k = 7) and imipramine (k = 5), the overall pooled sample sizes for these agents were still only a fraction of that for fluoxetine. Thus, our analysis may have had more power to detect significance for fluoxetine.
Conversely, for agents with more trials and larger overall sample sizes, a finding of no effect might be a more reasonable explanation. Still, our conclusions are consistent with most of the individual RCTs. For example, the Yatham et al. 41 RCT involving adjunctive agomelatine, which had a total sample size of 344, did not demonstrate superiority over placebo. While there may be a need for more data involving nonfluoxetine antidepressants, it is also possible that there are idiosyncrasies for individual agents.
Another methodological problem worth noting is that some studies lumped together 2 or more antidepressants within a class without separating their effects. For example, the Systematic Treatment Enhancement Program for Bipolar Disorder (STEP-BD) trial from 2007 did not differentiate outcomes with paroxetine versus bupropion. 48 A related problem is the notorious spate of failed versus negative RCTs in bipolar depression, such as the 2 failed aripiprazole trials, 49 the 2 failed ziprasidone trials, 50 and the 5 lamotrigine trials that all suffered more from high placebo response rates than from a failure to demonstrate the intrinsic value of a particular treatment. 51 This methodological dilemma is essential, given that the absence of evidence could become conflated with evidence of absence. 52
Treatment-emergent Mood Switches
As our review investigated agents that were adjunctive to a primary mood stabilizer, it was not surprising that none of the medications increased short-term rates of affective switching. However, the long-term risk for cycle acceleration is an entirely separate issue, which our study did not address. A previous study explored long-term affective switches in depressed individuals treated with lithium plus imipramine, lithium alone, or imipramine alone. 53 Therein, combination treatment provided no advantage over imipramine alone, with the lithium carbonate–treated group having fewer manic episodes than the other groups. 53 Despite the controversy on using antidepressants for bipolar depression, the present NMA’s results alone leave unanswered questions about the safety and wisdom of continuing an intervention beyond the acute phase. 54 –59
Strengths and Limitations
To our knowledge, the present review is the most comprehensive review of adjunctive treatments for bipolar depression. We identified a substantial evidence base through an exhaustive search strategy, incorporated an array of outcome measures, and conducted several analyses. The inclusion of active comparator conditions and a wide variety of treatments approximates real-world clinical conditions. However, our study has several limitations.
While our study considered both relative (e.g., response) and absolute measures (e.g., remission, depression severity) of efficacy, this also led to some inconsistencies in our findings. For example, racemic intravenous ketamine appeared to have a larger response rate than fluoxetine; however, the accompanying absolute measure provided a more modest effect size. Conversely, some agents demonstrated superiority for select outcomes such as coenzyme Q10 and pramipexole (for response) and T3 (for depression severity). We observed more consistency in our findings for acceptability measures, with fluoxetine and risperidone emerging as the most and least acceptable agents, respectively. Unmeasured and measured effect modifiers may have influenced these discrepancies. While meta-regression can adjust for measured effect modifiers, we could not run these analyses due to the small number of trials per comparison required to render this meaningful at the aggregate level. 60,61 Combining trials across several decades, such as the Banki 1977 RCT, may have increased immeasurable heterogeneity that could not be accounted for by statistical techniques. Similarly, extensive subgroup analyses were limited by a sufficient number of shared comparator agents across trials. Although it is unclear whether the trial duration is a treatment effect modifier, 62 these differences may have biased our results and subsequent interpretation of the findings.
While NMA can pool trials to enhance the overall sample, several of the “best agents” identified by our review, except fluoxetine, were based on the results of small proof-of-concept studies. Combining phase II and III trials favor phase II studies because effect size consistently decreases with rising sample size. For example, in the case of T3, which appeared to be one of the best augmentation strategies, the evidence was based on a single trial from 1977. Only 11 participants received T3. Similarly, the results for coenzyme Q10 came from a single proof-of-concept study. Likewise, intravenous racemic ketamine results came from 2 very-short-term (2-week) proof-of-concept studies with a total of only 33 participants receiving ketamine. Ketamine also has a known analytic problem of very high functional unblinding that increases the response rates of intravenous ketamine acutely. Thus, combining results from small, single-site, proof-of-concept studies with larger, multisite RCT reduced the basis for gauging the rigor and robustness of findings reported from one study to another. Furthermore, there may have been publication bias for coenzyme Q10, intravenous racemic ketamine, and T3, as only small studies with large effect sizes were reported.
The degree to which NMA can reasonably be expected to alleviate heterogeneity is limited. However, we conducted several subgroup analyses, including 1 for multisite trials (Online Appendix 10), which demonstrated the superiority of fluoxetine and lamotrigine (for both response and remission). Our study’s findings that showed consistency across subgroup analyses appeared to have a more robust evidence base. The reason that such differences occur in the first place depends on the degree to which there are unbalanced outcome moderators such as BD subtype, 63 subthreshold mixed features, 64 rapid cycling, variable episode number, medication dosing, and the use of concomitant pharmacotherapies. While we could not account for all of these, our NMA did explore the impact of BD subtype; however, some other examples are worth noting. In the Nemeroff et al. study, 65 the effect of adjunctive paroxetine was moderated (post hoc) by lithium levels. In the Cohn et al. study, 66 fluoxetine was more efficacious when combined with olanzapine 67 but not lithium. Yet, we found that lithium plus fluoxetine was significantly better than lithium alone when we pooled outcomes.
The population of interest in this NMA was adults with bipolar depression. Although the present NMA primarily involves adults with acute bipolar depression, this still represents a heterogeneous population. While we did not identify or include any trials involving only pediatric (<18 years) or geriatric (>65 years) samples, 2 studies included mixed age groups; we did not exclude these trials as they contained adult participants. The Detke et al. study had a median age of 15 years, with some up to 19 years of age; the Zeinoddini et al. (2018) study had a mean age of 68.2 years. Sensitivity analyses that excluded either study or both from the network did not cause significant changes in our outcome estimates. Relatedly, we included trials with treatment-resistant and rapid-cycling populations. While only a handful of trials included such patients, fluoxetine, coenzyme Q10, and pindolol demonstrated superiority over placebo for the non-treatment-resistant population.
Future Research
Larger, longer phase III studies are needed to replicate findings from small, single-site, proof-of-concept trials beyond the acute treatment window. Standardizing future RCT outcome measures and trial durations would support comparisons of findings across studies. There is also a need for studies that will refine our understanding of the pathophysiology of bipolar depression and its treatment, such as differential activation and involvement of monoaminergic receptors that may account for individual differences in treatment response. 68 The use of functional (rather than symptom-based) depression outcomes and biological measures of depression, such as brain-derived neurotrophic factor plasma levels, may correlate with more meaningful improvements in depression and complement clinical data. Finally, there is a need for more research among special populations such as children, adolescents, older adults, and people with concurrent disorders.
Conclusions
While several agents, such as racemic intravenous ketamine and fluoxetine, appear potentially effective and well-tolerated as acute augmentation strategies for bipolar depression, only the conclusions for fluoxetine seem reasonable given the replication in phase III trials that appropriately compare to other included studies. Based on this review, the most consistent evidence for augmentation strategies in bipolar depression is limited to fluoxetine. Given the extent of these findings, there remains a need for additional research into effective augmentation strategies for bipolar depression.
Supplemental Material
Supplemental Material, sj-docx-1-cpa-10.1177_0706743720970857 - Comparative Efficacy and Tolerability of Adjunctive Pharmacotherapies for Acute Bipolar Depression: A Systematic Review and Network Meta-analysis
Supplemental Material, sj-docx-1-cpa-10.1177_0706743720970857 for Comparative Efficacy and Tolerability of Adjunctive Pharmacotherapies for Acute Bipolar Depression: A Systematic Review and Network Meta-analysis by Anees Bahji, Dylan Ermacora, Callum Stephenson, Emily R. Hawken and Gustavo Vazquez in The Canadian Journal of Psychiatry
Supplemental Material
Supplemental Material, sj-pdf-1-cpa-10.1177_0706743720970857 - Comparative Efficacy and Tolerability of Adjunctive Pharmacotherapies for Acute Bipolar Depression: A Systematic Review and Network Meta-analysis
Supplemental Material, sj-pdf-1-cpa-10.1177_0706743720970857 for Comparative Efficacy and Tolerability of Adjunctive Pharmacotherapies for Acute Bipolar Depression: A Systematic Review and Network Meta-analysis by Anees Bahji, Dylan Ermacora, Callum Stephenson, Emily R. Hawken and Gustavo Vazquez in The Canadian Journal of Psychiatry
Supplemental Material
Supplemental Material, sj-pdf-2-cpa-10.1177_0706743720970857 - Comparative Efficacy and Tolerability of Adjunctive Pharmacotherapies for Acute Bipolar Depression: A Systematic Review and Network Meta-analysis
Supplemental Material, sj-pdf-2-cpa-10.1177_0706743720970857 for Comparative Efficacy and Tolerability of Adjunctive Pharmacotherapies for Acute Bipolar Depression: A Systematic Review and Network Meta-analysis by Anees Bahji, Dylan Ermacora, Callum Stephenson, Emily R. Hawken and Gustavo Vazquez in The Canadian Journal of Psychiatry
Supplemental Material
Supplemental Material, sj-pdf-3-cpa-10.1177_0706743720970857 - Comparative Efficacy and Tolerability of Adjunctive Pharmacotherapies for Acute Bipolar Depression: A Systematic Review and Network Meta-analysis
Supplemental Material, sj-pdf-3-cpa-10.1177_0706743720970857 for Comparative Efficacy and Tolerability of Adjunctive Pharmacotherapies for Acute Bipolar Depression: A Systematic Review and Network Meta-analysis by Anees Bahji, Dylan Ermacora, Callum Stephenson, Emily R. Hawken and Gustavo Vazquez in The Canadian Journal of Psychiatry
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
