Abstract
Introduction
Bipolar depression is disabling and often inadequately responsive to medication alone. The current efficacy evidence of transcranial magnetic stimulation (TMS) for bipolar depression is conflicting. Therefore, we synthesized randomized controlled trials (RCTs) that tested the efficacy, safety, and tolerability of TMS for bipolar depression.
Methods
We searched MEDLINE/EMBASE/Cochrane/PsycINFO/gray literature (01/10/2025) for RCTs comparing any TMS protocol with sham. Co-primary outcomes were depressive symptoms, all-cause discontinuation; secondary outcomes were response, remission. Risk of bias (RoB) was assessed with RoB-2. Random-effects models estimated standardized mean differences (SMDs) and risk ratios (RRs) with 95% confidence intervals (95%CI), alongside sensitivity, subgroup, and meta-regression analyses.
Results
Nineteen comparisons from 17 RCTs (N = 563; TMS = 293, sham = 270; mean N TMS = 15.4, sham = 15.9; mean duration = 2.40 weeks; RoB “low” = 35%, “some concerns” = 65%) were included. Among trials reporting subtypes (k = 13), 41.8% of participants had bipolar I disorder, and 58.2% had bipolar II disorder. The left dorsolateral prefrontal cortex was the most common target (k = 12). TMS reduced depressive symptoms versus sham (SMD = −0.34; 95%CI = −0.58 to −0.11), with no difference in all-cause discontinuation. TMS was favoured for response (RR = 1.41; 95%CI = 1.10 to 1.80) and remission (RR = 1.54; 95%CI = 1.06 to 2.23). However, these effects were not consistently confirmed in sensitivity or subgroup analyses by RoB, TMS type, stimulation site, or treatment resistance. Overall, 15 comparisons (88.2%) did not show superiority of TMS over sham for depressive symptoms at the individual trial level. No seizures or serious adverse events occurred; adverse events did not differ from sham. Meta-regression suggested a greater number of total pulses was associated with greater depressive symptom reduction (β = −0.018; p = .00017).
Conclusions
TMS shows a small meta-analytic antidepressant effect and acceptable tolerability in bipolar depression despite most individual trials being negative. However, subgroups and sensitivity findings did not support TMS as an efficacious treatment at current doses. Further testing via larger RCTs with higher-dose protocols is warranted.
Plain Language Summary
Bipolar depression can be difficult to treat, and many people continue to have depressive symptoms despite medication. Transcranial magnetic stimulation, or TMS, is a non-invasive treatment that uses magnetic pulses to stimulate specific areas of the brain involved in mood regulation. Although TMS is an established treatment for major depressive disorder, its benefit in bipolar depression remains uncertain. In this systematic review and meta-analysis, we examined randomized controlled trials that compared TMS with sham treatment, meaning a placebo-like version of TMS. We included 17 trials with 19 comparisons and 563 participants. Most studies were small, lasting about 2 to 3 weeks on average, and most had at least some concerns about risk of bias. Overall, TMS was associated with a small improvement in depressive symptoms compared with sham treatment. Participants receiving TMS were also more likely to have a treatment response or remission. TMS appeared well tolerated: people receiving TMS were not more likely to stop treatment, and adverse events were similar between TMS and sham groups. No seizures or serious adverse events were reported. However, the findings should be interpreted cautiously. Most individual trials did not show that TMS was better than sham treatment on their own. The results were also not consistently supported in sensitivity or subgroup analyses, including analyses based on study quality, type of TMS, brain stimulation target, or treatment resistance. Higher total numbers of magnetic pulses were linked with greater improvement in depressive symptoms, suggesting that dose may be important. In summary, TMS may have a small antidepressant effect in bipolar depression and appears acceptable and safe in the available trials. However, current evidence is not strong enough to confidently support TMS as an effective treatment at the doses tested so far. Larger and better-designed trials, especially using higher-dose protocols, are needed.
Introduction
Bipolar disorder (BD) is a chronic psychiatric condition characterized by episodes of mania/hypomania and depression. Depressive episodes in BD are particularly debilitating, contributing significantly to the overall disease burden, including substantial functional impairment and an elevated risk of mortality, notably through suicide. 1 Although quetiapine, olanzapine-fluoxetine combination, lurasidone, cariprazine, and lumateperone are approved by the US Food and Drug Administration (FDA) for treating bipolar depression, and the Canadian Network for Mood and Anxiety Treatments (CANMAT) and International Society for Bipolar Disorders (ISBD) guidelines recommend multiple pharmacologic options as first-line treatments for bipolar I and bipolar II depression, 2 many patients do not achieve an adequate response or are unable to tolerate these treatments. This highlights the need for novel and effective therapeutic interventions. 3
Transcranial magnetic stimulation (TMS), a non-invasive neuromodulation technique that modulates cortical excitability, has gained attention as a treatment for depressive symptoms. It received Health Canada approval for treatment of major depressive disorder (MDD) in 2002, 4 followed by FDA approval in 2008 for medication-resistant MDD following the pivotal trial by O'Reardon et al.5,6 Since then, high-frequency repetitive TMS (rTMS) targeting the left dorsolateral prefrontal cortex (LDLPFC) has been extensively studied for treating depressive episodes in MDD, demonstrating both efficacy and a favourable safety profile.7–9 Bilateral DLPFC stimulation, typically combining high-frequency left and low-frequency right rTMS, has also been reported to improve outcomes in treatment-resistant (TR) depression. 10 More recently, theta-burst stimulation (TBS), including both intermittent and continuous protocols, has emerged as a time-efficient alternative for treating depressive episodes in MDD. 11
Earlier studies suggest that TMS may also hold promise for treating depressive episodes in BD, particularly in TR cases.12,13 Consistent with the still-emerging evidence base in bipolar depression, CANMAT and ISBD guidelines currently list rTMS as a third-line adjunctive option for bipolar I depression. 2 Methodological heterogeneity, such as differences in stimulation parameters, study populations, and sample sizes, has impeded definitive conclusions. 14 While previous meta-analyses have supported the potential efficacy of TMS for bipolar depression,15–17 especially with LDLPFC targeting, 18 a recent synthesis by Hyde et al. in 2022, 14 which examined neurostimulation across multiple mental disorders, found no significant effect of TMS for bipolar depression; however, this analysis included only four randomized controlled trials (RCTs) in bipolar depression.
Several RCTs examining the efficacy of TMS for depressive episodes in BD have recently emerged.19,20 To address this rapidly evolving evidence base, we conducted an updated systematic review and meta-analysis to evaluate the efficacy, safety, and tolerability of TMS for bipolar depression, also exploring the role of moderators. By synthesizing data from RCTs, we aimed to address existing knowledge gaps and inform clinical applications of TMS for bipolar depression.
Methods
This systematic review adheres to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 guideline. 21 The protocol is available on the Open Science Framework at https://osf.io/cnvy4/.
Search Strategy and Inclusion Criteria
We searched MEDLINE, EMBASE, Cochrane, and PsycINFO from database inception until October 1, 2025 (inclusive) for RCTs investigating the efficacy of TMS in people with bipolar depression, without restrictions on language. The full search terms are reported in Supplemental eTable 2. To ensure comprehensive coverage, we also entered the search terms into Google Scholar, performed a hand search, and screened reference lists of included studies to capture any additional articles. Studies were included if they met the following criteria: (1) sham-controlled RCT design using any TMS protocol; (2) included people diagnosed with BD (current depressive episode) based on the Diagnostic and Statistical Manual of Mental Disorders (DSM) or the International Classification of Diseases (ICD) criteria; and (3) used standardized scales assessing depressive symptom severity. For studies with mixed samples (i.e., MDD and BD) that do not report separately by depression type, studies were included if bipolar depression represented >75% of the sample.
Study Screening
Articles were imported into Covidence 22 for abstract and full-text screening. Both title and full-text screening were conducted independently in duplicate by two reviewers (CZ, NF, SW), with disagreements resolved through consensus.
Data Extraction
Data extraction was conducted independently in duplicate (CZ, NF, SW) using a pre-specified Microsoft Excel spreadsheet. The co-primary outcomes were the change in depressive symptoms and all-cause discontinuation; key secondary outcomes were response and remission as defined by the study's authors, with additional outcomes extracted when reported. For each included study, we collected bibliographic details, sample size, key demographic and clinical characteristics, diagnosis (e.g., bipolar I disorder [BD1], bipolar II disorder [BD2], other specified bipolar and related disorder), TMS protocol parameters (including modality, stimulation site, and dose), depression scales at baseline and endpoint, other symptom measures, response and remission rates, adverse events (and adverse event-related discontinuation), and number of completers. If the same outcome was reported in both the grey literature and a peer-reviewed publication, the peer-reviewed report was prioritized. Intention-to-treat (ITT) analyses were preferred.
Risk of Bias (RoB)
The RoB was assessed independently in duplicate by two reviewers (CZ, NF, SW) using the Cochrane Risk of Bias Tool for Randomized Controlled Trials (RoB-2). 23 Studies were rated as “high,” “some concerns,” or “low” RoB. All discrepancies were resolved through consensus.
Statistical Analysis
R v4.4.0 was used for all analyses. A random-effects meta-analysis was performed to calculate a risk ratio (RR) for dichotomous variables or a standardized mean difference (SMD) for continuous variables with 95% confidence intervals (95%CIs), using a restricted maximum likelihood (REML) variance estimator. The Hartung, Knapp, Sidik, and Jonkman (HKSJ) method was applied when k = 3–10 and tau2 > 0.24,25 Between-study variance was measured by tau2, and variability in effect size not caused by sampling error was assessed using the I2 statistic, with I2 > 50% considered high variability. 26 For outcomes with zero events in one arm, an increment of 0.5 was added to each arm, and outcomes with zero events in both arms were excluded from the meta-analysis. The number needed to treat (NNT) was calculated with 95%CIs for dichotomous outcomes with p < .05 based on ORs as described by Cates. 27 Publication bias was assessed visually via funnel plots and quantitatively with Egger's test when k ≥ 10. 28
If at least 10 trials reported on both the moderators of interest and the outcome, meta-regression analyses were conducted to explore heterogeneity in effect estimates based on stimulation dose (total pulses scaled to per 1000 pulses), publication year, age, sex/gender, sample size, stimulation frequency, treatment duration (weeks), and BD2 status (post hoc deviation from the protocol given emerging evidence that BD2 depression may differ in course and treatment response from BD1). 29 Subgroup analyses were conducted by (1) RoB, (2) TMS type, (3) stimulation site, and (4) TR status (≥2 treatment failures for bipolar depression). We conducted leave-one-out sensitivity analyses for depressive symptoms to assess the consistency of effect size.
Results
Search Results, Baseline, and Design of Included Trials
From the initial search, k = 876 records were identified (databases = 875; Google Scholar = 1). After removing 291 duplicates, 585 unique records remained for title/abstract, of which 525 were excluded (Figure 1). We assessed 60 full texts; 43 were excluded (Supplemental eTable 3), leaving 17 RCTs12,30–45 for inclusion. Two three-arm RCTs (Novak et al. 41 and Hu et al. 34 ) contributed two randomized comparison arms each, yielding 19 effect sizes overall. The RCTs were conducted across 12 countries (most commonly France and the USA; k = 3 each). In total, N = 563 participants with bipolar depression were included (intervention: n = 293; sham: n = 270); mean age was 41.50 ± 14.88 years, 38.6% were males. BD subtype was reported in k = 13 RCTs (N = 467): BD1 n = 195 (41.8%), BD2 n = 272 (58.2%), with the remainder unreported. There were no participants with MDD. Race/ethnicity reporting was limited and variable (White/Hispanic/Latino: k = 5, unweighted mean 76.8%; Black: k = 3, 10.1%; Asian: k = 2, 9.0%).

Study selection flow diagram.
Across RCTs, patients were randomized 1:1 to TMS or sham, with total sessions ranging from 10 to 50. The most common TMS types were TBS (k = 9; 52.9%) and rTMS (k = 7; 41.2%). The most common stimulation sites were LDLPFC (k = 12; 70.6%) and RDLPFC (k = 4; 23.5%). All RCTs used TMS as augmentation to current treatment, except one with unreported augmentation/monotherapy status. 30 Most RCTs (k = 16; 94.1%) reported concomitant medications but did not quantify class- or drug-specific exposure (e.g., % or n). Six RCTs included all TR samples (≥2 treatment failures for bipolar depression), five did not include TR samples, and six did not report TR status. See Table 1 for study characteristics.
Characteristics of the Included Randomized Controlled Trials.
Legend: TR = treatment-resistant; BD = bipolar disorder; BD1 = bipolar I disorder; BD2 = bipolar II disorder; BD3 = bipolar III disorder; iTBS = intermittent theta-burst stimulation; cTBS = continuous theta-burst stimulation; rTMS = repetitive transcranial magnetic stimulation; dTMS = deep transcranial magnetic stimulation; ITG = inferior temporal gyrus; PPC = posterior parietal cortex; LDLPFC = left dorsolateral prefrontal cortex; RDLPFC = right dorsolateral prefrontal cortex; RVLPFC = right ventrolateral prefrontal cortex.
Efficacy of TMS in the Treatment of Bipolar Depression
Pooled efficacy estimates are reported in Figures 2 and 3. For continuous outcomes, TMS reduced depressive symptoms versus sham (SMD = −0.34; 95%CI = −0.58 to −0.11) (Figure 2). For dichotomous outcomes, TMS outperformed sham for response (RR = 1.41; 95%CI = 1.10 to 1.80; NNT = 10; 95%CI = 5 to 41) and remission (RR = 1.54; 95%CI = 1.06 to 2.23; NNT = 13; 95%CI = 6 to 108) (Figures 3(a) and (b)). Response was defined as ≥50% reduction in Montgomery-Åsberg Depression Rating Scale (MADRS) or Hamilton Depression Rating Scale (HDRS); remission thresholds ranged from MADRS ≤ 7–12 or HDRS ≤ 7.

Forest plot of pooled efficacy outcomes of TMS in the treatment of bipolar depressive symptoms. Studies that did not report depressive symptom reduction were not included in the figure. Legend: MADRS = Montgomery-Åsberg Depression Rating Scale; HAMD and HDRS and HDRS17 = Hamilton Depression Rating Scale; LDLPFC = left dorsolateral prefrontal cortex; LDL = left dorsolateral (prefrontal cortex); RVL = right ventrolateral (prefrontal cortex); CI = confidence interval.

Forest plots of pooled efficacy outcomes of TMS for bipolar depression: response (a) and remission (b). Studies that did not report response or remission were not included in (a) and (b), respectively Legend: LDL = left dorsolateral (prefrontal cortex); RVL = right ventrolateral (prefrontal cortex); RR = risk ratio; CI = confidence interval.
All-Cause and Intolerance-Related Discontinuation
At the primary endpoint, discontinuation did not differ between TMS and sham for all-cause (p = .66) or intolerability-related discontinuation (p = .82) (Supplemental eFigures 3 and 4).
Safety Outcomes
Pooled safety outcomes are reported in Figure 4. Among continuous outcomes, general mental health (SMD = −0.97; k = 1; 95%CI = −1.70 to −0.25) and psychiatric symptoms (SMD = −1.27; k = 1; 95%CI = −2.24 to −0.31) improved with TMS compared to sham; other continuous safety domains did not differ between groups. Among dichotomous outcomes, adverse events did not differ from sham (p = .23). No seizures, serious adverse events, or vasovagal syncope were reported, and there were no significant differences in manic/hypomanic switch or other commonly reported adverse effects.

Forest plots of pooled continuous safety outcomes of TMS in the treatment of bipolar depression (a) and pooled dichotomous safety outcomes (b). Legend: SMD = standardized mean difference; RR = risk ratio; CI = confidence interval.
Publication Bias and RoB
Publication bias was assessed for depressive symptoms, remission, and response. For depressive symptoms, the funnel plot was broadly symmetric with one small-study outlier, and Egger's intercept suggested no small-study effects (p ≥ .10) (Supplemental eFigure 5). For remission and response, funnel plots showed evidence of asymmetry, and Egger's intercept suggested small-study effects (p = .015 and p = .0082, respectively) (Supplemental eFigure 6 and 7). Publication bias was not assessed for all-cause discontinuation due to <10 reporting studies.
Of 17 publications, six (35%) were “low” RoB and 11 (65%) had “some concern” (Supplemental eFigure 8), most commonly due to compromised blinding/deviations from intended interventions and selective reporting.
Meta-Regression and Dose–Response Meta-Analysis
Total stimulation dose (per 1000 pulses) was associated with effect size (β = −0.018; 95%CI = −0.027 to −0.009; k = 14), indicating that higher cumulative pulse delivery was linked to greater improvement in depressive symptoms. After excluding the lowest-dose trial (Mak et al. 38 4500 pulses), the association persisted (β = −0.0164; 95%CI = −0.0265 to −0.0064; k = 13). Treatment duration, sample size, age, percentage female, stimulation frequency, percentage BD2, and publication year were not significant moderators (Supplemental eTable 4; Supplemental eFigure 9).
Subgroup Analyses
Several subgroup-specific estimates achieved nominal significance compared to sham, namely depressive symptoms in “low” RoB studies (SMD = −0.31; 95%CI = −0.57 to −0.05; k = 7) (Supplemental eFigure 1A), response in “low” RoB studies (RR = 2.05; 95%CI = 1.22 to 3.42; k = 6) (Supplemental eFigure 2A), depressive symptoms with deep TMS (dTMS) (SMD = −0.62; 95%CI = −1.19 to −0.06; k = 1) (Supplemental eFigure 1B), remission with rTMS (RR = 2.14; 95%CI = 1.02 to 4.48; k = 6), depressive symptoms in LDLPFC trials (SMD = −0.44; 95%CI = −0.83 to −0.05; k = 10) (Supplemental eFigure 1C), response in LDLPFC trials (RR = 1.55; 95%CI = 1.14 to 2.10; k = 10), remission in LDLPFC trials (RR = 1.85; 95%CI = 1.10 to 3.12; k = 6), response in non-TR (RR = 1.59; 95%CI = 1.01 to 2.51; k = 5) and TR trials (RR = 2.45; 95%CI = 1.37 to 4.38; k = 6), and remission in TR trials (RR = 2.59; 95%CI = 1.00 to 6.70; p = .049; k = 4). Other subgroup-specific estimates for depressive symptoms, response, remission, all-cause discontinuation, and intolerability-related discontinuation were not statistically significant versus sham (Supplemental eTable 5). However, no formal tests for between-subgroup differences were significant across RoB, TMS type, stimulation site, or TR status (all p_diff ≥ .05), indicating no clear evidence that the effect differed between subgroups.
Sensitivity Analyses
Leave-one-out analysis for depressive symptoms showed that the magnitude and direction of the pooled SMD varied only minimally, and the overall significant benefit of TMS over sham remained.
Because Luo et al. 42 was the only trial conducted in non-adults (mean age 15.7 years), we repeated subgroup analyses after excluding this study. Removal of Luo et al. yielded SMD = −0.38 (95%CI = −0.63 to −0.14), which was larger than the overall pooled effect on depressive symptoms. When stratified by RoB without Luo et al., the effect size for “some concerns” studies became significant (SMD = −0.48; 95%CI = –0.93 to −0.04). When stratified by TMS type without Luo et al., the effect size for TBS remained non-significant (SMD = −0.46; 95%CI = −0.97 to 0.05). When stratified by stimulation site without Luo et al., the effect size for LDLPFC stimulation site remained significant (SMD = −0.52; 95%CI = −0.93 to −0.11). TR status was not reported, so stratification by TR status was not conducted.
Because Mallik et al. 40 was the only continuous TBS (cTBS) study among the TBS studies (with the rest being intermittent TBS [iTBS]), we also repeated subgroup analyses excluding this trial. Removal of Mallik et al. yielded SMD = −0.37 (95%CI = −0.61 to −0.12), which was larger than the overall pooled effect on depressive symptoms. When stratified by RoB without Mallik et al., the effect size for “low” RoB studies remained significant (SMD = −0.34; 95%CI = −0.61 to −0.08). When stratified by TMS type without Mallik et al., the effect size for TBS remained non-significant (SMD = −0.43; 95%CI = −0.94 to 0.07). When stratified by stimulation site without Mallik et al., the effect size for sites other than LDLPFC remained non-significant (SMD = −0.16; 95%CI = −0.45 to 0.12). TR status was not reported, so stratification by TR status was not conducted.
Because Mak et al. 38 was the only low-frequency (≤1 Hz) rTMS study among the rTMS studies (with the rest being high-frequency [≥5 Hz] rTMS), we also repeated subgroup analyses excluding this trial. Removal of Mak et al. yielded SMD = −0.39 (95%CI = −0.62 to −0.16), which was larger than the overall pooled effect on depressive symptoms. When stratified by RoB without Mak et al., the effect size for “some concerns” studies became significant (SMD = −0.50; 95%CI = −0.93 to −0.07). When stratified by TMS type without Mak et al., the effect size for rTMS became significant (SMD = −0.39; 95%CI = −0.67 to −0.10). When stratified by stimulation site without Mak et al., the effect size for sites other than LDLPFC remained non-significant (SMD = −0.26; 95%CI = −0.56 to 0.05). When stratified by TR status without Mak et al., the effect size for TR studies remained non-significant (SMD = −0.81; 95%CI = −1.48 to −0.13).
Discussion
In this systematic review and meta-analysis of 17 RCTs (19 randomized comparisons, N = 563), TMS showed a small, statistically significant meta-analytic advantage over sham for reducing bipolar depressive symptoms (SMD = −0.34), with corresponding benefits on response (RR = 1.41; NNT = 10) and remission (RR = 1.54; NNT = 13). Discontinuation and adverse-event rates, including manic/hypomanic switches, did not differ from sham, and no seizures or serious adverse events were reported, supporting a generally favourable tolerability and safety profile. However, the credibility of this result is undermined by several factors: most individual comparisons (88.2%; k = 15) did not show statistically significant superiority of TMS over sham (possibly due to insufficient power), positive findings were not consistently reproduced across sensitivity and subgroup analyses, and there was no robust evidence that any specific TMS type or stimulation site clearly outperformed others. These findings should also be interpreted cautiously because most studies targeted LDLPFC, while the few studies (k = 7) targeting other sites did so heterogeneously (e.g., RDLPFC, RVLPFC, cerebellar vermis, bilateral). Overall, TMS may confer a modest average benefit in bipolar depression, but the sham-controlled RCT evidence remains tenuous and does not yet support its routine clinical use for this indication. These findings are broadly aligned with current CANMAT and ISBD guidance, which places adjunctive rTMS as a third-line option for bipolar I depression rather than alongside first-line pharmacotherapies. 2
Interpretation of Efficacy, Subgroup, and Sensitivity Findings
The pooled effect on depressive symptoms was small and appears lower than typical effects reported for TMS in MDD, where effect sizes are commonly small to moderate.46,47 A plausible contributor to this discrepancy is pervasive underpowering in bipolar depression trials compared to MDD trials: in our dataset, the largest active arm enrolled only 25 participants, whereas MDD trials often include substantially larger samples.46,47 Small samples increase imprecision, make true effects harder to detect at the individual-trial level, and may help explain both the small pooled SMD and the high proportion of null primary outcomes. In clinical terms, a small SMD may translate into subtle improvements at the group level that are difficult to detect in individual participants, especially against a background of active pharmacotherapy and placebo response. Although meta-analysis improves precision relative to any single trial, only 563 participants contributed to the outcomes across 17 small RCTs. Funnel plot inspection and Egger's test did not suggest small-study effects for depressive symptoms; therefore, the pooled SMD may be viewed as a more precise summary of limited data, though not definitive confirmation of efficacy.
Dichotomous outcomes supported a similarly cautious interpretation. Response and remission favoured TMS overall, but the confidence intervals for both were wide (NNT 95%CI = 5 to 41 and 6 to 108, respectively), reflecting imprecision and the influence of a few small studies. When trials were stratified by RoB, response remained significant in “low” RoB studies, but remission did not. Similarly, subgroup-specific estimates occasionally reached nominal significance (e.g., depressive symptoms in “low” RoB trials, the single dTMS trial, and LDLPFC trials, as well as response and remission in LDLPFC, TR, and non-TR trials). However, formal tests for between-subgroup differences were uniformly non-significant. Funnel plot inspection and Egger's test suggested small-study effects for response and remission, further supporting cautious interpretation of these categorical outcomes.
Isolated significant subgroups in the absence of significant between-subgroup tests mean we cannot claim robust effect modification by RoB, TMS type, stimulation site, or TR status. LDLPFC trials nevertheless tended to show numerically larger and more consistently favourable effects than non-LDLPFC targets, consistent with the broader MDD TMS literature. The lack of statistically significant subgroup differences likely reflects the small number of trials within each subgroup and the limited power of between-subgroup tests rather than proving that all stimulation targets are equally effective. LDLPFC can therefore be viewed as the most promising candidate target on current evidence, but this remains hypothesis-generating rather than definitive.
Sensitivity analyses further emphasized the fragility of the findings. Leave-one-out analyses showed that the direction of effect on depressive symptoms was generally stable, but the magnitude was modest and sensitive to the inclusion of individual trials. Excluding the only adolescent study 42 or the only cTBS study 40 slightly increased the pooled SMD for depressive symptoms, yet did not turn most non-significant subgroups into clearly positive findings. Additional analyses excluding the only low-frequency (≤1 Hz) rTMS study, 38 which also delivered the lowest total stimulation dose and targeted RDLPFC, further increased the pooled SMD for depressive symptoms (to −0.39) and rendered some previously non-significant strata statistically significant (i.e., rTMS and “some concerns” RoB studies), while estimates for non-LDLPFC sites and TR studies remained non-significant. This may suggest that very low-dose or atypical inhibitory protocols (e.g., low-frequency rTMS) may reduce the efficacy; however, the effect size remains small, confidence intervals are wide, and the inferences continue to rest on a small number of heterogeneous trials.
Overall, our subgroup and sensitivity analyses suggest that the small average benefit of TMS is not consistently anchored to a specific protocol (e.g., LDLPFC rTMS, iTBS), population (e.g., adult vs adolescent, TR vs non-TR), or methodological quality. Excluding a few atypical studies (adolescent, cTBS, low-frequency RDLPFC rTMS) tends to nudge the pooled effect toward slightly larger and sometimes statistically significant estimates, but these signals remain modest and imprecise. This lack of a clear “best” parameter set contrasts with the more established evidence base in MDD, where LDLPFC high-frequency rTMS and certain accelerated iTBS protocols have more reproducible efficacy profiles. 46 What our results most clearly highlight is that the bipolar depression TMS literature is dominated by very small RCTs, and the field now urgently requires adequately powered, high-dose trials before efficacy can be considered established.
Dose–Response Meta-Regression
A notable and more internally coherent finding was the dose–response association between total stimulation dose (pulses) and antidepressant effect. Meta-regression showed that higher cumulative pulses were significantly associated with greater improvement in depressive symptoms (β = −0.018 per 1000 pulses), and this association remained significant after excluding the lowest-dose trial (Mak et al. 38 ). This suggests that, at least within the parameter ranges used in bipolar depression RCTs, underdosing may partly explain why many individual trials were negative. Conceptually, an increase of 10000 pulses across an acute course would correspond to an approximate SMD improvement of 0.18, which, while still modest, could meaningfully influence trial-level outcomes.
These findings echo dose–response patterns observed in broader TMS and neurostimulation literature, where higher cumulative pulses or more intensive schedules are often linked to greater efficacy. 9 For example, the recent randomized trial of accelerated iTBS for treatment-refractory bipolar depression and the two-site open-label feasibility study of the Stanford Accelerated Intelligent Neuromodulation Therapy (SAINT) protocol in BD1 both delivered markedly higher total pulse doses over condensed time frames and reported rapid, substantial symptom improvements.12,13
However, dose–response relationships derived from aggregate meta-regression are observational and susceptible to ecological confounding; higher doses may co-occur with other design features (e.g., more experienced centres, more intensive clinical monitoring, higher baseline symptom severity) that could partly account for observed differences. Thus, our findings should be regarded as hypothesis-generating and supportive of, but not definitive for, a causal role of dose in mediating TMS efficacy for bipolar depression.
Comparison With Previous Meta-Analyses
Our findings both converge with and diverge from prior syntheses of TMS for bipolar depression. Earlier meta-analyses focusing primarily on rTMS generally reported small-to-moderate benefits on depressive symptoms and response, suggesting that TMS may be efficacious for bipolar depression.15,16,18 These studies, however, were based on fewer trials (often older, smaller RCTs with predominantly LDLPFC rTMS protocols, and some included mixed samples with a substantial proportion of participants with MDD rather than bipolar depression alone) and predated several recent negative or equivocal trials, including those examining TBS and newer accelerated protocols.12,33,37,39,41,42
More recent work by Hyde et al. found no significant effect of TMS for bipolar depression when analysing a cross-diagnostic dataset of neurostimulation RCTs, although only four BD RCTs contributed to that estimate. 14 Subsequent network meta-analyses incorporating TMS within a broader set of non-invasive brain stimulation interventions have also suggested ongoing uncertainty regarding its comparative advantage in bipolar depression when contrasted with other modalities and sham.19,20 Ventura et al. 17 recently provided a synthesis of RCT and uncontrolled studies, and conducted sham-controlled efficacy analyses. In those randomized sham-controlled trials, TMS was associated with a small-to-moderate antidepressant effect relative to sham (Cohen's d = 0.40), whereas their broader effectiveness analyses pooling active arms from RCTs together with uncontrolled studies yielded a much larger pre-post effect size (Cohen's d = 1.4). Our results are therefore directionally consistent with Ventura et al. in that both reviews detect a modest sham-controlled antidepressant signal. But, taken together, the two reviews also highlight the marked discrepancy in effect size estimates derived from sham-controlled versus open-label or uncontrolled designs. Specifically, our review was restricted to sham-controlled RCTs comprising 100% BD participants and found that the pooled benefit was small (SMD = −0.34), most individual comparisons were negative, subgroup and sensitivity analyses did not yield a robustly reproducible efficacy signal, and formal tests for between-subgroup differences did not support clear effect modification by RoB, TMS type, stimulation site, or TR status. Ventura et al. also identified a greater number of sessions as a predictor of larger antidepressant effect, which is broadly concordant with our dose–response meta-regression linking higher cumulative pulse dose to greater symptom improvement. Taken together, the two reviews suggest that while TMS may have antidepressant potential in bipolar depression, the substantially larger effects observed in open-label and uncontrolled studies likely overestimate efficacy relative to the more modest signal seen in sham-controlled trials, and the current sham-controlled RCT evidence remains insufficient to establish TMS as a clearly efficacious treatment at current parameter settings.
Notably, the clinical TMS literature in bipolar depression has increasingly included open-label RCTs and real-world retrospective analyses, where larger samples and more naturalistic settings often yield encouraging signals.13,48,49 These study designs are useful for assessing feasibility, tolerability, safety, and effectiveness in routine practice and may enhance external validity. However, because they are more vulnerable to expectancy effects, regression to the mean, concomitant treatment changes, selection bias, and confounding by indication, they cannot establish antidepressant efficacy with the same internal validity as blinded sham-controlled RCTs. Our findings show that, in contrast, the sham-controlled RCT base still consists of small studies with limited power, and it is this underpowered RCT backbone that constrains our ability to make firm efficacy conclusions. These findings suggest that while broader, less controlled data support the clinical promise of TMS, the current RCT evidence base remains too limited and inconsistent to regard TMS as an established treatment for bipolar depression.
Several methodological differences may explain why we identify a statistically significant but small overall effect, while simultaneously concluding that TMS does not yet represent a credible treatment option. First, our analysis emphasizes the internal consistency of the signal across subgroups, sensitivity analyses, and individual trials, rather than relying solely on pooled p-values. The fact that over 80% of individual RCT comparisons are negative, combined with inconsistent replication of subgroup-specific benefits, suggests that the “average” effect may be driven by a subset of small or context-specific positive trials. Second, we used RoB-2 to classify studies and examined RoB as a moderator, revealing that benefit signals were largely confined to “low” RoB trials for response but not remission; however, formal tests for between-subgroup differences did not confirm statistically significant differences. Third, our inclusion of heterogeneous TMS modalities and targets (reflecting real-world experimentation in bipolar depression) likely increased between-study variability and diluted any protocol-specific effect that might emerge in more homogeneous samples (e.g., adult LDLPFC rTMS only). Finally, pharmacological differences between bipolar depression and MDD samples may contribute to weaker or less consistent TMS effects: people with BD are more likely to receive anticonvulsants/mood stabilizers, dopamine/serotonin receptor antagonists, or dopamine receptor partial agonists, which can reduce or increase cortical excitability and partly dampen or alter the clinical benefit of TMS. 50
Safety and Tolerability
Our findings that all-cause and intolerability-related discontinuation did not differ between TMS and sham, and that commonly reported adverse events (e.g., headache, stimulation-site discomfort, pain/sensory discomfort) were not increased with active stimulation, are consistent with the broader TMS literature.6–9 Importantly, mania or hypomania switches were not significantly more frequent with TMS; while reassuring, this should be interpreted with caution given modest sample sizes, short follow-up, and heterogeneity in mood stabilizer co-prescription. The absence of reported seizures or serious adverse events likely reflects both the inherent safety of TMS when delivered within recommended parameters and the exclusion of higher-risk patients in RCTs.
From a clinical standpoint, these safety data suggest that TMS is unlikely to be more hazardous than sham or usual care for carefully selected patients with bipolar depression. However, given the small and fragile efficacy signal, the current balance of evidence does not justify widespread clinical deployment of TMS for bipolar depression outside research settings or highly individualized, last-line use after established interventions have been exhausted.
Strengths and Limitations
This study has several strengths. It was conducted according to a pre-registered protocol, followed PRISMA 2020 guidelines, and employed a comprehensive search strategy across multiple databases and grey literature without language restrictions. RoB was evaluated using the RoB-2 tool, and all screening and extraction procedures were conducted in duplicate to enhance reliability. We synthesized both acute efficacy and safety/tolerability outcomes, examined multiple clinically relevant moderators, and performed dose–response meta-regression and extensive subgroup and sensitivity analyses (a strength compared to previous meta-analyses).
Nonetheless, several limitations should be discussed. First, the combined sample for depressive outcomes was relatively small (563 participants across 17 RCTs), and many included studies were small, single-centre pilot trials, with the largest active TMS arm enrolling only 25 participants. This pervasive underpowering limits precision and increases vulnerability to chance findings and selective reporting. Our funnel plots and Egger's regression intercepts did not suggest small-study effects for depressive symptoms, but did for remission and response, but these methods have limited power when only slightly more than 10 studies are available, so more subtle publication bias cannot be ruled out. Meta-analysis can partially offset imprecision by pooling these trials, but it cannot fully overcome the limitations of very small trials or restore adequate power for clinically important subgroups. Second, heterogeneity in TMS parameters (e.g., modality, stimulation site, frequency, intensity, number of pulses, coil type, schedule), while realistic, may limit the interpretability and generalizability of our pooled findings and obscure genuine effects that might exist within specific parameter ranges. Third, nearly all trials evaluated TMS as an adjunct to ongoing pharmacotherapy, with incomplete reporting of concomitant medications; thus, our findings may not generalize to monotherapy or specific pharmacological combinations, and confounding by medication changes cannot be fully excluded. Fourth, reporting of bipolar subtype, polarity history, TR status, and illness course was incomplete, and no trials were designed or powered to test differential efficacy by BD1 versus BD2, rapid cycling, or psychotic features. Fifth, outcome assessment was limited to trial endpoints; the durability of TMS effects, relapse prevention, and long-term safety in BD populations remain largely unknown. Sixth, we relied on aggregate study-level data, which restricts our ability to explore patient-level moderators (e.g., baseline severity, comorbidities, medication doses), and may contribute to ecological bias in meta-regression analyses. Finally, several subgroup analyses contained only a small number of comparisons (e.g., dTMS and non-DLPFC targets), increasing the risk of artificial statistically significant findings.
Future Directions
Future research should prioritize large, multi-centre, adequately powered RCTs that: (1) test higher cumulative pulse doses and intensive/accelerated protocols, informed by dose–response findings but within safety limits; (2) focus on well-characterized bipolar subtypes (e.g., BD1 vs BD2, TR vs less refractory, rapid cycling, mixed features) to identify populations most likely to benefit; (3) standardize TMS parameters, particularly LDLPFC high-frequency rTMS and iTBS protocols that have established efficacy in MDD; (4) include longer-term follow-up to assess durability and relapse prevention; (5) rigorously monitor and report manic/hypomanic switches and other mood-destabilizing events; (6) combine with emerging therapeutics for BD (e.g., xanomeline-trospium chloride51,52); and (7) combine with lifestyle interventions such as exercise, which may enhance outcomes when used with TMS in MDD.53–55 Trials should explicitly report DLPFC targeting methods (e.g., MRI-guided neuronavigation, Beam F3, standard F3), and future meta-analyses should compare them, as demonstrating similar efficacy for lower-cost approaches like Beam F3 would have important implications for implementation. In this meta-analysis, we did not stratify by targeting method because TMS for bipolar depression remains exploratory, and available data were insufficient for robust comparison. Individual participant data meta-analyses will be essential to refine dose–response relationships, protocol selection, and patient-level predictors of response.
In parallel, mechanistic and translational work linking TMS to circuit-level changes, neurocognitive markers, and behavioural phenotypes specific to bipolar depression may help clarify whether and how TMS can be tailored to this disorder rather than simply imported from MDD protocols. Until such evidence is available, TMS for bipolar depression should be regarded as an experimental intervention with encouraging but not yet compelling support, and its use remains best suited to carefully monitored research contexts.
Conclusion
TMS may have a small antidepressant effect in bipolar depression that is not sufficiently robust and consistent to warrant strong clinical recommendations at present. Clinicians and guideline panels should remain cautious about extrapolating from the more established efficacy of TMS in MDD to bipolar depression, particularly given the high burden of treatment resistance and the potential for mood destabilization in this population, as well as the high number of TMS sessions/doses required for a robust effect. When TMS is considered for individual patients with bipolar depression, it should ideally be delivered within clinical trial frameworks, with careful mood stabilizer optimization, close monitoring for adverse events, and explicit discussion of the limited and uncertain evidence base. Future larger, higher-dose, and/or monotherapy TMS trials are needed to clarify the role of TMS in bipolar depression.
Supplemental Material
sj-docx-1-cpa-10.1177_07067437261457224 - Supplemental material for Transcranial Magnetic Stimulation for Bipolar Depression: A Systematic Review and Meta-Analysis of Randomized Controlled Trials: Stimulation magnétique transcrânienne dans les cas de dépression bipolaire : une revue systématique et une méta-analyse d’essais contrôlés à répartition aléatoire
Supplemental material, sj-docx-1-cpa-10.1177_07067437261457224 for Transcranial Magnetic Stimulation for Bipolar Depression: A Systematic Review and Meta-Analysis of Randomized Controlled Trials: Stimulation magnétique transcrânienne dans les cas de dépression bipolaire : une revue systématique et une méta-analyse d’essais contrôlés à répartition aléatoire by Carl Zhou, Nicholas Fabiano, Stanley Wong, Mikkel Højlund, Risa Shorr, Michel Sabé, Mattia Campana, Joshua Hyde, Valerie Brandt, Samuele Cortese, Sara Tremblay, Ram Brender, Gayatri Saraf, Lakshmi N. Yatham and Marco Solmi in The Canadian Journal of Psychiatry
Footnotes
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
Marco Solmi received honoraria/has been a consultant for Angelini, AbbVie, Bausch Health, Boehringer Ingelheim, Lundbeck, Otsuka, and Teva. Mikkel Højlund received honoraria/has been a consultant for Lundbeck and Otsuka. Samuele Cortese, NIHR Research Professor (NIHR303122), is funded by the NIHR for this research project. The views expressed in this publication are those of the author(s) and not necessarily those of the NIHR, NHS, or the UK Department of Health and Social Care. Samuele Cortese is also supported by NIHR grants NIHR203684, NIHR203035, NIHR130077, NIHR128472, RP-PG-0618-20003 and by grant 101095568-HORIZONHLTH- 2022-DISEASE-07-03 from the European Research Executive Agency. Samuele Cortese has declared reimbursement for travel and accommodation expenses from the Association for Child and Adolescent Central Health (ACAMH) in relation to lectures delivered for ACAMH, the Canadian AADHD Alliance Resource, the British Association of Psychopharmacology, and Healthcare Convention and CCM Group team for educational activity on ADHD and has received honoraria from Medice.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
