Abstract
We conducted primary and exploratory meta-analyses to evaluate the effectiveness of dialectical behavior therapy (DBT) and its forensic (DBT-F) and correctional (DBT-CM) adaptations in forensic psychiatric and correctional settings. Following PRISMA 2020 guidelines, a systematic search across multiple databases identified studies assessing the effects of DBT on aggression, impulsivity, emotion regulation, depression, and borderline symptoms. The primary meta-analysis included eight pretest-posttest control group studies (N = 394) and found a significant overall effect (b = −.51, p = .04) with notable heterogeneity. In a moderator analysis, a significant improvement was observed for borderline symptoms (b = −1.43, p < .001), but not for other domains. The exploratory meta-analysis of 21 studies (including one-group designs) found significant moderator effects for impulsivity, emotion regulation, depression, aggression and borderline symptoms. DBT appears promising for reducing borderline pathology in forensic and correctional settings, though more high-quality studies using well-adapted protocols are needed.
Keywords
Dialectical behavior therapy (DBT) is a comprehensive, evidence-based treatment originally developed by Marsha M. Linehan in the late 1980s for individuals diagnosed with borderline personality disorder. Although grounded in cognitive-behavioral principles, DBT is distinguished by its integration of mindfulness and its dialectical philosophical foundation. The dialectical framework emphasizes the synthesis of opposites, most prominently the balance of acceptance and change. Clients are encouraged to accept themselves and their current emotional experiences while simultaneously working toward behavioral change (Linehan, 1993).
The standard DBT program (Linehan et al., 2015) comprises four core treatment modalities: (a) individual therapy; (b) skills training groups, which are conducted in a classroom-like format and in which clients learn and practice behavioral skills across four modules, i.e., mindfulness (enhancing awareness and presence), distress tolerance (managing crises without making them worse), emotion regulation (understanding and modulating emotional responses), and interpersonal effectiveness (assertiveness and maintaining relationships) that are intended to help them increase their emotional and cognitive regulation by learning about the triggers that lead to reactive states and helping them to assess which coping skills to apply in the sequence of events, thoughts, feelings, and behaviors that lead to problematic behavior; (c) telephone coaching, which allows clients to contact therapists between sessions for support in applying DBT skills in real-life situations; and (d) therapist consultation teams, weekly meetings in which DBT therapists receive support to maintain treatment fidelity and manage the emotional demands of working with high-risk clients. DBT is highly structured and behavioral in orientation and emphasizes commitment, diary cards, behavioral chain analysis, and validation strategies (Linehan et al., 2015).
DBT is particularly suitable for individuals who experience emotions intensely and have difficulty regulating them. While it was originally developed for people diagnosed with borderline personality disorder, research has shown that DBT can also be effective for a wide range of other mental health conditions, including substance use disorders, posttraumatic stress disorder, depression, and eating disorders (Lynch et al., 2007). Its adaptability makes DBT a viable treatment option across various clinical settings, including outpatient clinics and inpatient units.
Impact of DBT on Behavior
Empirical studies have consistently shown that DBT interventions positively influence several key behavioral and emotional domains. Research demonstrates that DBT can effectively reduce borderline symptoms, including emotional instability, identity disturbance, and interpersonal dysfunction (Ineme & Osinowo, 2016; Storebø et al., 2020; Wahl, 2011). In addition, DBT’s strong focus on emotion regulation has been linked to improved emotional well-being and more stable interpersonal relationships. Patients who participate in DBT programs often report enhanced abilities to identify, understand, and manage emotions in socially appropriate and adaptive ways (Bianchini et al., 2019; Gratz & Gunderson, 2006; Stadler et al., 2024).
One of the core strengths of DBT is its focus on reducing impulsivity and aggression. In addition to enhancing emotion regulation skills, DBT helps participants increase their distress tolerance and teaches them to pause and reflect before acting on impulsive urges, thereby reducing the likelihood of aggressive outbursts (Bianchini et al., 2019; Shelton et al., 2011; Wahl, 2011). Finally, DBT has been shown to alleviate symptoms of depression, in part by increasing behavioral activation (Asmand et al., 2015; Bradley & Follingstad, 2003).
Suitability of DBT for Forensic and Prison Populations
The use of DBT in forensic psychiatry and correctional settings is a relatively recent but rapidly growing area of clinical practice and research. Because DBT specifically targets deficits in emotion regulation, it is considered particularly suitable for forensic psychiatric patients and incarcerated individuals. These populations often present with complex mental health needs and show high rates of personality and mood disorders (Mills et al., 2019; Tomlinson, 2018). For example, the prevalence of borderline personality disorder among women and men in prison is estimated to be 27.4% and 18.8%, respectively (Dahlenburg et al., 2024). Empirical studies show that justice-involved individuals with a history of violent behavior experience greater difficulties in emotion regulation and report more intense negative emotions, particularly anger and hostility, than men who have not committed offenses (Garofalo et al., 2017; Strickland et al., 2017; Velotti et al., 2017). Furthermore, a literature review by Leshem et al. (2019) concludes that deficits in emotion regulation constitute a significant risk factor for delinquency and recidivism. Given the distinct characteristics and needs of these populations, standard DBT has been adapted to better fit forensic and correctional settings. Two of the most notable adaptations are DBT for forensic populations (DBT-F) and DBT-corrections modified (DBT-CM).
The forensic adaptation of DBT, DBT-F, was developed in close collaboration with M. Linehan (McCann et al., 2000, 2007). It is tailored to individuals in forensic psychiatric settings, such as high-security hospitals or long-term psychiatric units. Among forensic inpatients, the developers focused in particular on the subgroup of justice-involved individuals with dissocial and impulsive behaviors, who are characterized by a constant search for stimulation, a tendency toward boredom, a parasitic lifestyle, a lack of realistic long-term goals, impulsivity, and irresponsibility. This subgroup does not use violence to achieve specific goals (i.e., instrumental violence) but instead tends to exhibit violence as a reaction to frustration, an expression of irritability, emotional instability, and impulsivity (i.e., reactive violence). DBT-F retains the core structure of standard DBT, including group-based interventions, but incorporates several specific modifications. These include a stronger emphasis on risk assessment and management; a focus on criminogenic thinking patterns such as entitlement, minimization, and externalization of blame; expanded interpersonal effectiveness modules addressing issues of power, control, and institutional authority; integration of violence-prevention strategies; and more extensive work on impulse control (Oermann, 2013).
In contrast, DBT-CM was developed for implementation in correctional institutions, such as prisons (Sampl et al., 2010; Shelton et al., 2011). Given the logistical constraints of these settings (e.g., limited session time, security procedures, short lengths of stay), DBT-CM incorporates shorter and more focused skills modules, places greater emphasis on group-based interventions and peer support, uses a modular format that allows participants to enter and exit the program flexibly, and involves correctional staff as co-facilitators or as reinforcers of DBT principles within the institutional milieu (Shelton et al., 2009). Both DBT-F and DBT-CM aim to address the high prevalence of aggressive behavior, violence, impulsivity, and emotional dysregulation within forensic and correctional settings, and their use has been associated with reduced institutional infractions, fewer episodes of self-harm, and improved emotion regulation (Evershed et al., 2003; Rosenfeld et al., 2019).
To date, one qualitative systematic literature review and one meta-analysis have examined the effectiveness of DBT in forensic psychiatry and correctional settings. In their review, Tomlinson (2018) found that DBT has the potential to reduce the risk of recidivism within criminal justice systems. Mills et al. (2019), in the only existing meta-analysis, reported that DBT significantly reduces risk behavior in forensic service users, with large effect sizes. Their analysis, however, was based on only five studies, none of which included a control group. Since then, many more studies have been published, making it reasonable to conduct a new meta-analysis. With a substantially larger and higher-quality dataset (including studies with control groups), we will also be able to carry out moderator analyses. This is crucial because DBT evaluation studies assess a wide range of dependent variables (e.g., aggressive behavior, impulsivity, and emotion regulation), which Mills et al.’s meta-analysis could not examine separately. Instead, they aggregated diverse outcome variables such as impulsivity, self-injury, and aggression.
Aim of the Meta-Analyses
The present meta-analyses aimed to synthesize and evaluate the available empirical evidence on the effectiveness of DBT, DBT-F, and DBT-CM in forensic psychiatric and correctional populations. In addition, moderator analyses were conducted to examine potential sources of variability in treatment effects. The following moderators were included:
Outcome categories (Aggression, Borderline, Depression, Emotion Regulation, and Impulsivity): Different outcome domains capture distinct aspects of DBT’s effectiveness. Examining each category separately allows us to identify whether DBT has stronger effects on specific domains, such as emotion regulation or impulsivity, rather than assuming a uniform effect across all behavioral and psychological outcomes.
Percentage of men: Sex may influence treatment outcomes, as men and women differ in aggression and emotional expression (Knight et al., 2002), though it is unclear whether this affects DBT effectiveness (Penta et al., 2022).
Intervention hours: The total number of hours or intensity of the DBT program may affect treatment efficacy. By including intervention hours as a moderator, we can examine whether longer or more intensive programs yield larger effects.
Prison versus forensic context: DBT is implemented differently in correctional institutions and forensic psychiatric hospitals due to environmental and structural factors. This moderator assesses whether the setting influences outcomes, as treatment may be more or less effective depending on contextual constraints.
DBT skills training versus full DBT treatment program: Some studies implement only the skills training component of DBT, while others use the full treatment package, including individual therapy and consultation teams. This moderator allows us to determine whether partial versus comprehensive interventions produce different effects.
Year of publication: Research practices and quality of reporting have evolved. Including publication year as a moderator enables us to assess whether effect sizes vary in relation to the period in which the study was conducted.
By systematically reviewing and integrating the current state of research, this work aimed to provide clinicians and policymakers with a clearer understanding of the usefulness of DBT in improving psychological and behavioral outcomes in some of the most challenging environments.
Method
We conducted and reported the primary and exploratory meta-analyses in accordance with the updated Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines (Page et al., 2021). We preregistered the study on the Open Science Framework (OSF) prior to starting the first round of the literature search (https://osf.io/bxcqh). In this protocol, we stated that we would examine DBT-F and DBT-CM in forensic or prison contexts. During the course of the literature search, we found that only 9 of the 21 therapies conducted in these settings followed DBT-F (n = 8) or DBT-CM (n = 1); in the other cases, the standard DBT treatment program was evaluated. We therefore decided not to adhere to the protocol. Thus, contrary to our initial plan, we did not focus on DBT-F and DBT-CM, but instead on DBT (in any form) in forensic or prison contexts.
Search Strategy
We performed a systematic literature search in the databases Web of Science (including Web of Science Core Collection, Preprint Citation Index, and ProQuestTM Dissertation & Theses Citation Index), APA PsycInfo, APA PsycArticles, PubMed, and Ovid Medline. To ensure that we also captured studies not indexed in these databases, such as research published in Persian, we conducted the same search in Google Scholar. Search terms were used in English, German, and Persian (languages spoken at a native level within the research team). The English search terms were: “Dialectical Behavior Therapy” AND (“Forensic*” OR “Forensic* Psychiatry” OR “correction* modified” OR “correctional setting*” OR “prison*”); The German search terms were: “Dialektisch-behaviorale Therapie” AND (“Forensisch*” OR “Forensisch* Psychiatrie” OR “Maßregelvollzug*” OR “Gefängnis*” “Justizvollzugsanstalt*”); The Persian search terms were: “رفتار درمانی دیالکتیک ” AND (“روانپزشکی قانونی” OR “زندان” OR “کانون اصلاح و تربیت”). We additionally screened the reference lists of relevant studies to identify further eligible articles. The literature search was carried out independently by two of the authors (MB and CM) in October 2024.
Inclusion and Exclusion Criteria
To be included in the primary meta-analysis, studies had to meet the following criteria: (a) They examined the effectiveness of DBT in any form (e.g., standard DBT, DBT-F, or DBT-CM), either as the full treatment package or at least one DBT module; (b) they recruited justice-involved individuals in a forensic psychiatric hospital or a correctional institution (e.g., prison); (c) they used a quasi-experimental design, that included both an experimental group receiving DBT and a control group not receiving DBT. The control condition could consist of a) an active control group (patients attending other therapy forms) and/or b) an inactive control group (no therapeutic treatment); (d) they reported quantitative data; therefore, qualitative studies, interviews, or other non-quantitative formats were excluded; (e) publications could be peer-reviewed or non–peer-reviewed (e.g., dissertations). When duplicate or multiple versions of a study were identified (e.g., dissertation and journal article), we included the earlier publication in the case of multiple articles, and the peer-reviewed published version in the case of dissertations; and (f) they were published in English, German, or Persian.
Studies included in the exploratory meta-analysis had to meet all of the above criteria except criterion (c): To obtain a comprehensive overview of research conducted in secure settings, the exploratory meta-analysis also included non-controlled (i.e., one-group) intervention studies that reported relevant pre- and post-intervention outcome measures.
Study Selection
After completing the database searches, 1,977 records were imported into Zotero (Digital Scholar, version 7.0.9, zotero.org). After removing duplicates (n = 183), author MB screened the titles and abstracts of the remaining records to identify those eligible for full text review (n = 52). In the final full-text reading stage, the studies were assessed based on the predefined inclusion criteria (see Figure 1). This entire process was independently repeated by a second author (CM), and any discrepancies between the reviewers were discussed and resolved.

Flowchart of Study Selection
Data Extraction
Some of the selected studies did not provide all the data required for the primary meta-analysis. In such cases, we contacted the corresponding authors to request the missing outcome data. Among the included studies, one (Asmand et al., 2015) featured both an active and an inactive control condition. Participants in the active control group attended another (non-DBT-F) group therapy, whereas those in the inactive control group did not receive any therapy. For this study, we used the data from the inactive control condition. Another study (Nee & Farman, 2005) included both adolescent and adult patients. Therefore, we treated these two age groups as separate samples and analyzed their data as two independent studies reported within a single paper.
During the data extraction phase, we observed that the studies used a wide range of questionnaires to assess the effectiveness of DBT in forensic and correctional settings. Therefore, one author (MB) compiled a list of all the questionnaires used in the included studies and gathered information about the specific symptoms or characteristics each instrument measured. The questionnaires were then examined to determine whether they assessed one of the constructs of interest, i.e., aggression, impulsivity, emotion regulation, depression, or borderline symptoms, and were assigned to the appropriate category.
Risk of Bias Assessment
To assess the risk of bias, we used the Cochrane Risk of Bias in Non-randomized Studies of Interventions tool (ROBINS-I; Sterne et al., 2016). The ROBINS-I framework relies on the comparability of groups, typically an intervention group versus a control group, or at least two groups receiving different interventions. When a study includes only one group (e.g., a pre-post design without a control group), ROBINS-I cannot be applied. Therefore, we evaluated only the eight studies that used a pretest–posttest control group design across the seven ROBINS-I bias domains. None of these studies reached a critical risk of bias, and thus all were included in the meta-analysis.
Meta-Analyses
The meta-analyses were conducted in R (R Core Team, 2020). On the basis of the outcome measures (means, standard deviations, and sample size) used in each study, we calculated one or multiple effect sizes for each study. To do so, we first calculated the pooled standard deviation for each outcome measure. Then, we used the function escalc from the package metafor (Viechtbauer, 2010) to calculate the effect size for each group separately (using the standardized mean difference) and then the overall effect size for each outcome variable.
Having multiple effect sizes in one study may lead to dependency of effect sizes within a study. To deal with this dependency, we used a mixed-effects meta-analysis approach (Assink & Wibbelink, 2016) by using the function rma.mv in the package metafor (Viechtbauer, 2010). In the first level of analysis, we entered the sampling variance of the observed effect sizes; this level included five outcome categories and considered the influence of potential moderators, such as the percentage of male participants or the amount of intervention time (per hour). To account for the hierarchical structure of the data, we also modeled random intercepts for both study number and outcome category.
Some studies were pilot studies that aimed to evaluate the feasibility of DBT in secure settings; these studies used pretest-posttest one-group designs, and the rules of risk-of-bias testing were not applicable to them. To obtain a better understanding of the studies performed in forensic and correctional settings, as an exploratory meta-analysis, we calculated the effect sizes of all studies, i.e., including the pilot studies with a pretest-posttest, one-group design.
Results
Primary Meta-Analysis of Studies with a Pretest-Posttest Control Group Design
Eight studies had a pretest-posttest control group design and were included in the primary meta-analysis (see Table 1 for information on the included studies). Two of the eight studies were non–peer-reviewed theses. The eight studies included a total of 394 participants. Participant age ranged from 12 to 61 years, although two studies did not report the age range. Most participants were men, and three studies did not include any women. In the treatment groups, four studies administered only DBT and the other four administered DBT in addition to treatment as usual.
Characteristics of Studies with a Pretest-Posttest Control Group Design
Note. DBT-F, dialectical behavior therapy for forensic populations. ROBINS-I, Risk of Bias in Non-randomized Studies of Interventions; TAU, treatment as usual; BDI, Beck Depression Inventory; BIS-11, Barratt Impulsiveness Scale; DERS, Difficulties in Emotion Regulation Scale; BDHI-D, Buss-Durkee Hostility Inventory—Dutch Version; STAXI(-2), The State-Trait Anger Expression Inventory; NAS, Novaco Anger Scale; ISHUS, Inmates’ Self-harm Urges Scale; OAS, Overt Aggression Scale; PAI-BOR, Personality Assessment Inventory-Borderline Personality Features Scale.
Indicates non–peer-reviewed studies/thesis.
Overall Effect Model
First, a multivariate random-effects meta-analysis was estimated. The analysis included random effects on three levels: effect sizes within categories, categories within studies, and studies themselves. Across all 11 effect sizes (out of eight articles), the overall effect was significant and negative, b = −.51, SE = 0.25, z = −2.03, p = .04, 95% CI [−1.00, −0.02] (see Figure 2). For the outcomes (Aggression, Impulsivity, Emotion Regulation, Depression, and Borderline Symptoms), lower scores indicate improvement; therefore, an effect size of Hedges’ g = −.51 is interpreted as follows: the experimental group shows a greater reduction in aggression/impulsivity/emotion disregulation/depression/borderline symptoms than the control group. The reduction was 0.51 standard deviations greater than that of the control group. The control group improved less and therefore remained more aggressive, impulsive, less emotionally regulated, more depressed, or showed more borderline symptomatology. This means the intervention is effective. The effect is medium in magnitude and practically meaningful. The results showed significant heterogeneity across studies, QE(10) = 25.49, p = .005, suggesting that the observed effect sizes varied more than would be expected by sampling error alone. The estimated between-study variance was moderate (τ² = 0.27), and the correlation of effect sizes within studies was high (ρ = 0.64). Given the presence of significant heterogeneity, moderator analyses were conducted.

Forest Plot of Studies with a Pretest-Posttest Control Group Design
Moderator Analyses
Since a significant overall effect was observed across different domains, a moderator analysis was conducted to examine whether the intervention produces consistent effects across all areas (aggression, borderline symptoms, depression, emotion regulation, and impulsivity). In a mixed-effects meta-regression model, these five categories were included as a moderator. The moderator was significant, QM(5) = 35.50, p < .001, indicating that the category explains a substantial portion of the variation in effect sizes. In the moderator analysis (no intercept), the Borderline category exhibited a statistically significant negative association; the pooled effect size for Borderline outcomes was Hedges’ g = −1.46 (see Table 2). Because lower scores indicate improvement for the outcomes considered, this corresponds to a very large improvement in the experimental group: Studies showed, on average, a 1.46 standard-deviation greater reduction in borderline symptoms under the intervention than the control condition. Residual heterogeneity was no longer significant, QE(6) = 6.17, p = 0.40, and the estimated variance component (τ² = .02) was very small, indicating that the category variable accounted for nearly all between-study variance.
Moderator Analyses of Studies with a Pretest-Posttest Control Group Design
In addition, we examined the influence of five further moderators (i.e., percentage of men, intervention hours, prison versus forensic context, DBT skills training versus full DBT treatment program and year of publication) on effect size. The results showed no significant effects of any of the moderators on the effect size.
Publication Bias
To test for publication bias, we performed Egger’s test. The test did not indicate funnel plot asymmetry, providing no evidence of publication bias (b = −1.01, z = 0.83, p = 0.41; see Figure 3).

Funnel Plot of Studies with a Pretest-Posttest Control Group Design
Exploratory Meta-Analysis of Studies With a Pretest-Posttest Control Group or One-Group Design
As stated earlier, most studies used a pretest-posttest one-group design without a control group; that is, they assessed changes in an intervention group over time. The characteristics of the studies included in the exploratory meta-analysis are presented in Table 3. We analyzed these studies together with those employing a pretest-posttest control group design in a second exploratory meta-analysis.
Characteristics of Studies with a Pretest-Posttest Control Group or One-Group Design
Note. DBT-F, dialectical behavior therapy for forensic populations; TAU, Treatment as Usual; BDI(-II), Beck Depression Inventory; BIS-11, Barratt Impulsiveness Scale; DERS(-SF), Difficulties in Emotion Regulation Scale (Short Form); SHI, Self-Harm Inventory; BSCS, Brief Self-Control Scale; BDHI-D, Buss-Durkee Hostility Inventory—Dutch Version; STAXI(-2), The State-Trait Anger Expression Inventory; NAS, Novaco Anger Scale; AARS, Adolescent Anger Rating Scale; BYI-2, Beck Youth Inventories; ZAN-BPD, Zanarini Rating Scale for Borderline Personality Disorder; OAS, Overt Aggression Scale; OAS, Overt Aggression Scale; DSH, Deliberate Self-Harm; ISHUS, Inmates’ Self-harm Urges Scale; DWFQ, Dealing with Feelings Questionnaire; BPRSE, Brief Psychiatric Rating Scale Expanded; IDAS, Irritability, Depression and Anxiety Scale; EIS, Eysenck Impulsiveness Scale; BSL-23, The Borderline Symptom List; PHQ-9, The Patient Health Questionnaire; PAI-BOR, Personality Assessment Inventory-Borderline Personality Features Scale; Bar-On EQ-I, The BarOn Emotional Quotient Inventory; AQ, Aggression Questionnaire; ADS, The Anger Disorders Scale; ECQ, Emotion Control Questionnaire; LPI, The Life Problems Inventory; MERLC, The Multidimensional Emotion Regulation and Locus of Control; SCS, Self-Control Scale; DBT-CM, dialectical behavior therapy-corrections modified.
Indicates non-peer-reviewed studies/thesis.
Overall Effect Model
First, a multivariate random-effects meta-analysis was estimated. The analysis included random effects on three levels: effect sizes within categories, categories within studies, and studies themselves. Across all 38 effect sizes (out of 21 articles), the overall mean effect was significantly negative, b = −.44, SE = .08, z = −5.61, p < .001, 95% CI [−.59, −.28] (see Figure 4). For the outcomes (Aggression, Impulsivity, Emotion Regulation, Depression, and Borderline symptoms), lower scores indicate improvement; therefore, an effect size of Hedges’ g = −.44 is interpreted as follows: the experimental group shows a greater reduction in aggression/impulsivity/emotion disregulation/depression/borderline symptoms than the control group. The reduction was 0.44 standard deviations greater than that of the control group. The control group improved less and therefore remained more aggressive, more impulsive, less emotionally regulated, more depressed, or showed more Borderline symptomatology. This means the intervention is effective. The effect is small to medium in magnitude and practically meaningful.

Forest Plot of Studies with a Pretest-Posttest Control Group or One-Group Design
The test for heterogeneity was significant, QE(37) = 60.67, p = .008, suggesting that the observed effect sizes varied more than would be expected by sampling error alone. The estimated between-study variance was moderate (τ² = .08), indicating meaningful variability across studies. The analysis demonstrates a significant overall effect, while also revealing residual heterogeneity, which points to the influence of additional moderators or study characteristics not yet accounted for. Given the presence of significant heterogeneity, a moderator analysis was conducted.
Moderator Analyses
In this multivariate meta-regression including category (i.e., Aggression, Impulsivity, Emotion Regulation, Depression and Borderline) as a moderator, the overall moderator test was highly significant, QM(5) = 140.32, p < .001, indicating that the categories explain a substantial portion of the variation in effect sizes. In the moderator analysis (no intercept), all five categories revealed a statistically significant negative association (see Table 4). Because lower scores indicate improvement for the outcomes considered (i.e., Aggression, Impulsivity, Emotion Regulation, Depression and Borderline), this corresponds to a large improvement in the experimental group.
Moderator Analyses of Studies with a Pretest-Posttest Control Group or One-Group Design
Residual heterogeneity remained significant, QE(33) = 55.54, p = .008, and the estimated variance component (τ² = .07) was moderate, suggesting that although the category variable accounts for a large share of between-study variance, additional unexplained heterogeneity persists.
Therefore, we examined the influence of six additional moderators (i.e., percentage of men, intervention hours, prison versus forensic context, DBT skills training versus full DBT treatment program, year of publication, and control-group versus one-group study design) on effect size. The moderator year of publication was positively associated with effect sizes, suggesting that effect sizes tend to increase slightly over time. This indicates that the overall effect size is less pronounced in more recent studies than in older ones. All other moderators did not reveal a significant.
Discussion
The present meta-analyses examined the effectiveness of Dialectical Behavior Therapy (DBT) in forensic and correctional settings based on studies with various designs.
Across controlled pretest-posttest studies, DBT demonstrated a significant overall effect on key outcomes (such as aggression, impulsivity, emotion regulation, depression, and borderline symptoms). This finding suggests that DBT can meaningfully reduce problematic behaviors and emotional dysregulation in this high-risk population. However, considerable heterogeneity was observed between studies, indicating differences in study designs, participant characteristics, and intervention implementation. Subsequent moderator analyses showed that the outcome category was a robust predictor of effect size. In particular, studies assessing borderline symptoms demonstrated very large improvements, confirming that DBT is especially effective for core borderline features. Other potential moderators, including gender, intervention duration, and setting (forensic vs. correctional facility), did not have a significant impact on effect sizes.
The exploratory meta-analysis yielded similar results. The overall effect remained negative and significant. Moderator analyses confirmed that the outcome category consistently explained a substantial portion of the variance in effect sizes, whereas additional variables such as study design, gender, intervention hours, and setting had only a minimal impact. Notably, the year of publication was significantly associated with the effect sizes, with the overall effect of DBT being somewhat less pronounced in more recent studies than in older ones. This may be due to methodological improvements, differences in sample characteristics, or stricter control conditions in the more recent investigations.
The first analysis included only pretest-posttest studies with both experimental and control groups. This design allows for the calculation of controlled effect sizes, providing a rigorous estimate of DBT’s efficacy relative to standard care. By comparing outcomes between intervention and control groups, we can more confidently attribute observed changes to the effects of DBT itself rather than to external factors or natural recovery. This type of analysis is considered methodologically stronger because it controls for confounding variables and reduces the risk of bias. Results of this first, more stringent analysis indicated that only borderline symptom severity showed a statistically significant improvement. Specifically, DBT demonstrated a significant advantage over treatment-as-usual conditions for borderline symptomatology, while no significant effects were observed for other outcome domains. This suggests that, under controlled conditions, DBT may primarily exert its specific therapeutic effects on core borderline-related symptoms in forensic and prison populations.
The second analysis included all pretest-posttest studies, including those without a control group. While this approach increases the number of studies and broadens the evidence base, it also introduces potential methodological limitations. Without a control group, it is not possible to definitively determine whether observed improvements are due to the intervention, spontaneous remission, regression to the mean, or other external influences. Effect sizes derived from uncontrolled studies may therefore overestimate or misrepresent the true impact of DBT. In contrast to the first analysis, the second analysis revealed statistically significant effects not only for borderline symptoms but also for aggression, emotion regulation, impulsivity, and depressive symptoms. However, given the lack of control conditions, these improvements cannot be unequivocally attributed to DBT itself. It is possible that these effects reflect nonspecific treatment factors, such as the structured environment, increased therapeutic attention, or the general impact of placement within a forensic psychiatric or correctional setting, rather than DBT-specific mechanisms.
Overall, the results suggest that DBT is an effective intervention in forensic and correctional settings, particularly for reducing borderline symptoms. The findings are consistent with the meta-analysis by Mills et al. (2019) as well as earlier research in clinical and community-based populations, which has repeatedly shown that DBT effectively reduces self-harm, emotional dysregulation, and impulsive behavior (Kliem et al., 2010; Linehan, 1993). The moderator analyses highlight the outcome category as a key factor influencing effect size, with other potential moderators (gender, intervention hours, setting) showing only minimal impact. This is consistent with the broader literature, which often shows that treatment effects in DBT are robust across demographic and setting variables but may vary depending on the specific symptom domain (Kliem et al., 2010; McMain et al., 2012). Despite these encouraging results, the remaining high heterogeneity suggests that additional unmeasured factors may be influencing the outcomes. Other studies, for example, have found that therapist expertise, adherence to the DBT model, or baseline symptom severity can moderate treatment success (Linehan et al., 2006).
Limitations
The following limitations have to be considered. When evaluating the results, it is important to note that only eight studies used a pretest-posttest control group design, which substantially limits the robustness and generalizability of the findings. In a meta-analysis with only eight studies, outliers can disproportionately shift the pooled effect, potentially exaggerating or underestimating the true effect. Outliers can also distort measures of heterogeneity, making it appear that there is more variability between studies than actually exists. In addition, a lack of statistical power may explain the absence of significant effects in some outcome categories. Therefore, although the results are promising, they should be interpreted with caution. They highlight the need for better-designed controlled studies in forensic and correctional contexts.
While most of the studies included in the meta-analyses focused on adults, one study in Table 1 and four studies in Table 3 examined adolescent participants. It remains unclear whether the effects observed in adults can be generalized to younger populations. Furthermore, some adolescent studies applied the adult version of DBT, even though a DBT-A protocol for adolescents exists. This may limit the appropriateness and comparability of results across age groups. Similarly, despite the availability of an adapted DBT protocol for correctional settings, only one out of ten studies conducted in correctional contexts in the present meta-analyses employed the DBT-CM protocol (McCann et al., 2000), while all others applied the original DBT protocol.
Another limitation concerns the potential influence of the country in which the studies were conducted. Forensic and correctional systems differ considerably across countries in terms of structure, available resources, staff training, and therapeutic culture. All of these factors can affect the implementation of the treatment and its outcomes.
Conclusions and Further Research
In summary, this meta-analysis strengthens the growing evidence that DBT represents a feasible and effective intervention in forensic and correctional contexts. These findings align well with the broader DBT literature and suggest that the therapy’s mechanisms of change are transferable to high-risk and institutionalized populations. However, the limited availability of high-quality pretest-posttest control group studies points to important gaps in the current evidence base. Specifically, controlled studies indicate that DBT primarily improves borderline symptomatology, while improvements in other domains such as aggression, emotion regulation, impulsivity, and depressive symptoms are observed mainly in uncontrolled studies and may reflect nonspecific effects of institutionalization or general therapeutic attention. Future research should therefore focus on conducting more rigorous controlled trials in forensic and correctional settings to disentangle DBT-specific effects from these nonspecific influences. Studies with larger sample sizes and consistent outcome measures across multiple symptom domains are needed to clarify the full scope of DBT’s efficacy. Moreover, future evaluation studies should examine adaptations of DBT, such as DBT-F (forensic) or DBT-CM (correctional management), to determine whether these tailored approaches enhance effectiveness in forensic and correctional contexts. Once a sufficient number of studies are available, the comparative effects of DBT-F and DBT-CM could be evaluated in a meta-analytic framework. Finally, cross-national comparisons may help to identify contextual factors that influence treatment outcomes, contributing to a more comprehensive understanding of DBT’s utility across diverse institutional and cultural settings.
Footnotes
Authors’ note:
The authors thank Jacquie Klesing, Board-certified Editor in the Life Sciences (ELS), for editing assistance with the manuscript. Conceptualization, M.D., J.S., and S.P.; methodology, M.B. and J.S.; formal analysis, C.M., M.W., and M.B.; investigation, C.M., M.W., and M.B.; resources, M.D.; data curation, J.S. and M.B.; writing—original draft preparation, M.B. and J.S.; writing—review and editing, M.B., S.P. and J.S. All authors have read and agreed to the published version of the manuscript. The authors declare no conflicts of interest.
