Abstract
Previous reviews of the effectiveness of intimate partner violence (IPV) programmes have shown mixed results. A systematic review with narrative synthesis was conducted to understand the methodological challenges in determining the effectiveness of IPV perpetrator programmes. A two-stage search strategy was employed. Firstly, a systematic search was conducted across six electronic databases: CINAHL, MEDLINE, PsycINFO, Global Health, ASSIA, and Web of Science to identify systematic reviews exploring the effectiveness of interventions for IPV perpetrators. Secondly, primary studies from selected reviews were screened for inclusion. In total, 508 full-text manuscripts from 38 reviews were assessed against eligibility criteria. Twenty-six randomised controlled trials (RCTs) were included in this review. Methodological issues included the following: (1) short-term follow-up; (2) an over-reliance on perpetrator self-reported (physical) IPV outcomes with few studies collecting collateral victim IPV reports; (3) a lack of data presented on attendance and completion rates; and (4) changes in partners or having more than one partner were not considered in the methods or analysis. Perpetrators may under-report their abusive behaviour. No RCTs accounted for changes in relationship status nor reported on outcomes for new or multiple partners, potentially underestimating IPV recidivism as a result. Future evaluations of IPV perpetrator programmes should collect victim and perpetrator self-reported outcomes of different types of IPV victimisation and perpetration, respectively, alongside official re-offence data over the longer term. Evaluations should present data on treatment completion and dose response to enhance our understanding of what works for whom and to examine outcomes for new or multiple partners.
Keywords
Introduction
Intimate partner violence (IPV) is widely recognised to be a significant global issue (World Health Organisation, 2021). Findings from studies in high-income countries suggest that it is not uncommon for men or women to disclose having ever perpetrated IPV. For example, a review of studies in industrialised, English-speaking nations found an overall pooled prevalence estimate for self-reported physical IPV perpetration of 25% (Desmarais et al., 2012). Higher estimates have been found in younger samples and for other types of IPV perpetration. Longitudinal studies of young adults in the Netherlands and the United States found that nearly half reported perpetrating physical IPV and more than three-quarters reported perpetrating psychological IPV (Desmarais et al., 2012; Verbruggen et al., 2020, 2021). Given the scale of the problem and its harmful consequences, it is vital that those who perpetrate IPV receive effective behaviour change programmes. It is in the context of designing and undertaking a trial to test the effectiveness of a programme to reduce IPV re-offending by men with substance misuse problems that the research team sought to critically reflect on the methodological challenges identified from similar studies.
Reviews of the effectiveness of IPV perpetrator programmes have shown mixed results (Babcock et al., 2024; Cheng et al., 2021; Karakurt et al., 2019; Tarzia et al., 2020; Travers et al., 2021; Wilson et al., 2021). Of six recent reviews, three concluded that evidence for the effectiveness of IPV programmes was inconclusive (Cheng et al., 2021; Tarzia et al., 2020; Wilson et al., 2021), while the other three reported a significant positive impact of such interventions on recidivism (Babcock et al., 2024; Karakurt et al., 2019; Travers et al., 2021). However, within these reviews, more rigorously designed studies reported a reduced impact of programmes on recidivism (Babcock et al., 2024; Cheng et al., 2021; Karakurt et al., 2019; Travers et al., 2021), suggesting the positive outcomes reported in less rigorous studies may be inflated by methodological bias. Of particular interest, given our own trial, a recent review of interventions for IPV perpetrators with mental health and substance misuse problems was hindered from drawing firm conclusions due to the poor quality of most studies and the high risk of bias in the conclusions presented (Sousa et al., 2024).
Methodological issues in assessing the effectiveness of IPV perpetrator programmes, such as the heterogeneity of programmes and populations, limited outcome measures assessed, and short duration of follow-up, were highlighted in two recent reviews (Turner et al., 2023; Vall et al., 2024). Neither of these reviews applied date restrictions for study selection, resulting in long timeframes (1992–2020 and 1988–2021, respectively), leading Vall et al. (2024) to conclude by recommending a more specific, recent timeframe ‘given that the quality of the latest studies still requires several improvements’ (p. 1994). Both reviews also only accepted studies of heterosexual males perpetrating IPV towards a female current or ex-partner. Vall et al. (2024) included any empirical studies, whereas Turner et al. (2023) included only RCTs. The present systematic review builds upon these reviews by including RCTs published during a more recent timeframe of the last 10 years, while also having no limitations on the gender or sexuality of the perpetrator or victim. This resulted in one-quarter of studies in our review with samples containing some proportion of female perpetrators, a previously excluded group. Furthermore, there are only small overlaps between our review and those recent reviews in the included studies (1 of 46 matched with Vall et al., 2024, and 5 of 15 matched with Turner et al., 2023), providing a different basis for assessing the extent and impact of these issues.
The current review also sought to interrogate the methodological decisions about which victims to include in the study (e.g. index victims, current and/or ex-partners) and to examine whether any studies accounted for changes in relationship status between perpetrators and victims throughout the duration of the trial. This thorny issue was not considered in recent methodological reviews (Vall et al, 2024 or Turner et al. 2023), although it undoubtedly impacts results and interpretation of findings. For example, measuring the primary outcome as IPV towards the index victim would not accurately capture current IPV if that relationship ended during the study and there was no longer contact. Similarly, measuring the primary outcome as IPV towards the current partner would miss IPV perpetrated against ex-partners with whom the perpetrator remains in contact (e.g. through child contact arrangements). Relationship stability or change has been shown to influence the level and type of IPV perpetrated over time (Shortt et al., 2012), with relationship-specific variables demonstrated to be key determinants of IPV persistence (Whitaker et al., 2010). Yet, the extent to which research evaluating the effectiveness of perpetrator programmes has accounted for relationship status, let alone measured IPV by perpetrators against multiple people they may have an opportunity to harm during the course of a study, remains unclear. Recent reviews (Turner et al., 2023; Vall et al., 2024) focussed on the challenge associated with low victim response rates rather than the challenge of accounting for the changing nature of relationship status itself. This is an important issue to address, and particularly salient for research involving justice-involved participants with substance use and/or mental health problems, where stable relationships are often lacking.
Methods
Aims
A two-stage systematic review with narrative synthesis was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA; Page et al., 2021) to identify methodological challenges and areas for improvement in future studies evaluating the effectiveness of IPV perpetrator programmes.
Search Strategy
Included RCTs were identified from published reviews or meta-analyses. This approach facilitated identification of the most up-to-date and methodologically rigorous studies assessing the effectiveness of IPV perpetrator programmes reporting measurable outcomes for IPV recidivism were synthesised. A systematic search was conducted across six electronic databases: CINAHL, MEDLINE, PsycINFO, Global Health, ASSIA, and Web of Science on July 17, 2025. The search strategy combined terms for IPV, RCTs, and reviews; no language restrictions were applied (see Appendix one).
Eligibility Criteria
The first stage involved selecting reviews if they (1) were a systematic review or meta-analyses; (2) evaluated interventions targeting IPV perpetrator behaviour change; (3) were published from 2015 onwards; and (4) included adult populations (18 years+). In the second stage, all primary studies from the selected reviews were screened for eligibility and were included if they (1) employed a randomised controlled trial (RCT) design; (2) evaluated the effectiveness of interventions aimed at changing IPV perpetrator behaviour; (3) were published within the last 10 years (from 2015 onwards); and (4) reported IPV perpetration recidivism as an outcome. Studies were excluded if they focussed on at-risk populations rather than IPV perpetrators or only outcomes for general violence rather than IPV-specific recidivism (Hasisi et al., 2016).
Study Selection
Reviews were assessed for inclusion by two authors independently (LD, IJ, SK, and ES), supported by Covidence review software. Following the removal of duplicates, 3,041 abstracts and titles were screened, with 79 meeting the inclusion criteria. These 79 reviews were then screened for full text, with 38 meeting inclusion criteria. Of the 41 ineligible reviews, the most common reason for exclusion was that they did not aim to evaluate IPV perpetrator programmes (n = 17). Thereafter, primary studies were extracted from eligible reviews, de-duplicated, and screened at full text by two authors (LD, IJ, SK, and ES) using Microsoft Excel. Of the 508 primary studies extracted from reviews, 26 RCTs met the inclusion criteria (see Figure 1). Primary studies were most commonly excluded because they were published before 2015 (n = 240). Discrepancies between reviewers were resolved through consensus or by a third reviewer. A total of 82 (3%) reviews were screened by a third reviewer at title and abstract, and 6 (8%) at full text.

PRISMA diagram.
Data Extraction and Synthesis
Data were extracted from the 26 RCTs by authors LD, IJ, SK, and ES (see Table 1). Studies were grouped for analysis according to the type of data used to capture recidivism: perpetrator and/or victim reports (n = 20), secondary data on official recidivism (n = 1), and studies that included secondary data alongside perpetrator and/or victim reports (n = 5). A custom data extraction template was used to extract study and intervention characteristics, including (1) study characteristics (sample size, country, setting, etc.); (2) intervention completion and retention (definitions, rates, and whether factors associated with completion and retention were considered); (3) recidivism (how recidivism was determined, self-report measures of IPV, and length of follow-up); and (4) victim factors (whether changes in relationship status were considered when assessing recidivism).
Details of Included Randomised Controlled Trials.
Note. SRP = Self-reported perpetration; SRV = Self-reported victimisation; CBT = cognitive‑behavioural therapy.
Both partners within each dyad could provide self-reported measures of perpetration and victimisation.
Assessment of Quality
The Cochrane Risk of Bias tool (Higgins et al., 2011) was used to assess the quality of included RCTs. Included studies were assessed for the presence of selection, performance, detection, attrition, and reporting bias. Authors SK and ES independently assessed studies’ methodological quality and then met to reach consensus. No studies were excluded on the basis of methodological quality (see Supplemental Figure 1). Five studies met all criteria (Easton et al., 2018; Hesser et al., 2017; Murray et al., 2020; Nesset et al., 2020; Stuart et al., 2016). The most common limitations were a lack of adequate information to assess the blinding of outcome assessors (n = 15 or 58%), the blinding of participants and/or personnel (n = 14 or 54%), and approaches to concealing allocation (n = 11 or 42%). Additionally, 23% of studies (n = 6) were at risk of attrition bias due to uneven distributions of missing data, and 4 studies (15%) were deemed to have a high risk of selective reporting.
Results
Study Characteristics
A summary of all 26 trials (23 RCTs and 3 pilot RCTs) evaluating 22 distinct perpetrator programmes is presented in Table 1. Sample sizes ranged from 18 (Stover, 2015) to 2,162 participants (Raj et al., 2016). Out of the 26 trials, 3 were conducted within criminal justice settings (Brem et al., 2023; Stover, 2015; Zarling & Russell, 2022), 6 in substance use services (Chermack et al., 2017, 2019; Easton et al., 2018; Fernández-Montalvo et al., 2019; Gilchrist et al., 2021; Stover et al., 2019), 7 in healthcare settings (Nesset et al., 2020; Raj et al., 2016; Satyanarayana et al., 2016; Taft, Creech, et al., 2016; Taft, Macdonald, et al., 2016; Taft et al., 2022; Zarling et al., 2015), and 10 in other community settings (Hartmann et al., 2021; Hesser et al., 2017; Kane et al., 2021; Lila et al., 2018, 2020; Murphy et al., 2018, 2020; Murray et al., 2020; Stuart et al., 2016; Tollefson & Phillips, 2015).
Of the 23 RCTs delivered in healthcare services, substance use or other community settings, 12 were conducted with participants who voluntarily attended the perpetrator programme or were clinician-referred (n = 12, Chermack et al., 2017; Fernández-Montalvo et al., 2019; Gilchrist et al., 2021; Hartmann et al., 2021; Hesser et al., 2017; Kane et al., 2021; Murray et al., 2020; Nesset et al., 2020; Raj et al., 2016; Satyanarayana et al., 2016; Taft et al., 2022; Zarling et al., 2015). Six included a mixed sample of participants who were court-mandated as well as those who were self- or clinician-referred (Murphy et al., 2018, 2020; Stover et al., 2019; Stuart et al., 2016; Taft, Creech, et al., 2016; Taft, Macdonald, et al., 2016). The remaining five RCTs recruited participants who were court-mandated to attend a perpetrator programme (Chermack et al., 2019; Easton et al., 2018; Lila et al., 2018, 2020; Tollefson & Phillips, 2015).
Nineteen of the RCTs delivered interventions grounded in cognitive‑behavioural therapy (CBT): CBT‑informed programmes (n = 9, Easton et al., 2018; Fernández-Montalvo et al., 2019; Hesser et al., 2017; Kane et al., 2021; Murray et al., 2020; Nesset et al., 2020; Satyanarayana et al., 2016; Stover, 2015; Stover et al., 2019) or a combination of CBT alongside Motivational Interviewing (n = 5, Chermack et al., 2017, 2019; Lila et al., 2018, 2020; Murphy et al., 2020), Social Information Processing Theory (n = 3, Taft, Creech, et al., 2016; Taft, Macdonald, et al., 2016; Taft et al., 2022), Incentives‑based (n = 1, Hartman et al., 2021), or Dialectical Behaviour and Motivational Enhancement Therapy (MET; n = 1, Gilchrist et al., 2021). The remaining seven RCTs, delivered Acceptance and Commitment Therapy (n = 2, Zarling & Russell, 2022; Zarling et al., 2015), MET (n = 1, Murphy et al., 2018), or a mindfulness‑based programme (n = 1, Tollefson & Phillps, 2015), a Standard Batterer Programme alongside a brief alcohol intervention (n = 2, Brem et al., 2023; Stuart et al., 2016), and an intervention informed by Social Cognitive Theory and the Theory of Gender and Power (n = 1, Raj et al., 2016).
The number of intervention sessions across trials ranged from 3 (n = 2, Murray et al., 2020; Raj et al., 2016) to 43 (n = 2, Lila et al., 2018, 2020). Two trials quantified the length of intervention in hours (Brem et al., 2023; Stuart et al., 2016).
Most trials (n = 15) were conducted in the United States (Brem et al., 2023; Chermack et al., 2017, 2019; Easton et al., 2018; Murphy et al., 2018, 2020; Stover, 2015; Stover et al., 2019; Stuart et al., 2016; Taft, Creech, et al., 2016; Taft, Macdonald, et al., 2016; Taft et al., 2022; Tollefson & Phillips, 2015; Zarling & Russell, 2022; Zarling et al., 2015). The remaining RCTs (n = 11) were conducted in Spain (n = 3, Fernández-Montalvo et al., 2019; Lila et al., 2018, 2020), India (n = 3, Hartmann et al., 2021; Raj et al., 2016; Satyanarayana et al., 2016), Zambia (n = 2, Kane et al., 2021; Murray et al., 2020), Sweden (n = 1, Hesser et al., 2017), Norway (n = 1, Nesset et al., 2020), and the United Kingdom (n = 1, Gilchrist et al., 2021).
With regard to perpetrators’ gender, in the majority of included RCTs, the perpetrator was male (n = 20, Easton et al., 2018; Gilchrist et al., 2021; Hartmann et al., 2021; Kane et al., 2021; Lila et al., 2018, 2020; Murphy et al., 2018, 2020; Murray et al., 2020; Nesset et al., 2020; Raj et al., 2016; Satyanarayana et al., 2016; Stover, 2015; Stover et al., 2019; Stuart et al., 2016; Taft, Creech, et al., 2016; Taft, Macdonald, et al., 2016; Taft et al., 2022; Tollefson & Phillips, 2015; Zarling & Russell, 2022). However, it should be noted that multiple of these studies also measured IPV perpetration by female partners as an outcome (Gilchrist et al., 2021; Murphy et al., 2020; Nesset et al., 2020; Taft, Creech, et al., 2016; Taft et al., 2022). The remaining six RCTs included female and male perpetrators (n = 5, Chermack et al., 2017, 2019; Fernandes Montalvo et al., 2019; Hesser et al., 2017; Zarling et al., 2015) or only female perpetrators (n = 1, Brem et al., 2023).
Perpetrator Programme Completion and Retention
Definition of Programme Completion and Retention
Of the 18 RCTs reporting IPV perpetrator programme completion rates, 7 defined completion as attendance at all sessions (Hartmann et al., 2021; Hesser et al., 2017; Raj et al., 2016; Taft et al., 2022; Tollefson & Phillips, 2015; Zarling & Russell, 2022; Zarling et al., 2015). Seven considered participants who attended a specified ‘dose’ (i.e. amount of the intervention) to be completers, ranging from 20% to 81% of sessions (M = 60%; Easton et al., 2018; Lila et al., 2018; Murphy et al., 2020; Stover, 2015; Stover et al., 2019; Taft, Creech, et al. 2016; Taft, Macdonald, et al., 2016). Four provided no definition (Lila et al., 2020; Murphy et al., 2018; Murray et al., 2020; Nesset et al., 2020). See Supplemental Table S1 for details.
Eighteen RCTs analysed the intent-to-treat (ITT) population regardless of intervention ‘dose’ (Brem et al., 2023; Chermack et al., 2017, 2019; Easton et al., 2018; Fernández-Montalvo et al., 2019; Gilchrist et al., 2021; Hartmann et al., 2021; Hesser et al., 2017; Kane et al., 2021; Murphy et al., 2018, 2020; Murray et al., 2020; Nesset et al., 2020; Stuart et al., 2016; Taft, Creech, et al., 2016; Taft et al., 2022; Zarling & Russell, 2022; Zarling et al., 2015). Four trials compared outcomes between programme completers, non-completers, and controls (Raj et al., 2016; Stover et al., 2019; Taft, Macdonald, et al., 2016; Tollefson & Philips, 2015). In Lila et al. (2018; 2020), official reoffence data were analysed for the ITT population, but non-completers were excluded from the analysis of self-report data. Two trials did not report whether or how completion was accounted for in the analysis (Satyanarayana et al., 2016; Stover, 2015).
Completion and Attendance Rates
Eighteen of the RCTs reported IPV perpetrator programme completion rates alone (n = 10, Hartmann et al., 2021; Hesser et al., 2017; Murray et al., 2020; Nesset et al., 2020; Raj et al., 2016; Stover, 2015; Stover et al., 2019; Taft, Creech, et al., 2016; Taft et al., 2022; Tollefson & Phillips, 2015) or alongside data on the average number of sessions attended (n = 8, Easton et al., 2018; Lila et al., 2018, 2020; Murphy et al., 2018, 2020; Taft, Macdonald, et al., 2016; Zarling & Russell, 2022; Zarling et al., 2015). An additional three trials reported only the average number of sessions attended (Chermack et al., 2017, 2019; Gilchrist et al., 2021). The remaining five trials did not report completion or attendance rates (Brem et al., 2023; Fernández-Montalvo et al., 2019; Kane et al., 2021; Satyanarayana et al., 2016; Stuart et al., 2016).
Factors Associated With Completion, Attendance, or Retention
Nine RCTs explored factors associated with study retention (n = 5, Chermack et al., 2017, 2019; Fernández-Montalvo et al., 2019; Hesser et al., 2017; Murphy et al., 2020) or intervention completion or attendance (n = 4, Gilchrist et al., 2021; Lila et al., 2018; Tollefson & Phillips, 2015; Zarling et al., 2015). Participants who were lost to follow-up or did not complete the intervention were found to be significantly younger (Chermack et al., 2019), were more likely to be homeless or suffer from mental health problems (Gilchrist et al., 2021), have lower incomes and higher recidivism risk (Lila et al., 2018), and report higher levels of emotional abuse perpetration pre-treatment (Murphy et al., 2020). The remaining trials found no statistically significant differences between completers and non-completers or between dropouts, and those followed up (n = 6, Chermack et al., 2017; Fernández-Montalvo et al., 2019; Hesser et al., 2017; Tollefson & Phillips, 2015; Zarling et al., 2015).
One additional trial discussed factors with the potential to increase intervention attendance, although these were not explored in the data (Murray et al., 2020). The remaining 16 trials did not report factors associated with intervention completion or retention (Brem et al., 2023; Easton et al., 2018; Fernández-Montalvo et al., 2019; Hartmann et al., 2021; Kane et al., 2021; Lila et al., 2020; Murphy et al., 2018; Nesset et al., 2020; Satyanarayana et al., 2016; Stover, 2015; Stover et al., 2019; Stuart et al., 2016; Taft, Creech, et al., 2016; Taft, Macdonald, et al., 2016; Taft et al., 2022; Zarling & Russell, 2022).
Recidivism
How Recidivism Was Measured
All RCTs measured IPV recidivism as a primary or secondary outcome, but how this was measured varied. Six studies reported official reoffence data, in isolation (n = 1, Tollefson & Phillips, 2015), in conjunction with self-reported perpetration (n = 3, Brem et al., 2023; Lila et al., 2018, 2020) or victimisation (n = 1, Zarling & Russell, 2022), or alongside self-reports from both perpetrators and victims (n = 1, Murphy et al., 2020). Of the six trials reporting official reoffence data, most only reported IPV-related recidivism (n = 5, Brem et al., 2023; Lila et al., 2018, 2020; Murphy et al., 2020; Tollefson & Phillips, 2015). The remaining trial differentiated between IPV-related recidivism, violent and non-violent offences (Zarling & Russell, 2022). See Supplemental Table S1 for details.
The most common approach to measure IPV recidivism was through self-report in isolation (n = 20). Of these, 10 collected data on IPV perpetration only (Chermack et al., 2017, 2019; Easton et al., 2018; Fernández-Montalvo et al., 2019; Hesser et al., 2017; Murphy et al., 2018; Stover, 2015; Stover et al., 2019; Stuart et al., 2016; Taft, Creech, et al., 2016; Zarling & Russell, 2022). Three trials collected only self-report victimisation data (Hartmann et al., 2021; Raj et al., 2016; Satyanarayana et al., 2016). The remaining seven trials included self-reported victimisation from collateral victims alongside perpetrators’ self-report (Gilchrist et al., 2021; Kane et al., 2021; Murray et al., 2020; Nesset et al., 2020; Taft, Creech, et al., 2016; Taft, Macdonald, et al., 2016; Taft et al., 2022). While two additional trials collected self-reported victimisation data, these outcomes were not analysed due to the small number of victim participants recruited (Easton et al., 2018; Stover, 2015). See Supplemental Table S2 for details of the different self-report scales used.
Self-Report Measures of IPV
Of the 25 RCTs employing validated scales to assess IPV, 19 of them used the Conflict Tactics Scale (CTS; Straus, 1979 or a version of it [CTS-2; Straus et al., 1996]). CTS measures were researcher-administered in 17 trials (Brem et al., 2023; Chermack et al., 2017, 2019; Easton et al., 2018; Fernández-Montalvo et al., 2019; Hesser et al., 2017; Lila et al., 2018, 2020; Murphy et al., 2018, 2020; Stover, 2015, 2019; Stuart et al., 2016; Taft, Creech, et al., 2016; Taft, Macdonald, et al., 2016; Taft et al., 2022; Zarling et al., 2015) and self-administered in two (Nesset et al., 2020; Zarling & Russell, 2022). The Timeline Follow-Back (TLFB; Sobell & Sobell, 1992) method was the next most common approach, used in five trials (Chermack et al., 2017; Easton et al., 2018; Murphy et al., 2018; Stover, 2015; Stover et al., 2019), alongside the Multidimensional Measure of Emotional Abuse (Murphy & Hoover, 1999), which was also used in five trials (Hesser et al., 2017; Murphy et al., 2020; Taft, Creech, et al., 2016; Taft, Macdonald, et al., 2016; Zarling et al., 2015). See Supplemental Table S2 for details.
Among the 25 RCTs that employed validated scales, 6 captured only one form of IPV. Five assessed physical IPV only (Chermack et al., 2017, 2019; Murphy et al., 2018; Stuart et al., 2016; Stover, 2015) and one assessed sexual IPV only (Taft et al., 2022). The majority (12 out of 25 trials) captured two forms of IPV, with nine measuring both physical and psychological/emotional IPV (Hesser et al., 2017; Lila et al., 2018, 2020; Murphy et al., 2020; Satyanarayana et al., 2016; Stover et al., 2019; Taft, Creech, et al., 2016; Taft, Macdonald, et al., 2016; Zarling et al., 2015) and three measured both physical and sexual IPV (Kane et al., 2021; Murray et al., 2020; Raj et al., 2016). Only 7 of the 25 trials using validated scales captured three or more forms of IPV (e.g. physical, sexual, psychological/emotional, and/or coercive control; Brem et al., 2023; Easton et al., 2018; Fernández-Montalvo et al., 2019; Gilchrist et al., 2021; Hartmann et al., 2021; Nesset et al., 2020; Zarling & Russell, 2022).
Few trials (3 out of 25) included measures that assessed coercive-controlling behaviours, with two using the Revised Controlling Behaviours Scale (CBS-R; Graham-Kevan & Archer, 2005), which includes a sub-scale capturing economic control (Gilchrist et al., 2021; Zarling & Russell, 2022) and one using the Indian Family Violence and Control Scale (Hartmann et al., 2021; Kalokhe et al. 2016). One study also included a measure of stalking behaviour, the Stalking Behaviour Checklist (Coleman, 1997), specifically its harassing behaviour subscale (Zarling & Russell, 2022). Only Gilchrist et al. (2021) measured technology-facilitated abuse. Several trials also incorporated complementary or proxy measures of IPV-related risk or abusive attitudes, including the Spousal Assault Risk Assessment (Kropp et al., 1995; Lila et al., 2018, 2020), Propensity for Abusiveness Scale (Dutton, 1998; Gilchrist et al., 2021), Inventory of Distorted Thoughts About the Use of Violence (IDT-V; Echeburúa & Fernadez-Montalvo, 1998; Fernández-Montalvo et al., 2019), and an IPV attitudinal assessment derived from India’s National Family Health Survey (NFHS-3; International Institute for Population Sciences & Macro International, 2007; Raj et al., 2016). These instruments do not directly quantify IPV acts but provide relevant cognitive, attitudinal, and behavioural correlates of perpetration risk.
Length of Follow-Up
Descriptions of the follow-up periods are provided in Supplemental Table S1. Twenty-five RCTs collected self-report data either in isolation (n = 20) or alongside official reoffence data (n = 5). The total follow-up period for self-report data ranged from 4 to 24 months post-baseline/randomisation (n = 13; Chermack et al., 2017, 2019; Gilchrist et al., 2021; Hesser et al., 2017; Kane et al., 2021; Lila et al., 2018, 2020; Murphy et al., 2020; Murray et al., 2020; Nesset et al., 2020; Raj et al., 2016; Stover et al., 2019; Stuart et al., 2016) or 3 to 12 months post-intervention (n = 12: Brem et al., 2023; Easton et al., 2018; Fernández-Montalvo et al., 2019; Hartmann et al., 2021; Murphy et al., 2018; Satyanarayana et al., 2016; Stover, 2015; Taft, Creech, et al., 2016; Taft, Macdonald, et al., 2016; Taft et al., 2022; Zarling & Russell, 2022; Zarling et al., 2015).
Total follow-up periods for trials reporting official reoffence data (n = 6), either in isolation (n = 1) or alongside self-report data (n = 5) ranged from 6 to 12 months post-intervention (n = 4; Brem et al., 2023; Lila et al., 2018, 2020; Zarling & Russell, 2022) or 12 months after the start of treatment (n = 1; Murphy et al., 2020). In one trial, official reoffence data were collected at 428 days post-intervention or dropout on average (Tollefson & Phillips, 2015).
The total length of participation was calculated by totalling the duration of the intervention and the length of follow-up post-intervention. Total length of participation ranged from 4 months (Gilchrist et al., 2021; Hartmann et al. 2021) to 2 years (Kane et al., 2021; Murray et al. 2020). The overall length of participation could not be calculated for two trials as total intervention durations were not reported (Brem et al., 2023; Satyanarayana et al. 2016).
Victim Factors
Of the 12 RCTs that did include victim data within the analysis, 11 collected self-reported victimisation data from a current partner (Gilchrist et al., 2021; Hartmann et al., 2021; Kane et al., 2021; Murphy et al., 2020; Murray et al., 2020; Nesset et al., 2020; Raj et al., 2016; Satyanarayana et al., 2016; Taft, Creech, et al., 2016; Taft, Macdonald, et al., 2016; Taft et al., 2022). For Gilchrist et al. (2021), victimisation data were collected from either men’s current or former (previous 12-months) partners. One RCT (Zarling & Russell, 2022) collected information only from the index victim. Victimisation self-reports were compared to perpetrator self-reports in four RCTs (Nesset et al., 2020; Taft, Creech, et al., 2016; Taft, Macdonald, et al., 2016; Taft et al., 2022). For RCTs which collected perpetrator self-report data (n = 21), the majority asked the perpetrator about abuse towards a current partner (n = 11; Brem et al., 2023; Gilchrist et al., 2021; Hesser et al., 2017; Kane et al., 2021; Murphy et al., 2020; Murray et al., 2020; Nesset et al., 2020; Stuart et al., 2016; Taft, Creech, et al., 2016; Taft, Macdonald, et al., 2016; Taft et al., 2022). Similar to how victimisation reports were collected, Gilchrist et al. (2021) collected data on a current or ex-partner. Two RCTs collected data related to the mothers of the perpetrator’s children (Stover, 2015; Stover et al., 2019). Lila et al. (2018, 2020) were the only RCTs that reported data on IPV perpetration towards the victim of the index offence. Chermack et al. (2017) asked about any report of physical IPV perpetration, while Chermack et al. (2019) and Zarling et al. (2015) asked about the perpetrator’s abuse towards one other person that they considered important (e.g. ex or current partner, friend, or family member). Data on which partner was asked about at baseline were not reported in three RCTs (Easton et al., 2018; Fernández-Montalvo et al., 2019; Murphy et al., 2018). For Easton et al. (2018), no information was reported on who the baseline data were collected for, although crucially, they did ask for the average number of hours or days with ‘an intimate partner’ and kept track of how many and for how long they were in the relationship. Average hours of contact were then used as a covariate in their data analysis. Gilchrist et al. (2021) similarly measured the extent of contact women had with their current or former partner. Satyanarayana et al. (2016) highlight the importance of this and describe the lack of information on the number of contact hours between partners as a limitation. No RCTs reported whether the perpetrator had changed their relationship status with the victim since baseline or reported whether the perpetrator acquired a new or any additional partners during the course of the research. Murphy et al. (2018) collected data on any IPV that occurred in the year prior to intervention initiation, which may have also included multiple partners. At least five additional RCTs may have collected information on multiple partners due to the use of official re-offence data; however, this was not explicitly stated within the text (Lila et al., 2018, 2020; Murphy et al., 2020; Tollefson & Phillips, 2015; Zarling & Russell, 2022).
Discussion
Mixed results across a diverse field and concerns about methodological rigour are commonly cited issues for RCTs evaluating the effectiveness of IPV perpetrator programmes. The current systematic review with narrative synthesis builds on prior reviews that have focussed on identifying methodological challenges and directions for improvement (Turner et al., 2023; Vall et al., 2024). By reviewing RCTs conducted within the past 10 years and including those with both male and female perpetrators, we have provided further evidence for existing findings while highlighting new ones. We discuss the key findings from our review below.
Limitations associated with short follow-up periods are compounded by an over-reliance on a single data source for measure outcomes. Most included trials (n = 21) relied on self-report data or official offence data, in isolation, to assess outcomes, and these had a maximum follow-up period of 2 years (the average length of follow-up was 12 months). By contrast, nearly all of the RCTs included in the Turner et al. (2023) review relied upon multiple data sources to measure re-offending (typically victim and/or perpetrator self-reports with official data); only 2 of 15 trials solely relied upon perpetrator self-reports. This difference can be explained by the study setting (mostly court-mandated participants in the RCTs in Turner et al. [2023] compared to mostly voluntary attendees in treatment or community settings for the RCTs in our review). For interventions delivered independently of the criminal justice system, gathering official offence data appears to be considered less relevant for those participants and/or the researchers have concerns that requesting such data will inhibit their willingness to participate in the study. The reasons underpinning these methodological choices, however, are rarely articulated or justified.
It is important to measure re-offending using multiple data sources to counteract the limitations inherent within each. A significant, inherent limitation with official IPV reoffence data is that it misses abuse that is not reported to the police. Studies have repeatedly documented that only a minority of victims will choose to report their experience to the police (42% according to the most recent figures released for England and Wales; Office for National Statistics, 2024). The under-reporting of IPV victimisation in the female population has been explained by factors such as fear, excusing, normalisation of abuse as an expression of love, dependence on males, and self-blaming (Chan, 2011). Furthermore, police are more likely to identify and make arrests for physical rather than non-physical IPV, such as coercive and controlling behaviour (McPhee et al., 2021). Thus, although official data may be more complete (i.e. less missing data), this data source can only ever represent a partial picture of IPV offending. In addition to underreporting bias, there is attrition bias due to the documented difficulty of achieving convictions in IPV cases. It is therefore preferable to use earlier measures that reflect less progression through the criminal justice system (e.g. calls to the police over arrests, arrests over convictions, convictions over imprisonment) to reduce the likelihood that recidivism is underestimated.
Yet, most RCTs in this review measured reoffending with rearrests and/or reconvictions for IPV, and they did so in the context of a research design that did not mitigate this limitation through the collection of self-report data. This raises a question over the accuracy of reported outcomes for recidivism, given the issues relating to employing any of these approaches in isolation. Furthermore, including different data sources acknowledges the value placed on each by different audiences, such as people with lived experience of IPV, service commissioners, and practitioners (Turner et al., 2023). Only one RCT in our review collected official data alongside self-report data from both perpetrators and victims (Murphy et al., 2020), although it had weaknesses such as a follow-up period of 1 year and a small sample size of 42 participants.
Although the importance of collecting self-report data to complement official data is clear, this data source has its own limitations. First, self-report data are affected by bias as they rely on the accuracy of perpetrator reports. Minimisation, socially desirable responding, and/or denial of abuse among perpetrators participating in programmes is well documented (Morrison et al., 2021). Perpetrators may under-report their abusive behaviour due to shame, embarrassment, or concerns that disclosures may result in new criminal investigations (Foddy, 1993). Collecting collateral reports from victims can be beneficial in assessing the accuracy of perpetrator self-reports, but it brings with it ethical and safety challenges alongside traditional recruitment and retention issues. Similar to Vall et al. (2024), relatively few of the studies included in this review (n = 8) explored recidivism from the perspective of the victim. The limited studies that sought to collect data from victims highlighted the challenges with their recruitment and retention (Easton et al., 2018; Gilchrist et al., 2021; Stover, 2015), a significant methodological challenge also highlighted by Turner et al. (2023). Although considered the best source for information on the perpetrator’s abusive behaviour, the low incidence of victims who are willing to participate in IPV research, combined with high levels of missing data from victims being lost at follow-up, raises concerns about differential non-response bias (Wilson et al., 2021). Furthermore, meta-analyses have found that effect sizes for IPV perpetrator programmes are smaller and/or insignificant when assessed by victims compared to more favourable results from criminal justice outcomes (Babcock et al., 2024; Cheng et al., 2021; Wilson et al., 2021). This suggests that IPV perpetrator programmes are either truly ineffective or that treatment effects have been masked by insufficient victim response data. Given the need to make confident conclusions about treatment effectiveness to commissioners, practitioners, and those affected by IPV, which are based on robust findings triangulated across data sources, this remains a significant problem in the field.
Compounding the challenge of recruiting and maintaining involvement from victims is the difficulty of selecting the ‘right’ person for the study, as well as accurately capturing their relationship status with the perpetrator throughout the research process. Given the length of follow-up within the included studies, changes in relationship status are likely to occur. This raises a second limitation related to a lack of clarity over whether self-report data should be collected from the victim of the index offence, current and/or ex-partner (note that all of these labels might apply to one individual during the course of a study). No RCTs reported whether the perpetrator had changed their relationship status with the victim since baseline, initiated a new relationship during the course of the research, or collected data from any specified new or additional partners. The most common approach was to collect data on any abuse towards a current partner (n = 14), without specifying whether this was the same or a new person since the previous data capture timepoint (e.g. data of index offence, recruitment, baseline, intervention completion, first follow-up). An explicit focus on current partners also omits abuse directed towards ex-partners with whom the perpetrator may have regular contact (e.g. through care of their shared children). In an attempt to mitigate this issue, two included trials (Easton et al., 2018; Gilchrist et al., 2021) measured the amount of contact between perpetrators and partners. Not accounting for the amount of contact between the dyad in the study could lead to an apparent reduction or curtailment of abuse within outcome measures, when in fact it may only be opportunities for abuse that have reduced. It also makes any assessment of bi-directional abuse within the dyad problematic, which is an important consideration within IPV treatment, given the need to understand the relationship context in which violence occurs (Machado et al., 2023). Overall, our review demonstrates that even the more recent RCTs in the field are compromised by not being able to specify relationship status at each data collection timepoint. Fully capturing the complex web of interpersonal relationships with whom those enrolled in an IPV perpetrator programme may have opportunities to re-abuse is necessary for the accurate measurement of IPV re-offending; a failure to do so calls into question even the most rigorous trial results.
A third limitation is that trials typically utilised self-report measures that only captured one or two forms of IPV. Physical IPV was the most commonly captured form of abuse, followed by psychological/emotional abuse, sexual abuse, and then psychological/verbal abuse. Only three RCTs captured coercive and controlling behaviours, two captured economic abuse and one measured technology-facilitated abuse (Gilchrist et al., 2021). Consequently, existing research generally fails to encompass the wide and varying forms of abuse that may take place within an intimate relationship, including coercive and controlling behaviours, technology-facilitated, and financial abuse. Excluding multiple types of abuse from self-report scales increases the likelihood that IPV re-offending will be underestimated.
Fourth, the CTS and CTS-2 were the most commonly used self-report measures within the included studies; however, the CTS has been criticised for its psychometric properties, including factor structure, validity, and reliability (Calvete et al., 2007; Lucente et al., 2001; Newton et al., 2001; Vega & O’Leary, 2007; Yun, 2010). The CTS received further criticism regarding the lack of temporal data on when violent acts occur, requiring respondents to aggregate their recollection of violent acts over a referent time period. The TLFB method has been used to capture this temporal data by mapping violent acts according to each day within a specified period. However, the TLFB method was only used in four RCTs included in this review. This reflects broader challenges in measuring behavioural change among violent men, as standard outcome measures (such as those relying on self-report) may not fully capture the complexity of rehabilitation (Bowen, 2011).
A final methodological challenge highlighted by our review is the varying definitions of intervention ‘completion’ and a general lack of detailed description and analysis reported by the included RCTs. A recent review also noted that the use of ‘completion’ as a proxy for programme effectiveness was highly problematic, given the inconsistent and often arbitrary definitions and measurements of this concept across studies (Ralph et al., 2025). Of the 18 trials that provided completion rates, the most common approach (n = 7) was to define completion as attendance at all sessions. Other trials (n = 7) operated with a less restrictive definition premised on a specified ‘dose’, ranging from 20% to 81%, while some provided no definition (n = 4). Trials did not provide descriptions of non-completers and were not able to include any analysis of a possible dose-response relationship. Most trials (n = 15) did not report factors associated with intervention completion or dropout. Those that did reported on different factors or yielded mixed results, making the identification of noteworthy trends across studies difficult. For example, there is some evidence that perpetrators with greater ‘stakes in conformity’ (e.g. older, employed, married, better educated, on higher incomes) have better completion rates. Yet, any significant results are from a very few trials (n < 5) and/or were not consistent (e.g. older age was positively related to completion in three trials but not statistically significant in two trials). Along similar lines, there is some evidence that perpetrators with complex needs or additional challenges are less likely to complete the intervention. For example, Gilchrist et al. (2021) found that men who did not attend any intervention sessions were more likely to be homeless or live in temporary accommodation, suffer from moderate to severe depression, anxiety, or post-traumatic stress disorder. However, other RCTs found that psychiatric diagnoses (Hesser et al., 2017) or level of violence and substance use (Chermack et al., 2017) did not differentiate those who completed or were lost to follow-up.
Limitations
A limitation of this review was the two-stage search strategy. As included RCTs were only identified from published systematic reviews, more recent primary studies were excluded (Cramer et al., 2024; Expósito-Álvarez et al., 2024). Although grey literature and works without peer review were not included to ensure the quality of the RCTs, including unpublished trials might have revealed other methodological challenges, and reduced the effects of publication bias. Although no gender exclusions were applied within this review, most included trials assessed programmes for IPV within heterosexual relationships where the male partner was the primary aggressor, suggesting a lack of available evidence on the effectiveness of IPV programmes for female perpetrators and in the context of LGBTQI+ relationships. Furthermore, similar to other recent methodological reviews (Turner et al., 2023; Vall et al., 2024), most RCTs in our review were conducted in the United States. There was also a preponderance of CBT-based intervention models tested in the trials. More evidence is needed to ascertain the efficacy of a wider range of IPV interventions (including novel approaches such as ACT) in a wider range of jurisdictions to understand how sociocultural contexts affect how different interventions are developed and implemented, especially in non-Western resource-constrained contexts (Sánchez de Ribera et al., 2025).
Conclusion
IPV is a prevalent problem in all societies, yet the evidence as to which programmes are effective at helping which perpetrators change their abusive behaviour is inconclusive. Programme variety, mixed results, and a concern over methodological rigour have characterised this field of study for decades. The current review identified 26 RCTs of 22 distinct IPV perpetrator programmes from across the globe, highlighting the methodological challenges they share. Based on our analysis, we offer the following recommendations for strengthening the evidence base going forward.
First, trials reporting IPV recidivism as an outcome should collect official reoffence data, ideally including measures of police contact that do not necessarily result in an arrest, alongside self-reported perpetration and collateral reports from victims. Comparisons between these self-reports may also provide an indication of the accuracy of self-reported abuse, allowing a better understanding of the degree of bias inherent in this data source.
Second, including routinely collected administrative data from agencies outside of the criminal justice system (e.g. victim support services, helplines, substance abuse treatment, sexual health services) is necessary for widening the scope of information about victims’ help-seeking beyond the police, as it is well established that most do not report their abuse to the police. Closer involvement with these agencies in the design of trial protocols could also yield novel approaches to overcoming the challenge of maintaining victim engagement throughout the data collection process, helping to minimise non-response bias. More research is needed to identify the barriers and enablers for victims’ participation within IPV research. Feasibility testing the methods used in other fields of study, which have been shown to be successful at reducing levels of missing data amongst participants in hard-to-reach groups (Rabbi et al., 2018), could be explored for this study population. Evidence-based recommendations to increase the recruitment and retention of victims are vital to enhancing their well-being and safety as well as improving the quality of IPV research. Although the principles of victim-centred, trauma-informed community-based participatory research methods (Jumarali et al., 2021) are apparent in much IPV research, particularly qualitative research, the extent to which trial protocols are consistently aligned to these principles is less clear. The ‘voice’ of IPV victims must be central within RCTs aiming to establish whether, how, and for whom interventions are effective at reducing IPV.
Third, outcomes should reflect the multi-faceted nature of IPV rather than solely measuring physical abuse. For example, including measures of coercive and controlling behaviours, stalking, technology-facilitated abuse, and financial abuse would provide a more holistic picture of the extent to which interventions may impact on different types of IPV perpetration.
Fourth, given the different number of intervention sessions and duration of follow-up in the studies, further research is needed to determine the optimal intervention ‘dose’ and follow-up duration, respectively, to test the efficacy of IPV treatment programmes. However, it is fair to say that the field needs to be re-oriented towards focussing more on medium and longer-term outcomes as we found these to be largely neglected within the included trials.
Fifth, trials focussing on measuring self-reported abuse within one dyad over time do not reflect the reality that relationships may change during the course of the study. Furthermore, ‘relationship’ and ‘partner’ are terms that should be clearly defined and operationalised, as these may mean different things to different people in different parts of the world and at different points in time. Changes in relationship status should be captured in these studies, and multiple partners should be differentiated. Methods for capturing data from people in dynamic relationships, as well as those with multiple (ex)-partners, are a complex methodological issue, which researchers chose to deal with in multiple ways. Often, this information was not reported. To overcome this methodological challenge will first require a commitment from researchers in the field to not mask how they dealt with this issue. Then, perhaps, examples of improved methods for capturing this information will become available and eventually proposals for minimum standards can be agreed amongst the research community. Establishing a set of agreed-upon standards could be instrumental for justifying and securing funding adequate for the complexity of the research required.
Diversity Statement
A total of 6,291 participants were included across the 26 RCTs reviewed. Notably, all focussed exclusively on heterosexual populations, highlighting a significant gap in research on IPV perpetration among individuals in non-heterosexual relationships. Several of the reviewed trials acknowledged the underrepresentation of female IPV perpetrators and non-heterosexual couples as a limitation within the field. However, 12 of the 26 trials did involve underrepresented populations such as people with substance misuse problems and military veterans. Participants’ age was reported as a mean value in 25 out of the 26 trials, with an average age of 35.16 years, suggesting a tendency for research in this area to focus on younger populations.
Ethnicity was reported in 17 trials, with the majority of participants identifying as ‘white’ (15/17 trials). This is unsurprising given that 18 of the 26 trials were conducted in Anglo-Saxon countries, including the United States (n = 15), United Kingdom (n = 1), Norway (n = 1), and Sweden (n = 1). The remaining trials were conducted in India (n = 3), Spain (n = 3), and Zambia (n = 2). This geographic distribution indicates that research on IPV perpetrator interventions is disproportionately concentrated in Western, high-income, and predominantly white-majority nations, with limited representation from the Global South and lower-income regions. This review also found a lack of consistent reporting of key sociodemographic factors. Specifically, 9 out of 26 trials did not report participants’ level of education, 10 did not report employment status, 16 did not report the number of children, 17 did not report income level, 25 did not report housing status, and only 3 reported participants’ disability status. Existing research suggests that sociodemographic factors such as race, ethnicity, immigration, socioeconomic status, and mental health play a significant role in IPV perpetration (Capaldi et al., 2012; Maldonado et al., 2022), highlighting an important gap in the current literature. Furthermore, none of the RCTs distinguished between sex assigned at birth and gender identity, reflecting an ongoing oversight in capturing gender diversity within IPV research.
An examination of the ‘Discussion’ sections of the included RCTs revealed that diversity and inclusion were seldom explicitly addressed. The most frequently mentioned consideration, noted in 13 out of 26 trials, was the need to replicate findings in underrepresented subpopulations, particularly among female perpetrators and individuals in non-heterosexual relationships, as well as those with specific IPV-risk factors, for example, substance use and/or psychiatric illness (Schick et al., 2025). Importantly, our review found 23% of studies included female perpetrators, suggesting research on this population is growing. Although beyond the scope of this review, future research should seek to replicate this review with studies testing interventions for female perpetrators of IPV.
Overall, this review highlights significant gaps in diversity within IPV perpetrator intervention research, particularly regarding gender, sexual orientation, ethnicity, geographic representation, and sociodemographic variables. Addressing these limitations through more inclusive and representative research will be essential for developing interventions that are culturally responsive and effective across diverse populations.
Supplemental Material
sj-docx-1-tva-10.1177_15248380261437078 – Supplemental material for Methodological Challenges in Determining the Effectiveness of Intimate Partner Violence Perpetrator Programmes: A Systematic Review
Supplemental material, sj-docx-1-tva-10.1177_15248380261437078 for Methodological Challenges in Determining the Effectiveness of Intimate Partner Violence Perpetrator Programmes: A Systematic Review by Isobel Johnston, Amanda L. Robinson, Lucia Dahlby, Emma Smith, Sharmila Mahesh Kumar, Elizabeth Gilchrist, Steven Parkes and Gail Gilchrist in Trauma, Violence, & Abuse
Footnotes
Acknowledgements
We would like to acknowledge our colleagues Professor David Gadd and Dr Polly Radcliffe for their continued contributions to the wider ADVANCE-D Trial.
ORCID iDs
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The ADVANCE-D cluster randomised controlled trial is funded by the NIHR Public Health Programme [NIHR154546].
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
No new data were generated or analysed within this review study. The source of all data discussed is referenced throughout.
Supplemental Material
Supplemental material for this article is available online.
Author Biographies
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
