Abstract
Adolescent family violence (AFV) has become a topic of increasing attention, yet our understanding of how to assess the risk of future family violence among this cohort is limited. This systematic review aimed to determine what risk assessment tools have been validated for use with AFV and investigate their predictive validity. It also sought to determine whether the literature adhered to the Risk Assessment Guidelines for the Evaluation of Efficacy statement (i.e., RAGEE guidelines). Out of 11,663 studies identified, seven met inclusion criteria and validated six instruments, including the Structured Assessment of Violence Risk in Youth (SAVRY), Youth Level of Service/Case Management Inventory (YLS/CMI), Victoria Police Screening Assessment for Family Violence Risk (VP-SAFvR), Static Assessment of Family Violence Recidivism (SAFVR), Dynamic Risk Assessment for Family Violence (DYRA), and the Integrated Safety Response (ISR). The discriminant ability of the Psychopathy Checklist: Youth Version (PCL-YV) was also considered in one study. Three key findings arose from this review. First, there are very few validated risk assessment tools for AFV behaviours, and variability in predictive and discriminant validity amongst those few that have been validated (with Area Under the Curve values ranging from .54 to .67). Second, there is a reasonably positive adherence to RAGEE guidelines. Third, there appears to be a high risk of bias among studies which validated risk assessment tools for use with AFV. The findings underscore the need for validated risk assessment tools tailored specifically for use with AFV, particularly for clinicians and professionals working in the family violence field.
Introduction
In recent decades, adolescent family violence (AFV) has become a topic of increasing societal, clinical, and academic attention internationally, with significant psychosocial, economic, and legal consequences documented for both adolescent family violence-users and victim/survivors (Fitz-Gibbon et al., 2022; Holt, 2022). Adolescent family violence refers to the use of abusive or aggressive behaviour (e.g., physical, verbal, psychological, emotional or financial abuse) by those aged under 18 years towards family members (e.g., parents, siblings, intimate partners, grandparents; Boxall & Sabol., 2021).
The international prevalence estimates of AFV vary due to methodological and definitional differences in the research. When official administrative data sources (e.g., police call-outs) are used, prevalence figures have shown that 8% to 10% of all police family-violence incidents relate to an adolescent engaging in abusive behaviour towards a family member (Phillips & McGuinness, 2020; Snyder & McCurley, 2008). Administrative data has also shown recidivism rates among this cohort to range from 9% to 35% (Boxall & Morgan, 2020; Sheed, McEwan et al, 2023), indicating a significant minority of young people who use family violence may continue to do so. These rates are broadly similar to adult rates of family violence recidivism (8%–44.5%; Fitzgerald & Graham, 2016; Jolliffe Simpson et al., 2021; Petersson & Strand, 2017; Spivak et al., 2024). Despite the similar rates of young people and adults who engage in family violence recidivism, there remains a distinct lack of research focusing on how to best assess the risk of future AFV, particularly when compared to adult family violence.
To date, much of the research – including recent systematic reviews (Booth et al., 2024; Burgos-Benavides et al., 2023) – examining AFV assessment tools, has focused on instruments which describe and quantify AFV behaviours (e.g., Boxall & Sabol, 2021; Burgos-Benavides et al., 2023; Calvete et al., 2013; Simmons et al., 2019; Tarriño-Concejero et al., 2023). Examples of these tools include the Child-to-Parent Violence Questionnaire (CPV; Contreras et al., 2019) and the Abusive Behaviour by Children Indices (ABC-I; Simmons et al., 2019). While such instruments are valuable for the purpose of research and assisting to better understand the manifestation of AFV, they are not developed or validated for use when assessing the likelihood or risk of a young person engaging in future family violence. As such, there is a need to expand upon this approach and begin the task of understanding how to assess future risk of AFV among those with a history of using AFV.
Risk assessment instruments may be used across various contexts, including by clinicians, family violence practitioners, correctional services, police forces, and research personnel. They help to identify the likelihood of further family violence behaviours (i.e., recidivism), guide decisions regarding the appropriateness and intensity of interventions, and identify key factors which may increase future offending and/or promote desistance (i.e., risk and protective factors) and thus may serve as targets for intervention. There have been major shifts in the way risk assessments have been developed and used over recent decades. Historically, risk assessment was typically unstructured or actuarial (i.e., typically focused on the algorithmic prediction of risk; Ogloff & Davis, 2020) in nature. Increasingly, however, there has been a growing recognition of the dynamic (i.e., changeable) nature of risk (Douglas & Otto, 2021), leading to an increase in the use of a structured professional judgement (SPJ; requires the professional to consider potential risk factors to inform/derive their overall risk judgement) approaches to risk assessment (Ogloff & Davis, 2020), particularly in clinical settings.
Validated risk assessment tools used by police to assess family violence are most commonly actuarial in nature and employed as a means of assisting with resource allocation by triaging likely risk of future family violence and identifying the most appropriate level/type of intervention (Jolliffe Simpson et al., 2021; McEwan et al., 2019; Spivak et al., 2021). In contrast, SPJ approaches to assessing risk are more frequently adopted by clinicians (e.g., psychologists, social workers), given they typically include an expanded list of potential risk factors (including greater inclusion of dynamic/changeable factors), and consideration of the nexus between risk assessment and risk management (Bonta & Andrews, 2016). To date, however, there is a lack of understanding as to how such tools are used in cases of AFV and in what contexts (e.g., policing, clinical) these have been validated.
Clinicians, police, and other professionals who work with young family violence-users are frequently required to judge the likelihood that violent/abusive behaviour will recur in future. Given the lack of robust evidence in this space, AFV-users are often assessed using general offending or violence risk assessment tools (e.g., Structured Assessment of Violence Risk in Youth (SAVRY; Borum et al., 2006), Youth Level of Service/Case Management Inventory (YLS/CMI; Hoge & Andrews, 2011); Shaffer et al., 2022), or by police using their standard (i.e., not age-specific) family violence tools (e.g., VP-SAFvR, SAFVR, DYRA; Jolliffe Simpson et al., 2021; McEwan et al., 2019). Yet, there is a lack of clarity as to whether the approaches being used are valid for an AFV population. Similarly, among the few studies that have examined risk assessment approaches for use with AFV, the concept of ‘risk’ has at times been poorly defined (if at all) and there is variation in follow-up time, base rates, and definitions of AFV.
To address this, we conducted a systematic review of studies investigating the predictive validity of risk assessment tools for use with AFV. Our primary objectives were to (a) identify risk assessment tools which had been evaluated for use with adolescents who have engaged in family violence (including intimate partner violence and violence to family members more broadly) and (b) investigate their predictive and discriminant validity. In addition, we sought to determine whether risk assessment tools validated for use with AFV adhere to the Risk Assessment Guidelines for the Evaluation of Efficacy (RAGEE; Singh et al., 2015) statement. The RAGEE guidelines were developed by an international panel of risk assessment experts using a 4-wave Delphi process. It comprises a 50-item reporting checklist identifying key study features which should be reported when publishing risk assessment research (Singh et al., 2015).
Method
This systematic review follows the Preferred Reporting Items for Systematic reviews and Meta-Analyses guidelines (PRISMA; Moher et al., 2009). It was pre-registered with PROSPERO (CRD42022346416), and the protocol can be accessed via https://www.crd.york.ac.uk/PROSPERO/view/CRD42022346416.
Eligibility Criteria
Published or unpublished empirical studies reporting original quantitative research were eligible for inclusion. The scope of this review included articles published in any language and was inclusive of both academic and grey literature. The full inclusion and exclusion criteria are detailed below.
Population
Male or female adolescents (aged under 18 years) who have engaged in family violence (i.e., AFV users). Family violence was considered to be any abusive or aggressive behaviour (e.g., physical, verbal, psychological, emotional, or financial abuse) irrespective of the nature of the family relationship (e.g., intimate partner abuse, child-to-parent abuse, sibling abuse, grandparent abuse). Studies that solely examined familial sexual abuse or partner sexual violence were ineligible due to the different theoretical frameworks used to understand sexually harmful/abusive behaviour and the largely different legal and therapeutic responses to such behaviour for young people. However, those that included sexual abuse as part of a broader definition of family violence were eligible.
Intervention
Eligible investigations included risk assessment instruments that evaluated the risk of future family violence among AFV users aged under 18 years, irrespective of whether the instruments were developed explicitly for adolescents.
Outcomes
Studies needed to include a measure of future family violence or abusive behaviour by participants towards a member of their family, with current or former intimate partners considered in this definition. Administrative data, self-report, and informant (e.g., police) report outcome measures were eligible for inclusion.
Search Strategy
A three-phased search strategy was used to identify eligible records in the extant literature. During the first phase, keywords were identified from the research literature. Subsequently, relevant databases were identified with the assistance of a librarian from Swinburne University of Technology and three pilot searches were performed. Additionally, several key references (Loinaz & de Sousa, 2020; Shaffer et al., 2022; Tapp & Moore, 2016; van der Put et al., 2019) were identified for systematic hand-searching.
The second phase comprised the database and hand-search. A total of six databases (APA PsychNet, EBSCOhost, Medline, ProQuest Central, Scopus, Web of Science) were searched from inception to September 29, 2022 using syntax that modified and grouped keywords using wildcard and near-field search operators and grouped conceptually with Boolean operators (e.g., ‘Family Violen*’ OR ‘Family Abus*’; ‘Adolescen*’ OR ‘Teen*’; ‘Risk* tool*’ OR ‘Risk* Assess*’ OR ‘Risk* Measure*’). Retrievable records were exported into Endnote software. The databases searched along with the syntax used and records retrieved are available in Supplemental Table S1. Additionally, all records cited in the reference list of van der Put et al. (2019; a meta-analysis of domestic violence tools used with adult populations) were manually extracted and pooled with those extracted during the database search.
The final phase comprised top-up searches performed after the title and abstract screening stage to identify any pertinent studies published since the execution of Phase 2 searches. This process first involved running multiple keyword searches of Google Scholar on July 24, 2024 for articles dated between 2022 and 2024. The title of records on the first 5 pages of results from each search were screened for eligibility and the abstracts of relevant studies were assessed. The search terms of each separate list are presented in Supplemental Table S2. This process of searching Google Scholar was then repeated on September 6, 2025 to cover the period 2024 to 2025 inclusive, and the Phase 2 database search was rerun in its entirety on September 7, 2025 using the same syntax and parameters to identify any relevant research disseminated after September 29, 2025.
Study Selection
Study selection of the records extracted during the second phase was performed by A.S. and M.T. using a screening protocol for titles and abstracts developed based on the inclusion and exclusion criteria. The protocol was tested and refined on 1,500 randomly sampled records across seven waves of 100 to 300 records. Percentage agreement and Cohen’s Kappa were calculated on the decision to include or exclude the record for full-text assessment, and the screening protocol was amended as necessary. The record numbers for 299 records were mistakenly sampled two or more times across the inter-rater reliability waves, so they were removed from analysis. By the seventh wave of inter-rater reliability, agreement was consistently above 80% and Kappa was consistently above .60. The percentage of agreement and Kappa score of each wave, both including and excluding records that were rated more than once are presented in Supplemental Table 3. The remaining 10,164 records extracted during Phase 2 were equally divided between authors A.S and M.T. and screened using the updated protocol.
Full-text screening was performed by authors A.S., M.T. and C.B. In an effort to expedite full-text screening, the intervention and outcome criteria were combined into one category leaving three reasons for full-text exclusion: ineligible population (i.e., all or most participants were over 18), record type (i.e., non-quantitative methods used), or ineligible intervention/outcome (i.e., did not use a risk-assessment instrument, did not attempt to discriminate between those who would or would not engage in an eligible outcome, solely examined risk factors, or did not include a family violence outcome). Three percent (n = 78) of the 2,588 records eligible for full-text screening were randomly sampled to test this protocol. Inter-rater reliability was established (n = 78, agreement = 92.31%, Kappa = .62, p = .639). The remaining 2,510 full-texts and 56 additional records identified during the third search phase and were screened by C.B. (82%), A.S. (13%), and M.T. (5%).
Data Collection Process
Data extraction was undertaken by authors C.B. and M.R. using an extraction template informed by other systematic reviews of risk assessment tools (e.g., van der Put et al., 2019) and that was piloted by authors M.T., C.B., and M.R. Both authors extracted the information for the included studies and established agreement, with author M.T. resolving any disagreement as necessary. The extraction template sought information relating to: the publication (e.g., authors, year of publication), methods (e.g., instruments investigated, whether they were structured professional judgement or actuarial, development versus validation study, type of rater, follow-up months), participants (e.g., total sample size, analysis sample size, age (M, SD, range), ethnicity), and analysis (e.g., outcome definition, effect size metric, effect size and confidence interval, indices of predictive and discriminant ability).
Reporting Quality and Bias Assessment
Each included study was assessed against the Risk Assessment Guidelines for the Evaluation of Efficacy (RAGEE; Singh et al., 2015). These are a set of guidelines – developed to provide reporting guidance for studies examining the predictive accuracy of violence risk assessments – comprising a maximum of 50 items across six domains (Abstract, Introduction, Methods, Results, Discussion, Disclosures). Each item is rated as either present or not reported. Additionally, items can be rated as not applicable to a particular study if it is irrelevant to the study or the methods used.
Risk of bias in individual studies was evaluated using the Prediction Model Risk of Bias Assessment Tool (PROBAST; Moons et al., 2019). The PROBAST contains 20 items across 4 domains (Participants, Predictors, Outcome, Analysis): all 20 items are relevant to development studies while 17 items are applicable to validation studies. Each domain is also given a risk of bias rating which is contingent on the responses to its constituent items. Possible bias on any of the items in the Participants or Predictors domains equates to a rating of ‘high risk of bias’ for that domain. The Outcome and Analysis domains can receive one or more ratings of potential risk, and the domain can still be considered low risk if reasons are reported for why the domain should still be considered a low risk of bias. A domain can also be considered an unclear risk of bias if relevant information is missing for some items and there are no items considered as being a high risk of bias. The domains of Participants, Predictors, and Outcome are also assessed for their applicability to the review. Overall ratings of low concern for applicability or risk of bias are given when there are no domains with high concern for applicability or risk of bias. Each included study was independently evaluated on the RAGEE and PROBAST by C.B. and M.R., or A.S. and M.T., with any disagreements within these pairings resolved with the assistance of a third author as needed.
Analysis
Due to the small sample of eligible studies, all analyses undertaken were narrative. The main measures assessed were indices of predictive ability. Where included studies tested risk classification (e.g., low vs high risk), discriminant ability was measured with instrument sensitivity (the number of individuals who committed family violence during the follow-up who were categorised as high risk divided by the total number of individuals who committed family violence) and specificity (the number of individuals who did not commit family violence who were considered low risk divided by the total number of people who did not commit family violence). The area under the receiver operating curve (AUC), which indicates the probability that a randomly selected recidivist will possess a higher risk rating than a randomly selected non-recidivist (Singh, 2013), was also used to measure risk classifications as well as risk scores. Indices of instrument calibration, namely the positive predictive value (PPV: the number of individuals categorised as high risk who committed family violence during the follow-up divided by the total number of individuals categorised as high risk) and the negative predictive value (NPV: proportion of those categorised as low risk who did not recidivate), were also evaluated among studies that reported risk classifications.
Results
Study Selection
Figure 1 presents the flow of study selection. The database search yielded 11,663 unique records, of which 9,073 were excluded based on title or abstract. A total of 2,591 records were sought for full-text assessment, but three could not be retrieved. Of the 2,588 full-text assessments, 1,259 records were excluded because they used an ineligible population, 1,015 records did not use a risk assessment instrument to assess the risk of future youth family violence, and 310 did not utilize quantitative methods, so were excluded. In total, four eligible records were identified from the database search. By the end of the screening process, the research team had become aware of four additional potentially eligible records (denoted as other sources in Figure 1). Further, the top-up searches identified 52 additional records. All fifty-six of the studies identified through other sources and top-up searches were retrieved and screened: 53 were ineligible (one was a duplicate, three used an ineligible design, 18 did not feature a risk assessment instrument or did not assess the risk of AFV as an outcome, and 31 did not include adolescents). In all, seven studies (four identified through the original database search and three via top-up searches and other sources) were eligible and included in the review.

Prisma diagram (Page et al., 2021) depicting the flow of study selection.
Study Characteristics
Table 1 reports the characteristics of the seven eligible studies that were reviewed (Jolliffe Simpson et al., 2021, 2023; McEwan et al., 2019; Shaffer et al., 2022; Sheed, McEwan, et al., 2023; Spivak et al., 2021, 2024). Two pairs of studies – Sheed, McEwan, et al. (2023) and Spivak et al. (2024), and Jolliffe Simpson et al. (2021; 2023) – were respectively based on the same datasets so are not independent. All seven of the records were journal articles and were published between 2019 and 2024 (Jolliffe Simpson et al., 2021, 2023; McEwan et al., 2019; Shaffer et al., 2022; Sheed, McEwan, et al., 2023; Spivak et al., 2021, 2024). Most of the included studies were conducted in Australia (n = 4), with two undertaken in New Zealand (n = 2), and one study from Canada (n = 1). Similarly, the majority (n = 6) of the included studies were either or both validating or revalidating a risk assessment instrument. The one exception is McEwan et al. (2019), who reported on both the development and initial validation of the VP-SAFvR.
Key Descriptive and Outcome Data of Included Studies.
Notes. AUC = Area Under the Curve, CI = confidence interval, DYRA = Dynamic Risk Assessment for Family Violence, max = maximum, min = minimum; ISR = Integrated Service Response; NA = not applicable, NPV = negative predictive value, NR = not reported, PPV = positive predictive value, SAFVR = Static Assessment of Family Violence Recidivism, SAVRY = Structured Assessment of Violence Risk in Youth, SD = standard deviation, SN = sensitivity, SP = specificity, SPJ = structured professional judgement, VP-SAFvR = Victoria Police Screening Assessment for Family Violence Risk; YLS/CMI = Youth Level of Service/Case Management Inventory.
Development and validation study.
Validation/revalidation study.
53.49% aged 15 to 17, 46.51% aged 18 to 19.
In total, six separate risk assessment instruments were used to assess the risk of AFV across the seven included studies. Of these, four instruments (VP-SAFvR, SAVR, DYRA, YLS/CMI) were actuarial and two (SAVRY and the Integrated Safety Response (ISR; see Mossman et al., 2017)) were SPJ.
Four studies investigated the VP-SAFvR (McEwan et al., 2019; Sheed, McEwan, et al., 2023; Spivak et al., 2021, 2024) which was developed to assess the risk of repeat police-reported family violence incidents among victim and identified perpetrator dyads of any age (McEwan et al., 2019); and Joliffe Simpson et al. (2021) examined both the SAFVR (Bissielo & Knight, 2016 as cited by Joliffe Simpson et al., 2021) and DYRA (Bissielo & Knight, 2016) which were created to assess (a) the risk of future family violence-related charges within 2 years and (b) the short term (i.e., 3 days) likelihood of an individual harming a family member. The other eligible study by Joliffe Simpson et al. (2023) investigated the ISR, which is a triaged multi-agency response to family violence used in New Zealand. Relevant to this review, Joliffe Simpson et al. (2023) examined the collaborative structured professional judgements – namely low, medium, or high assessments of future family violence risk – made as part of the ISR risk assessment and management process.
Shaffer et al. (2022) examined the total scores and structured professional judgement assessments of both the SAVRY (Borum et al., 2006) and YLS/CMI (Hoge & Andrews, 2002) for adolescent dating violence (n = 163 adolescents aged 12–18 years). Both the SAVRY and YLS/CMI were developed for use with youth, but the former evaluates the risk of any future violence while the latter assesses the risk of subsequent general offending. It is worth noting that Shaffer et al. (2022) additionally investigated the discriminant ability of the Psychopathy Checklist: Youth Version (PCL-YV), which is a psychometric scale and not a risk assessment, so it was not eligible for inclusion in this review. However, the suite of adult and youth PCL scales are commonly used by clinicians to support assessments of risk in forensic settings (see Blais & Forth, 2014; Neal & Grisso, 2014), and the measures of the PCL-YV’s discriminant ability from Shaffer et al.’s (2022) study are presented in Supplemental Table 4. Notably, none of the instruments identified in the studies were developed specifically for AFV.
Regarding the methodology, there was both notable consistency and variance among the included studies. The size of the investigated samples ranged from 62 to 3,262 (M = 1,294, SD = 1,449) and all (n = 7) featured samples comprising adolescents under the age of 18. However, Sheed, McEwan, et al.’s (2023) study also included subgroups of 10 to 14 and 15 to 19 year-olds, the latter of which was eligible for review given that 53.49% were under 18. The length of the included studies follow-up periods varied from 6 months or less (n = 3) to 24 months (n = 1), with the remainder using a 12-month period (n = 3).
Almost all (n = 5) of the included studies utilized assessments made by police officers of community samples and considered family violence recidivism as a family violence police contact that occurred during follow-up. Conversely, risk assessments in Joliffe Simpson et al.’s (2023) study were administered collaboratively by several agencies (e.g., police, health, and family violence services) to assess the risk of family violence police contact in the community, while researchers completed the assessments in Shaffer et al.’s (2022) investigation, with their outcome defined as any intimate partner violence charge. Lastly, the base rate of recidivism among the sample of studies ranged from .04 to .43 (M = .26, SD = .11), suggesting family violence recidivism among adolescents ranged between 4% and 43%. The lowest base rates were observed among the sample with the shortest follow-up period (3 days, base rate = .04: Joliffe Simpson et al., 2021) or most narrow outcome (intimate partner violence charge, base rate = .12: Shaffer et al., 2022), while the highest base rates were found in samples with longer follow-up periods and broadly defined family violence recidivism (base rate = .43: Joliffe Simpson et al., 2023; base rate = .37: Spivak et al., 2024).
Predictive and Discriminant Abilities
The predictive and discriminant ability indices of eligible outcomes that were reported in the included studies are presented in Table 1. There was a total of k = 24 eligible outcomes included across the seven included studies. All four of the studies that examined the VP-SAFvR tested the discriminant and predictive properties of the tool at the threshold scores of 3 and 4 on the instrument (McEwan et al., 2019; Spivak et al., 2021, 2024). Sheed, McEwan, et al.’s (2023) study additionally assessed each threshold among three eligible age-groups (10–14, 10–17, and 15–19). Taken together, at threshold scores of 3 and 4, the VP-SAFvR would accurately identify adolescents as being at high risk of family violence recidivism 27% to 46% and 30% to 50% of the time, respectively. It accurately identified adolescents as being at low risk of family violence recidivism, at threshold scores of 3 and 4, 77% to 89% and 73% to 86% of the time, respectively. Joliffe Simpson et al.’s (2021) study featured six eligible outcomes by assessing the predictive and discriminant abilities of the DYRA and SAFVR after three different follow-up periods. Shaffer et al.’s (2022) research featured four eligible outcomes by testing both the total score and structured professional judgement summary ratings of the SAVRY and the YLS/CMI. Joliffe Simpson et al.’s (2023) was the only study to assess multiple outcomes of differing severity, investigating the discriminant ability of the ISR for both any family violence recurrence and family violence recurrence involving physical harm.
An AUC was reported for each of the k = 24 eligible outcomes. In all, k = 23 of the AUC point estimates for the eligible outcomes were greater than .50 and ranged from small (min = .54: YLS/CMI summary rating Shaffer et al., 2022), to moderate (max = .67: VP-SAFvR used among 10–14 and 15–19 year-olds, Sheed, McEwan, et al., 2023) in size against conventional benchmarks (Rice & Harris, 2005). However, of the AUCs that exceeded .50, k = 3 eligible outcomes’ AUCs reported by Shaffer et al. (2022), k = 4 reported by Joliffe Simpson et al. (2021), and k = 1 Joliffe Simpson et al. (2023), possessed confidence intervals that either crossed .50 or had a lower bound value equal to .50. This suggests the instrument may be no better or worse than chance at discriminating between individuals who did and did not subsequently engage in family violence. Similarly, the only AUC value below .50 was observed by Joliffe Simpson et al. (2021) using the SAFVR for a follow-up of 3 days (AUC = 0.46 [0.17–0.75]) and indicates that a randomly selected recidivist would have a slightly lower probability of having a higher risk score than a randomly selected non-recidivist.
The sensitivity, specificity, PPV, and NPV were reported for k = 18 outcomes. Sensitivity values were greater than .70 among half (k = 9) of the eligible outcomes, reflecting the correct classification of 70% of the family violence recidivists as high risk. Conversely, half of the k = 18 outcomes possessed a specificity value of less than .50, indicating that more than 50% of the individuals categorised as high risk in these samples did not recidivate during the follow-up period. Across all but one of the samples in these studies, the PPV values were higher than the sample base rate, indicating that a rating of high-risk would exceed a chance guess of who would subsequently recidivate. For example, the proportion of adolescents considered high risk on the SAFVR with a new police-reported family violence incident after 5.52 months (PPV = .60) in Joliffe Simpson et al.’s (2021) study almost doubled the base rate for the same sample and period (.37). Each eligible NPV value reported in the included studies was .65 or greater, with most (k = 11) exceeding .80. This indicates 80% or more of the individuals categorised as low risk in most (k = 11) of the eligible outcomes did not engage in family violence recidivism during their follow-up period.
Reporting Quality
Table 2 displays a summary of each included study’s adherence to the RAGEE guidelines (Singh et al., 2015). The number of applicable items varied across the included studies (e.g., n = 5 studies did not conduct post hoc analyses, so questions related to these analyses were irrelevant), ranging from 43 (McEwan et al., 2019; Shaffer et al., 2022) to 48 items (Spivak et al., 2021). Overall, the information reported in the included studies was largely consistent with the RAGEE guidelines, with all seven of the included studies evidencing 77% or more of the guideline items applicable to their study. For instance, all or nearly all the studies reviewed reported information adhering to 75% or more of the items deemed applicable to the Abstract, Participants, Instrument design, Instrument administration, Predicted outcome, and Discussion sections of the guidelines. The adherence of the included studies was notably lower in the Study design (min = 60%, max = 80%), Statistical analysis (min = 50%, max = 100%), Predictive validity (min = 0%, max = 100%), and Disclosures (min = 0%, max = 75%) sections.
Summary of RAGEE Guidelines (Singh et al., 2015) Items for Included Studies.
Note. k = number of items.
Applicability and Risk of Bias
Each included study was reviewed using the PROBAST assessment for applicability and risk of bias in prediction studies (Wolff et al., 2019) and the results are presented in Figure 2. The PROBAST contains some items that are solely appropriate to development studies. Hence, McEwan et al.’s (2019) development and validation arms were assessed on the PROBAST separately 1 , taking the total number of PROBAST outcomes to 8. Concern for applicability on the PROBAST is assessed for the Participants, Predictors, and Outcome domains. All three domains in all eight outcomes were considered applicable to the current review. Hence, each included outcome’s overall concern for applicability was rated as low.

PROBAST (Wolf et al., 2019) concern for applicability and risk of bias ratings for included studies.
Conversely, 6 of the 8 outcomes were considered as being at an overall probable high risk of bias (McEwan et al., 2019*; 2019; Joliffe Simpson et al., 2021, 2023; Shaffer et al., 2022; Sheed, McEwan, et al., 2023). In five of these cases, this was due to a rating of a high risk of bias in the Analysis domain and the consistent pattern across these outcomes was the presence or mishandling of missing data. In addition to the Analysis domain, Joliffe Simpson et al.’s (2023) study was rated as high risk of bias in the Predictors domain. This reflects how the triage teams under investigation would typically administer the assessment: they recorded the relevant risk factors only after they had assigned a risk category.
Conversely, none of the eight outcomes controlled for participants’ time-at-risk, which could be an unmeasured source of bias if significant numbers of participants were in custody during the follow-up period. However, this was not considered grounds for a rating of high-risk on the Analysis domain for the Spivak et al. (2021, 2024) studies given the size of their samples (n = 207; n = 2722, respectively), the recruitment of participants after an unsubstantiated incident of family violence, and joint ratings on other items did not indicate an elevated risk of bias.
All seven studies included instruments that featured a predictor (violence) in the outcome (family violence). However, this was not considered sufficient to rate the Outcome domain as being a high risk of bias in and of itself because each outcome’s possible predictors and dependent variables were temporally distinct episodes of violence. Indeed, the eight included outcomes were all considered as being at a low risk of bias in the Participants and Outcome domains except Shaffer et al.’s (2022) which did not report sufficient information for a rating on the Outcome domain.
Post-hoc Analysis
The full-text screening process uncovered numerous studies of instruments that purported to assess for AFV, however, the instruments evaluated were screening tools as opposed to risk assessment tools. For this reason, a member of the research team performed a search of abstracts and titles of all records from the database and other sources that were excluded at the full-text level due to their outcome. Records’ title and abstract were reviewed to identify studies that developed or validated AFV screenings tools. This process revealed n = 33 studies concerned with the development or validation of screening tools for AFV behaviours, as opposed to risk assessment tools for AFV.
Discussion
Overview of Findings
This systematic review evaluated the current literature regarding risk assessment tools evaluated for use with AFV. Of the records screened during the three-phased search strategy, seven were eligible for inclusion in this review. These studies validated a mix of actuarial and structured professional judgment instruments, with the VP-SAFvR being the most frequently assessed tool (McEwan et al., 2019; Sheed, McEwan, et al., 2023; Spivak et al., 2021, 2024). The reviewed instruments varied in their predictive validity and discriminative power, with AUC values ranging from .54 to .67 (i.e., poor to moderate discriminative ability). Sensitivity was generally high, reflecting good identification of recidivists, while specificity was often lower, indicating challenges in accurately identifying non-recidivists.
There are three key findings for this review; (a) there are relatively few validated risk assessment tools for AFV behaviours and, among those few tools which have been evaluated for use with this population, there is variability in the predictive and discriminant validity, (b) there is reasonably consistent adherence to the RAGEE guidelines for reporting validation studies of violence risk assessments, and (c) there is seemingly high risk of bias present in studies that have validated risk assessment tools for AFV behaviours. Practice, policy and research implications are discussed, and summarised in Table 3.
Implications for Practice, Policy, and Research.
Validated Risk Assessment Instruments
Our findings indicate that there are few risk assessment tools that have been validated for use with adolescents engaging in family violence behaviours, and of those that have, the predictive and discriminant validity across tools is inconsistent. We note that the post-hoc analysis identified 33 studies assessing screening tools (e.g., assessing for the presence/absence, type, frequency of AFV behaviours) designed to identify AFV. Although useful for the identification of AFV behaviours and need for further assessment, screening tools have no utility in assessing the risk of AFV as they are simply a measure of the presence of a behaviour/s, and it is unreasonable to claim otherwise.
The VP-SAFvR appears to be the most effective at distinguishing between adolescents engaging in family violence behaviours who will and will not recidivate. While the other tools (DYRA, SAFVR, SAVRY, YLS/CMI, and ISR) exhibited poorer discriminant abilities, these results may be due to methodological and definition differences. For example, Joliffe Simpson and colleagues’ (2021) evaluation of the DYRA and SAFVR used three follow-up periods, all of which were less than 6 months (i.e., 3 days, 12 weeks, and 24 weeks), whereas all VP-SAFvR studies used a follow-up period of at least six months. A shorter follow-up period means less opportunity to recidivate and given the likely underreporting of incidents of AFV to police (Fitz-Gibbon et al., 2022), the use of policing data to determine recidivism with a short follow-up period may significantly limit the number of young people identified as re-engaging in family violence behaviours.
Although Shaffer and colleagues (2022) used a 2-year follow-up period to evaluate the SAVRY and YLS/CMI, their definition of family violence recidivism was narrow (i.e., intimate partner violence charge) compared to the other studies which used any family violence type/police attendance at family violence incidents as the indicator of recidivism. While it may be beneficial in some instances to utilise more narrow definitions of family violence (e.g., identifying unique risk factors), research has shown a high level of co-occurrence of family violence across different relationships (e.g., intimate partner abuse, child maltreatment), as well as significant overlap in risk and protective factors (Chan et al., 2021). In general, the use of a broader definition (i.e., any family violence) as opposed to a narrower definition (i.e., only intimate partner violence) may be best suited to an adolescent population, as recent literature has highlighted that young people engaging in family violence behaviours tend to do so across different relationships (Nowakowski-Sims, 2019). This was evidenced in one of the studies included in this review (Sheed, McEwan, et al., 2023) where rates of any-dyad recidivism (35.31%) were higher than same-dyad recidivism (24.24%).
Consideration to the sensitivity and specificity of risk assessment tools is also required when assessing their practical implementation. Some studies (e.g., Jolliffe Simpson et al., 2023; Shaffer et al., 2022) did not report these statistics. However, those that did revealed that, while sensitivity was robust (particularly for the VP-SAFvR when a threshold of 4 was applied) – indicating a significant proportion of recidivists were accurately identified – specificity was often lacking. This suggests that a substantial number of individuals classified as high risk did not engage in subsequent family violence, which could lead to overestimation of risk (Singh et al., 2015). Balancing sensitivity and specificity remains a key challenge, as high sensitivity may come at the cost of lower specificity, leading to potential issues with false positives. This does not necessarily deem the VP-SAFvR inappropriate, rather, it highlights the importance of risk assessment tool-users being aware of a tool’s limitations and subsequent practical implications.
Risk Assessment Study Quality
Adherence to the RAGEE guidelines was generally high, but notable shortcomings were observed in the study design and statistical analysis domains (Singh et al., 2015). Specifically, the failing to report time at risk (i.e., the time during the follow-up period that family violence recidivism could have occurred), inter-rater reliability, and lack of transparency/clarity of statistical analysis plans were common. This does not necessarily indicate that the authors did not address these aspects; rather, they were not reported in the reviewed article and therefore cannot be confirmed. This is undesirable for the area of risk assessment research as inconsistent methodological reporting limits reproducibility, attempts at meta-analytic work, and clinical utility of risk assessment instruments (Singh et al., 2015). However, the lack of information pertaining to inter-rater reliability may have been due to many of the studies relying on field data, rather than the tools being completed by researchers. With the above considered, the generally high adherence to the RAGEE guidelines indicates a relatively high level of rigor in the quality of the articles reviewed, with room for improvement. To support future methodological rigour in this field, it may be appropriate for journals to consider making a request of scholars to submit a RAGEE adherence checklist along with their publication, and for this to be included in the supplementary material.
Risk of Bias
The PROBAST assessment indicated a probable high risk of bias in five out of the seven studies, primarily due to issues with the management of missing data and consideration of time-at-risk (Wolff et al., 2019). Time at risk is an important consideration in the validation of risk assessment tools as these instruments often aim to predict events or outcomes within a specific time frame. In the instance of AFV, if an adolescent’s removal from risk factors (e.g., placed in custody, removal from the family home) is not considered and managed during validation, the tool’s ability to accurately forecast these events may not be adequately captured.
It is acknowledged that determining time at risk in an adolescent family violence population is challenging, given that few police contacts result in charges, few charges result in custodial sentences, and that custodial sentences for young people engaging in crime are (on average) considerably less than their adult counterparts (Australian Institute of Health and Welfare, 2016; Sheed et al., 2024). Similarly, it is often challenging for scholars to capture whether an adolescent has been excluded from the family home, however capturing this as a component of time at risk is important as the relinquishment of care or removal of the child from the home means there are fewer opportunities for the use of family violence. Regardless, it remains imperative that time at risk be considered in studies evaluating the predictive validity of risk assessment tools. Positively, the overall applicability of the studies included was rated as low risk.
Lastly, while Jolliffe Simpson et al.’s (2023) examination of the ISR reflected a high risk of bias in the Predictor domain, it is important to note that this is not reflective of the authors’ approach. Instead, it is an artefact of the multi-agency ISR assessments examined which, as highlighted as a potential limitation by the authors, were made in a seemingly unstructured way and prior to the identification of a set of specific risk factors (Jolliffe Simpson et al., 2023).
Strengths and Limitations of the Review
To our knowledge, this review represents the first attempt to systematically identify and synthesise research regarding risk assessment tools used to predict family violence recidivism among adolescents with a history of engaging in family violence. It provides a contemporary overview of available risk tools for use by researchers, clinicians/practitioners, and police wanting to assess recidivism among adolescents who use family violence. While the final studies included in this review were all published, the inclusion of grey literature and studies published outside the English language in our search minimised the potential that the results would be negatively impacted by publication or location bias. Thorough inter-rater reliability processes were adhered to, and assessments of quality and risk of bias were completed. The evaluation of research according to the RAGEE guidelines also represents a significant contribution to the literature.
This review is limited in three key respects. First, the lack of cultural or ethnic diversity considered within the samples used within the included studies (except for the Shaffer et al. (2022) study reporting ethnicity of their adolescent sample) means we were unable to comment on the applicability of the examined instruments for specific cultural or ethnic groups. Second, all included studies were drawn from samples of adolescents from Western countries (i.e., Australia, Canada, New Zealand), likely impacting generalisability of findings to non-Westernised nations. The reason for the lack of published research in non-Westernised countries is unclear, however we note that the scope of this review permitted inclusion of studies published in languages other than English, though no studies met inclusion criteria. It is therefore possible that researchers in non-Westernised countries are yet to examine the predictive validity of tools for use with adolescents who use family violence.
Future Directions for Research
The results of this review have highlighted several directions for future research. First, the distinct lack of research in this area indicated a need for further research examining the predictive validity of risk assessment tools for young people who use family violence. There is need for examination of whether existing youth violence/offending risk assessment tools (e.g., SAVRY, YLS/CMI) are valid for use with adolescents who engage in various types of family violence (e.g., child-to-parent abuse, sibling abuse, intimate partner abuse). It also appears prudent to explore the possibility of developing an AFV-specific risk assessment tool with utility across multiple settings. Based on the AFV literature and family violence literature more broadly, it is possible that the predictive validity of current tools is being limited by their minimal consideration of dyadic, victim vulnerability, and situational factors specific to adolescents who use family violence (Boxall et al., 2021; Sheed, Maharaj, et al., 2023; Simmons et al., 2018). It is therefore feasible that a tool developed specifically for use with adolescents who use family violence may provide an enhanced predictive capacity.
Second, there is a need for greater consideration, as well as consistency, across studies regarding how family violence recidivism is measured. All studies which met inclusion criteria utilised police data as their outcome data source, though different measures of family violence recidivism were used. Shaffer et al (2022) used subsequent police charges, while Sheed, McEwan, et al., 2023, McEwan et al (2019) and Spivak et al (2021, 2024) used the more sensitive measure of any subsequent police contact for using family violence (i.e., police call outs that may not result in charges). It is possible that, the low recidivism base rate issue encountered by Shaffer et al (2022) could have been remedied, and a more accurate picture of AFV recidivism been captured, had a more sensitive primary outcome measure (e.g., police contacts) been used. Therefore, it is not only important that researchers aim for consistency in how they measure recidivism (to ensure ease of comparing the validity of risk tools), but also to consider the sensitivity of the measure.
Third, there is great need for research to examine the validity of risk assessment tools across diverse demographic groups. Consideration of culture, ethnicity, gender diversity and disability status are largely absent from the existing literature. This represents a significant gap which needs to be addressed to ensure equitable and accurate prediction of recidivism across the entire adolescent population.
Fourth and related to the above, consideration needs to be given to racial fairness in developing or using any risk assessment tool for AFV. Shepherd and Willis-Esqueda (2018) considered the addition of a race addendum for risk instruments to improve fairness among risk classifications, which may include suggestions to clinicians about working effectively interracially and/or include contextual information for each risk item. However, it has been highlighted that the addition of any materials to existing instruments would require that their predictive and discriminant validity be retested (Shepherd & Spivak, 2021).
Clinical and Policy Implications
The findings have important implications for both clinical practice and policy. With regard to practice, practitioners should be aware of the limitations of commonly used risk assessment tools when assessing the risk of AFV. For example, the results of Shaffer et al. (2022) indicate that common SPJ tools, the SAVRY and YLS/CMI, may be no better or worse than chance at predicting adolescent intimate partner violence, therefore, are not recommended for use with this population for this particular purpose.
Even if using a tool with adequate predictive and discriminant validity, practitioners need to be cautious when interpreting risk assessment scores, particularly the challenges in balancing sensitivity and specificity (Bissielo & Knight, 2016) and awareness of the population on which the tool was validated. Although the VP-SAFvR exhibited the most promising predictive and discriminant abilities in the sample of tools assessed, this instrument is actuarial in nature and designed for use by police, which arguably has limited utility for practitioners and expert witnesses making risk judgements, recommendations, or providing intervention for young people engaging in family violence behaviours. Thus, there appears to be a somewhat urgent need for an AFV specific SPJ risk assessment tool to be utilised by clinicians and experts in various settings, including child and family community services, mental health services, corrections, and the courts.
Policymakers are encouraged to support such research as it is imperative that the tools being used to assess the risk of AFV are evidence-based and have been found to have adequate predictive validity. As indicated by Meyer et al. (2023) in their review of domestic and family violence perpetrator screening and risk assessment within Australia, the predictive validity of the tools being used across Australia in the family violence sector remain limited, or in some case, undetermined. For example, within the AFV space in Victoria, Australia, the risk assessment frameworks currently used in the sector are yet to be validated. The use of unvalidated risk assessment in high stakes scenarios, such as family violence, raises both ethical and safety concerns for individuals, families, and the community.
Conclusion
This systematic review highlights the critical need for validated risk assessment tools tailored specifically for AFV, particularly for clinicians (e.g., psychologists, social workers) working in this space. The findings reveal significant gaps in the availability and efficacy of such tools, with the VP-SAFvR emerging as the most promising, albeit with limitations related to predictive and discriminative validity. The high sensitivity but low specificity of these tools suggests a propensity for overestimating risk, which could have profound implications in practice. Furthermore, the review highlights methodological weaknesses in existing studies, such as the reliance on police data and inadequate consideration of time-at-risk, which undermine the accuracy and generalisability of findings. The lack of diversity within research samples also raises concerns about the applicability of these tools across different demographic groups.
Given these limitations, there is an urgent need for the development of AFV-specific risk assessment instruments that are validated across diverse populations and settings. Policymakers and practitioners must prioritise evidence-based approaches to risk assessment in AFV to ensure accurate and equitable outcomes. Future research should focus on refining methodological approaches, expanding data sources, and exploring the unique risk factors associated with different forms of AFV. By addressing these challenges, the field can move towards more reliable and effective tools that better serve the needs of adolescents, families, and communities impacted by family violence, and the services responsible for their care and management.
Supplemental Material
sj-docx-1-tva-10.1177_15248380251412518 – Supplemental material for A Systematic Review of Risk Assessment Measures for Adolescent Family Violence
Supplemental material, sj-docx-1-tva-10.1177_15248380251412518 for A Systematic Review of Risk Assessment Measures for Adolescent Family Violence by Abigail Sheed, Maddison Riachi, Catie Bridgeman, Nina Papalia, Melanie Simmons, James R. P. Ogloff and Michael D. Trood in Trauma, Violence, & Abuse
Supplemental Material
sj-docx-2-tva-10.1177_15248380251412518 – Supplemental material for A Systematic Review of Risk Assessment Measures for Adolescent Family Violence
Supplemental material, sj-docx-2-tva-10.1177_15248380251412518 for A Systematic Review of Risk Assessment Measures for Adolescent Family Violence by Abigail Sheed, Maddison Riachi, Catie Bridgeman, Nina Papalia, Melanie Simmons, James R. P. Ogloff and Michael D. Trood in Trauma, Violence, & Abuse
Supplemental Material
sj-docx-3-tva-10.1177_15248380251412518 – Supplemental material for A Systematic Review of Risk Assessment Measures for Adolescent Family Violence
Supplemental material, sj-docx-3-tva-10.1177_15248380251412518 for A Systematic Review of Risk Assessment Measures for Adolescent Family Violence by Abigail Sheed, Maddison Riachi, Catie Bridgeman, Nina Papalia, Melanie Simmons, James R. P. Ogloff and Michael D. Trood in Trauma, Violence, & Abuse
Supplemental Material
sj-docx-4-tva-10.1177_15248380251412518 – Supplemental material for A Systematic Review of Risk Assessment Measures for Adolescent Family Violence
Supplemental material, sj-docx-4-tva-10.1177_15248380251412518 for A Systematic Review of Risk Assessment Measures for Adolescent Family Violence by Abigail Sheed, Maddison Riachi, Catie Bridgeman, Nina Papalia, Melanie Simmons, James R. P. Ogloff and Michael D. Trood in Trauma, Violence, & Abuse
Footnotes
Authors’ Note
A summary of findings was presented orally at the Australian New Zealand Psychiatry Psychology and Law (ANZAPPL) conference in November 2024. With the exception of this presentation, the specific ideas and data analyses presented in this work have not been otherwise published or presented.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Authors Abigail Sheed, Melannie Simmons, Nina Papalia, James Ogloff, and Michael D. Trood co-authored one or more articles that are the subject of this review and James Ogloff is a co-author of the VP-SAFvR but receives no financial benefit from its use. The authors have no other conflicts of interest to declare.
Supplemental Material
Supplemental material for this article is available online.
Notes
Author Biographies
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
