Abstract
Using a hybrid prospective-retrospective study design, we examined the predictive validity of the Ontario Domestic Assault Risk Assessment (ODARA), Domestic Violence Risk Appraisal Guide (DVRAG), and the Psychopathy Checklist–Revised (PCL-R) in an incarcerated sample of 94 Canadian adult men with a history of intimate partner violence (IPV) referred for treatment. The sample was followed for an average of 3.6 years following release, yielding base rates of 23.4% and 12.8% for violent and IPV recidivism, respectively. Analyses revealed that the ODARA and DVRAG displayed high inter-rater reliability and that the two measures along with Factor 1 of the PCL-R generated the largest area under the curve (AUC) for IPV recidivism (AUC = .71, .71, and .69, respectively). Predictive validity of the three measures was maintained even after accounting for treatment exposure. Although promising, clinical implications of administering the ODARA and DVRAG to incarcerated men with a history of IPV are discussed.
Intimate partner violence (IPV) is a widespread public safety concern that affects women and young girls worldwide (Sardinha et al., 2022). Although estimates of IPV vary by country, with exact figures remaining elusive, the behavior is pervasive nonetheless. Within the United States, estimates indicate that approximately 32.9% of women and 28.1% of men have experienced physical violence perpetrated by an intimate partner in their lifetime (Breiding et al., 2014). A statistical survey of police-reported family violence between 2009 and 2021 conducted in Canada revealed that the rate of IPV committed against a female victim was the most frequently reported form of family violence, with continual increases in the rate of IPV being observed over the past 7 years (Statistics Canada, 2023). Moreover, a total of 114,132 victims of IPV were reported by police with the rate of victimization being four times higher among females than males. These findings are a clear indication that IPV remains a prevalent and ongoing societal concern and underscores the need for appropriate identification and management of perpetrators of IPV.
Assessing Risk for IPV
Historically, IPV risk assessment emerged in the literature following the establishment of other areas such as general and sexual violence risk assessment (e.g., Bowen, 2011; Kropp, 2004). This apparent lag in the earlier development and validation of IPV-specific risk assessment measures may be explained, in part, by the questioned necessity of such measures due to the high degree of overlap between risk factors for violence and IPV (Hanson et al., 2007; Hilton & Harris, 2005). Multicollinearity or redundancy among risk factors has been previously observed in relation to measures designed to predict general and violent recidivism (Kroner et al., 2005) and across various correctional populations (e.g., Hilton et al., 2001). A clear example of this issue was demonstrated with the Violence Risk Appraisal Guide (VRAG; Harris et al., 2015), a measure designed to predict violent recidivism which was found to be moderately predictive of IPV (weighted d = 0.65, 95% CI [0.49, 0.80] compared to weighted d = 0.40, 95% CI [0.32, 0.48] for spousal assault scales; Hanson et al., 2007).
Nevertheless, the need to predict IPV recidivism specifically has resulted in a growing number of IPV risk assessment measures being developed over the past 20 years. Within their recent systematic review, Graham et al. (2021) identified 18 IPV risk assessment measures among 43 studies. Among these measures was the Ontario Domestic Assault Risk Assessment (ODARA; Hilton et al., 2004). At the time of its development, there were no pre-existing actuarial measures designed to assess IPV risk. Hilton et al. (2004) developed the ODARA for use by frontline law enforcement personnel based upon readily available information from police reports of IPV. They identified a sample of 689 participants (589 for the construction phase and an additional 100 for cross-validation) in which there was the presence of “a victim report or police evidence of forceful physical contact by a man against his current or former wife or common-law wife” (p. 269). IPV recidivism was then coded, regardless of whether a formal charge was entered, over a mean follow-up period of 4.79 years. Thirty percent (n = 175) of the construction sample reoffended, with 95% of the recidivists reoffending against the victim of their index offense. Using similar statistical procedures employed during the development of the VRAG, Hilton et al. (2004) identified a total of 13 items based on their unique statistical contribution to the prediction of IPV recidivism (e.g., pre-index domestic assault, victim concern about future assaults, substance abuse). As reported by Hilton et al. (2004), validity of the ODARA in predicting IPV recidivism was strong (area under the curve [AUC] = .77 and .72, for the development and cross-validation samples, respectively), with Kropp (2008) considering it to be among four IPV-specific risk assessment measures to “hold the most promise” in his review of the literature.
In reflecting on the ODARA’s design for use by frontline police personnel, Hilton et al. (2004) argued that not accounting for more in-depth clinical information available to clinicians within forensic/correctional settings “could lead to suboptimal prediction” (p. 151). This led to the development of the Domestic Violence Risk Appraisal Guide (DVRAG; Hilton et al., 2008). Using 303 cases with extensive file information from the ODARA construction sample with an additional sample of 346 cases meeting inclusion criteria, Hilton et al. examined whether the addition of pre-existing risk assessment measures (e.g., VRAG) to the ODARA could improve upon its performance in predicting various IPV recidivism outcomes (i.e., number of recidivistic incidents, number of incidents with severe violence, and for total recidivism injury). Among the five risk assessment measures examined, the Psychopathy Checklist–Revised (PCL-R; Hare, 2003) was the best predictor in comparison to the other risk measures and was the most consistent in improving upon the predictive validity of the ODARA. Using the simple summation technique (see Nuffield, 1982), statistical weights were developed for each item based on the deviation from the recidivism base rate. Thus, the resulting DVRAG differed from the ODARA through its inclusion of a weighted scoring system and addition of the PCL-R score.
Although designed to assess psychopathic personality (Hare, 2016), the PCL-R has been incorporated into the violence risk assessment process due to its associations with recidivism. For instance, measures such as the VRAG have directly incorporated the PCL-R total score as an individual item, and use of the PCL-R by mental health professionals as a stand-alone risk assessment measure has been well documented within the literature (DeMatteo & Olver, 2022). Factor structure of the PCL-R has been extensively studied, with results suggesting a hierarchical two-factor, four-facet model of psychopathy (see Hare, 2003, 2016). Factor 1 encompasses the Interpersonal and Affective facets and consists of interpersonal and affective traits reflective of “selfish, callous, and remorseless use of others,” whereas Factor 2 encompasses the Lifestyle and Antisocial facets and consists of items tapping an unstable and socially deviant lifestyle and prior criminal behavior which are reflective of a “chronically unstable and antisocial lifestyle” (Hare, 2003, p. 79). Meta-analytic investigations have revealed that the validity of the psychopathy checklists (i.e., the PCL-R and its variants) in predicting violence is largely attributed to Factor 2, whereas the personality features encompassed on Factor 1 have typically underperformed in predicting violent outcomes (e.g., DeMatteo & Olver, 2022; Olver et al., 2020).
Although significantly predictive of IPV recidivism within the DVRAG development sample (Hilton et al., 2008), some researchers had previously argued that psychopathy was unrelated to IPV as the absence of any meaningful interpersonal attachment with others has long been considered a “fundamental” aspect of psychopathic personality (Dutton & Kropp, 2000). Earlier research had found psychopathy to be significantly lower among incarcerated men with a history of IPV compared to those without (Hilton et al., 2001), and a comparison of recidivistic and non-recidivistic perpetrators of IPV by Kropp and Hart (2000) revealed non-significant differences for the total and factor scores of the Psychopathy Checklist: Screening Version (PCL:SV; Hart et al., 1995). More recent meta-analytic evidence suggests, however, that a small, yet significant association exists between higher psychopathy scores and IPV perpetration (weighted r = .20, k = 14, N = 4,600; Robertson et al., 2020). This association between higher psychopathy scores and IPV perpetration has also been observed among adolescent samples (e.g., Brazil & Forth, 2024). However, results have been slightly mixed with respect to IPV recidivism. For instance, Rettenberger and Eher (2013) found that the PCL-R neither added any significant incremental validity to the ODARA nor significantly predicted violent or IPV recidivism. In contrast, those who committed IPV recidivism following treatment were approximately six times more likely to have a score above 12 on the PCL:SV (Dutton & Kropp, 2000) and meta-analytic results based on a small range of effect sizes (k = 2-7) have revealed that the PCL-R/PCL:SV is at least moderately predictive of IPV recidivism (Hanson et al., 2007; van der Put et al., 2019).
The Present Study
With meta-analytic evidence of slightly larger effects for the DVRAG over the ODARA in predicting IPV recidivism (mean AUC = .72 and .69, respectively; van der Put et al., 2019), research on the DVRAG has been slower with such results being based on a smaller number of studies in comparison to the ODARA (3 studies with 6 effect sizes vs. 10 studies with 22 effect sizes, respectively). Despite research on the ODARA continuing to accumulate (e.g., Jung & Himmen, 2022), there remain few published studies examining its more in-depth counterpart, the DVRAG (Hilton, 2021). In addition, we know of only two published studies that have assessed the validity and reliability of the ODARA among incarcerated samples (Hilton et al., 2010; Rettenberger & Eher, 2013), the latter of which also examined the DVRAG within a sample of incarcerated men with a history of sexual offending. As such, the purpose of the current study was to investigate the reliability and validity of the ODARA and DVRAG within a sample of incarcerated men with a history of IPV.
First, our study will extend the existing body of research on the validity of the ODARA and DVRAG and will be the first to assess federally incarcerated Canadian men serving 2 years or more, all of whom had a history of IPV. Second, our study design permitted evaluation of the validity of the PCL-R in predicting IPV recidivism, adding to a small literature examining this relationship. Third, given the incorporation of the PCL-R total score with the ODARA in the DVRAG, we examined whether the PCL-R total and factor scores add incremental validity to the ODARA in predicting recidivism. We hypothesized that the ODARA and DVRAG would demonstrate moderate to high inter-rater reliability and validity in predicting IPV recidivism. Furthermore, though research on the association between psychopathy and IPV has been limited and generally mixed, we hypothesized that Factor 2 of the PCL-R would contribute to the improved predictive accuracy of the ODARA relative to Factor 1 given the empirical findings with violent recidivism. As the current sample was drawn from a larger group of incarcerated men referred for IPV-specific treatment, we also conducted an exploratory analysis to determine whether treatment exposure impacted the predictive validity of the risk assessment measures, an often understudied aspect within violence risk assessment research.
Method
Reporting of the methodology and results for the current article was in accordance with the Risk Assessment Guidelines for the Evaluation of Efficacy Statement (Singh et al., 2015), a 50-item checklist designed to increase consistency in reporting across violence risk assessment studies examining predictive validity. Research approval for the current study was granted by the Psychology Research Ethics Board of Carleton University and the Research Branch of the Correctional Service of Canada. This study was not preregistered.
Participants
The current sample consisted of 94 incarcerated adult males referred to the Family Violence Prevention Program (FVPP). Age of the sample ranged from 20 to 57 years (M =35.58, SD = 9.13), with 84.0% being European Canadian/White, 6.4% African Canadian/Black, 1.1% Indigenous, and 8.5% listed as other. Eighty-eight participants were serving a determinate sentence ranging from 10 months to 10 years (M = 40.68 months, SD = 24.49), four were serving a determinate sentence with a long-term supervision order (M = 51.25 months, SD = 34.48, range = 0-73 months), and two were serving an indeterminate (life) sentence. 1 For 40.4% of the sample, the index offense which led to their current incarceration was IPV related. With respect to relationship status with their current/former intimate partner, 56.4% were common-law/cohabiting, 27.7% were dating, and 16.0% were married. Sexual orientation of the sample was predominantly heterosexual, with a single participant identifying as bisexual and having a history of violence against male and female intimate partners.
Measures and Recidivism Variables
The ODARA
The ODARA is a brief 13-item actuarial measure designed to assess risk for IPV recidivism (Hilton et al., 2004). Items are scored in a dichotomous yes/no format and can be completed by correctional staff in addition to frontline law personnel following a review of the individual’s criminal history and information pertaining to the index offense. Scores range from 0 to 13, with higher scores indicating elevated levels of risk, and fall within one of seven risk categories which provide an estimate for the probability of IPV recidivism. High inter-rater reliability was found for ODARA scores among 24 randomly selected cases (intraclass correlation coefficient [ICC] = .90 [absolute agreement]) and the calculated standard error of measurement (SEM) was 0.94 (Hilton et al., 2004).
The DVRAG
The DVRAG is a 14-item actuarial risk assessment measure designed to predict domestic violence recidivism that incorporates all 13 items from the ODARA in addition to the PCL-R total score (Hilton et al., 2008). Ten of the items are dichotomous, with the remaining four items being trichotomous. Items are weighted based on their ability during the development phase to distinguish IPV recidivists from nonrecidivists. For example, Item 1 (number of pre-index assaults in a police or criminal record) may be scored as −1, 0, or +5 depending on the number of assaults against a former/current intimate partner (or her children) contained within the individual’s official records. For the PCL-R total score (Item 14), the weighted score ranges from −1 to +6. DVRAG scores range from −10 to +46, with higher scores indicating elevated risk, and fall within one of seven risk categories that provide an estimate for the probability of IPV recidivism. High inter-rater reliability was achieved for 10 randomly selected cases within the development sample (ICC = .90 [absolute agreement]), while the SEM was found to be 2.20 (Hilton et al., 2008).
The PCL-R
The PCL-R is a 20-item rating scale designed to assess psychopathic personality traits (Hare, 2003). Scoring of the PCL-R is based a semi-structured interview and thorough file review, with scoring details provided within the user manual (Hare, 2003). Items are scored on a 3-point scale: 0 = does not apply, 1 = maybe/in some respect, and 2 = does apply. Summing the 20 items produces a total score that can range from 0 to 40. Hare (2003) has designated a score of 30 and above as the threshold for diagnosing psychopathy. Investigations into the field reliability of the PCL-R within applied settings have yielded ICCs ranging from .33 to .90 for the PCL-R total score (mean weighted ICC = .68; Olver et al., 2020).
Official Recidivism Outcomes
Follow-up was conducted for all participants following their release from federal custody. Correctional files (available on the Offender Management System [OMS]) and official police records (Fingerprint Service records of the Royal Canadian Mounted Police) were examined for the presence of recidivism. Violent recidivism was coded as a dichotomous variable (yes/no) based on a participant receiving either a charge or a conviction. Only charges that were dismissed, withdrawn, or received a stay of proceedings or peace bond were coded. Charges for which a participant was acquitted or found not guilty were not coded as recidivism. Included within the definition of violence were convictions or charges for robbery, assault, murder (including manslaughter and attempted murder), contact sexual offenses, kidnapping, arson with disregard for human life, assaulting a police officer, and uttering threats/criminal harassment. Violent offenses committed within the context of an intimate relationship were dichotomously coded as IPV recidivism.
Offenses detailed on OMS which did not appear on the participants’ criminal record but had sufficient information with trial and/or sentencing outcome were also coded. Given that IPV may also occur while a participant is incarcerated (via conjugal visits; Dutton & Hart, 1992), any offenses committed while in custody, whether during federal or provincial incarceration, that garnered a criminal charge or conviction were included. Time-at-risk was the calculated number of days between the initial date of release from a federal institution and one of three subsequent dates: date of offense, charge, or conviction; date of death; or date of follow-up. For participants who committed a violent or IPV offense prior to release (i.e., while incarcerated), their time-at-risk was the calculated number of days between the date of the family violence assessment (i.e., from May 5, 2002 to January 1, 2009) and the date of the offense.
Treatment Exposure
Because of missing or incomplete data, the calculated number of days between the start date and program completion date or date of program suspension served as a proxy measure of the number of group sessions attended. Number of group sessions attended was available for 37 participants and found to be highly correlated with the calculated number of days in program (rs = .90, p < .001).
Procedure
This study employed a hybrid prospective-retrospective study design with a subsample of participants originating from a previously identified sample of 586 federally incarcerated men referred to the FVPP (see Connors et al., 2012, 2013). Prior to sample selection, we determined that a minimum sample size of 90 participants was required to ensure statistical significance of an AUC value of .71 for a one-sided 5% test of significance with power set to 80% based on statistical procedures described by Hanley and McNeil (1982). Initially, 224 cases met criteria for inclusion within the current study. Cases were subsequently screened out if no psychological risk assessment was conducted at intake, prior to transfer to lower security, or prior to community release. Among the 224 cases, 146 met criteria for a psychological risk assessment and had a PCL-R administration on file. Of these, 94 cases met inclusion criteria for the study and had sufficient information to score the ODARA and DVRAG.
To be included within the current study, the following three criteria were required: (1) the participant met the referral criteria for the FVPP, (2) there is documented proof as evidenced by charges, convictions, and/or police reports that prior to the current federal incarceration the participant committed IPV against a current or former martial, cohabiting, or dating partner, 2 and (3) the participant has since completed his federal sentence (i.e., reached warrant expiry) or has been released from custody into the community on some form of structured release (i.e., unescorted temporary absence, day parole, full parole, or statutory release). All participants within the current study provided informed consent for their program information to be gathered and used for research purposes prior to their participation in the FVPP (for a description of the program, see Connors et al. [2012, 2013]).
Scoring of the ODARA and DVRAG was completed by the lead author, a master’s level graduate student at the time of data collection with relevant course work/training and clinical experience in correctional settings. To facilitate the reliable scoring of the ODARA and DVRAG, scoring certification for the lead author was obtained prior to data collection by scoring a total of 10 cases on the ODARA as provided within the ODARA 101: The Electronic Training Program (Hilton & Ham, 2015; Certificate No. 12-083). Level of inter-rater agreement between the ODARA risk categories for the 10 cases as independently scored by the lead developer (N. Zoe Hilton) and the lead author was considered high (single measures, random effects, absolute agreement; ICC = .98). Scoring of the ODARA and DVRAG occurred prior to coding of the recidivism variables and was based on a thorough review of correctional file information pertaining to psychological risk assessments, intake assessments, criminal profiles, and FVPP reports, in addition to various police and court documents located within the Police and Court Information Management Module available through OMS. No prior administrations of the ODARA or DVRAG were contained within the files.
Although 94 cases had been scored on the PCL-R, only 91 scores could be derived from file. At the time of data collection, hard copy PCL-R item, facet, and total scores were collected from the mental health files located at the Offender Records Depot within Millhaven Institution in the Ontario Region. Whenever two independently rated PCL-R assessments were available, information from both assessments was recorded for the purpose of evaluating inter-rater reliability (so long as the individual was not released nor incurred any new charges between the two assessments). For cases in which the hard copy PCL-R was unavailable (n = 58, 63.7%), PCL-R total and factor scores were estimated from percentile rankings reported within the psychological risk assessment reports.
Data Analysis
Descriptive statistics were computed for the total/factor scores of the ODARA, DVRAG, and PCL-R, with intercorrelations being calculated using Spearman’s rank-order correlation (rs). To compare norms of the ODARA and DVRAG from the current federally incarcerated sample to those previously reported by Hilton and colleagues (2008, 2010), we calculated the magnitude of the difference using Cohen’s d. For measures with multiple administrations, inter-rater reliability was examined by calculating ICCs using a two-way mixed-effects model for a single rater, absolute agreement (McGraw & Wong, 1996).
Predictive validity was primarily determined by calculating the AUC of the receiver operating characteristic (ROC) curve (Hanley & McNeil, 1982). The AUC value represents the probability that a randomly selected recidivist scores higher on a risk assessment measure than a randomly selected non-recidivist. According to Rice and Harris (2005), AUC values of .56, .64, and .71 are representative of small, medium, and large effect sizes, respectively. AUC values were calculated using the R package “pROC” (Robin et al., 2011). Predictive validity over time was examined using cumulative/dynamic time-dependent AUC with nearest neighbor estimator (AUC t C/D; see Heagerty et al., 2000) using the R package “survivalROC” (Heagerty & Saha-Chaudhuri, 2013). Using elements of ROC and survival analysis, the time-dependent AUC represents the area under the time-dependent ROC curve and is defined as the probability that a risk score measured on a random case (e.g., an individual who has violently reoffended) exceeds that for a random control (e.g., an individual who has not violently reoffended) at time t, with t representing a fixed point in time. The dynamic nature of the AUC t C/D relates to the calculation of specificity such that when Ti (i.e., the survival time for subject i) is greater than t for a case, the case will serve as a control (i.e., an individual who violently reoffends during the follow-up period but has not been violent by time t is counted as non-violent at time t). Once t ≥ Ti, however, the individual is subsequently classified as a case (i.e., the individual is counted as having violently reoffended). The cumulative aspect stems from the base rate increasing over time as prior cases are retained in the calculation of sensitivity.
To examine the incremental validity of the PCL-R with the ODARA and to examine predictive validity while controlling for treatment exposure, time-to-event analyses were also conducted using penalized Cox proportional hazards survival analysis to account for the smaller sample size and low base rates, thus reducing the potential bias in estimating the hazard ratio (Heinze & Schemper, 2001). These analyses were conducted using the R package “coxphf” (Heinze et al., 2023). Due to the legal nature of the sample and to maintain confidentiality, data for the current study are not publicly available.
Results
Program Information, Recidivism Rates, and Prevalence of Psychopathy
Over 80% of the sample had committed violence against an intimate partner prior to their IPV index offense and had evidence of prior violence against a victim who was not their current/previous intimate partner or their child (81.9% for each), suggesting that less than a quarter of the sample were exclusively violent toward their intimate partners. Thirty-five percent (n = 33) of the sample were referred to the high intensity program, with the remaining 64.9% (n = 61) being referred to the moderate program (those deemed low risk were referred to the Family Violence Awareness Program). Everyone within the current sample was referred for treatment, with 89 entering the treatment program. Treatment exposure ranged from 2 to 183 days (M = 83.10, SD = 42.63). Sixty of the participants successfully completed treatment, four attended all sessions, 10 did not complete the program, and 15 were suspended. Among the remaining five participants who did not enter into the FVPP, their absence was the result of being referred to another program (e.g., anger and emotions, sexual offender programming, or violence prevention programming), unavailability of the FVPP program due to wait-listing and time constraints, or being evaluated by the program coordinators as not meeting program criteria (i.e., denial of prior IPV offenses). The average time following release was 43.62 months (SD = 15.03, range = 1-86 months), and base rates for the sample were as follows: 23.4% for violent recidivism (n = 22) and 12.8% for IPV recidivism (n = 12). 3 For current status of the participants at follow-up, nine were under active community supervision, 1 had died within the community, 12 were re-incarcerated, and 72 had completed their federal sentence. Finally, in terms of the prevalence of psychopathy, only 8.5% of the sample (n = 8; three cases had missing information) had a PCL-R score ≥30.
Risk Scores/Categories and Inter-Rater Reliability
Among the 94 cases with sufficient information to score the ODARA, 36 (38.3%) had one missing item and two single cases had two and three missing items, respectively. Item 7, victim concern about future assaults, was the most problematic as 38 of the cases could not be scored. Prorating of the ODARA using risk categories was not feasible as many of the cases with missing items (n = 27, 71%) either met or exceeded the highest risk category (i.e., 7+) prior to prorating. In addition, only a single case with more than one missing item fell below the 7+ category (i.e., their score was equal to 6). As such, all analyses with the ODARA used raw scores with missing information being scored as 0. Despite the presence of missing information, 73 (77.7%) of the participants within the current sample had ODARA scores which fell within the highest risk category (i.e., 7-13). The overall mean score for the sample on the ODARA was 8.06 (SD = 2.05, range 2-12) and was higher than the mean of the incarcerated sample reported by Hilton et al. (2010; M = 5.81, SD = 2.06, N = 150), with the difference being large in magnitude (d = 1.09, 95% CI [0.82, 1.37]). Whereas prorating was not feasible with the ODARA, all completed DVRAGs with missing information were prorated. The overall mean score on the DVRAG was 25.59 (SD = 9.13, range −7 to +40) which was higher than the mean of the development sample (M = 2.88, SD = 7.76, N = 649; Hilton et al., 2008), with the difference being large in magnitude (d = 2.86, 95% CI [2.60, 3.12]). The mean DVRAG risk category was 6.53 (SD = 0.89, range 2-7) with 28.7% (n = 27) and 66.0% (n = 62) falling into the second highest (i.e., risk category 6) and highest (i.e., risk category 7) risk categories, respectively.
Inter-rater reliability between the first and second author was examined for the ODARA total score and ratings on the DVRAG trichotomous items for a random sample of 10 cases. To ensure unbiased scoring, the second author was blind to the ODARA and DVRAG scores (as scored by the first author) and recidivism data. ICC values fell within the excellent range for both the ODARA (ICC = .90) and DVRAG trichotomous items (ICC = .81). Independent rater scores were also available for the PCL-R among the larger sample of cases (n = 146) that had psychological risk assessments on file. High levels of inter-rater reliability were observed for the PCL-R (total score [n = 16, ICC = .91]), with similar results obtained for Factor 1 (n = 13, ICC = .69) and Factor 2 (n = 13, ICC = .97). Although the ICC for Factor 1 fell just below .70, it was within the “good” range (i.e., ICC of .60-.74; Fleiss, 1981).
Intercorrelations and Predictive Validity
As would be expected, the DVRAG displayed stronger and more significant associations with the PCL-R relative to the ODARA (see Table S1). According to predictive validity analyses, the DVRAG and PCL-R total and Factor 1 scores significantly predicted violent recidivism among the sample, with medium to large AUC values being observed (Table 1). Similar results were found for IPV recidivism; however, the ODARA also produced a statistically significant large effect (AUC = .71). Again, Factor 1 of the PCL-R and the DVRAG total score produced statistically significant moderate to large effects, respectively. 4 With respect to the cumulative/dynamic time-dependent AUC analysis, the AUC t C/D values for the various fixed time-points of 1, 2, 3, and 4 years (expressed as days; i.e., t = 365, 730, 1,095, and 1,460 days, respectively) appeared relatively consistent in magnitude across the follow-up period for both recidivism outcomes, with some exceptions. For instance, the ODARA and DVRAG displayed lower predictive accuracy within the first year following release relative to years 2, 3, and 4, which is consistent with the purpose of the measures in predicting longer term recidivism. Factor 2 of the PCL-R was a poor predictor of either recidivism outcome across all time-points, whereas Factor 1 was a consistently strong predictor of IPV recidivism and, to a lesser extent, violent recidivism across time. These results were consistent with the standard AUC analysis.
Predictive Validity Analyses for Violent and IPV Recidivism
Note. Base rates are in parentheses. ODARA = Ontario Domestic Assault Risk Assessment; DVRAG = Domestic Violence Risk Appraisal Guide; PCL-R = Psychopathy Checklist–Revised; AUC = area under the curve of the receiver operating characteristic; CI = confidence intervals of the AUC; IPV = intimate partner violence.
Base rate for violent and IPV recidivism = 24.2% and 13.2%, respectively. bBase rate for violent and IPV recidivism = 25.0% and 13.6%, respectively. cBase rate for violent and IPV recidivism = 24.7% and 13.5%, respectively.
p < .05. **p < .01. ***p < .001 (two-tailed test).
Incremental Contributions of PCL-R Total and Factor Scores
To examine whether the PCL-R total and factor scores make an incremental contribution to the DVRAG in predicting violent and IPV recidivism, we employed a series of penalized Cox regression analyses (Table 2) using the DVRAG with the PCL-R score removed (referred to in the analysis as the weighted ODARA or ODARAw). Although a significant effect was observed for the ODARAw in predicting violent and IPV recidivism in Step 1, the addition of the PCL-R total score at Step 2 did not add incrementally for either outcome (violent recidivism Δχ2 [1] = 3.10, p = .078; IPV recidivism Δχ2 [1] = 1.92, p = .166). With respect to the PCL-R factor scores, the presence of interpersonal and affective traits of psychopathy (Factor 1) added significant incremental validity at Step 2 (Δχ2 [1] = 6.25, p = .012). However, the addition of Factor 2 at Step 3 failed to add any further incremental validity (Δχ2 [1] = 1.13, p = .287). A similar trend was observed for IPV recidivism, with Factor 1 of the PCL-R adding significant incremental validity to the ODARAw at Step 2 (Δχ2 [1] = 4.39, p = .036) and Factor 2 failing to provide any further incremental validity at Step 3 (Δχ2 [1] = 0.30, p = .585).
Penalized Cox Proportional Hazards Model for Violent and IPV Recidivism With ODARA and PCL-R Total and Factor Scores (n = 88)
Note. Base rates are in parentheses. ODARA = Ontario Domestic Assault Risk Assessment; PCL-R = Psychopathy Checklist–Revised; ODARAw = weighted ODARA; B = regression coefficient; SE = standard error of B; HR = hazard ratio; CIHR = confidence interval of the HR; IPV = intimate partner violence.
Impact of Treatment Exposure on Predictive Validity
As a result of a high percentage of the sample being exposed to some degree of treatment (94.7%), we examined if treatment exposure (i.e., the number of days in program) impacted the predictive validity of the risk assessment measures for violent and IPV recidivism (see Table 3). Similar to AUC analyses, the ODARA, DVRAG, and Factor 1 of the PCL-R remained significant predictors of IPV recidivism, whereas only the DVRAG and Factor 1 of the PCL-R significantly predicted violent recidivism after controlling for treatment exposure.
Penalized Cox Proportional Hazards Model for Recidivism Outcomes Controlling for Treatment Exposure
Note. ODARA = Ontario Domestic Assault Risk Assessment; DVRAG = Domestic Violence Risk Appraisal Guide; PCL-R = Psychopathy Checklist–Revised; B = regression coefficient; SE = standard error of B; HR = hazard ratio; CIHR = confidence interval of the HR; IPV = intimate partner violence.
Discussion
Consistent with our hypotheses, we found the ODARA and DVRAG to be significant predictors of IPV recidivism, yielding a large overall AUC value of .71 and moderate to large time-dependent AUCs being evident across the first 4 years of the follow-up period. Despite prior research comparing the ODARA with other IPV instruments among similarly high-risk samples (e.g., Pham et al., 2023; Rettenberger & Eher, 2013), the current study constitutes one of the highest risk samples to date in which the ODARA and DVRAG have been validated. A potential consequence of this concentration of high-risk scores was increased homogeneity in risk and, therefore, less potential variance to detect statistical relationships among variables. For example, in their independent cross-validation of the ODARA and DVRAG, Rettenberger and Eher (2013) examined predictive validity using the risk category scores whereas the current study relied on total scores as most of the sample fell within the highest risk categories of either measure. Although we were unable to examine calibration of the risk categories as a result, similar yet slightly divergent findings emerged within the current study with respect to the inter-rater reliability and predictive validity of the ODARA and DVRAG. Furthermore, unlike prior studies which have reported greater inter-rater reliability (Hilton, 2021), our evaluation of inter-rater agreement between total scores (due to the lack of variability among the risk categories) resulted in a more stringent analysis. Nevertheless, our results suggest that the current risk categories of the ODARA and DVRAG may not be generalizable to higher risk samples and that efforts to adapt or recalibrate them for incarcerated populations will be necessary given their intended use in communicating risk within clinical contexts (e.g., see Hilton, 2014).
As the ODARA was a non-significant predictor of violent recidivism, the large and statistically significant AUC value produced by the DVRAG appeared attributable to the incorporation of the PCL-R, which itself was a significant predictor of violent recidivism. That said, the PCL-R total score failed to predict IPV recidivism or add incrementally to the ODARA for either outcome, which supports earlier findings of Rettenberger and Eher (2013). Unlike prior research findings regarding the predictive validity of Factor 2 (e.g., DeMatteo & Olver, 2022), our results revealed that the core psychopathic features as represented by Factor 1 were a moderate predictor of violence and IPV recidivism whereas Factor 2 was unrelated to either outcome. A possible explanation is that individuals scoring high on Factor 1 are likely to exhibit traits that may assist in acquiring new intimate partners at a faster rate (e.g., superficial charm, grandiosity, and manipulation), thus granting them greater access to potential victims. Another possible explanation includes the prospective scoring of the PCL-R as part of routine psychological risk assessments, which may have resulted in greater clinical accuracy in assessing the interpersonal and affective characteristics associated with Factor 1. Furthermore, the higher risk nature of the current sample may have restricted the range of scores on Factor 2 thus providing variance associated with Factor 1 to add incrementally to the prediction of recidivism.
Overall, our results are consistent with those of other researchers who found that Factor 1 affective traits added incrementally to the prediction of IPV (e.g., prior IPV convictions; Cunha et al., 2021). Future research with perpetrators of IPV could examine whether those scoring higher on Factor 1 (interpersonal and affective traits) in conjunction with two PCL-R items that do not load on either factor, promiscuous sexual behavior and many short-term marital relationships, engage in a greater number of intimate relationships and are therefore at increased risk for perpetrating IPV. Likewise, this combination of socially predatory traits with greater frequency of intimate relationships (i.e., potential victim pool) may also contribute to an increased risk for repeated perpetration of IPV (i.e., recidivism) across multiple victims. Factor 1, which is described as “selfish, callous, and remorseless use of others” (Hare, 2003, p. 79), combined with conning and manipulative behaviors, tends to align with descriptions of coercive control central to explanations of IPV (Hamberger et al., 2017). Furthermore, despite our efforts to collect them whenever available, missing hard copy protocols necessitated the need to derive total and factor scores for the PCL-R from percentile rankings for the majority of the sample, thus preventing us from examining facet scores. Although Hilton (2021) has provided modified scoring for the DVRAG, allowing for the PCL-R total score to be replaced with the Antisocial facet (Facet 4), these modifications appear to be based on the original development samples. Given our results, future research could focus on the relative contributions of the remaining facets across samples to determine whether supplementing Facet 1 or 2 optimizes predictive validity of the DVRAG.
Due to our study taking place within an institutional setting, an applied issue related to the administration and scoring of the ODARA and DVRAG became evident. Specifically, as the ODARA was developed for frontline personnel tasked with estimating risk of male-to-female IPV based upon easily accessible and coded information, its structure appears to lend itself well to this task (Jung & Himmen, 2022). In contrast, through its incorporation of the PCL-R, the DVRAG was developed as an “in-depth” risk assessment (Hilton, 2021) to be administered by forensic clinicians following the frontline intervention and disposition of a case. Whereas specific victim and index-related information may be readily available to frontline police personnel, this is not necessarily the case for mental health professionals working within an institutional setting. As victim concern could not be rated for 40.4% of the sample, the item posed a particular challenge when scored using file information despite the comprehensiveness of the correctional files which included police and court documents, with victim impact statements when available. Furthermore, only a small percentage of the current sample (i.e., 14.9% or n = 14) retained their intimate relationship with the victim of the IPV index offense at the time of admission to federal custody. For clinicians working within an institutional setting, the farther temporally one is from the incident, the more difficult it will be to determine victim concern as this information may not be readily available and the likelihood of interviewing the victim would be low. Indeed, the amount of elapsed time between the IPV index offense and admission to federal custody for over half the current sample was, on average, over 3 years. Further confounding this issue was the presence of three cases wherein the victim was killed. This has important clinical implications given the relevance of victim vulnerability factors when assessing risk for IPV (Kropp, 2019). Although removing victim concern had very little impact on the predictive validity of the DVRAG, we recommend prorating the total score as outlined by Hilton (2021) in the event of missing information. Nevertheless, exploring potential behavioral indicators representative of victim concern may be a worthwhile endeavor for future research.
Concerning limitations of the study, one of the primary limitations is the retrospective scoring of the ODARA and DVRAG as remaining blind to other risk assessment measures and the presence of reoffending was not feasible for some cases (e.g., those who were incarcerated due to a new offense that occurred following their IPV index). While this may have had a biasing effect on the predictive validity of the two measures, such effects were tempered by examining recidivism that occurred following release from federal custody, the high degree of inter-rater reliability, strict adherence to scoring procedures, and relative static nature of the items and their focus on prior offending behavior and behavior exhibited during the IPV index offense. As noted previously, less than half (i.e., 40.4%) of the sample were serving their sentence due to an IPV-related index offense and a number of participants had reoffended violently prior to their current incarceration. However, as outlined within the ODARA and DVRAG coding procedures, events such as these were not coded as they did not occur prior to the IPV index-related offense (i.e., the most recent incident of IPV). As a result of the item coding timeframe being bound to the most recent incident of IPV, four cases were rated as having no pre-index nondomestic assaults despite serving their current federal sentence for a violent offense. Whereas both measures predicted IPV recidivism in spite of this limitation, it will be important for researchers to examine whether removing/replacing problematic items or modifying existing scoring protocols (e.g., accounting for nondomestic violent incidents occurring after the IPV index offense) is required when applying the DVRAG to incarcerated samples.
Although the content contained within the federal correctional files was considered extensive and, at times, exhaustive, detailed information pertaining to prior offenses or provincial sentences was not consistently available for a small number of cases and, similar to Hilton et al.’s (2010) study, victim interviews were rare. However, unlike their study, identification of prior domestic assaults was readily available through the family violence assessments which utilized information stemming from criminal records, police reports, and self-report from the perpetrator or a family member. Information obtained from self-reports was considered integral to the coding procedure for a small number of cases as the historical police reports had been expunged by the investigating police department prior to the individual entering federal custody. In addition, coding of IPV recidivism was problematic for five cases within the current sample as the relationship between the victim and perpetrator could not be ascertained. However, the large AUC values produced by the ODARA and DVRAG suggest that, contrary to results found by Hilton and Harris (2009), ambiguity did not have a detrimental impact on the accuracy of the two measures in predicting IPV recidivism. Nevertheless, use of multiple sources for outcome coding may assist in reducing ambiguity in identifying IPV recidivism.
Possibly as a result of the smaller sample size and shorter follow-up period (i.e., 3.6 years), our base rate for IPV recidivism (i.e., 12.8%) was low in comparison to previous studies (e.g., 44.7% for Hegel, Pelletier, and Olver [2022]; 37% for Pham et al. [2023]; 21.2% for Rettenberger and Eher [2013]). Although this did not appear to be a limiting factor to our study in light of the AUC values and application of penalized Cox regression analyses, caution may still be warranted in interpreting the results of the predictive validity analyses for the IPV recidivism outcomes. Furthermore, the current study did not account for time under federal supervision and it is not uncommon for individuals with a history of IPV to have special conditions imposed by the Parole Board of Canada wherein their intimate relationships must be reported and monitored accordingly. As this may have impacted the observed recidivism rates, a longer follow-up period would have increased the likelihood of the participants surpassing their warrant expiry date resulting in the subsequent expiration of their parole conditions. Future research could extend the length of follow-up and statistically control for time under supervision when greater oversight of intimate/interpersonal relationships is occurring.
Further limiting the generalizability of our findings was the lack of diversity within the current sample. For instance, the majority of our sample were heterosexual perpetrators of male-to-female IPV. This is particularly important as there is growing recognition of increased rates of IPV among lesbian, gay, and bisexual individuals (Rollè et al., 2018). Moreover, as the composition of our sample was predominantly European Canadian/White, our findings regarding the predictive validity of the measures under investigation may not be generalizable to other racial/ethnic groups. This has important clinical implications as racial and ethnic minorities are overrepresented within correctional systems and there is an increasing emphasis being placed on the examination of cultural bias and cross-cultural validity within the risk assessment field (e.g., measurement and predictive invariance; Olver et al., 2024).
Despite the above limitations, several methodological strengths are worth noting. First, the current study is unique in that it provides a prospective examination of the field validity of PCL-R ratings made by mental health professionals for risk assessment purposes, and with a small number of independent ratings available for the PCL-R, this also allowed us to test its field reliability to an extent. Second, as the majority of the sample had been exposed to the FVPP, we were able to statistically account for this and examine the impact, or lack thereof, of treatment dosage on predictive validity. Finally, our use of the cumulative/dynamic time-dependent AUC at fixed time-points represents the application of a novel statistical technique not commonly reported in violence risk assessment research. Unlike other time-dependent measures of discrimination such as Harrell’s concordance index, time-dependent AUCs are the recommended approach when examining predictive validity at fixed or t-year time-points (Blanche et al., 2019).
In summary, despite the relatively small sample size, low base rate for IPV recidivism, and level of treatment exposure among the participants, our results support the reliability and predictive validity of the ODARA and DVRAG in an incarcerated group of high-risk IPV perpetrators. Although the PCL-R total score was not a consistent predictor of outcome, contrary to expectations, Factor 1 was found to be a significant predictor of violent and IPV recidivism. Despite our study lending some support to Hilton et al.’s (2008) rationale for incorporating the PCL-R into the DVRAG, while also adding to the relatively modest literature on the relationship between psychopathy and IPV, more research will be necessary to confirm the contribution of Factor 1 to IPV perpetration and recidivism. We recommend that such research be conducted using a fully prospective study design that incorporates a longer follow-up period, multiple measures scored by a single rater, and multiple sources for outcome coding to reduce ambiguity in identifying IPV recidivism. This would serve to increase the generalizability of the findings while also updating and improving upon the existing research base for IPV risk assessment.
Supplemental Material
sj-docx-1-cjb-10.1177_00938548241280397 – Supplemental material for Risk for Violent and IPV Recidivism Among Incarcerated Men With a History of IPV Perpetration: An Examination of the Predictive Validity of the ODARA, DVRAG, and PCL-R
Supplemental material, sj-docx-1-cjb-10.1177_00938548241280397 for Risk for Violent and IPV Recidivism Among Incarcerated Men With a History of IPV Perpetration: An Examination of the Predictive Validity of the ODARA, DVRAG, and PCL-R by Andrew L. Gray, Jeremy F. Mills and Adelle E. Forth in Criminal Justice and Behavior
Footnotes
Authors’ Note:
This research was supported by the Ontario Graduate Scholarship program (awarded to Andrew L. Gray) and is partially based on the thesis completed by Andrew (Gray, 2012) while under the supervision and co-supervision of Jeremy F. Mills and Adelle E. Forth, respectively. Portions of these findings were presented as a poster at the Third North American Correctional and Criminal Justice Psychology Conference, Ottawa, Ontario, Canada. Adelle E. Forth has conducted paid training on the administration of the Psychopathy Checklist–Revised (Hare, 2003). The authors wish to thank the Correctional Service of Canada for supporting this research and have no other known conflicts of interest to disclose. Due to the legal nature of the sample and to maintain confidentiality, data for the current study are not publicly available. ANDREW L. GRAY is also affiliated with Carleton University.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
