Abstract
This study examined the discrimination and calibration properties of Violence Risk Appraisal Guide–Revised (VRAG-R) scores and risk bins in a sample of 124 forensic psychiatric inpatients found Not Criminally Responsible on Account of Mental Disorder (NCRMD) or Unfit to Stand Trial (UST), admitted under Saskatchewan Review Board jurisdiction during a 38-year catchment period. The VRAG-R was scored on all NCR and UST individuals from file, and recidivism data were obtained via official criminal records. The VRAG-R demonstrated strong discrimination properties for general and violent recidivism over 3-year, 5-year, and overall follow-ups. Calibration analyses revealed that VRAG-R recidivism norms tended to estimate higher rates of recidivism than actually observed. Overall, the results supported the use of the VRAG-R for psychological risk assessments with a population of persons found NCRMD; however, it should be a multi-measure process that involves other empirically validated tools.
Keywords
When individuals with mental health diagnoses commit criminal acts, they are usually processed through the traditional channels in the legal system. Nevertheless, in Canada, persons can be declared Not Criminally Responsible on Account of Mental Disorder (NCRMD) when the courts determine that they lacked criminal intent. The incident rate of NCRMD verdicts is rare, representing about 0.1% of criminal court cases on an annual basis (Statistics Canada, 2014). It is uncommon because such a determination requires that the individual undergo comprehensive clinical assessment and meet strict legal criteria. In addition, the accused or counsel may decide that raising concerns of mental illness during criminal proceedings is not in their best interests (Campbell & Robertson, 2025), as individuals found NCRMD experience higher levels of restriction than those convicted in the criminal justice system. For instance, in a large-scale Canadian trajectory project, it was found that individuals declared NCRMD had higher rates of detention, lower rates of community release, and lengthier periods of supervision compared to their criminally sentenced correctional counterparts (Martin et al., 2022). Consequently, very few accused individuals raise the issue of mental illness and/or meet the legal threshold in Canada. These individuals can be found NCRMD, or they can be found Unfit to Stand Trial (UST; Latimer & Lawrence, 2006; Martin et al., 2022).
Criminal Responsibility and Fitness Legislation in Canada: A Brief Overview
If the NCRMD defense is successfully invoked, it results in a unique designation in which the individual is not found guilty, nor innocent. Instead, they are transferred to a forensic psychiatric facility post-verdict and then presented annually before a Review Board (RB), who must review the individual’s treatment progress and risk for future violence and make a decision regarding continued detention or some variation on discharge (Latimer, 2006). An RB has the challenging task of deciding upon one of three possible dispositions for individuals found NCRMD: (1) detention in a hospital; (2) conditional discharge from the hospital with conditions; or (3) absolute discharge. An essential part of this process is ensuring that the disposition will be appropriate for each individual NCRMD case (Criminal Code, 1985, s. 16; Latimer, 2006).
It is also possible that an accused is found unable, on account of mental disorder, to conduct a defense at any stage of the proceedings, resulting in an UST finding. In particular they cannot: (1) understand the nature of proceedings; (2) understand the possible consequences of proceedings; or (3) communicate with counsel. An accused found UST by a court is also diverted to an RB. The courts and RBs can order a conditional discharge or detention order; and the courts, in particular, have the authority to order a stay of proceedings if: (1) it is unlikely that the accused will ever be fit; (2) the accused does not pose a significant threat to the safety of the public; and (3) proper administration of justice supports a stay of proceedings. A UST accused is under the purview of the RB until they are deemed fit, or until the charges are stayed or withdrawn (Criminal Code, 1985, s. 2 and Part XX.1; Latimer, 2006).
The Canadian insanity defense, referred to as NCRMD, in Section 16 of the Criminal Code (1985) is used as a legal defense when an accused person, at the time of committing a criminal act, was suffering from a disease of the mind that either, (a) impaired their capacity to understand what they were doing at that moment, or (b) impaired their ability to recognize that the action was wrong (Criminal Code, 1985, s. 16(1)). Section 672.38 of the Criminal Code (1985) outlines the authority of an RB to review dispositions concerning any accused individual who has received an NCRMD verdict (Haag et al., 2021) that both recognizes the person’s right to freedom, and the safety of the public (Latimer, 2006). Potential dispositions can be an absolute discharge (i.e., accused is no longer under RB jurisdiction and has no restrictions on their liberty), conditional discharge (i.e., accused is supervised in the community with restrictions under RB jurisdiction), or detention (i.e., accused is detained within the hospital for treatment and stabilization) (Latimer, 2006; Wirove et al., 2023).
The Processing and Trajectory of NCRMD and UST Cases in Canada
The National Trajectory Project (NTP) examined the current criminal justice provisions for people declared NCRMD by the courts, who are presented before a provincial or territorial review board and subject to their jurisdiction. The project also elucidated the criminological and clinical profiles of this population in Canada. The sample consisted of 1,800 men (n = 1,519; 84.4%) and women (n = 280; 15.6%) found NCRMD, in the three most-populated Canadian provinces, British Columbia (n = 222), Ontario (n = 484), and Quebec (n = 1,094), between May 2000 and April 2005, and followed until December 2008 (Crocker et al., 2015). About one-half (49.2%) of the individuals had no prior contact with the criminal justice system, about one-third (30.9%) had been convicted for an offense against the person, and less than half (41.8%) for other offense. Of all NCRMD index verdicts, offenses against the person accounted for 64.9%, property offenses for 16.9%, and other Criminal Code violations for 18.2%; up to one-half of all NCRMD verdicts were for minor assaults, property offenses, or other nonviolent violations. People found NCRMD for severe offenses (e.g., homicide) had lower rates of recidivism, while nonviolent index verdicts and offense history predicted future offenses (Crocker et al., 2015).
In 2014, Bill C-14, the Not Criminally Responsible Reform Act (NCR Reform Act; Bill C-14, 2014), came into effect by the Canadian government, and resulted in several amendments to the Criminal Code. This included that public safety be the most important consideration in the decision-making process, for accused people found NCRMD; the creation of a high-risk designation, for NCRMD accused; and victim involvement in the decision-making process, regarding people found NCRMD (Goossens et al., 2019). Individuals that are given the High-Risk Accused (HRA) designation face new restrictions for transitioning through the forensic system, but little evidence supported the creation of a HRA designation. For instance, using the NTP sample, Goossens et al. (2019) found that an HRA designation could apply to up to one in four individuals found NCRMD. The HRA group was under the supervision of the RB for a longer amount of time than the non-HRA group, although rates of recidivism were essentially the same between groups. Moreover, HRA designation correlated better with index offense severity than it did with public safety risk. The HRA designation thus is an unreliable means of evaluating future violence risk; instead, risk appraisals would be better informed through incorporating validated measures to improve fairness, accuracy, and utility of decision making (Wirove et al., 2023).
Wilson et al. (2015) reviewed 6,743 RB hearings for 1,794 individuals that were declared NCRMD from the NTP sample and examined whether the items from two empirically supported risk assessment measures, the Historical Clinical Risk Management-20 (HCR-20) and the Violence Risk Appraisal Guide (VRAG) were considered in RB decisions. There was considerable similarity between the expert reports and the reasons for decision by the RBs (r = .96); however, less than half of the risk factors were included in expert reports or in RB’s reasons for dispositions, and a complete risk assessment measure was rarely completed (17% of hearings). The information from validated measures can aid clinicians and RBs in making appropriate decisions regarding public safety and treatment planning, and to limit the tendency for clinicians to be influenced by non-risk-relevant factors (e.g., offense severity). Canadian and international NCR cases have demonstrated low rates of general and violent recidivism (Bonta et al., 1998; Charette et al., 2015; Subramaney & Marais, 2015; Wang et al., 2007), and lower than that of correctional samples (Fazel et al., 2016; Friendship et al., 1999; Goossens et al., 2019; Grann et al., 2008; Hayes et al., 2014; Norko et al., 2016; Richer et al., 2018; Simpson et al., 2018; Tabita et al., 2012). Given the potential for false positives, it is crucial that these instruments be continually validated for use in RB decision making (Wilson et al., 2015).
Issues, Methods, and Metrics in Forensic Risk Assessment
Risk assessment tools are commonly used by forensic mental health clinicians to assess an individual’s risk for reoffending, and to guide risk management procedures, including RB deliberations. Furthermore, the use of structured risk assessment measures, in forensic mental health systems, are valuable to prevent human judgment biases, increase the accuracy of decisions, and increase clinical efficacy. They aid with matching the intensity of intervention to the individual’s level of risk and informing treatment focus areas for risk management per the risk and need principles (Andrews & Bonta, 2010). The primary purpose of violence risk assessment is violence prevention which involves identifying persons at a high risk for future violence to be prioritized for risk management intervention; however, the ability for a risk assessment tool to assist with this prevention process requires good predictive accuracy for the targeted outcome (Hilton et al., 2015; Wirove et al., 2023).
Risk communication metrics, in turn, are utilized to communicate the results of an actuarial risk assessment. Absolute risk estimates the likelihood of recidivism associated with a given score or risk category on an assessment tool (e.g., a score of 5 is associated with a 50% likelihood of recidivism within two-years), whereas relative risk represents an individual’s risk in comparison with other cases (Davies et al., 2022). In turn, the major predictive properties of risk tools are discrimination and calibration. Discrimination (i.e., relative risk) refers to the extent to which recidivists can be differentiated from non-recidivists on the basis of test scores. For instance, receiver operator characteristic (ROC) and area under the curve (AUC) analyses calculate the extent to which risk scores, on a given measure, can accurately discern a recidivist from a non-recidivist, and therefore higher risk from lower risk cases. Calibration, the absolute risk, measures how well a prediction of an event matches the true underlying probability of an event (Lindhiem et al., 2020). These metrics establish the rates of recidivism associated with risk scores to assist with correctional decision making, and determine to what extent the recidivism rates from the normative sample generalize to other samples or settings (Wirove et al., 2023).
The Predictive Properties of VRAG-R Scores and Risk Bins
The Violence Risk Appraisal Guide-Revised (VRAG-R) is an actuarial risk assessment tool designed for use with adult correctional and forensic psychiatric populations (Rice et al., 2013). The tool is intended to contribute structured appraisals of violence risk and to aid evidence informed decision making, such as conditional discharge or release. The VRAG-R was developed to replace its predecessors, the Violence Risk Appraisal Guide (VRAG) and the Sex Offender Risk Appraisal Guide (SORAG), into one measure that was easier to score and could be used to assess the risk for future violence for both general and sexual offending populations (Davies et al., 2022; Rice et al., 2013). Since Rice et al.’s (2013) construction and validation of the VRAG-R, additional psychometric research has followed, showing support for the instrument’s good to excellent predictive accuracy for violent (AUC = .74, Cortvriendt et al., 2024; AUC = .66, Glover et al., 2017; AUC = .75; Gregório Hertz et al., 2021; AUC = .74, Hogan & Olver, 2019; AUC = .71, Olver & Sewall, 2018) and general recidivism (AUC = .65, Glover et al., 2017; AUC = .78, Gregório Hertz et al., 2021; AUC = .74, Hogan & Olver, 2019).
Wirove et al.’s (2023) sample of the Alberta NCRMD population was much lower risk than Rice et al.’s (2013) construction and validation sample used to develop the VRAG and VRAG-R, and this difference was even more marked for female persons (Harris et al., 1993; Rice et al., 2013). This is consistent with previous research that has found women to have less risk for violence and significantly lower VRAG scores than men (Coid et al., 2009; Hastings et al., 2011; Wirove et al., 2023). The VRAG-R showed strong recidivism discrimination properties, and the rates of violent recidivism increased with VRAG-R scores, in the calibration analysis, for this sample. However, a comparison of violent recidivism rates from the NCR Alberta population, with the VRAG-R normative sample, demonstrated that VRAG-R scores overpredicted future violence overall. This indicates that the VRAG-R is better calibrated for higher risk samples and is possibly less generalizable to other correctional or forensic mental health populations in Canada (Wirove et al., 2023).
Given these issues with calibration, for the Alberta NCR sample, it is necessary to conduct further research with forensic mental health samples, especially NCRMD populations and their female and male subgroups. This would enhance our understanding of the predictive properties of the VRAG-R with women and the efficacy of this tool. This may require constructing local norms for risk assessment measures accessible to forensic mental health professionals to represent more credible depictions of risk. Furthermore, conducting risk assessments should not depend solely upon one single measure to assess violence risk or to determine the dispositions of people found NCRMD. Rather, the use of both dynamic assessment instruments (i.e., assessing variables that have the ability to change throughout the life course) and static measures should be included in the risk assessment process (Wirove et al., 2023).
Current Study and Rationale
The assessment, prediction, and management of risk for future violence is a mainstay across courtroom, correctional, criminal justice, and forensic mental health settings; all of which are intended to inform strategies and management procedures to prevent future violence (Douglas et al., 2014). The use of violence risk assessment instruments is an unavoidable component of this enterprise, and it is essential that extant tools are continually evaluated and refined (Davies et al., 2022), given the necessity to make safe and humane detention and discharge decisions with NCRMD and UST populations. Further cross-validation of the predictive properties of tools frequently employed in forensic mental health settings, such as the VRAG-R in terms of discrimination (i.e., the extent to which VRAG-R scores differentiate violent recidivists from non-recidivists) and calibration (i.e., the violent recidivism rates associated with VRAG-R scores and their generalizability to other samples) is necessary to inform applications of the tool in decision making across forensic and correctional settings.
This study intends to examine the extent to which use of a tool, such as the VRAG-R, has the potential to increase the accuracy of violence risk appraisals and inform RB release decisions in a hybrid NCR-Unfit Saskatchewan population. The research objectives are: (1) to examine gender differences on VRAG-R ratings, (2) to determine the discrimination properties of the VRAG-R, for general and violent recidivism, in a Saskatchewan NCR-UST population, and (3) to evaluate the calibration properties of the VRAG-R in the present sample, to the recidivism norms from Rice et al. (2013) and other comparison samples. The findings from this study will inform whether the VRAG-R normative sample is representative of other correctional or forensic mental health populations in Canada and may aid RBs in making discharge and post-release risk management decisions (Wirove et al., 2023) in Saskatchewan and beyond.
Method
Sample
The current study included 138 identified forensic patients (male: 84.8%, n = 117; female: 15.2%, n = 21) admitted to Saskatchewan Hospital North Battleford (SHNB) who were determined to either be NCRMD (64.5%, 89/138) or UST (35.5%, 49/138) and hospitalized for treatment and stabilization purposes until discharged by the Saskatchewan RB. The patients were, on average, early middle-aged (M = 32.9 years; range = 12.8–74.6 years), and the ethnoracial composition of the sample was 44.2% White (61/138); 42.0% Indigenous (58/138); 7.2% other ethnic minority (10/138); and 6.5% unspecified (9/183). The sample covered a catchment period of all consecutive NCR-Unfit cases spanning 38 years. There were a range of diagnoses present at the time of first hearing with over two thirds the sample diagnosed with a psychotic illness or schizophrenia spectrum disorder (70.1%, 98/138), followed by substance use disorder (SUD)/alcohol-related disorder (52.2%, 72/138), personality disorders (44.2%, 61/138), intellectual disability/cognitive impairment (27.5%, 38/138), mood disorders (8.7%, 12/138), and other diagnoses (28.3%, 39/138). There was a range of NCR verdict offenses, the most common being assault (9.4%, 13/138), second degree murder (9.4%, 13/138), uttering threats (5.8%, 8/138), aggravated assault (5.1%, 7/138), arson (5.1%, 7/138), sexual assault (4.3%, 6/138), assault with a weapon (3.6%, 5/138), attempted murder (3.6%, 5/138), first degree murder (2.9%, 4/138), and break and enter (2.9%, 4/138).
Measures
Violence Risk Appraisal Guide–Revised (VRAG-R)
The Violence Risk Appraisal Guide–Revised (VRAG-R) was developed to combine the VRAG and the Sex Offender Risk Appraisal Guide (SORAG) into a single instrument that was easier to score and could be used to assess the risk of future violence for general and sexual offending populations (Davies et al., 2022; Olver & Sewall, 2018). It is designed for use with both correctional and forensic psychiatric populations. The VRAG-R comprises 12 static items: (1) Lived with parents until age 16, weight range −2 to +2; (2) Elementary school maladjustment, weight range −3 to +4; (3) History of alcohol or drug problems, weight range −2 to +4; (4) Marital status at time of index offense, weight range −1 to +1; (5) Criminal nonviolent history score, weight range −3 to +5; (6) Failure on conditional release, weight range −2 to +4; (7) Age at index offense, weight range −7 to +2; (8) Criminal violent history score, weight range −2 to +4; (9) Prior admissions to correctional institutions, weight range −2 to +6; (10) Conduct disorder prior to age 15, weight range −2 to +5; (11) Sex offending history, weight range −2 to +3; and (12) Antisociality facet scores (e.g., poor behavior controls, early behavior problems, criminal versatility) from the Psychopathy Checklist-Revised (PCL-R; Hare, 1991, 2003), weight range −6 to +6. Total VRAG-R scores range from −34 to +46 (Rice et al., 2013) and are assigned to one of nine risk bins, that are arranged into deciles so that each proportion represents one-tenth of the sample (Rice et al., 2013).
The validation study, for the VRAG-R, consisted of 1,261 male forensic inpatients and incarcerated persons, of whom were included in the following previous studies, Harris et al. (1993, 2003) and Quinsey et al. (1995). Rice et al. (2013) found the VRAG-R to be an accurate predictor (AUC = .75) of the likelihood, frequency, and severity of long-term violence in a forensic population, and predicted as well as the instruments it replaced (i.e., the VRAG-R and SORAG). Three different validation studies conducted by Glover et al. (2017), Gregório Hertz et al. (2019), and Olver and Sewall (2018) showed support for the tool’s interrater reliability (ICCs = .83, .97, .95). Glover et al. (2017), using a sample of male correctional individuals, reported that the VRAG-R gave moderate levels of predictive validity for general and violent recidivism. Gregório Hertz et al. (2021), with a sexual offense sample, found that the VRAG-R showed moderate to large predictive accuracy for violent, general, and sexual recidivism (AUC range= .75, .78, and .63). Olver and Sewall (2018) found the VRAG-R scores, from a treated sexual offending sample, demonstrated moderate to large predictive accuracy for sexual (AUC range= .60–.67) and violent (AUC range= .70–.78) recidivism. In all, the VRAG-R’s predictive accuracy parallels the original VRAG/SORAG system (Campbell et al., 2009; Hanson & Morton-Bourgon, 2009; Harris et al., 2010).
Recidivism
Recidivism was defined as any new criminal code charge or conviction post-hospital discharge. Two operationalizations of recidivism were used. Violent recidivism refers to any new criminal code charge or conviction for an offense against the person that had potential to do physical or psychological harm (e.g., assault, homicide, robbery), including sexual offenses. General recidivism consists of any new charge or conviction for any category of offense, whether violent or nonviolent. Recidivism data were coded through a binary (yes 1, no 0) system, with the date of new charge/conviction coded to compute time to recidivism. In turn, this enabled the computation of fixed 3- and 5-year follow-ups as a means of controlling for at risk in the community (Wirove et al., 2023).
Procedure
VRAG-R ratings were completed as part of a larger research undertaking examining the risk profiles, discharge decisions, and release outcomes of every person who has come under the Saskatchewan Review Board’s jurisdiction, within the last 38 years, and subsequently declared NCRMD or UST. Ethical approval was provided from the University of Saskatchewan Behavioural Research Ethics Board (certificate Beh ID no. 2071) and operational research approval from the Saskatchewan Health Authority. Prior to the commence of this project, all files of the persons found NCRMD/UST, in the history of this province, were reviewed, and data were examined, for specific inclusion and exclusion criteria, based on the file information or recidivism data available. The cases in which there was adequate file information available (N = 135) were included in the coding process to produce scores on the VRAG-R; three cases had insufficient file information to score the VRAG-R. Recidivism data were subsequently captured through hardcopy criminal records obtained via the Canadian Police Information Centre (CPIC), Royal Canadian Mountain Police (RCMP) National Headquarters, provided to the study supervisor. To examine interrater reliability, 20 files (14.8% of the sample) were randomly selected and independently double coded. Excellent interrater reliability was obtained via ICC single measure, absolute agreement, two-way mixed effects model:
Data Analytic Plan
The data analyses were conducted using Statistical Package for the Social Sciences (SPSS) version 28. First, the descriptive statistics and frequency distributions of VRAG-R scores, among their respective bin number, were computed for the scores of both the total population and separated by gender. Second, the discrimination properties of the VRAG-R for general and violent recidivism were examined via the receiver operating characteristic (ROC) curve, to provide an area under the curve (AUC) value (i.e., the probability that a randomly selected recidivist will receive a higher VRAG-R score than a randomly selected non-recidivist). ROC analyses generate an AUC statistic that ranges from 0 to 1 with a value of .50 representing chance levels of predictive accuracy. The effect sizes of AUC values are as follows: .556 represents a small effect size, .639 represents a medium effect size, and .714 represents a large effect size (Rice & Harris, 2005). VRAG-R total score and bin number were each examined in the prediction of 3-year and 5-year general and violent recidivism for the sample.
Third, calibration analyses were conducted via logistic regression to generate violent recidivism probabilities for VRAG-R scores using the logistic function:
The E/O indices were computed for observed rates of 5-year violent recidivism from the Saskatchewan NCRMD population, compared to the 5-year expected rates from the Rice et al. (2013) sample, to determine to what extent the instrument norms overpredict or underpredict future violence. Further E/O indices associated with VRAG-R scores, in the present SK sample, were computed using 5-year violent recidivism rates reported in Wirove et al. (2023; Alberta NCRMD sample) and Olver and Sewall (2018; Clearwater Sex Offender Program sample).
Results
VRAG-R Descriptive Statistics and Frequencies
The descriptive statistics for the Saskatchewan population of persons declared NCRMD or UST, extending over a period of 40 years, is given in Table 1. The population’s average VRAG-R score (M = 1.44; SD = 18.2) was slightly lower (d = 0.14) than Rice et al.’s (2013) VRAG-R normative sample (M = 3.6, SD = 12.5), moderately higher (d = 0.44) than that found in Wirove et al.’s (2023) Alberta NCRMD population (M = −6.8, SD = 19.0), and very slightly higher (d = 0.03) than Harris et al.’s (1993) original VRAG construction sample (M = 0.91, SD = 12.9). In the present sample, male patients also had significantly higher VRAG-R scores (M = 3.21; SD = 17.7) than female patients (M = −8.14; SD = 18.0): t(133) = 2.689, p = .008, d = 0.64, and were significantly higher risk as classified by VRAG-R bin number,
Descriptive Statistics and Bin Frequencies for VRAG-R Scores
Note. VRAG-R = Violence Risk Appraisal Guide-Revised; SD = standard deviation.
Discrimination Properties of the VRAG-R for Violent and General Recidivism
The recidivism rates for the Saskatchewan NCRMD-UST population with VRAG-R ratings were as follows: violent recidivism 8.65% (9/104) 3-year, 12.8% (11/86) 5-year, and 15.3% (19/124) overall and general recidivism 13.5% (14/104) 3-year, 16.3% (14/86) 5-year, and 18.5% (23/124) overall. There was some sample attrition at fixed follow-ups owing to a successively smaller number of cases with a given time at risk in the community, with attrition increasing as time at risk increased. Owing to the small number of female recidivists (n = 2), the discrimination properties of the VRAG-R could only be completed for the aggregate sample and male subgroup. The rates of recidivism for the male subgroup were as follows: violent recidivism 18.4% (19/103) overall and general recidivism 20.4% (21/103) overall. The AUC values for the VRAG-R prediction of recidivism rates, for the overall Saskatchewan NCR-UST population, can be found in Table 2. VRAG-R total score and bin number significantly predicted 3-year and 5-year general and violent recidivism. The predictive accuracy of the VRAG-R was higher for 5-year violent and general recidivism than for 3-year violent and general recidivism, and for total scores versus bin number. The AUC magnitudes, for the Saskatchewan NCR-UST population, were large for violent and general recidivism across VRAG-R measures and fixed follow-up periods. When the discrimination analyses were repeated in the male subsample of patients, AUC magnitudes were similarly large in magnitude across all outcomes (.71–.80) with the exception of a medium effect for VRAG-R bin in the prediction of overall (unfixed) general recidivism.
Discrimination Properties of the VRAG-R Score and Risk Bins for Violent and General Recidivism at 3-Year, 5-Year, and Overall Follow-Up: Aggregate Sample and Male Patients
Note. VRAG-R = Violence Risk Appraisal Guide-Revised; AUC = area under the curve; CI = confidence interval. Aggregate sample overall N = 124; 3-year n = 104; 5-year n = 86. Male subsample overall n = 103; 3-year n = 89; 5-year n = 74.
p < .05, **p < .01, ***p < .001.
Calibration Properties of the VRAG-R for Violent Recidivism
The calibration properties of the VRAG-R, for violent recidivism, were examined via logistic regression and E/O index in the Saskatchewan NCRMD-UST population. First, 5-year violent recidivism rates were estimated for all possible VRAG-R scores through logistic regression. The results from the logistic regression produced the following values for 3-year (

VRAG-R Calibration: Observed Rates of Violent Recidivism for the Nine-Bin Structure and Estimated Rates of Violent Recidivism Associated with Individual Scores over Fixed 5-Year Follow-Up
E/O indices were then computed to examine the VRAG-R’s calibration efficacy, for 5-year violent recidivism, in the Saskatchewan NCRMD-UST population compared to the Rice et al. (2013) 5-year violence norms (Table 3). The E/O index results showed that the VRAG-R had satisfactory calibration properties in the current sample, with minor agreement between observed and expected recidivism rates. For 5-year violent recidivism, none of the E/O index values were statistically significant (i.e., the CIs overlapped with zero), except for the total category, showing significant overestimation of 5-year violent recidivism rates (E/O = 2.37; 95% CI = [1.31, 4.08]). The tool had an overall tendency to estimate higher rates of recidivism, than actually observed, given that the E/O index values exceeded 1 for each bin. That said, the 95% CIs were very wide for many of these analyses, owing to the small number of observed recidivists within a given risk band, resulting in some underpowered analyses (Hanson, 2017). Specifically, the E/O Indices and 95% CI, for bins one to four, could not be calculated due to there being zero observed recidivists in these categories; additionally, bins six and seven had exceptionally wide CIs because the E/O index was calculated based on only a single recidivist. These caveats aside, the VRAG-R normative sample from Rice et al. (2013) overestimated 5-year violence risk among the Saskatchewan NCRMD-UST population by 2.4 times.
E/O Index: Five-Year Rates of Violent Recidivism for the Current Sample Compared with the Normative Sample (Rice et al., 2013), Alberta (AB) NCRMD Sample (Wirove et al., 2023) and Clearwater High Intensity Sex Offender (SO) Program Sample (Olver & Sewall, 2018)
Note. VRAG-R = Violence Risk Appraisal Guide-Revised; CI = confidence interval. Bolded E/O index and 95% CIs denote significance. Data are not reported due to the inability to calculate 95% CIs for a proportion that is zero.
We also compared the 5-year rates of violent recidivism from Wirove et al.’s (2023) Alberta NCRMD sample and Olver and Sewall’s (2018) treated sexual offending sample from the Clearwater High Intensity Sex Offender Program, as the expected rates, with the current sample’s 5-year observed rates of violent recidivism (Table 3). When the Alberta NCRMD samples 5-year violence rates are used as the expected rates, for each risk bin, all E/O index values were below 1.00, meaning that they consistently underestimated rates of 5-year violent recidivism relative to the current sample; however, this was only significant in risk bin eight and the total category. In contrast, when this is done with the Clearwater Sex Offender Program sample there was a consistent overestimation of rates of 5-year violent recidivism, though not significant as all of the CIs overlapped with 1.00. Interestingly, for each risk bin, the Rice et al. (2013) 5-year violence norms overpredicted violent recidivism to a greater extent than did the observed violent recidivism rates in the Olver and Sewall (2018) sample. This is noteworthy as, based on average VRAG-R total risk scores, the Clearwater Program sample was higher risk than the VRAG-R normative sample.
Discussion
The present study featured an examination of the predictive properties of the VRAG-R, with implications for forensic mental health practice, in a sample of persons found either NCRMD or UST by the Saskatchewan Law Courts. Of particular interest was the calibration properties of the VRAG-R relative to other correctional and forensic samples utilizing the tool, power limitations notwithstanding, owing to sample size and low violent recidivism base rate. The Saskatchewan NCR-UST population was lower risk than the VRAG-R construction sample (Rice et al., 2013), consistent with Wirove et al.’s (2023) Alberta NCRMD sample and NTP findings where recidivism rates from this multi-province NCR sample were lower than rates of recidivism from a general correctional population (Charette et al., 2015). Wirove et al.’s (2023) Alberta NCRMD population also had much lower VRAG-R scores overall, and across gender groups, compared to the current sample’s VRAG-R scores overall and across gender groups.
The current sample yielded 5-year violent recidivism of 12.8% (11/86), while this was 5.4% (22/405) for the Alberta NCRMD sample. As seen in Wirove et al. (2023), female patients had a lower total score and bin number frequency distribution than their male counterparts. For instance, nearly one-third (28.6%) of all females’ total scores put them in the first risk bin, while a similar proportion (28.1%) of males were allocated among the first four risk bins. Generally, when compared to men, women make-up a minority of Criminal Code offenses in Canada and they commit less violent crime; a finding that was supported in this study (Brown et al., 2020).
Discrimination and Calibration Properties of the VRAG-R in an NCR-UST Sample
Consistent with recent previous VRAG-R research, strong recidivism discrimination properties (large AUCs) were observed for VRAG-R bins and scores across the aggregate sample and male subsample. Moreover, calibration findings demonstrated that rates of 5-year violent recidivism increased with VRAG-R scores; however, VRAG-R scores from the normative sample overpredicted future violence at all risk bands, generally by about 2–3 times (or 200–300%) in bins 5 through 9. Although only the total risk category E/O index was significant owing to power limitations, this does not negate the overprediction that was evident in the present sample and consistent with similar Canadian forensic inpatient samples. For instance, Wirove et al. (2023) found the E/O index was significant in 7 out of 9 bin comparisons and violent recidivism was overpredicted in each bin by two-to-fifteen times (200% to 1500%).
Two recent studies evaluated calibration properties of the VRAG-R. Olver and Sewall (2018) included 296 treated men from the Clearwater High Intensity Sex Offender Program, operated by the Correctional Service of Canada. Gregório Hertz et al. (2021) consisted of 534 men convicted of sexual offenses, released from the Austrian prison system. The calibration properties of VRAG-R were checked, for variability of absolute risk estimates, by comparing the estimated re-offense rates of the samples (expected) with those of the Rice et al. (2013) normative sample (observed). Gregório Hertz et al. (2021) found no significant differences between groups across risk bins; however, Olver and Sewall (2018) found significant E/O index values for the ninth and highest risk bin (E/O = 1.48; 95% CI = [1.16, 1.88]), and the overall total (E/O = 1.45; 95% CI = [1.23, 1.75]). Generally, the VRAG-R had strong calibration properties with good agreement between observed and expected recidivism rates for both studies, with most fluctuations being small and nonsignificant in magnitude.
The VRAG-R evidenced moderate to large effects for 5-year violent recidivism in Olver and Sewall (2018), Gregório Hertz et al. (2021), and Wirove et al. (2023). The mean VRAG-R total score for Olver and Sewall’s (2018) sample was 16.8 (SD = 18.6) and 3.42 (SD = 17.66) for Gregório Hertz et al.’s (2021) sample, both of which are higher than that found in the current study’s sample and the Alberta NCRMD sample. In addition, Gregório Hertz et al.’s (2021) sample had the closest average VRAG-R total score to Rice et al.’s (2013) normative sample, and the VRAG-R also demonstrated the strongest calibration properties with this sample. In contrast, the VRAG-R showed poorer calibration properties for Olver and Sewall’s (2018) sample, with the highest VRAG-R risk score, and even weaker for the current sample and Wirove et al.’s (2023) sample, with the lowest VRAG-R risk scores.
The normative sample may have also been higher risk than was captured by their static scores from the VRAG-R; the tool may have been better calibrated originally for a sample that presents with higher risk (Davies et al., 2021; Wirove et al., 2023). For instance, the 5-year rates from Rice et al. (2013) for the VRAG-R in bin 9 was 80%, which is markedly higher than the observed 5-year rates in the present sample (40%), Gregório Hertz et al.’s (2021) sample (55%), Olver and Sewall’s (2018) sample (55%), and Wirove et al.’s (2023) sample (18%). On this point, the Rice et al. (2013) construction sample may have unique characteristics (i.e., unmeasured dynamic risk factors such as criminal attitudes, antisocial personality pattern, and negative peer associates) that increase their risk and contribute to higher observed rates of recidivism (Davies et al., 2022). The majority of the sample, used for the construction and validation of the VRAG-R, was comprised of acutely mentally ill forensic male clients who had committed violent offenses. Consequently, the VRAG-R normative sample appeared to be unexpectedly high risk, in ways that are not accounted for by the VRAG-R and is not sufficiently representative of other correctional or forensic mental health populations in Canada. This may, in part, explain why the VRAG-R produced slightly poorer calibration with the Saskatchewan NCR-UST population, compared to higher risk samples (Olver & Sewall, 2018), and even weaker calibration with an even lower risk Alberta NCRMD sample (Wirove et al., 2023).
Strengths and Limitations
There are several limitations and strengths to the current study with implications for practice and the generalizability of findings. One potential limitation is the archival and retrospective nature of this study, as the quality and quantity of file information was variable and inconsistent across the files; in rare instances, the VRAG-R could not be completed. Second, owing to the limited number of NCRMD cases in Saskatchewan’s history, by necessity, these cases were combined with UST files to yield a sufficient sample size, and these two populations may have distinct criminogenic characteristics. Even still, given the striking relative infrequency with which NCRMD and UST legislation are used in this province, the obtained sample was smaller than desired which would particularly affect VRAG-R cell frequencies at more extreme (and less common) scores, and the resulting power and generalizability of calibration findings. Accordingly, calibration findings should be interpreted with caution.
Third, the sample limitations in turn, precluded conducting gender stratified analyses for female patients, and thus the predictive properties of the VRAG-R cannot necessarily be generalized in the same way across male and female forensic patients. Fourth, our use of official criminal records may have resulted in an underestimate of recidivism rates, given that undetected or unreported offenses would not be captured. Fifth, NCRMD designations that resulted in an immediate absolute discharge were not included within this database; although rare, in such instances, there are no files produced for these cases since they would not have come under RB jurisdiction. Finally, over the study’s catchment period of 38 years, there have been many changes to the NCRMD legislation that could have the potential to produce cohort effects.
That said, an important study strength is its representation of the entire small population of persons found NCRMD and UST in Saskatchewan’s history who have come under the Saskatchewan RB jurisdiction. Moreover, this study contributes to further cross-validation of the discrimination and calibration properties of the VRAG-R in an NCRMD population outside of Ontario. It also provides information on potential gender differences, in terms of violence and recidivism risk, via the frequency and distribution of males and females among each VRAG-R bin number. Research on gender comparisons is lacking, in general, but is specifically limited among risk assessment tools. Finally, this study had strong methodological characteristics that bolstered the integrity and consistency of the findings. Meticulous data extraction and VRAG-R coding procedures were employed, including thoroughly reviewing each file, which culminated in excellent interrater reliability and reducing measurement error across researchers.
Implications for Research and Practice
The issue of generalizability of VRAG-R recidivism rates to other jurisdictions necessitates further calibration work with this measure. Recent legislative developments require Canadian RBs ensure public safety is the paramount consideration in their decision-making process (Bill C-14, 2014; Crocker et al., 2015); however, this must also be tempered with use of the least restrictive alternative for discharge, when the individual’s risk is manageable within the community (Winko v. British Columbia [Forensic Psychiatric Institute], 1999). Accordingly, it is crucial that tools such as the VRAG-R can contribute to accurate risk appraisals, without inflating or low balling risk, to aid risk management-related decisions regarding release and disposition outcomes. The present findings suggest that structured risk assessment tools such as the VRAG-R should be integrated more systematically into RB processes, and it supports the appropriateness of use of this tool by clinicians and review boards. Case formulations and release decisions should not be dependent on a single measure, however, and should be a multisource procedure utilizing actuarial estimates of risk to enhance clinical judgment to improve predictive accuracy; this is especially the case for a static tool such as the VRAG-R which will not capture changes in risk. As such, the inclusion of a dynamic violence risk tool to be used in tandem with the measures, for instance, to drive services, aid case formulation, and evaluate potential changes in risk in our view would strengthen services further.
The VRAG-R’s absolute recidivism estimates have questionable generalizability to other populations, jurisdictions, or settings. Further validation work across unselected and broadly representative samples are needed to further interrogate and understand discrepancies in recidivism rates across VRAG-R bins. The development of local or jurisdiction specific norms would also be a consideration (e.g., Canadian Prairies), given the overprediction from the normative sample. As future empirical efforts continue to produce data on recidivism estimates, risk ratios, and percentiles, the Justice Center’s standardized five risk levels can eventually be applied to the VRAG-R. This would assist with developing clearer and more universal language of risk communication, and expand forensic and correctional practice (Davies et al., 2022; Hanson et al., 2017). Without revisiting the norms or risk communication metric, however, the development of jurisdiction specific norms may be an ethical imperative to strengthen the fairness and accuracy of release decisions to balance patient rights with public safety.
Footnotes
Authors’ Note:
The views, opinions, and assumptions expressed in this paper are those of the authors and do not necessarily reflect the views or official positions of the Saskatchewan Health Authority, Correctional Service of Canada, or the University of Saskatchewan. The authors thank Sydney Rine for her work and guidance in VRAG-R data collection; Brent Nixon and Saskatchewan Hospital North Battleford of the Saskatchewan Health Authority; Saskatchewan Ministry of Justice: Corrections and Policing; and the Saskatchewan Law Courts for their support of this research. As the present study features use of protected or copyrighted materials to collect highly sensitive data on a vulnerable population, the data, as well as most study measures, are not publicly available. This study was not preregistered.
