Abstract
This study examined the discrimination and calibration properties of the Violence Risk Appraisal Guide–Revised (VRAG-R) within a large subset of the population of 574 individuals who had been found Not Criminally Responsible on Account of Mental Disorder (NCRMD) in Alberta. The VRAG-R was scored on all individuals identified via
Keywords
Persons with mental health diagnoses can come into conflict with the law for several reasons, one of which is by way of the commission of criminal acts that are attributable to active mental health symptoms. Although the commission of severe violence is infrequent among persons with mental disorder (Douglas et al., 2009), the international news media is nonetheless replete with high profile cases of tragic and extremely violent acts committed by mentally ill persons (Stuart, 2006; Wahl, 1992). In some instances, these persons will come under forensic mental health jurisdiction, such as through insanity defense legislation, and a series of detention, discharge, and risk management decisions will by necessity be made by review boards to promote public safety and client wellbeing. Part of this vital task is the ability to accurately assess a person’s risk for recidivism and particularly future violence, to inform discharge decisions and postrelease risk management.
Increasingly, the use of structured risk assessment measures has become a mainstay in forensic mental health. The imposition of structure, whether the tool be actuarial and numeric (i.e., with scored linked to recidivism estimates) or structured and non-numeric (i.e., as with structured professional judgment or SPJ), structured risk assessment measures help to minimize human judgment biases, increase the fairness and accuracy of decisions, and increase clinical utility, for instance, by informing service intensity and treatment foci for risk management (Brown & Singh, 2014). This study examined the predictive properties of a structured violence risk assessment measure developed on, and frequently used with, forensic mental health samples—the Violence Risk Appraisal Guide–Revised (VRAG-R; Rice et al., 2013)—a population of men and women from the province of Alberta who received an insanity defense (i.e., Not Criminally Responsible on Account of Mental Disorder [NCMRD]) verdict over a 70-year catchment period and followed-up in the community post discharge. We turn to a brief review of the Canadian insanity defense legislation and the discharge and recidivism patterns of persons with a positive verdict, followed by a review of violence risk assessment considerations with this population and the VRAG-R measure.
The Insanity Defense in Canada: Background and Context
The Canadian insanity defense is referred to as NCRMD in Section 16 of the
Review boards have the unwieldy task of generating a disposition that most balances the person’s right to freedom and the safety of the public. The discretion with which review boards can serve this function, however, changed with the imposition of new legislation instructing review boards to place public safety as the paramount consideration in review board hearings (Bill C-14, 2014). In response to increasing public pressure to ensure public safety and mental health, in 2014, Bill C-14,
Violence Risk Assessment with Persons Found NCRMD
NCRMD cases frequently involve crimes against the person. According to Statistics Canada (2014), this represented about two-thirds (63%) of NCRMD cases between 2005 and 2012; 20% of which represented major assaults. Crocker et al.’s (2015) National Trajectory Project (NTP) of discharge and recidivism patterns of NCRMD cases from British Columbia, Ontario, and Quebec found similar results with 64.9% of the index offenses involving crimes against the person. Livingston et al. (2003) examined an NCRMD cohort within British Columbia and found that assault was most serious offense for about half (45.5%) of the cohort, while in Alberta, Haag et al. (2016) found that 46.9% of NCRMD index offenses were for nonsexual violent offenses (excluding homicide), 18.5% for homicide, and 10.6%, attempted homicide. Furthermore, recidivism rates of NCRMD patients are lower than that typically found in correctional samples in Canada and internationally, which would be consistent with NCRMD samples generally being lower risk and also the possibility of good discharge decisions and risk management activities by review boards (Fazel et al., 2016; Friendship et al., 1999; Goossens et al., 2019; Grann et al., 2008; Hayes et al., 2014; Norko et al., 2016; Richer et al, 2018; Simpson et al., 2018; Tabita et al., 2012). For instance, Charette et al. (2015) found the 3-year recidivism rates of discharged NCRMD cases to be 17% following an absolute discharge immediately postindex offense, 22% following a conditional discharge, and 22% following absolute discharge after a period under review board jurisdiction. In all, although having a violent index offense is a consistent theme within this population, index offense severity is not to be taken as evidence that a person found NCRMD will continue to pose a danger to public safety (
The selection of one or more risk instruments to appraise violence risk involves as much the purpose and context of the assessment as it does the psychometric properties and intended function of any assessment measured used. Ultimately the purpose of violence risk assessment is violence prevention; this is achieved when individuals at high probability for future violence can be accurately identified and then given the greatest priority for risk management interventions to prevent future violence. The ability for a risk instrument to be able to aid such a preventive function, however, requires that it can predict the outcome of interest with satisfactory accuracy. To this end, the primary predictive properties of risk tools are discrimination and calibration (Helmus & Babchishin, 2017). Discrimination (relative risk) considers how risky an individual case is relative to other cases on a risk measure and can be examined through risk ratios, percentile ranks, or receiver operator characteristic (ROC) area under the curve (AUC) analyses. Such metrics enable examination of the extent to which risk scores on a given measure can accurately discern would-be recidivists from nonrecidivists and hence, higher risk from lower risk cases.
Calibration (absolute risk) “is a specific component of accuracy that measures how well a probabilistic prediction of an event matches the true underlying probability of the event” (Lindhiem et al., 2020, p. 840). An example of calibration is the use of logistic regression to generate rates of recidivism associated with individual scores (and hence groups of scores) on a risk measure. Furthermore, the Hosmer–Lemeshow goodness of fit test, can be conducted in logistic regression as a measure of calibration to examine to what extent the observed event (i.e., recidivism) match the expected event rates, with close correspondence between the two and a nonsignificant test indicating good calibration for the predictive model (Lindhiem et al., 2020). A further illustration of calibration would be to what the extent the expected recidivism rates generated from the risk categories in a reference group correspond to those observed in a comparison group, known as the
Several risk instruments, both general and crime specific (e.g., sexual, violence, intimate partner violence), containing static and/or dynamic risk items, have been developed with evidence demonstrated for their predictive properties from meta-analysis (Campbell et al., 2009; Hanson & Morton-Bourgon, 2009; Yang et al., 2010). The VRAG (Harris et al., 1993) and Sex Offender Risk Appraisal Guide (SORAG; Quinsey et al., 1998) refer to a pair of empirical actuarial tools designed to evaluate risk for future violence among violent offending and sexual offending-specific populations. The measures were each developed and validated on a hybrid sample of largely NCRMD (or equivalent previous legislation) cases and individuals on remand awaiting trial and sentencing (Harris et al., 2015). The measures comprised static risk variables (e.g., offense history, demographic, and clinical), differentially weighted based on the magnitude of the predictor criterion associations of the best-predicting linear combination of variables.
Since its development more than two decades ago, the VRAG has been independently validated more than 60 times in correctional and forensic mental health samples in several countries (Rice et al., 2013). Results from meta-analytic reviews demonstrate moderate to large effects (AUC equivalents = .66–.73) for the VRAG’s predictive accuracy for future violence (Campbell et al., 2009; Hanson & Morton-Bourgon, 2009; Yang et al., 2010). The VRAG is used regularly in the United States and in other countries around the globe (Cox et al., 2018; Singh et al., 2014), and in Canada, it is often used in risk assessments with persons found NCRMD to assist with review board decisions (Wilson et al., 2015).
The VRAG-R
The VRAG-R (Rice et al., 2013) was developed by integrating the VRAG and SORAG into one common violence risk assessment measure and then refining the item content. It has 12 items as with the VRAG, but is easier to score, for instance, using the antisocial facet from the Psychopathy Checklist-Revised (PCL-R; Hare, 1991, 2003) in place of the full scale and thereby at least partially addressing the “scale within a scale” criticism; however, critics may note that a prominent component of the PCL-R is still included and that the “scale within a scale” issue still applies, at least to some degree. The measure has also removed counterintuitive and controversial variables (e.g., inverse weighting of index offense homicides or schizophrenia diagnoses). Items are summed to generate total scores that are organized into nine risk bins; the risk bins are arranged into deciles with relatively equal proportions of cases within.
Psychometric research following on Rice et al.’s (2013) construction and validation of the measure has supported the predictive properties of the VRAG-R for future community violence in general correctional samples (AUC = .66, Glover et al., 2017), corrections-based sexual offending samples (AUC = .75, Gregório Hertz et al., 2021; AUC = .73, Olver & Sewall, 2018), and a forensic mental health sample (AUC = .74, Hogan & Olver, 2019). Cross-validation research on the VRAG-R’s calibration properties is needed. For instance, the observed 5-year rate of violent recidivism in the top bin (bin 9) of the VRAG-R was 80%, and the 15-year rate, 91%. Although the bins are now more representative in size and grouping, these are exceptionally high violent recidivism rates, and it is unclear to what extent such rates could generalize to other jurisdictions or samples. For instance, Olver and Sewall (2018), in a broadly high risk-need sexual offending sample, had a 5-year rate of violent recidivism of 53.9% for bin 9; this disparity (
Most VRAG and VRAG-R research has been conducted with male forensic and correctional populations, and to our knowledge, only the original VRAG has been formally examined with female populations. In two prison-based samples, one from the United Kingdom (Coid et al., 2009) and one from Germany (Eisenbarth et al., 2012), VRAG scores had moderate-to-high predictive accuracy for general recidivism (AUCs = .66 and.72, respectively), and moderate accuracy for future violence (AUC =.65; Coid et al., 2009). Moreover, in a U.S. jail sample of 145 female inmates (Hastings et al., 2011), the VRAG was a poor predictor of institutional misconduct but had small-to-moderate predictive accuracy (AUCs = .61–.66) for 1-year rearrest postrelease. Two of the aforementioned studies (Coid et al., 2009; Hasting et al., 2011) also found women to have significantly lower VRAG scores than men.
Current Study
Review boards make frequent use of violence risk assessment to inform discharge and management decisions within forensic mental health systems, and historically, the VRAG has been used frequently and evaluated heavily in settings around the world. The VRAG-R has a number of important advancements beyond the VRAG (and SORAG), however, further cross-validation research on the measure’s discrimination and calibration findings is needed, particularly forensic mental health samples, such as those with a positive insanity verdict. To date, there has yet to be a large-scale examination of these predictive properties of VRAG-R scores in an exclusive NCRMD population. Given that NCRMD cases tend to be lower risk with lower rates of recidivism, yet the VRAG measures are frequently used with this population, the issue of calibration and the generalizability of VRAG-R norms is paramount.
As such, this study examined the discrimination and calibration properties within the Alberta NCRMD population. There were four primary sets of hypotheses:
Females found NCRMD will have lower VRAG-R risk scores and bin number frequency distributions than their male counterparts.
VRAG-R bin and total scores will demonstrate moderate to high predictive accuracy (AUCs ≥ .64–.71), and hence strong discrimination properties for general and violent recidivism.
With respect to calibration, VRAG-R 5-year recidivism rates, as a function of VRAG-R bin and total score, will be higher for the Rice et al. (2013) normative sample than the Alberta NCRMD population.
The discrimination properties of the VRAG-R will extend to broad diagnostic groups (e.g., psychosis, antisocial personality disorder). In terms of calibration, higher rates of 5- and 10-year violent recidivism will be observed as a function of increasing VRAG-R score across diagnostic groups.
Method
The Alberta NCR Project
Ethics approval for this study was obtained through two university behavioral research ethics boards in neighboring provinces and operational approval was obtained from the Alberta Health Authority. Per Simmons et al. (2012), “We report how we determined our sample size, all data exclusions (if any), all manipulations, and all measures in the study” (par. 6).
Participants
As of October 2018, there were 574 identified cases of persons found NCRMD (male 83.1%,
VRAG-R
The VRAG-R (Rice et al., 2013) is an empirical actuarial violence risk assessment tool statistically developed from a violent mentally disordered offender population in Ontario, Canada; approximately half the men were found NCRMD while the remainder were on remand for violent crimes and undergoing assessment (Harris et al., 2015). A more streamlined and user-friendly revision of its predecessor, the VRAG-R comprises 12 static items: (a) Lived with parents until age 16 (weight range −2 to +2), (b) Elementary school maladjustment (weight range −3 to +4), (c) History of alcohol or drug problems (weight range −2 to +4), (d) Marital status at time of index offense (weight range −1 to +1), (e) Criminal nonviolent history score (weight range −3 to +5), (f) Failure on conditional release (weight range −2 to +4), (g) Age at index offense (weight range −7 to +2), (h) Criminal violent history score (weight range −2 to +4), (i) Prior admissions to correctional institutions (weight range −2 to +6), (j) Conduct disorder prior to age 15 (weight range −2 to +5), (k) Sex offending history (weight range −2 to +3), and (l) Antisociality—facet 4 of the PCL-R (weight range −6 to +6). Individual items are weighted based on the strength and direction of their associations with violent recidivism. Items are summed to generate a total score ranging from −34 to +46 and arranged into nine risk bins. The measure is scored by a service provider for a given correctional/forensic client and can be completed solely from sufficiently detailed archival information sources (e.g., psychological assessments, criminal records, police reports, institutional behavioral records, social history intake), using the detailed scoring rules for the instrument (Harris et al., 2015) or as provided from www.vrag-r.org (n.d.).
Recidivism Criteria
Recidivism was defined any as new criminal code conviction post discharge and was coded from Fingerprint Service (FPS) sheets through the Canadian Police Information Center (CPIC) as of 2015. Two operationalizations of recidivism were used. Violent recidivism consisted of any new criminal code conviction for an offense against the person with the potential for physical or psychological harm (e.g., assault, homicide, robbery), including sexual offenses. General recidivism consisted of a new conviction for any category of offense, be it violent or nonviolent. Offenses were coded in binary (yes 1, no 0) fashion. Conviction dates for first new offenses under a given category were also coded to track time to conviction to generate fixed follow-ups for data analysis (see Planned Analyses).
Procedure
All files of the 574 persons declared NCRMD in the history of the province were examined for inclusion and exclusion based on either the availability of file information or recidivism data. Missing information was not assumed to be missing for any systematic reason—data were either missing due to age of file (i.e., older files contained less information, ink on onionskin paper was illegible) or the research assistant’s discretion (i.e., the reports on file did not comment on needed information) in consultation with the project principal investigator (PI)—and was excluded from analysis. Demographic, clinical, criminological, and VRAG-R risk variables were coded by the first author and a team of undergraduate research assistants from files located at the hospital, a community mental health outpatient facility, and the Alberta Review Board. In the minority of instances when the VRAG-R had already been coded by hospital psychology or psychiatry staff, the item and total scores were extracted for inclusion in this study in lieu of rerating the measure from file. The Alberta NCR project principal investigator, a registered psychologist with over 20 years experience working with forensic and correctional populations, provided training to the research assistants on study measures and oversaw all data collection. The first author also attended online VRAG-R training through the Global Institute of Forensic Research. The research assistants completed regular scoring validity checks and also had access to the principal investigator and senior researchers for questions.
To examine interrater reliability, 30 files (6.3% of the sample) were randomly selected and independently double coded. Per Cicchetti and Sparrow (1981), “excellent” interrater reliability for VRAG-R total scores was obtained via intraclass correlation coefficient, single measure, absolute agreement, one-way random effects model: ICCA,1 = .983, 95% confidence interval (CI) (.965, .992),
Planned Analyses
Data analysis proceeded over several steps. First, descriptive statistics and frequency distributions of VRAG-R scores and their respective bins were examined. This was completed for both scores via the entire population and separated by gender. Second, to ascertain the capacity of VRAG-R bin and total scores to discriminate recidivists from nonrecidivists, the discrimination properties of the VRAG-R were examined in the prediction of fixed 5-year, fixed 10-year, and overall (unfixed) violent and general recidivism via ROC analyses. ROC analyses generate an AUC statistic ranging from 0 to 1 representing the probability that a randomly selected recidivist will score higher on a given risk tool than a randomly selected nonrecidivist. With values of .50 representing chance levels of predictive accuracy, AUC values of .556, .639, and .714 represent small, medium, and large effect sizes, respectively, (Rice & Harris, 2005). These analyses were conducted in the population as a whole and the large male subsample; there were insufficient female recidivists to conduct predictive validity analyses with this subsample.
Third, to examine the rates of recidivism associated with VRAG-R scores and the generalizability of the VRAG-R norms to the current sample, calibration analyses were conducted. To do this, per Olver and Sewall (2018), logistic regression was used to model 5- and 10-year estimates of violent recidivism with specific VRAG-R scores. Logistic regression generates a constant (
A further critical step toward examining calibration was through computation of the
As a fourth and final point of investigation, the discrimination and calibration (excluding the
Results
VRAG-R Descriptive Statistics and Frequencies
Table 1 provides the VRAG-R descriptive statistics for the Alberta-NCRMD population overall and by gender. The population average VRAG-R score was low (
Descriptive Statistics and Bin Frequencies for VRAG-R Scores.
Discrimination Properties of the VRAG-R for Violent and General Recidivism
Rates of recidivism for the Alberta NCRMD population were as follows: Violent recidivism 5.4% (22/405) 5-year, 7.7% (31/401) 10-year, 9.2% (44/476) overall; General recidivism 8.8% (36/405) 5-year, 13.2% (53/401) 10-year, 14.9% (71/476) overall. Owing to the small number of female recidivists (
Table 2 provides AUC values for VRAG-R prediction of these recidivism outcomes in the overall Alberta NCRMD population and in the male subsample. VRAG-R total score and bin level significantly predicted general and violent recidivism in the aggregate sample and among males. In both groups, slightly higher predictive accuracy was observed for violent than general recidivism, as well as for total scores versus bin number. In the Alberta NCRMD population, AUC magnitudes were moderate to large for violence and moderate for general recidivism across VRAG-R measures. In the male subgroup, AUCs were slightly lower, evincing moderate predictive accuracy for all outcomes across follow-up, with the exception of small AUCs for general recidivism at the unfixed follow-up.
Discrimination Properties of the VRAG-R for Violent and General Recidivism for 5-Year, 10-Year, and Overall Follow-Up: Aggregate Sample and Male Subsample.
Calibration Properties of the VRAG-R for Violent Recidivism
The calibration properties of the VRAG-R for violent recidivism were examined through logistic regression and

VRAG-R Calibration: Observed Rates of Violent Recidivism for the Nine-Bin Structure and Estimated Rates of Violent Recidivism Associated with Individual Scores over Fixed 5- and 10-Year Follow-Ups.

Logistic Regression Estimated 5- and 10-Year Rates of Violent Recidivism for all Possible VRAG-R Scores for the Overall Population (Dashed Lines) and Male Subsample (Solid Lines).

Logistic Regression Estimated Trajectories of 5- and 10-year Violent Recidivism for Observed VRAG-R Scores by Diagnostic Category.
E/O Index: Five-Year Rates of Violent Recidivism for Normative Sample (Rice et al., 2013) Compared with the Alberta NCRMD Population: Overall Sample and Male Only.
Discrimination and Calibration Properties of the VRAG-R by Diagnostic Category
Discrimination and calibration analyses were repeated across broad diagnostic categories highly represented in this population: any psychotic disorder, any mood disorder, SUD (i.e., present or ever), and ASPD (including ASPD traits). Rates of recidivism by diagnostic subgroup were as follows: psychotic disorder, violent (5-year 5.4%, 18/334; 10-year 6.7%, 22/330; overall 8.3%, 32/387) and general (5-year 8.0%, 27/334; 10-year 11.0%, 36/330; overall 13.2%, 51/387) recidivism; mood disorder, violent (5-year 1.5%, 2/130; 10-year 3.1%, 4/129; overall 5.9%, 9/153) and general (5-year 4.6%, 6/130; 10-year 8.5%, 11/129; overall 12.4%, 19/153) recidivism; SUD, violent (5-year 5.6%, 12/216; 10-year 8.0%, 17/215; overall 8.8%, 22/251) and general (5-year 10.6%, 23/216; 10-year 15.3%, 33/215; overall 16.7%, 42/251) recidivism; and ASPD, violent (5-year 15.7%, 13/83; 10-year 21.7%, 18/83; overall 19.6%, 19/97) and general (5-year 18.1%, 15/83; 10-year 27.7%, 23/83; overall 26.8%, 26/97) recidivism.
Table 4 provides the results of ROC analyses to examine the discrimination properties of VRAG-R score and bin within each the four diagnostic subgroups. For persons with psychotic disorders (the largest diagnostic subgroup), VRAG scores and bins significantly predicted both outcomes irrespective of follow-up, with moderate effects for violence and small effects for general recidivism. For the mood disorders subgroup, predominantly large effects were observed for violence and moderate effects for general recidivism, however, the AUC magnitudes had some instability and wide CIs, particularly for 5-year outcomes due to the small number of recidivists. Finally, VRAG-R scores and bins had small and nonsignificant predictive effects for both sets of outcomes within the ASPD and SUD subgroups.
Discrimination Properties of the VRAG-R for Violent and General Recidivism as a Function of Follow-Up and Diagnostic Category.
Finally, logistic regression was conducted to estimate the rates of 5- and 10-year violent recidivism associated with specific VRAG-R scores within each of the four diagnostic subgroups; the regression model statistics are presented in Table 5, the results of which paralleled AUC findings. Hosmer–Lemeshow goodness of fit tests was all nonsignificant for each diagnostic subgroup suggesting that the logistic distributions provided a reasonable approximation of violent recidivism rates to warrant modeling. Application of the logistic function using the VRAG-R predictor and constant values from Table 5 generated the trajectories of 5- and 10-year violent recidivism for each diagnostic subgroup (see Figure 3). Of note, only the VRAG-R scores populated within a diagnostic subgroup were used to generate the curves. The curve for 5-year violence within the mood disorders group is particularly steep, likely owing to the small number of recidivists. Otherwise, similar trajectories of 5- and 10-year violent recidivism were associated with VRAG-R scores for three out of the four diagnostic subgroups; the one exception was considerably higher rates of 5- and 10-year violence for the ASPD subgroup irrespective of VRAG-R score. That is, individuals with ASPD had higher rates of recidivism for a particular score than did members of other diagnostic subgroups, but still at rates substantially lower than the VRAG-R norms (Rice et al., 2013).
VRAG-R Logistic Regression Prediction Models for 5- and 10-year Violent Recidivism by Diagnostic Category.
Discussion
This study sought to examine the risk profiles and recidivism outcomes of individuals who had been found NCRMD in the province of Alberta. The study had two goals: first to investigate the population for potential gender differences, and second to examine the predictive accuracy (i.e., the discrimination and calibration properties) of the VRAG-R.
The Alberta NCRMD population was much lower risk than the construction or validation sample used to develop the VRAG (Harris et al., 1993) and VRAG-R (Rice et al., 2013
Discrimination and Calibration Properties of the VRAG-R in an NCRMD Sample
Consistent with recent previous VRAG-R research (Gregório Hertz et al., 2021; Hogan & Olver, 2019; Olver & Sewall, 2018; Rice et al., 2013) strong recidivism discrimination properties were observed for the VRAG-R with respect to the overall sample and male subgroup. AUCs were slightly higher for the sample overall than in the male subgroup; this is due to the AUC being a rank ordered statistic, with females having both lower scores and lower rates of recidivism, which would result in a greater concentration of recidivists at the top end of VRAG-R scores and hence higher AUC values. Among diagnostic subgroups, VRAG-R bin and total scores had good predictive accuracy for persons with psychotic illness or mood disorder, however, accuracy was lower (small effects) for SUD and ASPD diagnoses. We believe this is attributable to SUD and ASPD diagnoses tending to be the exception rather than the rule in this forensic mental health sample (in contrast to correctional settings); not only were the diagnostic base rates lower, but ASPD and SUD are themselves inherently criminogenic and on their own accounted for higher rates of recidivism.
The results of calibration analyses demonstrated that rates of 5- and 10-year violent recidivism increased with VRAG-R scores. Formal comparison of observed violent recidivism rates within the current sample to those expected from the VRAG-R normative sample, however, demonstrated that VRAG-R scores substantially overpredicted future violence at all risk bands. The
Implications for Research and Practice
Taken together the results of this study have several clinical and correctional implications regarding policy, practice, treatment, and assessment within the Alberta NCRMD population. First, the study findings demonstrate strong discrimination properties for the VRAG-R, but significant issues with calibration in a parallel NCRMD sample. Although some caution should be exercised given that the Alberta NCRMD population was much lower risk, it is sobering that overprediction occurred even at the lowest risk VRAG-R bands and strongly indicates that baseline actuarial risk does not explain the whole picture. The VRAG-R normative sample also used a slightly more liberal recidivism criterion (i.e., violent charges/convictions or returning to hospital for a reason that would have otherwise resulted in a criminal charge for a violent offense as opposed to the higher threshold of conviction), which may partly account for some disparities. Recidivism base rates can also be impacted by other factors such as prosecutorial discretion in which charges to prosecute, law enforcement officer arrest decisions based on mental health presentation during police encounters, or access to legal representation (e.g., Legal Aid).
Second, future research should examine the predictive validity of the VRAG-R with female correctional or forensic mental health samples. Such research would not only aid knowledge about the predictive properties of the VRAG-R with women, but potentially increase the number of tools available for risk assessment and management applications with this population. Gender stratified norms, however, as seen with other tools such as the Level of Service/Case Managment Inventory (LS/CMI), would be essential to avoid perpetuating the problem of overpredicting violence risk.
Third, the present findings highlight: (a) the importance of local norms for risk assessment measures and (b) the importance of conducting risk assessments as an integrated, multimeasure, multisource process that does not depend solely upon one single measure to appraise risk or to make decisions. It is clear that the VRAG-R norms would generate considerably higher projections of rates of future violence attached to scores. As such, the present findings may be taken as a set of local Alberta NCRMD norms, which likely represent more realistic portrayals of risk. Even still, the VRAG-R should not be used as a standalone measure and is likely best complemented by a dynamic measure, and in practice this often occurs. For instance, Olver and Sewall (2018) found that a measure of sexual violence risk incremented VRAG-R predictions, and that logistic regression could be used to model recidivism estimates incorporating treatment change information. This study findings also support the assertion of Canadian researchers elsewhere (Goossens et al., 2019; Grantham, 2014; Charette et al., 2015; Haag et al., 2021; Lacroix et al., 2017) that there is very little evidence that supports the idea that legislation change was needed to protect public safety (i.e., Bill C-14), given the lower levels of risk than general correctional samples and low rates of recidivism.
Strengths, Limitations, and Conclusions
This study has important strengths and limitations. One potential limitation is that this study was archival and retrospective in nature, and invariably, the researchers were at the mercy of quality and quantity of information available on file. In some cases, the files were simply limited and insufficient, particularly with older files (e.g., circa 1940–1970), and the VRAG-R could not be coded. Relatedly, not all files could be retrieved for individuals identified NCRMD and were similarly excluded from this study. Second, NCRMD cases resulting in an immediate absolute discharge were not captured by this database given that they would not have come under Alberta Review Board jurisdiction and no file would be created; importantly such cases are very rare, yet still, this small select portion of the NCRMD population was unaccounted for in this research. Finally, during the study’s 70-year catchment period many changes have occurred in NCRMD legislation, as well as diagnostic classification, such as the various iterations of the
A unique and important study strength is its representation of the entire population of persons found NCRMD in Alberta’s history who have come under the Alberta Review Board jurisdiction. Second, this study has some important firsts; it is the first cross-validation of the discrimination and calibration properties of the VRAG-R in an NCRMD population outside of Ontario. It is also the first examination of potential gender differences within this population within Alberta, specifically on an actuarial risk measure and recidivism outcomes. Finally, there are also core methodological strengths that support the integrity and stability of findings. Specifically, high-quality VRAG-R data with strong interrater agreement was collected, with comprehensive long-term outcome data captured; methodological and data conditions that are ideal to permit rigorous examination of the predictive properties of VRAG-R scores.
As understanding about individuals found NCRMD improves, policy and legislation improves, and assessment and intervention can be tailored to manage violence risk and improve client wellbeing. Given that review boards have been recently instructed to place public safety as their paramount consideration, the results of this study can help to refine how much risk should be allocated to those persons found NCRMD and create a better balance between civil liberties and public safety.
Footnotes
Acknowledgements
The authors give special thanks to Dr. Stephen Wormith for his valuable insights and suggestions on this research program. The views and opinions expressed in this article are those of the authors and do not necessarily represent those of the Ottawa or Alberta Health Authorities or the University of Saskatchewan or the University of Alberta.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
