Abstract
In 2000, this journal published an influential case–control study identifying dynamic risk factors for sexual recidivism (Hanson & Harris, 2000). In 2017, updated recidivism information for the same sample was obtained with an average follow-up of 20 years. The current study compared the risk factors that differentiated between sexual recidivists and nonrecidivists between the two research designs: original case–control and updated prospective cohort. Of the 82 comparisons, 50 favored the prospective design while 32 favored the case–control; however, most of the differences were small and nonsignificant. Static and dynamic risk factors were approximately equivalent between study designs. Factors identified as sex-specific (e.g., sexual deviancy) were also equivalent between designs while general risk factors (e.g., substance use) were more likely identified in the prospective design. Overall, case–control studies can be used for the identification of risk factors, especially for low base rate behaviors such as sexual recidivism.
Evidence-based corrections requires knowledge of recidivism risk and protective factors (i.e., characteristics empirically associated with the outcomes of interest; Bonta & Andrews, 2024; Hanson, 2009). One of the following two research designs are typically used to identify such factors: (a) prospective cohort designs (i.e., assessment of factors at Time 1 are correlated with the outcome of interest at Time 2), and (b) case–control designs (i.e., factors are compared between a group that has experienced the outcome of interest and a group that has not, without a follow-up period). Of the two, prospective cohort designs are usually preferred because they can clearly establish the temporal order of the variables (Mann, 2003; Sedgwick, 2013). Temporal order is necessary for applied prediction tasks and is a key criterion for causal theories (Kraemer et al., 1997). In addition to concerns about temporal order, case–control studies are also influenced by the variables used to match cases (Meehl, 1970). Case–control studies are, nonetheless, widely used because they can be completed relatively quickly, particularly for low base rate behaviors (Rothman et al., 2008; Song & Chung, 2010), such as sexual recidivism.
More than twenty years ago, this journal (Criminal Justice and Behavior) published a highly influential case–control study of dynamic risk factors for sexual recidivism (Hanson & Harris, 2000). The Hanson and Harris (2000) study has been widely cited, with over 1,300 citations in Google Scholar as of July 2024, including 36 citations in 2023 alone, two decades after it was originally published. The study also informed the development of the STABLE-2007 and ACUTE-2007 (Hanson et al., 2007) sexual recidivism risk tools, which are among the most used in Canada and the United States (Bourgon et al., 2018; McGrath et al., 2010). But given the inherent limitations of case–control studies, how much should we trust the Hanson and Harris results? More generally, to what extent does the recidivism risk and protective factors identified in any case–control study replicate in prospective studies? Although many factors have been examined in both case–control and prospective studies, the studies have been different (one study of variable x is case–control; another is prospective). Consequently, it is hard to tell whether any observed differences in the outcome of these two types of studies can be attributed to differences in the initial sample selection, location, or any of a myriad of other, unmeasured features varying across research studies. We are unaware of any previous study of recidivism risk factors directly comparing the results of a case–control design and a truly prospective design using the same variables, with the same sample, in the same setting. The current study fills this gap by comparing risk factors differentiating sexual recidivists and nonrecidivists in Hanson and Harris’ (2000) original case–control study to the factors predicting sexual recidivism in a 20-year prospective follow-up of the same sample.
Overview of Prospective and Case–Control Designs
Prospective designs can take many forms; however, prospective cohort studies are commonly used when studying recidivism risk factors. Prospective cohort studies start with a defined group who vary on a characteristic (or set of characteristics); after a follow-up period, some of the group members will have the outcome and some will not. Those with and without the outcome are then compared on the original factors that were measured at baseline (Mann, 2003; Sedgwick, 2013). Temporal order is established because the variables are measured prior to the outcome, limiting the risk of recall bias. Although prospective cohort studies are often considered the gold standard in prediction studies, they are resource and time intensive. Truly prospective cohort designs are particularly problematic when assessing outcomes that occur rarely (e.g., suicide) or manifest over a long period (e.g., life expectancy of infants born in 2024; Mann, 2003; Sedgwick, 2013). Furthermore, prospective cohort designs are susceptible to bias stemming from attrition between timepoints, especially if the reason for the attrition is associated with the factors assessed at baseline (Sedgwick, 2013).
Like prospective cohort designs, case–control studies compare those with and without the outcome on a predefined set of variables. The key difference is that those with the outcome are selected because they have the outcome. There are several different methods of selecting those without the outcome (i.e., the control group), usually involving some form of matching from a larger pool of potentially eligible cases (Song & Chung, 2010). For example, researchers could compare all cases of teen suicide in a metropolitan area with a random sample of teenagers from the same city during the same time period. Because researchers determine the number of cases in each group, case–control studies make it possible to study very low frequency outcomes, as well as outcomes that occur many years later (Rothman et al., 2008; Song & Chung, 2010). In general, case–control studies require fewer participants for equivalent statistical power, allow for the use of existing file information, and can be conducted more quickly and easily than prospective designs.
As with all observational studies, both prospective cohort and case–control studies are susceptible to the influence of unmeasured, third variables. Prospective cohort studies often address this problem by measuring a wide range of variables and then using them as statistical controls. In the case–control design, it is common to match cases on variables that could influence the outcome but are not the focus of the investigation (e.g., age, gender, geographic location, income; Cologne & Shibata, 1995; Meehl, 1970). Although matching is common practice in case–control studies, it is not without limitations. Any variable used as a matching variable cannot be used to compare the groups on the outcome of interest (by design, the effect would be zero). Less obviously, matching decreases the perceived effects of correlated variables. If, for example, low income is related to high criminal behavior, matching cases on age would decrease the association between income and crime because young adults typically make less money than older adults. Furthermore, as Meehl (1970) described, matching cases on any measured variable, m, would be expected to systematically mismatch the cases on some other, unknown variable, z. By controlling for variable m, the level of the mismatch on variable z could be quite high.
Risk Factors for Sexual Recidivism
The effect matching has on the underlying constructs is difficult to anticipate (Meehl, 1970). In the context of sexual recidivism risk assessment, two major constructs have been identified: (a) sex-crime-specific factors and (b) general antisociality (Babchishin et al., 2012; Brouillette-Alarie et al., 2016; Lehmann et al., 2013). Sex-crime-specific factors describe problems with atypical sexual interests and poor sexual self-regulation, contributing to sexual crime but not necessarily to nonsexual crime. The sex-crime-specific dimension is exemplified by characteristics such as early onset of sexual offending, engaging in diverse sex crimes, atypical sexual interests (e.g., pedophilia, exhibitionism), sexual preoccupation, and emotional identification with children (a correlate of pedophilia). The general antisociality dimension describes a general propensity for rule-breaking (sexual or otherwise), and includes factors such as adjustment problems in childhood, poor self-regulation and problem-solving, antisocial attitudes and personality traits (e.g., callousness), and relationship conflicts (Hanson & Bussière, 1998; Hanson & Morton-Bourgon, 2004; Helmus & Thornton, 2015).
In their case–control study, Hanson and Harris (2000) examined both static (historical, largely unchangeable) and dynamic (potentially changeable) risk factors in a sample of slightly over 400 men with a history of sexual offending who had served part of their sentence on community supervision. Sexual recidivists (n = 208) and nonrecidivists (n = 201) were matched on certain demographic characteristics (gender, geographic location, jurisdiction), as is customary in case–control studies. Given Hanson and Harris (2000) were primarily interested in dynamic risk factors, they also attempted to match cases on sex-crime-specific static risk factors (age, victim type, sexual criminal history). As well, they approximately matched on sexual recidivism risk level using the Rapid Risk Assessment for Sex Offense Recidivism (RRASOR; Hanson, 1997). Overall, Hanson and Harris (2000) found that the static factors that best differentiated recidivists from nonrecidivists were scores on validated risk assessment measures for general violence (e.g., VRAG, Quinsey et al., 1998), indicators of sexual deviance (e.g., number of paraphilias, victim type), IQ, and history of childhood maltreatment (e.g., sexual abuse and neglect). Examples of promising stable dynamic factors were antisocial attitudes (e.g., sees self at no risk for recidivism), poor social influences, and sexual entitlement. Because the groups were matched on sex-specific static factors, the effect sizes of the general risk factors (e.g., antisocial attitudes) may have been inflated; conversely, the variables correlated with the matching variables would be expected to show reduced effects compared to what they would have shown in prospective studies. Any of the variables explicitly used in matching (e.g., RRASOR scores) could not be meaningfully tested. Consequently, it is an open question whether the same types of risk factors (i.e., sex-crime specific vs. general) identified through this case–control design would replicate in a prospective cohort design.
Current Study
In 2017, the data set created by Hanson and Harris (2000), was updated to include prospective recidivism information for the same individuals from the original study. The current study compared the predictive accuracy of static and stable dynamic risk factors identified through a case–control design versus a prospective cohort design. Overall, we expected more differentiation between recidivists and nonrecidivists in the prospective cohort rather than case–control design because the prospective design would have less matching bias. Given the original study had matched on sex-crime-specific factors, we expected relatively larger effects from sex-specific risk factors in the prospective study and relatively larger effects from general antisociality factors in the case–control study.
Methods
Participants
The original sample of 409 men were supervised in the community for a sexually motivated offense (Hanson & Harris, 2000) between 1992 and 1997. Cases were selected from all Canadian provincial correctional systems (except Prince Edward Island) and all regions of the Correctional Service of Canada. All the men had been convicted of sexual assault involving physical contact against a person who was not in their immediate family (i.e., cases of offending against children/stepchildren were excluded, as were those with purely noncontact sexual offense histories). In the original study, 208 were classified as having recently sexually recidivated while on community supervision, whereas 201 were currently on supervision without any known reoffense as of the 1997 data collection. In the original study, recidivism was indexed by charges for sexual offenses (68%), violations of community supervision for sexually motivated behavior (26%), and nonsexual charges with sexual motivations (e.g., stealing underwear; 6%). Most of the sexually motivated violations were for sex crime behaviors (e.g., exhibitionism, sexual contact with a minor) that eventually resulted in charges and convictions; however, some of the community supervision violations were for behavior that indicated that a new sexual offense was imminent, but not actualized (e.g., approaching victims, possession of rape kits). None of the original sexual recidivism events included purely technical violations, such as nondisclosure of consenting intimate relationships or possession of legal pornographic materials.
Once a recidivism case was identified, nonrecidivist cases were selected from the same location (e.g., Metro X Probation Office) and jurisdiction (provincial/federal). Among the available cases at that office, the nonrecidivist case was selected that minimized static differences between the recidivist and nonrecidivist groups. As well, an effort was made to match on other salient characteristics, such as major mental illness (e.g., schizophrenia) and ethnicity (e.g., Indigenous status). Finally, the researchers aimed to match on the risk for sexual recidivism using RRASOR (Hanson, 1997) total scores.
Updated criminal history information obtained in 2017 was used to identify subsequent recidivism events for the nonrecidivist group, and correct errors in the original classification. The most common error was pseudo-recidivism (i.e., individuals had new charges for behavior that predated the beginning of their supervision period), resulting in 25 cases being reclassified from recidivists to nonrecidivists. As well, two cases were removed because they did not fit the sampling frame of the original study (one case with a very old recidivism event; one case in which nonsexual offending was mistaken for a sexual offense). Consequently, the original 409 cases were reclassified as 180 recidivists and 227 nonrecidivists (n = 407) for the purpose of the case–control analyses. Consequently, some cases that were (supposedly) matched in the original study were not matched in the current sample (e.g., a particular probation office may have contributed two nonrecidivists rather than a matched set of one recidivist and one nonrecidivist).
The prospective analyses considered only the 227 men who were classified as nonrecidivists in 1997. Although some of these men had prior sexual and nonsexual offenses, all had successfully completed at least six months of community supervision (average of 24 months) at the time of data collection. The median at-risk date for the nonrecidivists was 1996 (range from 1981 to 2002) in the updated data set. Subsequent criminal history information identified 57 of these 227 men as having committed a new sexual offense. Recidivism information was not available for 14 cases. Consequently, the prospective study compared 57 men who were known to have sexually reoffended after their assessment date in 1997 to 156 men who had no new recidivism events as of the follow-up end date in 2017.
At time of release, the average age of the complete sample (n = 407) was 39.4 years (SD = 11.4, range from 18 to 74). Most of the men were born in Canada (87.3%) and White (85.0%; 8.8% Indigenous; 2.9% Black; 3.3% unknown/other). Median education level was Grade 10. Based on the Static-99R (Phenix et al., 2017), the sexual recidivism risk of this group was above average: Very low risk (1%); Below average risk (4.9%); Average risk (28.7%); Above average risk (30.7%); Well above average risk (34.6%).
Original Data Collection
The selection of variables was guided by Andrews’ (1994) work on criminogenic factors, Hanson and Bussière’s (1996, 1998) meta-analyses on sex-crime-specific factors, and consultations with researchers, correctional managers, and community supervision officers. The final data collection instrument benefited from focus groups with experienced probation/parole officers and extensive pilot testing. The original project obtained ethics clearance from 14 institutional review boards. Selection and training of field team members that would be completing interviews followed Canada-wide position advertising, resumé review, and in-person interviews including role-plays and mock telephone cold calls. Ecological validity was increased by having an active-duty probation officer included in the interview team (seconded to the data collection team). All coders received a week-long in-person training that involved mock interviews, practice scoring on actual de-identified cases, and team-building exercises. Once data collection started, the team was actively supported by frequent telephone communication, periodic onsite visits, and an ever-evolving e-mail “Decision Log” that addressed ongoing issues in coding and other guidance. To increase the likelihood that the information concerning the predictor variables described the case before the recidivism event, coders first established two critical time points in the interviewees’ lives, one in the month proceeding the recidivism event and another six months earlier (e.g., changed jobs; returned from maternity leave; experienced a severe weather event). The interviewees were asked to link their responses about the recidivist/nonrecidivist cases to these two critical time periods in their own lives. Coding of contemporaneous contact notes was completed using the same interview coding form as the officer interview, utilizing the same two critical time periods. Data were then couriered to Ottawa for double-entry data punching. Further procedural details can be found in Hanson and Harris (1998, 2000).
Updated Recidivism
Two graduate-level research assistants working for Public Safety Canada linked recidivism information obtained in 2017 to the previously collected case information. Recidivism events were primarily identified by Canadian Police Information Center (CPIC) criminal history records held by the Royal Canadian Mounted Police. These records were supplemented by news articles searched via Google, identifying seven additional recidivism events (five contact, one noncontact, one sexually motivated violation). Sexual recidivism was based on new charges (45 cases) or sex-related community supervision violations (12 cases). Fifteen of the 57 recidivism cases involved only noncontact sexual offending (e.g., voyeurism, exhibitionism, child sexual exploitation materials). Follow-up time began at the date of assessment and ended with death (n = 18 of the 213; median death date of 2010, range from 1998 to 2016), deportation (n = 1), the date of the most recent CPIC record (2017), or the start of a period of incarceration that extended into one of these dates. The average follow-up time was 20.6 years (median = 20.7, SD = 4.1). At-risk time excluded periods of incarceration for nonsexual offenses.
Classification of Risk Factors
All three authors independently categorized the list of risk factors from Hanson and Harris (1998, 2000) as either sex-crime-specific, general, or neither (see S1 in the Supplemental Materials). Using Fleiss’ (1971) kappa, interrater agreement was good for the sex-specific (kappa = 0.671) and general (kappa = 0.605) categories, but poor for the other/neither category (kappa = 0.364). Given that kappa is heavily influenced by post hoc endorsement rates (penalizing infrequently used categories), interrater agreement was also indexed using the Brennan-Prediger coefficient (Moss, 2023), which indicated good agreement for all three categories (0.700 for sex-specific; 0.606 for general; and 0.718 for other/neither). Consensus was reached on any variable without perfect agreement. Reliability coefficients were calculated using the irrCAC package in R (Gwet, 2019).
Data Use
The current study was conducted under a data sharing agreement with Public Safety Canada. This study received ethics approval from the Dalhousie University Research Ethics Board (#2021-5662). The recidivism data set used in this study has been previously used by Aelick et al. (2020) in a study of mental health variables, Lee et al. (2020) in a meta-analysis on the predictive validity of sexual recidivism risk tools for individuals of Indigenous heritage in Canada, and Hanson et al. (2024) in a study of sexual recidivism rates.
Analysis Plan
Differences between the recidivists and nonrecidivists were computed for the case–control and prospective cohort designs. The question of primary interest was the difference between these differences (i.e., differences in effect sizes). The effect size for ordinal (e.g., no = 0, possibly = 1, yes = 2), count (number of prior offenses), and interval-like variables (risk scale scores, age) was the area under the curve (AUC), which is insensitive to base rates and extreme values (Ruscio, 2008). For dichotomous variables, the effect size was the logged odds ratio, with 0.5 added to each cell as a variance stabilization adjustment (Fleiss et al., 2003; Hanson, 2022). These were calculated in logit metric and reported in odds ratio metric for ease of interpretation. When interpreting the effect sizes, the following metrics can be used: small (.56 to .63), moderate (.64 to .70), and large (> .71) for AUCs (Rice & Harris, 2005); small (.71 to .43; 1.4 to 2.3), moderate (.44 to .27; 2.4 to 3.7), and large (< .27; > 3.7) for odds ratios (see Hanson, 2022).
The differences between the effects sizes were reported in several ways. First, we simply counted the number of times that the effect size was larger (in the expected direction) in the case–control analyses compared to the prospective cohort analyses. When the effect sizes were larger in one design than another, we considered that the results favored that design. Next, we counted the number of times the magnitude of these differences was more than expected by chance based on the standard error of differences.
The generic form for the standard error of a difference score is as follows (Ley, 1972):
In our study, r (the correlation between the variables) was computed separately for the AUC values (r = 0.7026, k = 51) and the logged odds ratios (r = 0.5497, k = 33). The variances for the AUC values were calculated using IBM/SPSS Version 29.0.2.0; the variances of the logged odds ratios were calculated in Microsoft Excel using the standard formula with 0.5 added to each cell (Hanson, 2022). Ninety-five percent confidence intervals for the differences were calculated as ± 1.96σa-b. When the confidence intervals do not include zero, they are statistically significant at alpha equal < .05 level. These confidence intervals should be considered approximate because the application of the normal probability distribution to Ley’s (1972) formula assumes equal variances (at the population level), which is unlikely to be the case for effect sizes based on different sample sizes. Nevertheless, the relative consistency of sample sizes within each design made the standard error of the differences a useful metric to identifying plausible differences across designs. No correction was made for multiple comparisons because keeping the p value at .05 provided a more sensitive test of the differences of effects between the two designs. All numbers in the case–control and prospective cohort analyses were verified independently by the first and second authors.
Results
Overall, the relationship of the variables to sexual recidivism was similar in the case–control and prospective cohort studies, with the expected trend toward larger effects in the prospective design. Of the 82 comparisons, 50 favored the prospective design (i.e., were larger) and 32 favored the case–control design. However, most of these differences were small. Only 27 of the 82 comparisons were statistically significant (17 favoring prospective; 10 favoring case–control; using p < .05 without correction for multiple comparisons). The same patterns were found for the static, historical variables (Table 1: 30 favored prospective; 20 favored case–control) and the stable dynamic variables (Table 2: 21 favored prospective; 11 favored case–control). Given that the criminal history variables were used to match cases, it is not surprising that all eight of the criminal history variables in Table 1 showed larger effects in the prospective study than the case–control study; five of these differences were statistically significant. The RRASOR, which also guided case matching, showed an AUC of .55 in the case–control study and .60 in the prospective study.
Effect Sizes for Sexual Recidivism of Static, Historical Variables in a Case–Control Design Compared to a Prospective Design
Note. AUC used for ordinal and interval variables; Odds ratios used for dichotomous variables. All differences were computed such that positive values (for AUC values) and ratios greater than one (for odds ratios) indicate the effects were larger (in the expected direction) for the case–control design than for the prospective design. For the standard error of the differences, the correlation was estimated at 0.7026 (n = 51) for the AUC values and 0.5497 (n = 33) for the logged odds ratios. Differences for AUC values are not significant when the confidence interval includes zero. Differences in the odds ratios are not significant when the confidence interval for the ratio does not include 1. Bolded values indicate that the comparison was statistically significant (p < .05). AUC = area under the curve; CI = confidence interval; VRAG = Violence Risk Appraisal Guide; RRASOR = Rapid Risk Assessment for Sex Offense Recidivism.
Difference not calculated because there was no expectation as to the direction of the effect.
Effect Sizes for Sexual Recidivism of Stable, Dynamic Variables in a Case–Control Design Compared to a Prospective Design
Note. AUC used for ordinal and interval variables; Odds ratios used for dichotomous variables. All differences were scales such that positive values (for AUC values) and ratios greater than one (for odds ratios) indicate the effects were larger (in the expected direction) for the case–control design than for the prospective design. For the standard error of the differences, the correlation was estimated at 0.7026 (n = 51) for the AUC values and 0.5497 (n = 33) for the logged odds ratios. Differences for AUC values are not significant when the confidence interval includes zero. Differences in the odds ratios are not significant when the confidence interval for the ratio does not include 1. Bolded values indicate that the comparison was statistically significant (p < .05). AUC = area under the curve; CI = confidence interval.
Difference not calculated because there was no expectation as to the direction of the effect.
Of the total 82 variables, 62 were classified as either being a sex-crime-specific risk factor (28 variables) or general risk factor prior to analyzing the data (34 variables; see S1 in Supplemental Materials). Of the 28 sex-crime-specific variables, equal numbers showed larger effects in the case–control design (14 variables) and the prospective design (14 variables); only seven of these comparisons were statistically significant (four favoring case–control; three favoring prospective). For the 34 general crime variables, the effects tended to be larger in the prospective design (21 variables) than the case–control design (13 variables); again, only a small number of these comparisons were statistically significant (seven favoring prospective; four favoring control). This pattern of results was opposite to the pattern we predicted (we had expected the sex-crime-specific variables to show larger effects in the prospective design than in the case–control design). Tables summarizing the findings can be found in S2 and S3 of the Supplemental Materials.
Discussion
The choice of a specific research method is guided by the strengths and limitations of the different approaches, their costs, and the available resources. When identifying risk factors for sexual recidivism, case–control studies are often used because they are relatively easy to implement and require fewer resources (e.g., time, money) than prospective studies. Despite these advantages, prospective studies are still considered the gold standard because the temporal order between risk factors and the outcome of interest is easily established. But to what extent are the findings between the two research designs equivalent? To our knowledge, no other study of sexual recidivism risk factors has compared the results obtained from a case–control study to those of a prospective follow-up of the same sample. Our results support the following conclusions: (a) case–control and prospective cohort designs can provide similar information on risk factors for sexual recidivism; (b) matching on specific factors in case–control studies can have unexpected effects on the observed relationships; and (c) the factors identified through both designs are largely consistent with the broader literature on sexual recidivism risk.
Case–Control vs. Prospective Cohort Designs
Given the strengths associated with prospective cohort designs (e.g., established temporal order), we expected that more risk factors would differentiate sexual recidivists from nonrecidivists in the prospective analyses compared to the original case–control study. Although there were more comparisons that favored the prospective cohort rather than case–control design, the differences were small and mostly nonsignificant. There also did not appear to be any consistent pattern in whether static or stable dynamic risk factors emerged in either research design. Slightly more of the comparisons of stable dynamic risk factors favored the prospective rather than the case–control design; however, once again, these differences were small and mostly nonsignificant. These results provide support for the ability of case–control designs to identify risk factors for sexual recidivism. Not all case control studies, however, are created equal: it is necessary to consider the quality of the study. In this case, Hanson and Harris (2000) relied on information from multiple sources (e.g., file information, interviews with parole officers) and made a concerted effort to establish the temporal order of factors, which may have contributed to the replicability of the original findings using a prospective design.
Given that Hanson and Harris (2000) matched on static sex-specific factors, we expected that this matching would amplify general risk factors in the case–control design whereas sex-specific risk factors would be more prominent in the prospective cohort design. Unsurprisingly, the factors that were used in matching for the case–control (e.g., victim type—offended against boy; sexual offending history, etc.) showed stronger effects in the prospective design than the case–control design. Contrary to expectations, however, sex-specific risk factors were otherwise equally distributed between case–control and prospective designs; instead, general risk factors were more likely to be identified in the prospective cohort design. Our assumptions concerning the effect of matching on sex-specific factors in the case–control study were, in this case, completely wrong!
These findings highlight a key criticism of case–control studies; the effect that matching on specific risk factors will have on the effect of other correlated factors, is difficult, if not impossible to discern a priori (Meehl, 1970). The goal of matching on static, sex-specific risk factors in the original Hanson and Harris (2000) study was to amplify the effects of dynamic risk factors given that the identification of dynamic factors was the main goal of the study. However, the two main constructs predicting sexual recidivism (sex-crime specific and general antisociality) are likely correlated and share common variance (Babchishin et al., 2012). The unanticipated consequence of this matching may have been to suppress both sex-crime-specific and general antisociality factors in the case–control study. Despite this limitation, the pattern of results for individual risk factors was surprisingly consistent across both designs. Although there was sometimes a change in the strength of the effect size, the direction of the relationship between the risk factor and sexual recidivism was generally consistent across both research designs, which provides further support for the use of case–control studies in the identification of risk factors for sexual recidivism.
Risk Factors Overall
The risk factors that differentiated between sexual recidivists and non-recidivists across both study designs were largely consistent with the broader literature on risk and protective factors for sexual recidivism (Hanson & Bussière, 1998; Hanson & Morton-Bourgon, 2004; Helmus & Thornton, 2015). When examining static, sex-specific risk factors, for example, indicators of sexual deviancy, victim type (e.g., boys), and sexual offending history were all related to sexual recidivism. When examining static, general risk factors, indicators of antisocial personality (e.g., psychopathy scores, diagnosis of antisocial personality disorder) were good predictors of sexual recidivism in both designs. Somewhat surprisingly, the relationship between the VRAG and sexual recidivism, which was among the largest in the case–control design, became even larger in the prospective design, despite the scale not being created to predict sexual recidivism. The VRAG and its updated version (VRAG-R; Rice et al., 2013) have shown significant discrimination between sexual recidivists and nonrecidivists in past studies (e.g., .62 and .63, respectively; Olver & Sewall, 2018); however, the effect size in the prospective cohort design was larger (.76 compared to .62 in the Olver and Sewall study [2018]). Although the VRAG may be related to sexual recidivism, the recommendation, based on the results of larger meta-analyses, is to utilize tools that were designed to predict sexual recidivism (e.g., Hanson & Morton-Bourgon, 2009).
Sex-specific dynamic factors that emerged in both designs included attitudes supportive of sexual offending (i.e., rape, child molestation, and sexual entitlement), victim access, and being on antiandrogen medication. Dynamic general risk factors that significantly predicted sexual recidivism were consistent with the Central Eight risk/need factors (Bonta & Andrews, 2024) and included substance use, negative social influences, and antisocial lifestyle.
Somewhat inconsistent with the broader literature on sexual recidivism risk was the positive and significant relationship between childhood sexual abuse and sexual recidivism across both research designs. Although the rates of childhood sexual abuse are relatively high among men who have committed a sexual offense compared to the general population (Whitaker et al., 2008; evidence from case–control studies), previous prospective studies have not found a consistent association between a history of sexual abuse and subsequent recidivism. Meta-analyses of adult samples have found no association between childhood sexual abuse and sexual recidivism (e.g., Hanson & Bussière, 1998), whereas at least one meta-analysis of youth samples has found a significant association between childhood sexual abuse and sexual recidivism (Mallie et al., 2010). Further examination of this potential relationship is required and may have implications for the type of interventions adopted for individuals with abuse histories (e.g., trauma-informed care; Dalsklev et al., 2021).
Other findings may have implications for the scoring of specific tools designed to predict sexual recidivism. Consistent with other studies (Hanson & Thornton, 2003), the number of sexual index offenses was not significantly related to sexual recidivism, whereas a history of past sexual offenses was. This pattern of results supports the decision from scale developers to exclude considerations of index offenses and, instead, focus on the density of past sexual offending (e.g., Static-99/R [Hanson & Thornton, 2000; Helmus et al., 2012] and Static-2002R [Hanson & Thornton, 2003; Helmus et al., 2012]). Another finding with potential scoring implications pertained to the interaction between the sex and age of the victim. Although having male victims has long been accepted as an established risk factor for sexual recidivism, the effect was only evident for boy male victims; adult male victims showed the opposite effect indicating lower risk. If this pattern holds, it will have consequences for the scoring of “male victim” items in sexual recidivism risk tools, such as Static-99R, Static-2002R, Risk Matrix-2000/S (Thornton et al., 2003), and the Minnesota Sex Offender Screening Tool-4 (MnSOST-4; Duwe, 2019), all of which have an item for male victims that does not differentiate between boys and men.
Implications
Meehl (1970) doubted the utility of case–control designs: “To put it most extremely, the so-called ex post facto ‘experiment’ is fundamentally defective for many, perhaps most, of the theoretically significant purposes to which it has been put” (p. 374). Although case–control studies introduce a certain amount of uncertainty, especially concerning the effects of matching, the results of the current study do not support Meehl’s dire claim. On the contrary, if risk factors were identified in the case–control design, they tended to also emerge in the prospective design; for no variable was there a reversal of a risk factor between the two designs (i.e., statistically significant in different directions). Differences between the designs were usually small and mostly nonsignificant indicating that case–control studies can and should be used to identify risk factors, especially for outcomes where prospective designs may be impractical, time-consuming, and expensive.
Of course, not all case–control studies are created equal; the quality of the study design will matter for the validity of the results. A primary concern with the use of case–control studies is that the ease with which they can be implemented may lead to a lack of rigor compared to how they should be conducted (Rothman, 1986; Schulz & Grimes, 2002). There are several recommendations that can be made to strengthen the case–control study design. The most important consideration is ensuring the equivalency between the two groups (Schulz & Grimes, 2002; Song & Chung, 2010). To control for systematic differences, both groups should come from the same population, location, and setting. In other words, the control group should be representative of the individuals who have experienced the outcome of interest (Schulz & Grimes, 2002).
Once cases are preselected, additional individual-level matching variables may be necessary to control for factors that could influence the results but that have no substantive connection to the research questions (e.g., age, sex, any factor that is considered a confound; Song & Chung, 2010). Caution should be taken and a strong rationale provided for matching on these factors given that the overall effect that matching will have on the comparisons between the groups cannot be fully known. It should also be noted that the limitations associated with matching do not simply disappear in prospective cohort designs. Preselection still occurs, and the effects of this preselection on the results are also uncertain (Mann, 2003).
Another consideration in case–control designs is the quality and the sources of information used to identify and measure the risk factors and outcome of interest. Whenever possible, multiple sources of information should be used, and every attempt should be made to establish that risk factors occurred before the outcome of interest. In the case of Hanson and Harris (2000), information was gathered from extensive file review and through interviews with probation/parole officers. Furthermore, additional steps were taken to establish temporal ordering of the factors and the sexual recidivism outcome. For example, during interviews, parole officers were specifically asked to think about their clients 6 months prior to the current interview and to compare any changes in the factors. Case–control studies that are carefully planned are those most likely to provide reliable and valid results (Schulz & Grimes, 2002).
In the current study, some of the observed differences between the effect sizes for the case–control and prospective studies would have been due to differences in study design and some would have been due to random error. Although prospective designs generally have more credibility than case–control designs, we do not recommend that evidence-based practitioners privilege findings from one or another of the research designs in the current study because most of the apparent differences are likely attributable to random error. Instead, we recommend that practitioners look to meta-analytic reviews for the best evidence of risk and protective factors. The current findings do suggest, however, that case–control studies should be included in systematic and meta-analytic reviews, and that reviewers should examine study design as a moderator variable.
The current findings contrast with those of Weisburd et al. (2001) who found meaningfully different results from better compared to worse studies. Specifically, they found that evaluations of criminal justice programs that used randomized assignment found smaller effects, and were more likely to identify adverse effects, than observational studies. Although there is no single “scientific method,” some research designs assume canonical importance in certain research communities (one meaning of Kuhn’s [1970] paradigms). For example, Kraemer and colleagues (1997) restrict the use of the term “risk factor” to variables for which temporal precedence has been demonstrated, and explicitly rule out case–control studies as the basis of valid inferences concerning potentially causal factors. We think they have gone too far. In particular, evidence-based practitioners who inferred psychologically meaningful risk factors from the Hanson and Harris (2000) case–control study would have stood on solid ground. The difficulty, of course, is that all research is fallible. Scientific inquiry is a problem-solving task that cannot be reduced to any particular sets of research designs (Hanson, 1958; Wolman, 1971). The current findings raise the confidence that should be accorded to case–control designs, but does not diminish the importance of truly prospective studies—as well as other modes of inquiry. Scientific conclusions are strengthened by replication within a progressing, orderly program of research (Lakatos, 1970).
Limitations
Although novelty is a virtue, it is also a limitation. As far as we are aware, this is the only study of recidivism risk factors that has directly compared the results of case–control and prospective studies. Consequently, any conclusions must be tempered because they are only based on this one sample. Whether the findings would generalize to other comparisons between case–control studies and prospective cohort designs needs to be examined. One of our main recommendations is that researchers with access to case–control samples should, when possible, plan for a prospective follow-up of these same individuals. Another limitation to consider is the sample size for the prospective cohort design, which was relatively small because only the nonrecidivists from the original case–control study were included. Power to detect differences in effect sizes between the two designs was likely attenuated. Despite these limitations, the quality of the information from the original case–control study and the exceptionally long follow-up time of the prospective cohort study, provide confidence in the validity of the risk factors that were identified across both designs.
Conclusion
Overall, case–control studies can and should be used when determining risk factors for low frequency outcomes, such as sexual recidivism. The quality of the case–control study matters, and every effort should be made to collect high quality information from multiple sources and to establish the temporal order of the factors and the outcome of interest. Although matching may be necessary in case–control designs, researchers should recognize that it is difficult to anticipate the effect that matching will have on the results. Replication studies across diverse samples are therefore needed and, when possible, samples in case–control studies should be followed-up prospectively to compare results across the different methodological designs.
Supplemental Material
sj-docx-1-cjb-10.1177_00938548241291155 – Supplemental material for Where Should We Intervene, 20 Years Later? Case–Control and Prospective Cohort Designs Provide Similar Answers
Supplemental material, sj-docx-1-cjb-10.1177_00938548241291155 for Where Should We Intervene, 20 Years Later? Case–Control and Prospective Cohort Designs Provide Similar Answers by Julie Blais, R. Karl Hanson and Andrew J. R. Harris in Criminal Justice and Behavior
Footnotes
Data Availability Statement
The current study was conducted under a data sharing agreement with Public Safety Canada.
Supplemental Material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
