Sage Journals: Discover world-class research

Abstract

Multidimensional item response theory (MIRT) was used to study the construct validity of the Static-2002R, an actuarial scale for the assessment of reoffending among adult men who sexually offended. Using a sample of 2,569 individuals with a history of sexual crime, exploratory factor analysis (EFA) extracted three factors: Persistence/Paraphilia, General Criminality, and Youthful Stranger Aggression. MIRT confirmed the factor structure identified in the EFA model and provided item-level data on discrimination and difficulty. All Static-2002R items showed moderate to very high discrimination and covered a wide range of risk levels (i.e., difficulty). MIRT analyses attested to the construct validity of the scale, as no items were identified as problematic and the resulting factor structure was consistent with that of earlier studies. Considering the stability of results pertaining to the factor structure of the Static-2002R and the advantages of dimensional scoring, we recommend the integration of dimensional scores in the scale.

Keywords

item response theory (IRT)multidimensional Static-2002R risk assessment sexual offending factor analysis

Actuarial scales for the assessment of risk for criminal reoffending have typically prioritized predictive accuracy over construct validity. Following the documentation of the pitfalls of unstructured clinical judgment in the 1950s and after (Dawes et al., 1989; Grove et al., 2000; Hanson & Morton-Bourgon, 2009; Meehl, 1954), the fields of correctional psychology and criminology turned to actuarial assessment to reliably measure the level of risk of offending populations (Harris & Hanson, 2010). Actuarial scales rely on a mechanical combination of predictors and are considered “atheoretical” because they often contain a list of risk-relevant items not organized by dimensions or, alternatively, comprise conceptual dimensions that were not empirically validated by psychometric analyses (Andrews & Bonta, 2010; Bonta, 1996; Brouillette-Alarie & Lussier, 2018).

Among risk tools developed for individuals adjudicated for a sexual offense, the Static-99R (Hanson & Thornton, 2000) and Static-2002R (Hanson & Thornton, 2003; Helmus et al., 2012) are the most used internationally and inform a number of legal decisions made with this population (Archer et al., 2006; Bourgon et al., 2018; Jackson & Hess, 2007; Kelley et al., 2020; Neal & Grisso, 2014). The Static-99R comprises 10 commonly available items related to demographic characteristics, criminal history, and victim choice. The Static-2002R improves coding consistency and conceptual clarity compared with the Static-99R by organizing its 14 items into five subscales: Age at Release, Persistence of Sexual Offending, Sexual Deviance, Relationship to Victims, and General Criminality. The General Criminality subscale and age item of the Static-2002R can be combined to score the Brief Assessment of Recidivism Risk–2002R (BARR-2002R), a scale that predicts violent and general recidivism better than the Static-99R and Static-2002R and screens for antisocial tendencies among adult men who sexually offended (Babchishin et al., 2016). Both the Static-99R and Static-2002R, as their name implies, are static risk tools, which means that they only comprise static risk factors. Static risk factors are considered easier to score than dynamic factors because the data on which they rely are easily accessible in correctional files and are seen as devoid of subjectivity (Bonta, 2002; Gendreau et al., 1996). They have, however, a limited ability to inform treatment, determine parole conditions, and assess positive or negative changes in offenders’ lives (Andrews & Bonta, 2010; Bonta, 1996, 2002; Douglas & Skeem, 2005; Gendreau et al., 1996).

In the last 20 years, a number of studies have sought to identify latent constructs in static risk scales for individuals with a history of sexual crime by using latent variable models, such as factor analysis (e.g., Roberts et al., 2002; for a review, see Brouillette-Alarie et al., 2016). Clarifying the construct validity of static risk tools has many potential advantages for the field. First, it offers insight into why certain scales predict certain outcomes better than others, as this is dependent on the constructs they assess and how each construct is weighted in these scales. This, in turn, can help evaluators integrate the potentially conflicting results of similar risk scales when multiple tools are available for the same population (Barbaree et al., 2006). Second, understanding the constructs implicit in risk tools can improve predictive accuracy. Specifically, when the constructs are known, it is possible to improve the reliability and validity of their assessment using standard psychometric methods and, therefore, improve the predictive accuracy of the scale (Brouillette-Alarie et al., 2016). Finally, construct-level approaches maximize the clinical relevance of existing scales by focusing on psychological dimensions, facilitating the identification of the “source” of the risk. This enables static risk tools to give insight into the assessment of potentially changeable constructs; although the items found in these scales are not modifiable through intervention, the latent psychological constructs they represent are amenable to change (Babchishin & Hanson, 2020; Beech & Ward, 2004; Brouillette-Alarie et al., 2018). Evaluations that address psychological features are generally better received by clinicians, practitioners, and decision makers than those that only delineate the level of risk (Mann et al., 2010).

Multiple studies investigated the latent constructs of the Static-2002(R). Boughner (2010), Ennis et al. (2011), and Langton et al. (2007) studied the factor structure of the original Static-2002; Jung et al. (2017) studied the Static-2002R; and Brouillette-Alarie et al. (2016) and Rohrer (2019) studied nonredundant items from both the Static-99R and Static-2002R. These studies identified at least two constructs consistent with theoretical models of sexual recidivism (e.g., Doren, 2004): sexual criminality and general criminality. Sexual criminality was typically defined by items related to the number of sexual offenses and indicators of paraphilic sexuality (e.g., child victims, noncontact sexual offenses), while general criminality was defined by items that reflect the magnitude, violence, and diversity of criminal careers (e.g., number of prior sentencing occasions, prior nonsexual violence). In addition to these two constructs, others were found in Static-99R and Static-2002R items. Although there is no clear consensus on the number and nature of additional constructs, they seem to be mostly related to age, relationship history, violence in the index offense, and the sexual abuse of unrelated and stranger victims.

Convergent validity analyses have also been carried out to clarify the psychological meaning of these constructs (Brouillette-Alarie et al., 2018; Brouillette-Alarie & Hanson, 2015). The sexual criminality construct was correlated with atypical sexual interests/paraphilias (especially pedophilia), emotional identification with children, and grooming offending strategies. The General Criminality construct was associated with numerous features of antisocial personality disorder (American Psychiatric Association, 2013) and psychopathy (Hare, 2003). The third construct, “Youthful Stranger Aggression,” was correlated with offense seriousness and sexual sadism and was therefore interpreted as a motivation to harm victims. Sexual criminality was more prevalent in individuals having sexually abused children, while the other two, General Criminality and Youthful Stranger Aggression, were more common in individuals who have sexually assaulted women. All three constructs predicted sexual recidivism, but only General Criminality and Youthful Stranger Aggression predicted nonsexual types of recidivism (Brouillette-Alarie et al., 2016, 2018).

Item Response Theory (IRT) as a Tool to Study Construct Validity

Although the aforementioned studies offered valuable insight into the latent constructs of assessment tools used to predict reoffending among men with sexual offenses, they were anchored in classical test theory (CTT) models, that is, exploratory factor analysis and confirmatory factor analysis (EFA/CFA). Scholars from the field of education have long advocated for the use of IRT models, as they are less sensitive than CTT models to circular dependency, that is, dependence on the overall performance of the validation sample (de Ayala, 2009). IRT was introduced in the 1950s and 1960s by Lord (1953) and Rasch (1960) to better assess item discrimination and difficulty, and create sample-free measures (Osteen, 2010). IRT models assume that an examined latent trait, denoted theta (θ), is unidimensional and analogous to the “true score” in CTT (de Ayala, 2009). Thus, a response to an item is influenced by both the qualities of the participant and the properties of the item. The graph of the relation between the ability score of a person and the probability that this person will endorse an item is called the item characteristic curve. IRT models, unlike CTT, do not assume a linear relationship among these, so that the item characteristic curve takes the form of an S-shaped curve (Reid et al., 2007). This feature is relevant for criminological actuarial scales because most of their items are dichotomous and, thus, better fitted by an S-shaped curve than a linear relationship (akin to how logistic regression better fits dichotomous outcomes than linear regression).

Examining the structure of a scale via IRT requires that the scrutinized construct be unidimensional (de Ayala, 2009). That is, IRT assumes that item covariations arise from a single underlying dimension, and its violation leads to unstable IRT model parameter estimates and weak model fit (de Ayala, 2009; Reise et al., 2005, 2011, 2015). This is problematic because risk scales for individuals with a history of sexual crime are multidimensional. To circumvent the unidimensionality assumption, multidimensional item response theory (MIRT) was introduced in the 1970s to analyze scales that are multidimensional (Reckase, 1972). MIRT is a probabilistic model designed to measure an individual’s likelihood of responding to a specific item based on item parameters and multiple latent traits (Reckase, 2009). This technique allows us to model multiple latent traits simultaneously without raising measurement error and model instability (Reckase, 2009).

Compared with factor analysis, MIRT has numerous advantages: It overcomes the item–person confound of CTT models, allows more accurate treatment of the standard error of measurement, enables the use of dichotomous and ordinal data, and provides indices on how well each item performs (Osteen, 2010). MIRT models are tools that can be used in conjunction with factor analysis models to better triangulate evidence via the comparison of results of nonredundant analytical procedures. Note that unlike traditional EFA and CFA, item factor analysis (see Wirth & Edwards, 2007) should not be considered as CTT and shares many advantages inherent to the use of MIRT models.

To our knowledge, only one study used MIRT to explore the latent constructs of risk scales for individuals with a history of sexual crime. Allen and Pflugradt (2014) explored the factor structure of the Static-99 using Normal Ogive Harmonic Analysis Robust Method (Fraser & McDonald, 1988), a nonlinear, MIRT-based factor analysis model. They found three constructs similar to those found by Brouillette-Alarie et al. (2016). Allen and Pflugradt’s (2014) sample size (N = 451), however, was low. Sample sizes of 1,000 (Jiang et al., 2016) or 2,000 (Forero et al., 2009) are recommended for MIRT, as small sample sizes can lead to unstable model fits and unreliable item parameters. In addition, their analysis was conducted on the Static-99 rather than the Static-2002R, which is the most up-to-date version of the scale.

Current Study

We applied an MIRT model to investigate the latent structure of the Static-2002R, a risk tool for adult men who sexually offended, and compared results to what is found in traditional factor analytic models. Because sample peculiarities (incarceration vs. community, high vs. low risk, etc.) are of foremost importance in risk tool validation and use, we employed a sample of 2,569 participants that maximized the chances of being representative of multiple correctional settings while satisfying the sample size requirements of MIRT. In turn, MIRT models maximized the chances that our results were sample-agnostic and, thus, generalizable to diverse correctional settings. In the following section, we report how we determined our sample size, all data exclusions, all manipulations, and all measures in the study.

Method

Sample

This study used datasets from a project involving the re-norming of the Static-99 and Static-2002 (for additional information on the samples and data preparation, see Helmus, 2009). These datasets constitute a nonexhaustive collection of contemporary validation studies of the Static-99, with some having Static-2002 data. These samples should be broadly representative of typical settings in which the Static-99R and Static-2002R are used. To be included in the current study, samples from the re-norming project were required to have information on all Static-2002R items. In total, seven samples were available (N = 2,569).

Descriptive information of the included datasets can be found in Table 1. Four samples were from Canada; the remaining samples were from Denmark (k = 1), the United Kingdom (k = 1), and the United States (k = 1). Three samples consisted of routine correctional samples (i.e., relatively unselected, expected to be fairly representative of the population of individuals convicted for a sexual offense), either from prisons (Bigras, 2007; Boer, 2003) or community supervision (Hanson et al., 2007). One sample was a treatment sample (Harkins & Beech, 2007), whereas the rest were more specialized, including a sample evaluated for civil commitment (Knight & Thornton, 2007), receiving specialized forensic psychiatric evaluations (Bengtson, 2008), or detained in federal prison until the expiration of their sentence (Haag, 2005). In the full sample, the average Static-99R score was 3.1 (SD = 2.6) and the average Static-2002R score was 4.3 (SD = 2.7). For reference, a score of 3 on the Static-99R corresponds to the 66th percentile, while a score of 4 on the Static-2002R corresponds to the 64th percentile (Hanson et al., 2012). The average age at release was 39 years old (SD = 12).

Table 1

Descriptive Statistics of Samples.

Variables/studies	Bengtson (2008)	Bigras (2007)	Boer (2003)	Haag (2005)	Hanson et al. (2007)	Harkins & Beech (2007)	Knight & Thornton (2007)	Total sample
n	308	425	296	190	702	182	466	2,569
Type of sample	Forensic psychiatric evaluations	Routine CSC	Routine CSC	Detained until end of sentence	Routine community supervision	Prison and community treatment	Civil commitment	NA
Country	Denmark	Canada	Canada	Canada	Canada	The United Kingdom	The United States	NA
Release period	1978–1995	1995–2004	1976–1994	1995	2001–2005	1994–1998	1957–1986	1957–2005
Median year of release	1986	1999	1990	1995	2002	1995	1970	1995
Recidivism criteria	Charges	Charges	Convictions	Convictions	Charges	Convictions	Charges	NA
Follow-up in years, M (SD)	16.2 (4.3)	4.6 (1.9)	13.3 (2.1)	7.0 (0.0)	3.5 (1.0)	10.4 (1.1)	8.6 (2.6)	8.0 (4.9)
Sexual recidivism rate	34.1%	5.9%	8.8%	24.7%	8.1%	13.7%	28.1%	16.2%
Nonsexual violent recidivism rate	23.4%	9.9%	17.6%	NA	8.8%	10.4%	24.9%	15.3%
Any recidivism rate	64.6%	23.3%	51.7%	NA	27.9%	35.2%	54.9%	40.2%
Age at release in years, M (SD)	32.5 (10.4)	42.7 (11.9)	41.2 (12.5)	36.7 (9.7)	41.6 (13.2)	43.5 (12.7)	36.1 (11.4)	39.4 (12.5)
Static-99R, M (SD)	3.8 (2.4)	2.1 (2.3)	2.8 (2.8)	4.1 (2.2)	2.4 (2.4)	2.1 (2.6)	4.6 (2.4)	3.1 (2.6)
Static-2002R, M (SD)	4.6 (2.4)	3.4 (2.5)	3.9 (2.7)	5.7 (2.3)	3.5 (2.5)	3.6 (2.8)	6.1 (2.5)	4.3 (2.7)

Note. CSC = Correctional Service of Canada.

Measures

Static-2002R

Similar to the Static-99R, the Static-2002R (Hanson & Thornton, 2003; Helmus et al., 2012) is an empirical actuarial risk assessment tool for adult males who sexually offended (see also https://saarna.org/). It has 14 items grouped into five main subscales: Age at Release, Persistence of Sexual Offending, Sexual Deviance, Relationship to Victims, and General Criminality. The total score (ranging from −2 to 13) can be used to place offenders in one of five risk categories: Level I—very low risk (scores of −2 to −1), Level II—below average risk (scores of 0 to 1), Level III—average risk (scores of 2 to 4), Level IVa—above average risk (scores of 5 to 6), and Level IVb—well above average risk (scores of 7 or higher) (Hanson et al., 2017). Static-2002R items are identical to Static-2002 items, with the exception of updated age weights.

The Static-2002 was originally developed to improve coding consistency, conceptual clarity, and predictive accuracy compared with the Static-99. Although the Static-2002 was more accurate than the Static-99 (Hanson et al., 2010), revising Static-99 age weights increased its predictive accuracy such that there was no longer a meaningful difference between the Static-99R and the Static-2002R (Babchishin et al., 2012). Both scales, however, contribute incrementally to the prediction of sexual recidivism (Babchishin et al., 2012; Lehmann et al., 2013).

Item Preparation

Initial item preparation was required before inputting items from the Static-2002R in statistical analyses. First, we combined items that were part of the same continuum to avoid potential collinearity issues. Specifically, the “any unrelated victim” and “any stranger victim” items were combined into one item on a 3-point scale: 0 = no unrelated or stranger victim, 1 = at least one unrelated victim and no strangers, and 2 = at least one stranger victim. In addition, the “any prior involvement with the criminal justice system” and “prior sentencing occasions for anything” items were combined into one item on a 4-point scale: 0 = no prior involvement with the criminal justice system; 1 = at least one prior charge, but less than three prior sentencing occasions; 2 = three to 13 prior sentencing occasions; and 3 = 14+ prior sentencing occasions. Next, we examined potential collinearity (r > .80) between items with tetrachoric/polychoric correlations (appropriate for dichotomous and ordinal items; Flora et al., 2012) and found a high correlation between the “high rate of sexual offending” and “prior sentencing occasions for sexual offenses” items (r = .95). Thus, the “high rate of sexual offending” item was dropped in this study. In sum, 11 items were entered in the factor analysis and MIRT model (see Appendix A for the full list of items).

Analytical Strategy

Factor Analysis and Dimensionality

To confirm whether the Static-2002R was unidimensional or multidimensional in our sample, an EFA of Static-2002R items was performed using the R statistical program and “mirt” package (Chalmers, 2012). EFA guidelines suggested by Brouillette-Alarie et al. (2016) were followed after adapting them to the mirt package, which meant (a) using quasi-polychoric correlation matrices; (b) extracting factors using the Metropolis–Hastings Robbins–Monro algorithm (Cai, 2010); (c) rotating factors using an oblique rotation (geomin); (d) assessing factor structure fit with the root mean square error of approximation (RMSEA, should be <.06), comparative fit index (CFI, should be >.95), and Tucker–Lewis index (TLI, should be >.95) (Hu & Bentler, 1999); and (e) determining factor inclusion with factor loadings of at least .40 (Stevens, 1992). In addition to the three fit indices outlined above, the Akaike’s (AIC; Akaike, 1973) and Bayesian (BIC; Schwarz, 1978) information criteria were used. For both criteria, the lower value indicates the better-fitting model. Quasi-polychoric correlations between Static-2002R items were computed using the “psych” package (Revelle, 2021).

To verify if a single “risk” construct encompassed the three Static-2002R dimensions, a bifactor model (Reise, 2012) was run in the R mirt package. The AIC and BIC of regular factor solutions were compared with those of the bifactor model to see which structure better represented the data. Only these two fit indices were used because when comparing a traditional model with a bifactor model, the CFI and RMSEA will always favor the bifactor model as it is less parsimonious. In contrast, the AIC and BIC consider the number of parameters in the model (i.e., provides a penalty for each additional parameter), allowing to examine the improvement in fit for models with varying parameters, such as the models examined in the current study.

MIRT

MIRT was applied to Static-2002R items following the factor structure identified in the EFA. Two-parameter MIRT was performed, which allowed for items to vary not only in their difficulty on the latent trait but also in their capacity to discriminate between persons located at different points on the continuum (Reid et al., 2007). Our sample size (N = 2,569) was considered sufficient to provide accurate parameter estimates (Forero et al., 2009; Jiang et al., 2016). Reckase’s (2009) equations (i.e., two-parameter logistics model) were followed to estimate the difficulty and discrimination parameters in MIRT. The logistic model was used:

P (χ_{i j} = C_{j} | θ_{i}, α_{j}, d_{i}) = \frac{\sum_{k = 1}^{m} α_{j k} θ_{i k} + d_{j}}{1 + e^{(\sum_{k = 1}^{m} α_{j k} θ_{i k} + d_{j})}} .

(1)

Where $i = 1, \dots, N$ represent the distinct participants; $j = 1, \dots, n$ , the test items; $k = 1, \dots, m$ , step of the graded response item, and suppose that there are m latent factors $θ_{i} = (θ_{i 1}, \dots, θ_{i m})$ with associated item slopes $α_{j} = (α_{1}, \dots, α_{m})$ . There are C_j unique categories for item j, with intercepts $d_{j} = (d_{1}, \dots, d_{(C_{j} - 1)})$ .

As seen in Equation 1, slope parameters (α; discrimination) are related to the slope of the surface, which indicates the rate that the probability of a correct response changes from point to point in the θ space. The d parameter (intercept) is not a difficulty parameter as seen in unidimensional IRT; it instead indicates the probability of a correct response when all the latent traits are at zero. The negative of the intercept value (–d) divided by the multidimensional discrimination parameter (A) gives the relative item difficulty related to the corresponding coordinate dimension (see Equation 3).

The multidimensional discrimination parameter, denoted alpha (A; see Equation 2), is the degree to which the item has the power to discriminate between individuals who have or do not have the corresponding θ level of latent traits (Reid et al., 2007). According to Baker (2001), discrimination values 0.01 to 0.34 are very low, 0.35 to 0.64 are low, 0.65 to 1.34 are moderate, 1.35 to 1.69 are high, and greater than 1.7 are very high.

A_{j} = \sqrt{Σ_{k = 1}^{m} α_{j k}^{2}} .

(2)

The multidimensional difficulty parameter, denoted beta (B; see Equation 3), is the location of the inflection point on the item characteristic curve and usually varies from −3 to 3. The value of B has the same interpretation as the b parameter for the unidimensional IRT model. Specifically, items located above 0 are considered difficult, whereas values below 0 indicate easier items (de Ayala, 2009):

B_{j l} = \frac{- d_{j l}}{\sqrt{\sum_{k = 1}^{m} α_{j k}^{2}}} = - d_{j} / A_{j},

(3)

where B_j_l is the step difficulty and d_j_l is the step intercept for the lth step of the graded response item.

Results

EFA

Inter-item correlations ranged from −.27 to .80 (Mdn absolute r = .25; see Appendix B). The three-dimensional model was the best-fitting one, as its AIC and BIC values were the lowest among all models (see Table 2). In addition, the goodness-of-fit of the three-dimensional model was good, with an RMSEA of .046, a CFI of .987, and a TLI of .965. All the items loaded on at least one factor (factor loadings >.40) and only one item (juvenile and adult sexual offenses) loaded on multiple factors. Furthermore, the ratio between the eigenvalues of the first and second factors was not sufficiently high to conclude that the scale was “unidimensional enough” (Bertrand & Blais, 2004).

Table 2.

Summary of Factor Structure Fit and Three-Dimensional Factor Loadings (N = 2,569).

Fit indices	One dimension	Two dimensions	Three dimensions	Four dimensions
AIC	36,598.70	35,524.97	34,910.27	34,987.01
BIC	36,762.53	35,747.32	35,185.28	35,308.83
RMSEA	.141	.090	.046	—
CFI	.748	.925	.987	—
TLI	.675	.868	.965	—
	Static-2002R items	Factor 1	Factor 2	Factor 3
Item no.	Static-2002R items	Persistence/Paraphilia	Youthful Stranger Aggression	General Criminality
1	Age at release	−.231	.920	−.053
2	Prior sexual offenses	.728	−.009	.399
3	Juvenile and adult sexual offenses	.535	.451	.197
4	Noncontact sexual offenses	.565	−.003	−.044
5	Male victim	.604	−.099	−.194
6	Young, unrelated victims	.783	.015	−.225
7	Unrelated and stranger victim	.357	.443	.044
8	Prior arrest/sentencing occasions	.032	−.054	.952
9	Community supervision violation	−.035	.016	.836
10	Years free prior to index offense	.119	.336	.690
11	Prior nonsexual violence	−.235	− .025	.859

Note. AIC = Akaike information criterion; BIC = Bayesian information criterion; RMSEA = root mean square error of approximation; CFI = comparative fit index; TLI = Tucker–Lewis index. Factor loadings equal or superior to .40 are in bold.

Similar to Brouillette-Alarie et al. (2016), three factors were found. Factor 1, Persistence/Paraphilia, comprised, in decreasing order of loading: (a) young, unrelated victims; (b) prior sexual offenses; (c) male victim; (d) noncontact sexual offenses; and (e) juvenile and adult sexual offenses. Factor 2, Youthful Stranger Aggression, comprised (a) age at release, (b) juvenile and adult sexual offenses, and (c) unrelated and stranger victim. Finally, Factor 3, General Criminality, comprised (a) prior arrest/sentencing occasions, (b) prior nonsexual violence, (c) community supervision violation, and (d) years free prior to index offense. Persistence/Paraphilia had a correlation of .09 with Youthful Stranger Aggression and of .29 with General Criminality. Youthful Stranger Aggression had a correlation of .25 with General Criminality.

The bifactor model indicated that static recidivism risk in individuals with a history of sexual crime was constituted of three distinct subfactors without sharing a general factor. The fit indices of the three-dimensional bifactor model (AIC = 35,359.95, BIC = 35,588.15) were worse than those of the regular three-factor model (AIC = 34,910.27, BIC = 35,185.28). In addition, the discrimination values of the bifactor model were substantially different from those of a unidimensional model comprising the same items, indicating that multidimensionality needed to be taken into account. Therefore, the Static-2002R appeared to be fully multidimensional in our sample and was not suitable for regular IRT.

MIRT

Two-parameter MIRT was applied to Static-2002R items following the factor structure identified in the EFA. First, MIRT slopes (α; discrimination) indicated that all Static-2002R items, except for juvenile and adult sexual offenses, performed more effectively (i.e., changed more quickly in probability) in the factor in which EFA classified them (see Table 3). For juvenile and adult sexual offenses, EFA did indicate cross-loading on Persistence/Paraphilia and Youthful Stranger Aggression. Therefore, it was not entirely surprising to see the juvenile and adult sexual offenses item have an important slope in Youthful Stranger Aggression.

Table 3

Slopes (α; Discrimination) From MIRT.

	Static-2002R items	Factor 1	Factor 2	Factor 3
Item no.	Static-2002R items	Persistence/Paraphilia	Youthful Stranger Aggression	General Criminality
1	Age at release	−3.241	2.279	0.605
2	Prior sexual offenses	2.648	2.598	1.718
3	Juvenile and adult sexual offenses	0.415	2.260	0.859
4	Noncontact sexual offenses	0.816	0.782	−0.078
5	Male victim	0.966	0.641	−0.425
6	Young, unrelated victims	1.320	1.314	−0.531
7	Unrelated and stranger victim	−0.024	1.276	0.296
8	Prior arrest/sentencing occasions	1.228	1.001	4.904
9	Community supervision violation	0.083	1.006	3.026
10	Years free prior to index offense	<0.001	1.734	2.629
11	Prior nonsexual violence	<0.001	<0.001	2.421

Note. α values of items that had factor loadings equal or superior to .40 in Table 2 are in bold. MIRT = multidimensional item response theory.

Second, as seen in Table 4, all Static-2002R items showed multidimensional discrimination values (A; the degree to which the item has the power to discriminate between individuals who have or do not have the corresponding θ level of latent traits) in the moderate to very high range (1.13–5.15). Only noncontact sexual offenses, male victim, and unrelated and stranger victim showed moderate rather than very high discrimination. All General Criminality items were, therefore, very highly discriminant.

Table 4

Multidimensional Discrimination (A) and Multidimensional Difficulty (B) Parameters From MIRT.

	Static-2002R items	Multidimensional discrimination	Multidimensional difficulty
Item no.	Static-2002R items	A	B ₁	B ₂	B ₃
1	Age at release	4.00	−1.58	−0.20	0.26
2	Prior sexual offenses	4.09	0.49	1.28	—
3	Juvenile and adult sexual offenses	2.45	2.01	—	—
4	Noncontact sexual offenses	1.13	2.04	—	—
5	Male victim	1.23	1.43	—	—
6	Young, unrelated victims	1.94	1.32	—	—
7	Unrelated and stranger victim	1.31	−0.92	0.68	—
8	Prior arrest/sentencing occasions	5.15	−0.65	0.26	1.75
9	Community supervision violation	3.19	0.22	—	—
10	Years free prior to index offense	3.15	0.32	—	—
11	Prior nonsexual violence	2.42	0.61	—	—

Note. MIRT = multidimensional item response theory.

Third, results in Table 4 showed that all Static-2002R items except age at release, unrelated and stranger victim, and prior arrest/sentencing occasions were difficult (multidimensional difficulty [B] > 0), indicating they were reflective of relatively uncommon and risk generating characteristics. Items that were “less difficult” were ordinal items for which the first ranks had difficulty values under 0. However, their later ranks had difficulties over 0, suggesting they captured a wide range of risk. Therefore, the age at release, unrelated and stranger victim, and prior arrest/sentencing occasions items could be more common in offenders with a lower sexual recidivism risk, in contrast with very difficult items such as noncontact sexual offenses and juvenile and adult sexual offenses, which would be very rare in men with lower sexual recidivism risk.

Discussion

The objective of the current article was to apply an MIRT model to improve our understanding of tools designed to assess the reoffending risk of men adjudicated for a sexual offense, by examining the latent structure of the Static-2002R. The Static-99R and Static-2002R are considered by many practitioners as the gold standards in assessing the baseline, static risk of sexual reoffending (Archer et al., 2006; Bourgon et al., 2018; Jackson & Hess, 2007; Kelley et al., 2020; Neal & Grisso, 2014). Therefore, understanding their latent structure is not only useful for sexual offending theorization but also paramount for sexual offender risk assessment practice. Psychometric analyses beyond predictive validity comprise many potential advantages, namely, moving the field from a unidimensional conceptualization of risk toward a multidimensional model where multiple psychological constructs intertwine to lead to recidivism. In other words, it enables the study of the building blocks of recidivism with finer granularity than if a single risk dimension was used.

This echoes developments in the assessment of psychopathy, where varying scores on the two factors and four facets of the Psychopathy Checklist–Revised (PCL-R; Hare, 2003; Neumann et al., 2007) produce different clinical entities (e.g., high scores on Factor 1 and low scores on Factor 2 depict the “white-collar psychopath,” which is different from persons who would obtain high scores on both factors). In the case of the three-factor model of sexual violence risk, high scores on Persistence/Paraphilia and low scores on Youthful Stranger Aggression and General Criminality might produce a clinical portrait similar to that of the fixated pedophile of Groth et al. (1982), while high scores on General Criminality and low to medium scores on Persistence/Paraphilia and Youthful Stranger Aggression might echo Knight and Prentky’s (1990) opportunistic sexual aggressor of women. The mapping of different combinations of scores on the three-factor model and existing typologies of individuals that sexually offended would have to be validated by future studies, but they nevertheless open an interesting interface between risk assessment research and on-the-ground clinical work where typologies usually play a more important role.

Finally, the present study comprises advantages associated with the use of MIRT rather than traditional factor analysis only. Using MIRT models enables the reduction of circular dependency (and thus improves generalizability), gives insight on the calibration of items with the difficulty and discrimination parameters, and allows the comparison of results with traditional EFA models.

Factor Structure

As to the factor structure, analyses revealed that MIRT results were very close to EFA results, as well as those from other empirical studies of the latent structure of the Static-99R and Static-2002R (e.g., Brouillette-Alarie et al., 2016; Brouillette-Alarie & Proulx, 2013; Roberts et al., 2002; Seto, 2005). Fit indices of our EFA converged on three factors very similar to those found in Brouillette-Alarie et al. (2016), which is not surprising considering that the same dataset was used in Brouillette-Alarie et al. (2016) and the current study. However, the results of Brouillette-Alarie et al. (2016) were based on nonredundant items from the Static-99R and Static-2002R, while the current study comprised Static-2002R items only. The three factors extracted encompassed the two main dimensions of sexual recidivism risk: (a) sexual criminality, indicative of atypical sexual interests and/or sexual preoccupations, and (b) general criminality, indicative of psychopathic traits and/or an antisocial lifestyle (Brouillette-Alarie & Hanson, 2015; Doren, 2004).

The third dimension was centered around young age, juvenile sexual offenses, and the sexual abuse of unrelated/stranger victims. The scientific consensus about this third dimension is currently unclear, some attributing this construct to demographics (younger offenders are unlikely to have access to intrafamilial victims, for example, their own children; Brouillette-Alarie et al., 2016), some to intent to harm victims (Brouillette-Alarie et al., 2018; Lehmann et al., 2014; Roberts et al., 2002), and others to sexual offender types (sexual aggressors of women are, on average, younger than sexual aggressors of children and use more coercion in their sexual offenses; Knight & Thornton, 2007). Even though the factor may not represent a psychologically meaningful construct, it may correlate with multiple psychological and physiological traits, as does aging. Indeed, the link between age and crime is a classic in criminology, dating to the early studies of Quételet (1835) and having been replicated ever since with offenders with or without a history of sexual crime (Hanson, 2002; Hirschi & Gottfredson, 1983; Moffitt, 1993; Sampson & Laub, 2003). As mentioned by Barbaree et al. (2009), age could have an interaction effect with sexual and general criminality. In their study, sexual criminality items were more predictive of sexual recidivism for older individuals, while general criminality items were more predictive of sexual recidivism for younger individuals. This would fit with MIRT results from this study, where the “age at release” item obtained very high discrimination slopes for both Persistence/Paraphilia and Youthful Stranger Aggression, but in opposite directions. This indicates that for Persistence/Paraphilia, older ages are more discriminatory, while for Youthful Stranger Aggression, younger ages are more discriminatory. Even though this unusual pattern should not be a sufficient justification to advocate for the removal of age in the Static-2002R, as the item is certainly risk relevant (Hanson, 2002), it may suggest that future factor analyses or MIRT evaluations of the scale may be better served by removing the age item a priori from the solution.

Our factor solution was discordant with that of Jung et al. (2017), the only other factor analysis study of the Static-2002R as a whole. In their study, two factors were retained because three- and four-factor solutions comprised Heywood cases (loadings greater than 1.0; Heywood, 1931). Heywood cases are often indicative of insufficient sample size or too many factors extracted, which motivated Jung et al. (2017) to settle on two dimensions. Interestingly, in their two-factor solution, the age at release and juvenile and adult sexual offenses items did not load on any factor, suggesting they may have belonged to a third, unextracted dimension. In addition, among their factor retention criteria, parallel analysis—arguably the most robust criterion (O’Connor, 2000; Schmitt, 2011)—suggested three factors.

MIRT

MIRT analyses indicated that all Static-2002R items, except for juvenile and adult sexual offenses, performed more effectively (i.e., changed more quickly in probability) in the factor in which EFA classified them. The juvenile and adult sexual offenses item had a significantly higher discrimination slope in Youthful Stranger Aggression than in Persistence/Paraphilia, which was not entirely surprising considering that EFA results highlighted that this item cross-loaded on both Youthful Stranger Aggression and Persistence/Paraphilia. In addition, apart from the previously discussed “age at release” item, no Static-2002R item had substantial (moderate or more) negative discrimination in a factor. This suggests, along with the low but positive correlations between factors, that Static-2002R dimensions are different psychological entities that nevertheless incrementally contribute to risk of future sexual offending. It would remain to be seen if and how these dimensions interact to lead to recidivism, which would make an interesting avenue for future research, as the current study did not account or test for moderation effects.

In sum, results indicate a good convergence between EFA loadings and MIRT discrimination slopes—two different but complementary families of statistical analyses—and further contribute to cement the factor structure of the Static-2002R.

Multidimensional Item Discrimination and Difficulty

The MIRT model gave valuable information on the performance of Static-2002R items. Knowing item-level parameters has numerous potential advantages for actuarial scale development, namely, by suggesting avenues to improve item weighting according to discrimination and difficulty. Scaling items according to their psychometric properties rather than giving them equal weight (usually 1 point) has the potential to improve the predictive validity of scales and make them more face valid, as not all risk factors are likely to be equally related to recidivism. There are debates on the tangible benefits of differentially weighting items, with some results indicating that complex combinations rarely outperform the simple summing of dichotomous items (Ghiselli et al., 1981; Grann & Långström, 2007; Silver et al., 2000). However, differential weighting has its greatest impact when there is a wide variation in weighting values, little intercorrelation between items, and only a few items (Ghiselli et al., 1981; Kline, 2005). Considering that actuarial scales usually comprise few nonredundant items, they could benefit from differential weighting.

Concerning multidimensional discrimination (A), none of the Static-2002R items had values that were unsatisfactory. Only noncontact sexual offenses, male victim, and unrelated and stranger victim—sexual violence–related items—showed moderate rather than very high discrimination. General Criminality items were found to be very discriminant, suggesting that the BARR-2002R, which contains those items plus age, might display particularly good psychometric properties in IRT analyses.

All Static-2002R items except age at release, unrelated and stranger victim, and prior arrest/sentencing occasions were found to be difficult (multidimensional difficulty [B] > 0). The first ranks of these ordinal items had difficulty values under 0 and their later ranks over 0, suggesting they captured a wide range of risk. As mentioned by Giguère and Lussier (2016), less difficult items are not a problem by themselves, as actuarial scales should aim to have difficult and less difficult items to cover all levels of the risk continuum. Our analyses suggested that in the Static-2002R, age at release, unrelated and stranger victim, and prior arrest/sentencing occasions were the items most likely to cover the lower end of the risk continuum and, thus, to be found among lower risk individuals. Difficult items included noncontact sexual offenses, juvenile and adult sexual offenses, and, to a lesser extent, male victim, young, unrelated victims, and the later ranks of prior arrest/sentencing occasions and prior sexual offenses. Therefore, endorsement of these items is more likely to be found among higher risk individuals and should be rare among lower risk individuals. In the Static-2002R, difficulty mostly manifested itself in sexual violence items, which is consistent with the scarcity of sexual offenses in criminal records compared with nonsexual offenses. Indeed, men with a history of sexual crime are more likely to reoffend with a nonsexual offense than a sexual crime (Hanson & Bussière, 1998; Hanson & Morton-Bourgon, 2005) and are more likely to be generalists than “sexual specialists” in their criminal careers (D. A. Harris et al., 2011; Lussier et al., 2005). In future developments of the Static-2002R, it might be worthwhile to test whether improving the weighting of the “juvenile and adult sexual offenses” and “noncontact sexual offenses” items could lead to improvements in predictive validity, as these items were the most difficult and worth 1 point only in the scale.

In sum, MIRT analyses did not single out any Static-2002R item as problematic, echoing the findings of Helmus and Thornton (2015), which tested the predictive and incremental validity of Static-99R and Static-2002R items in a meta-analysis. They found that only index nonsexual violence proved problematic—an item absent from the Static-2002R and, consequently, from our study. Therefore, the current research attests to the relevance of all Static-2002R items. Even though more explicit coding rules or a reworking of the structure of less discriminative items (noncontact sexual offenses, male victim, and unrelated and stranger victim) might potentially improve the scale, there are no impetus or warning flags given that items demonstrated at worst moderate discrimination levels.

As to implications of the MIRT model for future research, apart from suggesting avenues to re-weight items, it also brings the complementary question of the convergence between MIRT parameters and risk levels. If two individuals are scored positively on four 1-point items of the Static-2002R and thus obtain a score of 4, are their risk levels the same if the items endorsed are not the same and have varying levels of difficulty? Current CTT models (and the Static-2002R) would assume that their risk is similar, unlike θ positioning on an IRT model. Future studies can examine whether these two hypothetical individuals actually differ on recidivism risk. That said, we must be cognizant that the meaning of item discrimination and difficulty might not be the same in risk scales than in mathematical tests from the field of education. Success on a complicated mathematical problem usually implies success on an easier problem. In the context of risk scales, this assumption might not hold true, especially in multidimensional scales like the Static-2002R (hence the use of MIRT). For example, endorsing the “noncontact sexual offenses” is likely to come with endorsement of the less difficult “prior sexual offenses” item, as committing a noncontact sexual offense automatically means that you have committed at least one prior sexual offense (unless the noncontact sexual offense was committed in the index offense). However, it does not mean endorsement of the “male victim” or “prior nonsexual violence” items—especially the latter one, as it is part of another dimension. This implies that even though MIRT opens numerous interesting avenues for risk scale development, these avenues need to be concretely tested for usefulness before discarding previous advances in the field anchored in CTT models.

Limitations

The main limitation of the current study relates to the item preparation that was necessary before submitting the scale for EFA and MIRT. Collinearity concerns led us to merge items (e.g., unrelated/stranger victim) and discard others (high rate of sexual offending), which implies that the Static-2002R was not entered “as is” in latent variable analyses. This may have influenced the outcoming factor structure and deprives us of data on the high rate of sexual offending item. For example, even though it would be anticipated to find this item in Persistence/Paraphilia, it may have significantly loaded on General Criminality, a factor also characterized by criminal repetition. A necessary step before integrating dimensional scores in the Static-2002R would be to perform an EFA and/or CFA with items entered without modifications and compare results with those obtained in the current study. Encouragingly, recent studies on sadism have shown consistency of findings across samples and analytical strategies whether item preparation was performed or not (Longpré et al., 2020; Stefanska et al., 2019; Yoon et al., 2019).

Implications for Practice

The implications for practice of the current study are threefold. First, as mentioned above, results confirm the relevance of using the Static-2002R to assess the risk of adult men with a history of sexual crime. Substantial literature already exists on the reliability and predictive validity of the scale (Babchishin et al., 2012; Hanson et al., 2010; Phenix & Epperson, 2016; Tully et al., 2013), and this study further attested to its psychometric properties with items performing well under MIRT scrutiny. Other actuarial scales in the field, such as the Level of Service/Case Management Inventory (LS/CMI; Andrews et al., 2004), were not so lucky (Giguère & Lussier, 2016).

Second, this study emphasized the importance of integrating dimensional scores in the Static-2002R. Our results indicated that recidivism risk in men who sexually offend was multidimensional: The EFA identified a three-factor solution; these factors did not share a substantial amount of variance; and there was no higher order risk dimension that encompassed the three factors. In other studies, these dimensions showed predictive and convergent validity patterns that were quite differentiated (Brouillette-Alarie et al., 2016, 2018). Considering that the nomological networks of these constructs seem dissimilar, they may not represent the same psychological (or physiological) constructs, which emphasizes the need to switch to a dimensional scoring of risk in actuarial instruments. Such a change could enable improvements in predictive validity depending on the outcome of interest. For example, the BARR-2002R outclasses the predictive validity of the Static-2002R for nonsexual recidivism because it discards a dimension (sexual criminality) irrelevant to that outcome (Babchishin et al., 2016). Dimensional scoring would also open the door for future examinations of the nomological networks of these constructs, which could retroactively lead to further improvements in predictive validity.

Third, studies that evaluated the Static-2002R’s cross-cultural validity also highlighted the importance of dimensional scoring. Indeed, recent studies have shown that the Static-99R was similarly predictive of recidivism for White, Black, Hispanic, and Indigenous individuals with a history of sexual crime (Lee & Hanson, 2017; Lee, Hanson, & Blais, 2020). However, for the Static-2002R, when comparing Caucasian and Indigenous offenders, the Static-2002R failed to be predictive for Indigenous participants because the Persistence/Paraphilia and Youthful Stranger Aggression constructs were not predictive (Lee, Hanson, & Blais, 2020). Thus, static risk dimensions comprising sexuality-related items were not predictive compared with General Criminality items. Another relevant result was found in a study that compared motivations for sexual offending among White and Black participants (Lee, Hanson, Calkins, & Jeglic, 2020). Results indicated that White men had higher paraphilic and lower antisociality scores compared with Black men. In both cases, had the authors failed to explore risk constructs in addition to total scores, important nuances would have been lost, including opportunities to improve treatment tailoring in the future.

In sum, considering that in the current study the factor structure of the Static-2002R was consistent in EFA and MIRT and that similar structures were found in other studies of the Static-99R/2002R (Brouillette-Alarie et al., 2016; Brouillette-Alarie & Proulx, 2013; Roberts et al., 2002; Seto, 2005), there is now sufficient evidence for integrating dimensional scores in the Static-2002R. For the prediction of sexual recidivism, all three constructs—thus all Static-2002R items—would likely be used. For nonsexual recidivism, it would remain to be seen if the prediction of nonsexual recidivism would be better served by using the BARR-2002R (General Criminality plus the age item) or the sum of General Criminality and Youthful Stranger Aggression. In any case, the existing five subscales of the Static-2002R should be replaced by those found in studies of the latent structure of the scale, as empirical results do deviate from the initial division proposed by creators of the instrument.

Footnotes

Appendix A

Table A1.

Static-2002R Items Used in This Study.

Item no.	Original item no.	Risk factor	Codes	Score
1	1	Age at release	18–34.9	2
			35–39.9	1
			40–59.9	0
			60 or older	−1
2	2	Prior sexual offenses	None	0
			1	1
			2, 3	2
			4 or more	3
3	3	Juvenile (<18) and adult sexual offenses	No	0
3	3	Juvenile (<18) and adult sexual offenses	Yes	1
4	5	Noncontact sexual offenses	No	0
4	5	Noncontact sexual offenses	Yes	1
5	6	Male victim	No	0
5	6	Male victim	Yes	1
6	7	2+ young (<12) and 1+ unrelated victims	No	0
6	7	2+ young (<12) and 1+ unrelated victims	Yes	1
7	8 and 9	Unrelated and stranger victim	No	0
			Yes (unrelated)	1
			Yes (stranger)	2
8	10 and 11	Prior arrest/sentencing occasions	0	0
			1–2	1
			3–13	2
			14 or more	3
9	12	Community supervision violation	No	0
9	12	Community supervision violation	Yes	1
10	13	Years free prior to index offense	No	1
10	13	Years free prior to index offense	Yes	0
11	14	Prior nonsexual violence	No	0
11	14	Prior nonsexual violence	Yes	1

Note. The “high rate of sexual offending” item (original item no. 4) was dropped in this study because of its high correlation with the “prior sentencing occasions for sexual offenses” item (r = .95).

Appendix B

Table B1.

Polychoric/Tetrachoric Correlations Between Static-2002R Items (N = 2,569).

Items	1	2	3	4	5	6	7	8	9	10	11
1	1.00	—	—	—	—	—	—	—	—	—	—
2	−.11	1.00	—	—	—	—	—	—	—	—	—
3	.33	.62	1.00	—	—	—	—	—	—	—	—
4	−.08	.46	.09	1.00	—	—	—	—	—	—	—
5	−.27	.33	.19	.23	1.00	—	—	—	—	—	—
6	−.15	.47	.34	.32	.56	1.00	—	—	—	—	—
7	.25	.37	.39	.28	.04	.34	1.00	—	—	—	—
8	<.001	.56	.42	.14	−.01	.04	.22	1.00	—	—	—
9	.24	.48	.32	.09	−.07	<.001	.24	.80	1.00	—	—
10	.33	.53	.45	.13	−.01	.13	.41	.71	.76	1.00	—
11	.11	.21	.17	−.08	−.17	−.20	.16	.67	.65	.57	1.00

Note. 1 = age at release; 2 = prior sexual offenses; 3 = juvenile and adult sexual offenses; 4 = noncontact sexual offenses; 5 = male victim; 6 = young, unrelated victims; 7 = unrelated and stranger victim; 8 = prior arrest/sentencing occasions; 9 = community supervision violation; 10 = years free prior to index offense; 11 = prior nonsexual violence.

Authors’ Note

The views expressed are those of the authors and not necessarily those of Public Safety Canada.

Declaration of Conflicting Interests

The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Kelly M. Babchishin is an author of the Static-2002R.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Sébastien Brouillette-Alarie

References

Akaike

(1973). Information theory and an extension of the maximum likelihood principle. In Petrov

B. N.

Csaki

B. F.

(Eds.), Second international symposium on information theory (pp. 267–281). Academiai Kiado.

Allen

B. P.

Pflugradt

D. M.

(2014). An exploration of the latent constructs of the STATIC-99. International Journal of Offender Therapy and Comparative Criminology, 58(11), 1376–1388. https://doi.org/10.1177/0306624x13496046

American Psychiatric Association. (2013). Diagnostic and statistical manual of mental disorders (5th ed.).

Andrews

D. A.

Bonta

(2010). The psychology of criminal conduct (5th ed.). LexisNexis/Matthew Bender.

Andrews

D. A.

Bonta

Wormith

S. J.

(2004). The Level of Service/Case Management Inventory (LS/CMI). Multi-Health Systems.

Archer

R. P.

Buffington-Vollum

J. K.

Stredny

R. V.

Handel

R. W.

(2006). A survey of psychological test use patterns among forensic psychologists. Journal of Personality Assessment, 87(1), 84–94. https://doi.org/10.1207/s15327752jpa8701_07

Babchishin

K. M.

Hanson

R. K.

(2020). Monitoring changes in risk of reoffending: A prospective study of 632 men on community supervision. Journal of Consulting and Clinical Psychology, 88(10), 886–898. https://doi.org/10.1037/ccp0000601

Babchishin

K. M.

Hanson

R. K.

Blais

(2016). Less is more: Using Static-2002R subscales to predict violent and general recidivism among sexual offenders. Sexual Abuse: A Journal of Research and Treatment, 28(3), 187–217. https://doi.org/10.1177/1079063215569544

Babchishin

K. M.

Hanson

R. K.

Helmus

(2012). Even highly correlated measures can add incrementally to predicting recidivism among sex offenders. Assessment, 19(4), 442–461. https://doi.org/10.1177/1073191112458312

10.

Baker

F. B.

(2001). The basics of item response theory (2nd ed.). ERIC Clearinghouse on Assessment and Evaluation.

11.

Barbaree

H. E.

Langton

C. M.

Blanchard

Cantor

(2009). Aging versus stable enduring traits as explanatory constructs in sex offender recidivism: Partitioning actuarial prediction into conceptually meaningful components. Criminal Justice and Behavior, 36(5), 443–465. https://doi.org/10.1177/0093854809332283

12.

Barbaree

H. E.

Langton

C. M.

Peacock

E. J.

(2006). Different actuarial risk measures produce different risk rankings for sexual offenders. Sexual Abuse, 18(4), 423–440. https://doi.org/10.1177/107906320601800408

13.

Beech

A. R.

Ward

(2004). The integration of etiology and risk in sexual offenders: A theoretical framework. Aggression and Violent Behavior, 10(1), 31–63. https://doi.org/10.1016/j.avb.2003.08.002

14.

Bengtson

(2008). Is newer better? A cross-validation of the Static-2002 and the Risk Matrix 2000 in a Danish sample of sexual offenders. Psychology, Crime & Law, 14(2), 85–106. https://doi.org/10.1080/10683160701483104

15.

Bertrand

Blais

J.-G.

(2004). Modèles de mesure: L’apport de la théorie des réponses aux items [Measurement models: The contribution of item response theory]. Presses de l’Université du Québec.

16.

Bigras

(2007). La prédiction de la récidive chez les délinquants sexuels [Predicting recidivism in sexual offenders] [Doctoral dissertation, Université de Sherbrooke]. Available from ProQuest Dissertations & Theses Global. (Publication No. NR30941)

17.

Boer

(2003). Evaluating the Static-99 and Static-2002 risk scales using Canadian sexual offenders [Unpublished master’s thesis]. University of Leicester.

18.

Bonta

(1996). Risk-needs assessment and treatment. In Harland

A. T.

(Ed.), Choosing correctional options that work: Defining the demand and evaluating the supply (pp. 18–32). SAGE.

19.

Bonta

(2002). Offender risk assessment: Guidelines for selection and use. Criminal Justice and Behavior, 29(4), 355–379. https://doi.org/10.1177/0093854802029004002

20.

Boughner

K. L.

(2010). Dimensions of the Static-2002: A statistical and theoretical perspective [Master’s thesis, Spalding University]. Available from ProQuest Dissertations & Theses Global. (Publication No. 3404472)

21.

Bourgon

Mugford

Hanson

R. K.

Coligado

(2018). Offender risk assessment practices vary across Canada. Canadian Journal of Criminology and Criminal Justice, 60(2), 167–205. https://doi.org/10.3138/cjccj.2016-0024

22.

Brouillette-Alarie

Babchishin

K. M.

Hanson

R. K.

Helmus

L.-M.

(2016). Latent constructs of the Static-99R and Static-2002R: A three-factor solution. Assessment, 23(1), 96–111. https://doi.org/10.1177/1073191114568114

23.

Brouillette-Alarie

Hanson

R. K.

(2015). Comparaison de deux mesures d’évaluation du risque de récidive des délinquants sexuels [Comparison of two measures of recidivism risk assessment of sexual offenders]. Canadian Journal of Behavioural Science, 47(4), 292–304. https://doi.org/10.1037/cbs0000019

24.

Brouillette-Alarie

Lussier

(2018). The risk assessment of offenders with a history of sexual crime: Past, present and new perspectives. In Lussier

Beauregard

(Eds.), Sexual offending: A criminological perspective (pp. 349–375). Routledge.

25.

Brouillette-Alarie

Proulx

(2013). Predictive validity of the Static-99R and its dimensions. Journal of Sexual Aggression, 19(3), 311–328. https://doi.org/10.1080/13552600.2012.747630

26.

Brouillette-Alarie

Proulx

Hanson

R. K.

(2018). Three central dimensions of sexual recidivism risk: Understanding the latent constructs of Static-99R and Static-2002R. Sexual Abuse, 30(6), 676–704. https://doi.org/10.1177/1079063217691965

27.

Cai

(2010). High-dimensional exploratory item factor analysis by a Metropolis-Hastings Robbins-Monro algorithm. Psychometrika, 75(1), 33–57. https://doi.org/10.1007/s11336-009-9136-x

28.

Chalmers

R. P.

(2012). Mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29. http://dx.doi.org/10.18637/jss.v048.i06

29.

Dawes

R. M.

Faust

Meehl

P. E.

(1989). Clinical versus actuarial judgment. Science, 243(4899), 1668–1674. https://doi.org/10.1126/science.2648573

30.

de Ayala

R. J

. (2009). Methodology in the social sciences. The theory and practice of item response theory. Guilford Press.

31.

Doren

D. M.

(2004). Toward a multidimensional model for sexual recidivism risk. Journal of Interpersonal Violence, 19(8), 835–856. https://doi.org/10.1177/0886260504266882

32.

Douglas

K. S.

Skeem

J. L.

(2005). Violence risk assessment: Getting specific about being dynamic. Psychology, Public Policy, and Law, 11(3), 347–383. https://doi.org/10.1037/1076-8971.11.3.347

33.

Ennis

Hook

Choy

(2011, November). Reliability, validity, and factor structure of the Static-2002. Association for the Treatment of Sexual Abusers.

34.

Flora

D. B.

LaBrish

Chalmers

R. P.

(2012). Old and new ideas for data screening and assumption testing for exploratory and confirmatory factor analysis. Frontiers in Psychology, 3, Article 55. https://doi.org/10.3389/fpsyg.2012.00055

35.

Forero

C. G.

Maydeu-Olivares

Gallardo-Pujol

(2009). Factor analysis with ordinal indicators: A Monte Carlo study comparing DWLS and ULS estimation. Structural Equation Modeling: A Multidisciplinary Journal, 16(4), 625–641. https://doi.org/10.1080/10705510903203573

36.

Fraser

McDonald

R. P.

(1988). NOHARM: Least squares item factor analysis. Multivariate Behavioral Research, 23(2), 267–269. https://doi.org/10.1207/s15327906mbr2302_9

37.

Gendreau

Little

Goggin

(1996). A meta-analysis of the predictors of adult offender recidivism: What works!. Criminology, 34(4), 575–608. https://doi.org/10.1111/j.1745-9125.1996.tb01220.x

38.

Ghiselli

E. E.

Campbell

J. P.

Zedeck

(1981). Measurement theory for the behavioral sciences. W. H. Freeman.

39.

Giguère

Lussier

(2016). Debunking the psychometric properties of the LS\CMI: An application of item response theory with a risk assessment instrument. Journal of Criminal Justice, 46, 207–218. https://doi.org/10.1016/j.jcrimjus.2016.05.005

40.

Grann

Långström

(2007). Actuarial assessment of violence risk: To weigh or not to weigh? Criminal Justice and Behavior, 34(1), 22–36. https://doi.org/10.1177/0093854806290250

41.

Groth

A. N.

Hobson

W. F.

Gary

T. S.

(1982). The child molester: Clinical observations. Journal of Social Work & Human Sexuality, 1(1–2), 129–144. https://doi.org/10.1300/J291v01n01_08

42.

Grove

W. M.

Zald

D. H.

Lebow

B. S.

Snitz

B. E.

Nelson

(2000). Clinical versus mechanical prediction: A meta-analysis. Psychological Assessment, 12(1), 19–30. https://doi.org/10.1037/1040-3590.12.1.19

43.

Haag

A. M.

(2005). Do psychological interventions impact on actuarial measures: An analysis of the predictive validity of the Static-99 and Static-2002 on a re-conviction measure of sexual recidivism [Doctoral dissertation, University of Calgary]. Available from ProQuest Dissertations & Theses Global. (Publication No. NR05662)

44.

Hanson

R. K.

(2002). Recidivism and age: Follow-up data from 4,673 sexual offenders. Journal of Interpersonal Violence, 17(10), 1046–1062. https://doi.org/10.1177/088626002236659

45.

Hanson

R. K.

Babchishin

K. M.

Helmus

L. M.

Thornton

Phenix

(2017). Communicating the results of criterion referenced prediction measures: Risk categories for the Static-99R and Static-2002R sexual offender risk assessment tools. Psychological Assessment, 29(5), 582–597. https://doi.org/10.1037/pas0000371

46.

Hanson

R. K.

Bussière

M. T.

(1998). Predicting relapse: A meta-analysis of sexual offender recidivism studies. Journal of Consulting and Clinical Psychology, 66(2), 348–362. https://doi.org/10.1037/0022-006X.66.2.348

47.

Hanson

R. K.

Harris

A. J. R.

Scott

Helmus

(2007). Assessing the risk of sexual offenders on community supervision: The Dynamic Supervision Project (Publication No. 2007-05). Public Safety Canada. https://www.publicsafety.gc.ca/cnt/rsrcs/pblctns/ssssng-rsk-sxl-ffndrs/ssssng-rsk-sxl-ffndrs-eng.pdf

48.

Hanson

R. K.

Helmus

Thornton

(2010). Predicting recidivism amongst sexual offenders: A multi-site study of Static-2002. Law and Human Behavior, 34(3), 198–211. https://doi.org/10.1007/s10979-009-9180-1

49.

Hanson

R. K.

Lloyd

C. D.

Helmus

Thornton

(2012). Developing non-arbitrary metrics for risk communication: Percentile ranks for the Static-99/R and Static-2002/R sexual offender risk tools. International Journal of Forensic Mental Health, 11(1), 9–23. https://doi.org/10.1080/14999013.2012.667511

50.

Hanson

R. K.

Morton-Bourgon

K. E.

(2005). The characteristics of persistent sexual offenders: A meta-analysis of recidivism studies. Journal of Consulting and Clinical Psychology, 73(6), 1154–1163. https://doi.org/10.1037/0022-006X.73.6.1154

51.

Hanson

R. K.

Morton-Bourgon

K. E.

(2009). The accuracy of recidivism risk assessments for sexual offenders: A meta-analysis of 118 prediction studies. Psychological Assessment, 21(1), 1–21. https://doi.org/10.1037/a0014421

52.

Hanson

R. K.

Thornton

(2000). Improving risk assessments for sex offenders: A comparison of three actuarial scales. Law and Human Behavior, 24(1), 119–136. https://doi.org/10.1023/a:1005482921333

53.

Hanson

R. K.

Thornton

(2003). Notes on the development of Static-2002 (Publication No. 2003-01). Department of the Solicitor General of Canada. https://www.publicsafety.gc.ca/cnt/rsrcs/pblctns/nts-dvlpmnt-sttc/index-en.aspx

54.

Hare

R. D.

(2003). Manual for the Revised Psychopathy Checklist (2nd ed.). Multi-Health Systems.

55.

Harkins

Beech

A. R.

(2007). Measurement of the effectiveness of sex offender treatment. Aggression and Violent Behavior, 12(1), 36–44. https://doi.org/10.1016/j.avb.2006.03.002

56.

Harris

A. J. R.

Hanson

R. K.

(2010). Clinical, actuarial and dynamic risk assessment of sexual offenders: Why do things keep changing? Journal of Sexual Aggression, 16(3), 296–310. https://doi.org/10.1080/13552600.2010.494772

57.

Harris

D. A.

Knight

R. A.

Smallbone

Dennison

(2011). Postrelease specialization and versatility in sexual offenders referred for civil commitment. Sexual Abuse: A Journal of Research and Treatment, 23(2), 243–259. https://doi.org/10.1177/1079063210384276

58.

Helmus

L. M.

(2009). Re-norming Static-99 recidivism estimates: Exploring base rate variability across sex offender samples [Master’s thesis, Carleton University]. Available from ProQuest Dissertations & Theses Global. (Publication No. MR58443)

59.

Helmus

L. M.

Thornton

(2015). Stability and predictive and incremental accuracy of the individual items of Static-99r and Static-2002r in predicting sexual recidivism: A meta-analysis. Criminal Justice and Behavior, 42(9), 917–937. https://doi.org/10.1177/0093854814568891

60.

Helmus

L. M.

Thornton

Hanson

R. K.

Babchishin

R. K.

(2012). Improving the predictive accuracy of Static-99 and Static-2002 with older sex offenders: Revised age weights. Sexual Abuse: A Journal of Research and Treatment, 24(1), 64–101. https://doi.org/10.1177/1079063211409951

61.

Heywood

H. B.

(1931). On finite sequences of real numbers. Proceedings of the Royal Society of London A, 134(824), 486–501. https://doi.org/10.1098/rspa.1931.0209

62.

Hirschi

Gottfredson

(1983). Age and the explanation of crime. American Journal of Sociology, 89(3), 552–584. https://doi.org/10.1086/227905

63.

L-t

Bentler

P. M.

(1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6(1), 1–55. https://doi.org/10.1080/10705519909540118

64.

Jackson

R. L.

Hess

D. T.

(2007). Evaluation for civil commitment of sex offenders: A survey of experts. Sexual Abuse: A Journal of Research and Treatment, 19(4), 425–448. https://doi.org/10.1177/107906320701900407

65.

Jiang

Wang

Weiss

D. J.

(2016). Sample size requirements for estimation of item parameters in the multidimensional graded response model. Frontiers in Psychology, 7(109). https://doi.org/10.3389/fpsyg.2016.00109

66.

Jung

Ennis

Hermann

C. A.

Pham

A. T.

Choy

A. L.

Corabian

Hook

(2017). An evaluation of the reliability, construct validity, and factor structure of the Static-2002R. International Journal of Offender Therapy and Comparative Criminology, 61(4), 464–487. https://doi.org/10.1177/0306624X15595228

67.

Kelley

S. M.

Ambroziak

Thornton

Barahal

R. M.

(2020). How do professionals assess sexual recidivism risk? An updated survey of practices. Sexual Abuse, 32(1), 3–29. https://doi.org/10.1177/1079063218800474

68.

Kline

R. B.

(2005). Principles and practice of structural equation modeling. The Guilford Press.

69.

Knight

R. A.

Prentky

R. A.

(1990). Classifying sexual offenders: The development and corroboration of taxonomic models. In Marshall

W. L.

Laws

D. R.

Barbaree

H. E.

(Eds.), Handbook of sexual assault: Issues, theories, and treatment of the offender (pp. 23–52). Plenum.

70.

Knight

R. A.

Thornton

(2007). Evaluating and improving risk assessment schemes for sexual recidivism: A long-term follow-up of convicted sexual offenders (Publication No. 217618). https://www.ncjrs.gov/pdffiles1/nij/grants/217618.pdf

71.

Langton

C. M.

Barbaree

H. E.

Hansen

K. T.

Harkins

Peacock

E. J.

(2007). Reliability and validity of the Static-2002 among adult sexual offenders with reference to treatment status. Criminal Justice and Behavior, 34(5), 616–640. https://doi.org/10.1177/0093854806296851

72.

Lee

S. C.

Hanson

R. K.

(2017). Similar predictive accuracy of the Static-99R risk tool for White, Black, and Hispanic sex offenders in California. Criminal Justice and Behavior, 44(9), 1125–1140. https://doi.org/10.1177/0093854817711477

73.

Lee

S. C.

Hanson

R. K.

Blais

(2020). Predictive accuracy of the Static-99R and Static-2002R risk tools for identifying Indigenous and White individuals at high risk for sexual recidivism in Canada. Canadian Psychology/Psychologie canadienne, 61(1), 42–57. https://doi.org/10.1037/cap0000182

74.

Lee

S. C.

Hanson

R. K.

Calkins

Jeglic

(2020). Paraphilia and antisociality: Motivations for sexual offending may Differ for American Whites and Blacks. Sexual Abuse, 32(3), 335–365. https://doi.org/10.1177/1079063219828779

75.

Lehmann

R. J. B.

Goodwill

A. M.

Hanson

R. K.

Dahle

K.-P.

(2014). Crime scene behaviors indicate risk-relevant propensities of child molesters. Criminal Justice and Behavior, 41(8), 1008–1028. https://doi.org/10.1177/0093854814521807

76.

Lehmann

R. J. B.

Hanson

R. K.

Babchishin

K. M.

Gallasch-Nemitz

Biedermann

Dahle

K.-P.

(2013). Interpreting multiple risk scales for sex offenders: Evidence for averaging. Psychological Assessment, 25(3), 1019–1024. https://doi.org/10.1037/a0033098

77.

Longpré

Sims-Knight

J. E.

Neumann

Guay

J.-P.

Knight

R. A.

(2020). Is paraphilic coercion a different construct from sadism or the lower end of an agonistic continuum? Journal of Criminal Justice, 71, 101743. https://doi.org/https://doi.org/10.1016/j.jcrimjus.2020.101743

78.

Lord

F. M.

(1953). The relation of test score to the trait underlying the test. Educational and Psychological Measurement, 13(4), 517–549. https://doi.org/10.1177/001316445301300401

79.

Lussier

LeBlanc

Proulx

(2005). The generality of criminal behavior: A confirmatory factor analysis of the criminal activity of sex offenders in adulthood. Journal of Criminal Justice, 33(2), 177–189. https://doi.org/10.1016/j.jcrimjus.2004.12.009

80.

Mann

R. E.

Hanson

R. K.

Thornton

(2010). Assessing risk for sexual recidivism: Some proposals on the nature of psychologically meaningful risk factors. Sexual Abuse: A Journal of Research and Treatment, 22(2), 191–217. https://doi.org/10.1177/1079063210366039

81.

Meehl

P. E.

(1954). Clinical versus statistical prediction: A theoretical analysis and a review of the evidence. University of Minnesota.

82.

Moffitt

T. E.

(1993). Adolescence-limited and life-course-persistent antisocial behavior: A developmental taxonomy. Psychological Review, 100(4), 674–701. https://doi.org/10.1037/0033-295X.100.4.674

83.

Neal

T. M. S.

Grisso

(2014). Assessment practices and expert judgment methods in forensic psychology and psychiatry: An international snapshot. Criminal Justice and Behavior, 41(12), 1406–1421. https://doi.org/10.1177/0093854814548449

84.

Neumann

C. S.

Hare

R. D.

Newman

J. P.

(2007). The super-ordinate nature of the Psychopathy Checklist–Revised. Journal of Personality Disorders, 21(2), 102–117. https://doi.org/10.1521/pedi.2007.21.2.102

85.

O’Connor

B. P.

(2000). SPSS and SAS and programs for determining the number of components using parallel analysis and Velicer’s MAP test. Behavior Research Methods, Instruments, & Computers, 32(3), 396–402. https://doi.org/10.3758/BF03200807

86.

Osteen

(2010). An introduction to using multidimensional item response theory to assess latent factor structures. Journal of the Society for Social Work and Research, 1(2), 66–82. https://doi.org/10.5243/jsswr.2010.6

87.

Phenix

Epperson

D. L.

(2016). Overview of the development, reliability, validity, scoring, and uses of the Static-99, Static-99R, Static-2002, and Static-2002R. In Phenix

Hoberman

H. M.

(Eds.), Sexual offending: Predisposing antecedents, assessments and management (pp. 437–455). Springer.

88.

Quételet

(1835). Sur l’homme et le développement de ses facultés, ou essai de physique sociale [On man and the development of his faculties, or essay on social physics] (Vol. 2). Bachelier.

89.

Rasch

(1960). Studies in mathematical psychology: I. Probabilistic models for some intelligence and attainment tests. Nielsen & Lydiche.

90.

Reckase

M. D.

(1972). Development and application of a multivariate logistic latent trait model [Doctoral dissertation, Syracuse University]. Available from ProQuest Dissertations & Theses Global. (Publication No. 7307762)

91.

Reckase

M. D.

(2009). Multidimensional item response theory. Springer.

92.

Reid

C. A.

Kolakowsky-Hayner

S. A.

Lewis

A. N.

Armstrong

A. J.

(2007). Modern psychometric methodology: Applications of item response theory. Rehabilitation Counseling Bulletin, 50(3), 177–188. https://doi.org/10.1177/00343552070500030501

93.

Reise

S. P.

(2012). The rediscovery of bifactor measurement models. Multivariate Behavioral Research, 47(5), 667–696. https://doi.org/10.1080/00273171.2012.715555

94.

Reise

S. P.

Ainsworth

Haviland

(2005). Item response theory: Fundamentals, applications, and promise in psychological research. Current Directions in Psychological Science, 14(2), 95–101. https://doi.org/10.1111/j.0963-7214.2005.00342.x

95.

Reise

S. P.

Cook

K. F.

Moore

T. M.

(2015). Evaluating the impact of multidimensionality on unidimensional item response theory model parameters. In Reise

Revicki

(Eds.), Handbook of item response theory modeling. Taylor & Francis.

96.

Reise

S. P.

Moore

Maydeu-Olivares

(2011). Target rotations and assessing the impact of model violations on the parameters of unidimensional item response theory models. Educational and Psychological Measurement, 71(4), 684–711. https://doi.org/10.1177/0013164410378690

97.

Revelle

(2021, March 27). Package “Psyc.” https://cran.r-project.org/web/packages/psych/psych.pdf

98.

Roberts

C. F.

Doren

D. M.

Thornton

(2002). Dimensions associated with assessments of sex offender recidivism risk. Criminal Justice and Behavior, 29(5), 569–589. https://doi.org/10.1177/009385402236733

99.

Rohrer

D. M.

(2019). Replication of a three-factor solution: Exploring the underlying constructs of the Static-99R and Static-2002R [Master’s thesis]. Brandeis University. https://hdl.handle.net/10192/36741

100.

Sampson

R. J.

Laub

J. H.

(2003). Life-course desisters? Trajectories of crime among delinquent boys followed to age 70. Criminology, 41(3), 555–592. https://doi.org/10.1111/j.1745-9125.2003.tb00997.x

101.

Schmitt

T. A.

(2011). Current methodological considerations in exploratory and confirmatory factor analysis. Journal of Psychoeducational Assessment, 29(4), 304–321. https://doi.org/10.1177/0734282911406653

102.

Schwarz

(1978). Estimating the dimension of a model. Annals of Statistics, 6(2), 461–464. https://doi.org/10.1214/aos/1176344136

103.

Seto

M. C.

(2005). Is more better? Combining actuarial risk scales to predict recidivism among adult sex offenders. Psychological Assessment, 17(2), 156–167. https://doi.org/10.1037/1040-3590.17.2.156

104.

Silver

Smith

W. R.

Banks

(2000). Constructing actuarial devices for predicting recidivism: A comparison of methods. Criminal Justice and Behavior, 27(6), 733–764. https://doi.org/10.1177/0093854800027006004

105.

Stefanska

E. B.

Nitschke

Carter

A. J.

Mokros

(2019). Sadism among sexual homicide offenders: Validation of the sexual sadism scale. Psychological Assessment, 31(1), 132–137. https://doi.org/10.1037/pas0000653

106.

Stevens

J. P.

(1992). Applied multivariate statistics for the social sciences (2nd ed.). Lawrence Erlbaum.

107.

Tully

R. J.

Chou

Browne

K. D.

(2013). A systematic review on the effectiveness of sex offender risk assessment tools in predicting sexual recidivism of adult male sex offenders. Clinical Psychology Review, 33(2), 287–316. https://doi.org/10.1016/j.cpr.2012.12.002

108.

Wirth

R. J.

Edwards

M. C.

(2007). Item factor analysis: Current approaches and future directions. Psychological Methods, 12(1), 58–79. https://doi.org/10.1037/1082-989X.12.1.58

109.

Yoon

Mauzaite

Mokros

(2019). Sexual Sadism Scale. In Fisher

T. D.

Davis

C. M.

Yarber

W. L.

Davis

S. L.

(Eds.), Handbook of sexually-related measures (pp. 525–527). Routledge.

An Examination of the Latent Constructs in Risk Tools for Individuals Who Sexually Offend: Applying Multidimensional Item Response Theory to the Static-2002R

Abstract

Keywords

Item Response Theory (IRT) as a Tool to Study Construct Validity

Current Study

Method

Sample

Measures

Static-2002R

Item Preparation

Analytical Strategy

Factor Analysis and Dimensionality

MIRT

Results

EFA

MIRT

Discussion

Factor Structure

MIRT

Multidimensional Item Discrimination and Difficulty

Limitations

Implications for Practice

Footnotes

Appendix A

Appendix B

Authors’ Note

Declaration of Conflicting Interests

Funding

ORCID iD

References