Abstract
Accurate risk assessment of individuals convicted of sexual offenses is crucial to prevent reoffending and prolonged institutionalization. However, findings indicate a heterogenous quality of risk assessment reports. Some of the qualitative variance may reflect differences in the strength of empirical evidence linking risk factors to reoffending. Some factors that have historically been important treatment targets have meta-analytically been shown to be empirically unsupported. To investigate the influence of unsupported risk factors on the decision-making process, the present study examined risk assessment reports (N = 304) conducted between 1999 and 2016. Results showed a heterogenous consideration of empirically (un)supported risk factors. Reports following a structured risk assessment approach considered significantly more empirically supported risk factors than reports based on an unstructured, clinical-intuitive assessment procedure. Taken together, our findings provide further support for the use of structured and standardized risk assessment procedures and caution expert witnesses against incorporating empirically unsupported risk factors.
Keywords
Introduction
Quality of criminal risk assessment reports for individuals convicted of sexual offenses has previously been shown to be heterogeneous (Haarig et al., 2012; Kunzl & Pfaefflin, 2011; Wertz et al., 2020). The methodological approach to risk assessment has been proposed as one central aspect responsible for this heterogeneity (e.g., Rettenberger & Eher, 2016; Wertz & Rettenberger, 2021). Prognostic judgments may be based on subjective clinical (or unstructured, intuitive, unguided, impressionistic), actuarial (statistical, mechanical, or algorithmic), structured professional, or clinical-idiographic prediction methods (e.g., Grove et al., 2000; Meehl, 1954; Nicholls et al., 2013). Research comparing different risk assessment approaches has consistently demonstrated the predictive superiority of structured methods (actuarial or structured professional judgment [SPJ] instruments; e.g., Ægisdóttir et al., 2006; Bengtson & Långström, 2007; Hanson & Morton-Bourgon, 2009) and the limited accuracy of unstructured predictions (e.g., Grove et al., 2000; Johansen, 2007; Turgut et al., 2006), particularly for predicting sexual and violent recidivism (e.g., Bonta et al., 1998; Heilbrun et al., 2016; Jackson et al., 2004). Crucially, however, a further study evaluating German risk assessment reports revealed that approximately half did not include standardized instruments but instead relied solely on intuitive judgments (Wertz & Rettenberger, 2021).
Additional value for explaining the heterogenous quality of risk assessment reports may be gained from analyzing the type and number of risk factors considered for risk assessment. In addition, several personality characteristics of examinees may affect expert witness judgments even though they do not constitute validated risk factors (Mann et al., 2010; Rettenberger, 2018). In an attempt to give an overview of factors that are frequently incorporated into expert witness judgments despite lacking predictive validity for recidivism, Mann et al. (2010) distinguished several types of variables for predicting recidivism risk among individuals convicted of sexual offenses. Next to identifying four variables for which no stable empirical relation to sexual reoffending could be established (i.e., denial, low self-esteem, major mental illness, loneliness), the authors describe four variables for which more than five studies were not able to find a predictive relationship with sexual recidivism at all (i.e., depression, poor social skills, poor victim empathy, lack of motivation for treatment at intake). The overview by Mann et al. (2010) was updated by Seto et al. (2023) who reviewed relevant literature that has appeared since that time. Two risk factors previously deemed promising by Mann et al. (2010), hostility toward women and dysfunctional coping, are now considered as empirically supported, while no new risk factors were identified. Furthermore, positive social support was the only empirically supported protective factor. Consequently, the consideration of the unsupported risk factors identified by Mann et al. (2010) and Seto et al. (2023) may contribute to the heterogeneity in the quality and accuracy of risk assessment reports.
In addition, the predictive relevance of psychiatric disorders remains questionable. Expert witnesses are urged to exercise caution when estimating the influence of psychiatric diagnoses on the risk of recidivism, as several studies indicate that such diagnoses have low or no predictive validity (Eher et al., 2015, 2016; Kingston et al., 2015). Recognizing the questionable role of psychiatric disorders for risk assessment, the German version of the Diagnostic and Statistical Manual of Mental Disorders (5th ed.; DSM-5, American Psychiatric Association, 2015) includes a warning note regarding the consideration of psychiatric diagnoses in judicial contexts. This call for caution is supported by a relatively low reliability of clinical diagnoses in the forensic context, leading to a large proportion of unjustified diagnoses (Rettenberger, 2018) and a high prevalence of mental disorders among individuals convicted of sexual offenses (e.g., up to 72% diagnosed with a mood disorder; Eher, Rettenberger, & Turner, 2019) despite generally low recidivism rates (Rettenberger et al., 2015).
Importantly, some exceptions exist. Specific disorders such as exclusive pedophilia (Eher et al., 2015), exhibitionism (Biedermann et al., 2023), hypersexual disorder (Gregório Hertz et al., 2022), some other personality disorders (PD), including antisocial, narcissistic, and borderline disorders (F-60 diagnoses), as well as substance use disorders (SUD; Kingston et al., 2015) predicted recidivism in previous studies, indicating the necessity to differentially consider the influence of psychiatric disorders when assessing the risk of reoffending (Långström et al., 2004).
Thus, factors that have historically been important targets and standard components of most treatment programs continue to be considered regularly, although meta-analytically regarded empirically unsupported (Mann et al., 2010; Seto et al., 2023). Therefore, examining the extent to which such unsupported risk factors are still considered in risk assessments of individuals convicted of sexual offenses, and how they influence prognostic judgments, represents an important aspect of quality assurance in forensic (risk) assessment practice.
Study Objectives
The main aim of the present study was to identify the characteristics considered and empirical foundations for prognostic judgments in a sample of German risk assessment reports about individuals convicted of sexual offenses. To this end, we examined the influence of these characteristics on the prognostic direction and accuracy of the reports. More precisely, study objectives were to systematically examine the degree of consideration of empirically unsupported and supported risk factors in risk assessment reports about individuals convicted of sexual offenses, and to investigate the relevance of the identified risk factors for the direction and the accuracy these judgments.
Methods
We retrospectively analyzed N = 304 risk assessment reports with regards to different aspects of the offense(s), pre-delinquency, psychiatric diagnosis according to the International Statistical Classification of Diseases and Related Health Problems, 10th revision (ICD-10; World Health Organization, 2016), and incarceration or placement of individuals due to articles 20, 21, 63, 64, or 66 of the German penal code. 1 Report-related data (time, institutional context, expert profession, use of risk assessment tools, methodological approach, and direction of prognostic judgment) as well as the consideration of empirically unsupported and supported risk factors for sexual recidivism were systematically gathered. In addition, the accuracy of the prognostic judgments was examined using data about recidivism according to the Federal Central Register (retrieved in 2016). 2
Sample
All risk assessment reports were about male individuals charged or convicted of sexual offenses. Reports were gathered from two German institutions representing common forensic practice: the penitentiary in Freiburg (n = 135) and the Department of Forensic Psychiatry of the University Hospital Munich (n = 169). Assessments were conducted between 1999 and 2016 and were ordered by diverse judicial parties in the course of different penal law and sanction execution proceedings, including local or district courts, courts for the execution of prison sanctions, higher regional courts, and public prosecutors. Important to note, research on risk factors for sexual recidivism and approaches to risk assessment evolved during this timeframe (see e.g., Kelley et al., 2020), leading to shifts in what was considered best practice throughout the sampling period. Nevertheless, examining risk assessment reports from this extended timeframe offers a representative cross-section of reports. This enables our research to identify factors contributing to the heterogeneity in report quality, consistent with patterns observed in similar time periods in previous studies. Risk assessments were conducted by 68 different expert witnesses who reported between 1 and more than 40 assessments (M = 16.2 reports; SD = 7.2; range: 1–42). More than three quarters of the assessments were contributed by psychiatrists (80.6%, n = 245), while 16.1% (n = 49) were conducted by psychologists, and 3.3% by experts of both professions (n = 10).
Empirical Data Collection and Coding Procedure
We excluded reports about nonsexual, female, and juvenile persons. To ensure comparability of assessment contexts, reports that were only based on records without personal examination of the individuals as well as incompletely archived reports were also excluded and replaced by reports including a face-to-face examination by an expert witness. Unstructured clinical judgments were defined as assessments in which risk factors were measured solely based on the clinical experience of the assessor (Grove et al., 2000; Hanson & Morton-Bourgon, 2009; Nicholls et al., 2013; Skeem & Monahan, 2011). If any actuarial or SPJ tool was used, assessments were considered structured.
Empirically unsupported risk factors were drawn from the study by Mann et al. (2010). Their review constitutes a seminal work, providing an overview of variables that are frequently incorporated into expert witness reports, despite showing an equivocal or no relation to recidivism. Contrasting them with empirically supported risk factors, reports were examined regarding expert witness’ consideration of items included in the German version of the VRS:SO (Wong et al., 2003), for which a good reliability and predictive validity in a German speaking sample of individuals convicted of sexual offenses could be shown (Gaunersdorfer & Eher, 2022, 2023). 3 From the 24 items making up the VRS:SO, seven are considered static, and 17 constitute dynamic risk factors that are amenable to change. Each factor was rated independently of use of risk assessment instruments on an ordinal scale and subsequently dichotomized for statistical analyses (0 = not considered for risk assessment; 1 = considered for risk assessment).
To assess the agreement with which the coding scheme was applied by two raters (MW and MG), interrater reliabilities were calculated for all risk factors in a randomly selected sample of 20 reports (6.7%; Bujang & Baharum, 2017) using Cohen’s (1968) weighted kappa κw. Using linearly-weighted κ-coefficients, interrater-reliability was at least substantial (κw = .61-.80; Landis & Koch, 1977) for all variables. For unsupported risk factors, Cohen’s weighted κ-ranged from κw = .81, p < .001 to κw = 1.0, p < .001. For static VRS:SO items, Cohen’s kappa ranged from κw = .82, p < .001 to κw = .93, p < .001, and for dynamic VRS:SO items, the reliability statistics ranged from κw = .73, p < .001 to κw = .90, p < .001. Several psychiatric diagnoses, such as exclusive pedophilia, exhibitionism, hypersexual disorder, and other PDs, including antisocial, narcissistic, or borderline disorders (F-60 diagnoses), and SUDs were considered as relevant for risk assessment, as all other mental disorders were considered as irrelevant.
To assess risk communication, the exact wording of the final judgment of each report was translated into a five-point Likert-type scale (very low risk, low risk, moderate risk, high risk, very high risk) by the first author (MW) according to a recommended five-level risk category (Eher, Rettenberger, Etzler, et al., 2019; Hanson et al., 2017). 4 To examine the accuracy of prognostic judgments, actual recidivism data was extracted from criminal records in June 2016, according to the Federal Central Register, and analyzed using an average follow-up period of 7.48 years. Recidivism was coded as any (new criminal conviction of any kind), nonviolent, general sexual (new conviction involving both sexual offenses with physical contact as well as noncontact sexual offenses), sexual contact (new conviction for sexual offense with physical contact), or violent reconviction.
Data Analysis
To analyze the consideration of supported and unsupported risk factors and their dependence on the use of standardized risk assessment instruments, an independent samples t-test (two-sided) and a multivariate analysis of variance (MANOVA) with subsequent univariate analyses of variance (ANOVA) were conducted. To investigate the relevance of supported and unsupported risk factors for the direction and accuracy of expert witness judgments, hierarchical binary logistic regression analyses were conducted. 5 Following the regression analyses, Receiver Operating Characteristic (ROC) analyses were calculated to assess the discriminability of high- and low-risk predictions, accurate and inaccurate judgments, as well as recidivism for each predictor and the final regression models by calculating the area under the curve (AUC; Backhaus et al., 2018).
Results
Almost two-third of the reports concerned individuals diagnosed with a mental disorder according to the ICD-10 who were placed in preventive detention or forensic psychiatry (see Supplemental Table S1, available in the online version of this article). Nearly 85% (n = 256) of the sample were convicted of at least one offense prior to the index offense, mostly because of sexual, or both sexual and violent offenses. Approximately half of the sample was described to have a (very) high risk of reoffending (nvery high = 52; nhigh = 103), while approximately 40% of reports concluded a (very) low (nvery low = 23; nlow = 101) risk of reoffending. Moderate risk judgments were made in 8.2% of the reports (n = 25). Approximately one in six reports (15.5%) included psychological testing (i.e., use of formal psychological diagnostic instruments), and 45% (n = 134) of all risk assessments followed a structured approach. Among these, approximately 23% applied SPJ methods only (e.g., HCR-20 [Müller-Isberner & Webster, 1998; original version: Webster et al., 1997], SVR-20 [Müller-Isberner et al., 2000; original version: Boer et al., 1997]), 3% used actuarial only (e.g., Static-99 [Rettenberger & Eher, 2006; original version: Harris et al., 2003], VRAG [Rossegger et al., 2009; original version: Quinsey et al., 2006], Stable-2007 [Matthes & Rettenberger, 2008; original version: Hanson et al., 2007]), and 18.4% used both methods.
Recidivism Rates
On average, individuals were followed up for a time at risk of 7.48 years (SD = 4.04, range: 1.5–16). The average time between risk assessment and release was 2.01 years (SD = 3.24). Recidivism rates of the total sample (n = 221) were analyzed for 2-year, 5-year, and total follow-up periods. Individuals who had not yet been released or lacked a criminal record because of death or emigration were excluded. Among the remaining sample, almost half were reconvicted for at least one offense of any kind during the total follow-up period. As expected, recidivism rates at the 2- and 5-year follow-ups were lower (see Supplemental Table S2, available in the online version of this article).
Hit Rates
For the average total follow-up period, the hit rate for the prediction of general recidivism was approximately at chance level (50.7%). Among individuals who reoffended with a sexual offense (n = 11), expert witnesses correctly classified 54.5% (n = 6) as high risk, while 36.4% (n = 4) were incorrectly assessed as low risk. For one person who reoffended with a sexual offense, the recidivism risk was assessed as moderate.
Overall, the descriptive results suggest that the present sample was comparable to other (international) samples of individuals convicted of sexual offenses (Hanson & Morton-Bourgon, 2009; Nedopil, 2013).
Consideration of Unsupported and Supported Risk Factors
We could find a significant mean difference in the consideration of supported (VRS:SO) and unsupported (according to Mann et al., 2010) risk factors of medium effect size, t(606) = 7.60, p < .001, d = 0.62, 95% CI [0.09, 0.15] (Cohen, 1968). Regardless of whether risk assessment instruments were applied, approximately 11% more supported (M = 33.8%, SD = 22.1, n1 ≈ 7–8) than unsupported (M = 22.0%, SD = 15.50, n2 = 1.8) risk factors were considered for risk assessment (see Supplemental Table S3, available in the online version of this article).
On average, expert witnesses considered one-third of all empirically supported risk factors and approximately two unsupported risk factors for their prognostic decisions. High standard deviations for both supported and unsupported variables suggest that the reports were highly heterogeneous regarding the number of risk factors considered. When risk assessment instruments were applied, a significant increase in the proportion of (un-)supported risk factors was observed, F(2, 301) = 61.79, p < .001,
Relevance of Risk Factors for Prognostic Direction
Table 1 shows the results of the hierarchical logistic regression analysis, examining how the presence of diagnosis and the consideration of empirically supported risk factors (SRF) and unsupported risk factors (URF) predict the direction of risk judgments. The regression model containing all predictors (diagnosis, SRF, URF, diagnosis x SRF) fitted the data significantly better than the null model,
Hierarchical Logistic Regression Using Number and Type of Subject-Related Characteristics to Predict the Direction of Risk Assessment (N = 279)
Note. CI = confidence interval;
Variable was mean centered and standardized for regression analyses.
A main effect of diagnosis indicated that the presence of a mental health diagnosis significantly reduced the probability of receiving a low-risk judgment. In contrast, a main effect of SRF showed that an increased consideration of SRF increases the probability of a low-risk judgment. A significant interaction of SRF and diagnosis suggested that the influence of SRF on the direction of risk assessment depended on whether a psychiatric diagnosis was present or not. The consideration of a larger proportion of SRF increased the probability of a judgment of a (very) low risk only for individuals without a mental health diagnosis. For these individuals, the odds to be categorized as (very) low risk increased by 2.27 if the proportion of risk factors increased by one standard deviation (i.e., increase by 5.3 risk factors, 22.1%). Individuals for whom an average number of SRF were considered (i.e., 33.8%, Mcentered = 0) were classified as (very) low risk with a probability of 57.6%. If, however, four to five SRF more were considered (i.e., 55.9%, Mcentered = 1), the probability to be categorized as (very) low risk increased by 17.9%, given that all other variables were held constant.
A follow-up analysis confirmed that the effect of SRF was nullified, if a psychiatric diagnosis was present, B = −0.07, Wald(1) = 0.19, p = .667, eB = 0.94, 95% CI [0.69, 1.27]. Individuals with a mental health diagnosis had a decreased chance to be classified as (very) low risk, regardless of the number of SRF considered for prognostic judgments. The presence of a diagnosis decreased the odds of a (very) low-risk judgment by 0.51 compared to those without a diagnosis, corresponding to a 16.7% decrease (all other variables kept constant; see Supplemental Figure S1, available in the online version of this article).
Relevance of Risk Factors for Predictive Accuracy
Table 2 displays the results of the hierarchical logistic regression analysis using the presence of a diagnosis (any diagnosis present: yes/no) and the consideration of SRF and URF to predict the accuracy of expert witness judgments for the 2-year follow-up period. The regression model including all predictors did not fit the data significantly better than the null model,
Hierarchical Logistic Regression Using Number and Type of Subject-Related Characteristics to Predict Hit Rates for a Time at Risk of 2 Years (N = 207)
Note. CI = confidence interval;
Variable was mean centered and standardized for regression analysis.
Simple effect analyses revealed that the proportion of SRF considered only positively predicted judgment accuracy if the assessed individual was not diagnosed with a psychiatric disorder (see Supplemental Figure S2, available in the online version of this article). For those with a diagnosis, risk assessment accuracy did not depend on the consideration of SRF, B = −0.07, Wald(1) = 0.12, p = .728, eB = 0.94, 95% CI [0.64, 1.37]. The accuracy with which the final model was able to predict whether expert witness judgments were correct was slightly above chance level (AUC = .65; Rice & Harris, 2005).
Finally, differences in the accuracy of expert witness judgments over time were investigated. Similar to the model predicting hit rates for the 2-year follow-up period, the model for the 5-year follow-up revealed a significant interaction of SRF and diagnosis, B = −0.73, Wald(1) = 3.88, p = .049, eB = 0.48, 95% CI [0.24, 1.0]. In contrast to the 2-year model, no significant effect of SRF on predictive accuracy could be found for individuals without a psychiatric diagnosis, B = 0.51, Wald(1) = 3.22, p = .073, eB = 1.66, 95% CI [0.96, 2.88], or those with a diagnosis, B = −0.22, Wald(1) = 0.86, p = .354, eB = 0.80, 95% CI [0.51, 1.28]. The significant interaction indicated that, for individuals with a diagnosis, the odds of correctly predicting recidivism decreased by 0.48 if the number of SRF increased by one unit compared to the same increase for those without a diagnosis, B = −0.73, Wald(1) = 3.88, p = .049, eB = 0.48, 95% CI [0.24, 1.00]. These results indicate that the relationship between SRF and predictive accuracy differed for individuals with and without a diagnosis. Yet, the number of considered SRF within each group could not significantly predict whether risk assessments were correct.
The model containing all predictors did not fit the data significantly better than the null model,
Neither the interaction term nor any of the main effects of SRF, URF, or diagnosis significantly predicted hit rates for the total follow-up period, with correct classifications just above chance level (54.3%). These results suggest that the predictive contribution of SRF decreased over time, yielding non-significant effects for an average follow-up of 7.48 years.
Relevance of Risk Factors for Recidivism
Additional regression analyses with the same variables (diagnosis, SRF, URF, diagnosis × SRF) predicting recidivism with any criminal offense revealed no significant interaction of SRF and the presence of a psychiatric disorder, B = −0.09, Wald(1) = 0.09, p = .764, eB = 0.91, 95% CI [0.51, 1.65]. Furthermore, the presence of a psychiatric disorder did not significantly predict recidivism, B = 0.09, Wald(1) = 0.10, p = .755, eB = 1.09, 95% CI [0.63, 1.90]. In contrast, a significant main effect of SRF was observed, indicating that a greater number of SRF considered in the assessment was associated with a lower probability of criminal reoffending in the future, B = −0.36, Wald(1) = 5.93, p = .015, eB = 0.70, 95% CI [0.52, 0.93] (AUC = .60; Rice & Harris, 2005).
The point-biserial correlation of SRF and sexual recidivism was not significant, indicating that the proportion of SRF considered was unrelated to the probability of sexual recidivism, r(N = 221) = −0.50, p = .462. Similarly, no significant relationship between sexual recidivism and the presence of a psychiatric disorder was found,χ²(1, N = 221) = 0.27, p = .606. This indicated that both the proportion of SRF considered as well as the presence of a psychiatric diagnosis were independent of the risk of sexual recidivism.
The Relevance of Mental Health Diagnoses for Prognostic Direction and Hit Rates
In the next step we repeated the hierarchical logistic regression analyses predicting the direction of prognostic judgments, hit rates for 2-year, 5-year, and total follow-up periods, and recidivism, while controlling for whether a diagnosis is considered relevant (PDCR) for recidivism risk or not (PDCI).
Relevance of Risk Factors for Prognostic Direction
In the model predicting the direction of expert witness judgments, a significant interaction of SRF and the presence of a psychiatric diagnosis considered as irrelevant for risk assessment (PDCI; based on the current status of scientific knowledge) could be detected, while no significant interaction of psychiatric diagnoses considered as relevant for risk assessment (PDCR) and SRF was found.
The model including only the significant interaction showed that the direction of prognostic judgments depended on the consideration of SRF when no PDCI was present, B = −0.75, Wald(1) = 6.96, p = .002, eB = 0.47, 95% CI [0.27, 0.83]. When a PDCI was present, the consideration of SRF did not predict the direction of risk assessment, B = −0.11, Wald(1) = 0.29, p = .592, eB = 0.89, 95% CI [0.59, 1.35]. In contrast, when no PDCI was present, an increase in the consideration of SRF by one standard deviation (5.3 risk factors, 22.1%) raised the probability of receiving a low-risk judgment by 14.6%, reflecting a significant change, B = 0.63, Wald(1) = 11.20, p < .001, eB = 1.89, 95% CI [1.30, 2.73]. Furthermore, the presence of a PDCR significantly decreased the odds of being classified as low risk, B = −1.44, Wald(1) = 13.99, p < .001, eB = 0.24, 95% CI [0.11, 0.50], whereas no main effect of PDCI on the direction of risk assessment could be detected, B = −0.33, Wald(1) = 1.25, p = .264, eB = 0.72, 95% CI [0.40, 1.28]. The final regression model revealed a correct classification of 62.4% (AUC = .60; Rice & Harris, 2005).
Relevance of Risk Factors for Predictive Accuracy
Logistic regression analyses predicting 2-year hit rates revealed no significant interaction of either PDCI and SRF, or PDCR and SRF. This indicates that the accuracy-increasing effect of SRF did not depend on whether a mental health diagnosis is considered relevant or irrelevant for recidivism risk. The model only including main effects revealed a significant effect of SRF, B = 0.39, Wald(1) = 5.38, p = .020, eB = 1.48, 95% CI [1.06, 2.07], and of PDCR, B = −1.14, Wald(1) = 6.94, p = .008, eB = 0.32, 95% CI [0.14, 0.75], on the probability of correctly predicting recidivism. When the consideration of SRF increased by one standard deviation, the probability of correctly predicting recidivism increased by 9.25%. In contrast, when individuals were diagnosed with a PDCR, this probability decreased by 27.2%. No significant main effects of URF or PDCI on predictive accuracy could be detected. ROC analyses for the model only containing main effects indicated a medium effect size (AUC = .61; Rice & Harris, 2005).
Regression analyses for the 5-year follow-up period revealed a significant interaction of SRF and PDCR, B = −1.28, Wald(1) = 6.02, p = .014, eB = 0.28, 95% CI [0.10, 0.77]. Conditional main effect analyses showed that SRF did not predict risk assessment accuracy when a PDCR was present, B = −0.83, Wald(1) = 3.11, p = .078, eB = 0.44, 95% CI [0.17, 1.10]. However, in the absence of a PDCR, an increase in the consideration of SRF by one standard deviation significantly increased the probability of correctly predicting recidivism by 9.9%, B = 0.45, Wald(1) = 4.01, p = .045, eB = 1.57, 95% CI [1.01, 2.44]. No significant main effects of URF, B = 0.13, Wald(1) = 0.68, p = .409 eB = 1.14, 95% CI [0.84, 1.56], or PDCI, B = 0.16, Wald(1) = 0.22, p = .641, eB = 1.17, 95% CI [0.60, 2.29], on the accuracy of expert witness judgments were found (AUC = .62; Rice & Harris, 2005).
Similarly, for the total follow-up period, a significant interaction of SRF and PDCR could be detected, B = −0.95, Wald(1) = 6.54, p = .011, eB = 0.39, 95% CI [0.19, 0.80]. In the absence of a PDCR, an increase in the consideration of SRF by one standard deviation increased the probability of correctly predicting recidivism by 9.54%, B = 0.40, Wald(1) = 4.52, p = .033, eB = 1.49, 95% CI [1.03, 2.14]. Conditional main effect analyses confirmed that SRF did not predict assessment accuracy if a PDCR was present, B = −0.55, Wald(1) = 2.96, p = .085, eB = 0.57, 95% CI [0.31, 1.08]. Neither URF, B = 0.15, Wald(1) = 1.12, p = .290, eB = 1.16, 95% CI [0.88, 1.54], nor PDCI significantly predicted the accuracy of expert witness judgments, B = 0.27, Wald(1) = 0.75, p = .387, eB = 1.31, 95% CI [0.71, 2.44]. ROC analyses for this model, including only main effects and the significant interaction term, indicated a medium effect size (AUC = .63; Rice & Harris, 2005).
Relevance of Risk Factors for Recidivism
Finally, a hierarchical logistic regression predicting general recidivism rates revealed no significant interactions of either SRF and PDCR nor SRF and PDCI. Similarly, no significant main effects were found for the presence of a PDCR, B = −0.17, Wald(1) = 0.19, p = .661, eB = 0.84, 95% CI [0.39, 1.82], for PDCI, B = 0.22, Wald(1) = 0.49, p = .485, eB = 1.24, 95% CI [0.67, 2.30], for the proportion of SRF considered, B = −0.31, Wald(1) = 3.75, p = .053, eB = 0.74, 95% CI [0.54, 1.00], or for the proportion of URF considered, B = 0.08, Wald(1) = 0.32, p = .572, eB = 1.08, 95% CI [0.82, 1.43] (AUC = .60; Rice & Harris, 2005).
Likewise, no significant associations were found between sexual recidivism and the presence of a PDCR, χ²(1, N = 221) = 0.13, p = .718, or between sexual recidivism and the presence of a PDCI, χ²(1, N = 221) = 0.75, p = .388. These findings indicate that neither the proportion of SRF considered nor the presence of a psychiatric diagnosis (PDCR and PDCI) were related to the risk to recidivate with a sexual or a general offense.
Discussion
Accurate risk assessment of individuals convicted of sexual offenses is crucial to prevent reoffending and unnecessary institutionalization. However, the quality of criminal risk assessment reports remains heterogeneous. Previous research underlined the superiority of structured over unstructured approaches (e.g., Bengtson & Långström, 2007; Hanson & Morton-Bourgon, 2009; Heilbrun et al., 2016). Structured methods, such as actuarial or SPJ-based instruments, use a predetermined list of empirically derived static and/or dynamic risk (and protective) factors that show an evidence-based relation with reoffending. Risk assessment instruments furthermore provide coding rules designed to improve interrater-reliability and predictive accuracy However, risk factors that have historically been important treatment targets and remain standard components of most treatment programs continue to be regularly considered, even though they lack empirical and meta-analytical support (Mann et al., 2010; Rettenberger, 2018; Seto et al., 2023).
In the present study, a heterogeneous use of empirically supported and unsupported risk factors could be identified. Encouragingly, across all reports, whether structured or not, relatively more supported risk factors than unsupported risk factors were considered during risk assessment. Furthermore, the application of risk assessment instruments significantly increased the proportion of supported risk factors included in evaluations. Together, these findings provide further support for the use of structured and standardized risk assessment procedures.
Our findings also show that the consideration of supported risk factors influenced the prognostic direction of expert witness judgments. A higher proportion of supported risk factors considered increased the odds that an individual was assessed as having a (very) low risk of reoffending. However, our analyses showed that this effect was limited to individuals without a mental health diagnosis. When such a diagnosis was present, the effect of supported risk factors was nullified. In these cases, psychiatric diagnoses were treated as separate, incremental risk factors, contrary to current best practice recommendations. Thus, individuals with a mental health diagnosis had a reduced chance to be judged as (very) low risk, regardless of the number of supported risk factors considered.
Similarly, the number of supported risk factors considered influenced the accuracy of expert witness judgments only positively, if the examinee was not diagnosed with a psychiatric disorder. The effect of supported risk factors, independent of use of formal risk assessment instruments, further decreased over time, showing significant effects for the 2-year follow-up period but not for the total time at risk of more than seven years. This highlights that empirically driven risk predictions generally have their greatest predictive value shortly after discharge, while being less predictive of long-term behavior. Notably, supported risk factors showed a time-independent effect on predictive accuracy when recidivism-relevant and recidivism-irrelevant diagnoses were examined in isolation, suggesting that the influence of risk factor consideration on predictive accuracy may be context-specific.
Theoretical Implications
Our results indicate that the presence of a mental health diagnosis moderates the probability with which expert witnesses classify individuals as having low or high recidivism risk. This finding may be interpreted in light of theoretical work on decision-making and biases in forensic practice (Neal & Grisso, 2014). Estimating recidivism risk requires a comprehensive assessment and integration of diverse subject-related information. At the same time, only recidivism-relevant factors should be considered, while empirically unrelated aspects to reoffending should not influence risk judgments. Given the complexity of this task, expert witnesses may be susceptible to implicit biases that facilitate the integration process, with the hazard of compromising the accuracy of the resulting risk judgments (e.g., Dror et al., 2021). For instance, given a high prevalence of personality disorder diagnoses among individuals convicted of sexual offenses (Eher, Rettenberger, & Turner, 2019), expert witnesses might be more inclined to assign higher risk ratings to those with such a diagnosis, thereby neglecting (other) relevant case-specific information and the relatively low rate of sexual recidivism in this population (Neal & Grisso, 2014; Oberlader & Verschuere, 2025; Rettenberger et al., 2015). As a consequence, information that may be irrelevant for risk assessment may unduly influence risk judgments.
The current finding that individuals with a mental disorder are unlikely to receive a low-risk judgment, irrespective of the number of supported risk factors considered, may reflect such clinical override, meaning an intuitive adjustment of risk estimates derived from standardized assessment procedures (Oberlader & Verschuere, 2025; Rettenberger, 2018). Crucially, these biases may largely operate unconsciously (Neal & Brodsky, 2016) and, as shown in the present study and previous work (Murrie et al., 2013; Oberlader & Verschuere, 2025), cannot be entirely mitigated through the adherence to structured measures during risk assessment. Nevertheless, structured methods constitute promising tools that help expert witnesses to critically examine and verify their assumptions.
Clinical Implications
The prevalence of mental health diagnoses is higher among individuals convicted of sexual offenses than among other crime-related populations, particularly for paraphilic, PDs, and SUDs (Biedermann et al., 2023; Eher, Rettenberger, & Turner, 2019). Despite these high prevalence rates, relatively few studies have examined the relationship between mental health diagnoses and (sexual) reoffending. These studies indicate that mental health diagnoses are not predictive of recidivism (Bonta et al., 1998, 2013; Kingston et al., 2015), although comorbid SUDs, some PDs, particular sexual preference disorders (e.g., exhibitionism and exclusive pedophilia; Biedermann et al., 2023), and hypersexuality (Gregório Hertz et al., 2022) showed low to moderate predictive accuracy. Crucially, though, mental disorders do not seem to improve the prediction of recidivism beyond actuarial risk assessment tools (Biedermann et al., 2023; Eher, Rettenberger, & Turner, 2019). Diagnostic categories derived from the DSM or ICD also usually fail to predict recidivism among individuals convicted of sexual offenses (e.g., Eher et al., 2016; Mann et al., 2010; Seto et al., 2023).
Despite the controversial influence of mental health disorders on recidivism risk, the present results demonstrate that psychiatric diagnoses significantly influence the forensic decision-making process. The presence or absence of a mental health diagnosis affected both the prognostic direction and the accuracy of the risk assessments and nullified the accuracy-increasing effect of considering empirically supported risk factors. Specifically, individuals with a psychiatric diagnosis had a significantly lower probability of receiving a low-risk judgment. Furthermore, when a diagnosis was present, the risk assessment accuracy did not depend on the consideration of empirically supported risk factors, suggesting that individuals with a diagnosis have a lower probability of being released, even though such diagnoses do not reliably predict reoffending.
These results point toward the substantial influence of psychiatric disorders on the direction and accuracy of forensic risk assessment of individuals convicted of sexual offenses. Importantly, follow-up analyses indicated that only psychiatric diagnoses considered as recidivism-relevant, based on the current status of scientific knowledge, but not those considered recidivism-irrelevant significantly predicted the direction of risk assessment, with a decrease in the probability to receive a low-risk judgment if a recidivism-relevant diagnosis was present. In addition, the presence of a recidivism-relevant diagnosis reduced the probability for correct recidivism predictions. However, the mere presence of a diagnosis (whether considered recidivism-relevant or not) did not predict recidivism in the current study, supporting existing findings that question the predictive validity of mental health disorders for recidivism risk.
Based on these findings and theoretical considerations, several clinical implications emerge for ensuring evidence-based, objective risk assessment. First, a systematic consideration of a structured risk assessment approach helps assessors to differentiate between empirically supported and unsupported risk factors. Standardized instruments emphasize the relevance of included predictors while excluding irrelevant ones. Second, continuous training –particularly regarding which factors are empirically linked to recidivism and which are not–is of crucial for all persons involved in forensic diagnostics and risk assessment processes. Finally, risk assessment judgments should not exclusively rely on clinical diagnoses.
Limitations
Despite the considerable strengths of the present study, including the large sample size in comparison to previous studies and the determination of interrater reliabilities, the results should be interpreted in light of some limitations. As all variables were retrospectively extracted from risk assessment reports, causal interpretations should be made with caution. The data permit only correlational inferences between predictors and outcomes. Furthermore, the analysis of hit rates inherently excluded individuals who were not discharged following risk assessment (see also Wertz et al., 2018). This prohibits a comprehensive examination of the accuracy of prognostic decisions across the full sample.
Regarding hit rates, it should further be noted that, for analytical purposes, risk predictions were dichotomized as low risk and high risk. In practice, however, expert witnesses typically differentiate between multiple levels of risk. As such, a low-risk judgment may still be considered accurate if an individual reoffends, provided the assessed risk was lower than that of those classified as high-risk.
In addition, risk factors were coded as considered only if their influence on recidivism risk was explicitly mentioned in the report. While it is possible that other variables were implicitly considered by expert witnesses, the reasoning behind risk judgments should be transparent and traceable. Therefore, variables not explicitly discussed were classified as not considered.
Another limitation concerns the small number of individuals recidivating with a sexual offense (n = 11). While this is desirable in practice, it precluded a differentiation of general and serious (i.e., violent, sexualized) reoffending in our regression analyses. As recidivism with serious offenses is of particular practical relevance, such offenses may be associated with distinct risk factors (Babchishin et al., 2016), future studies with larger samples should test the significant predictors from the present study specifically among individuals recidivating with serious offenses.
Finally, research on empirically supported and unsupported risk factors has progressed substantially throughout the past two decades. This implies that risk factors considered empirically (un)supported may have changed during the sampling period of 17 years. Crucially, however, while best practice approaches may have evolved over time, a retrospective analysis of the aspects contributing to unsatisfactory prediction accuracy–considering both state-of-the-art empirically supported and unsupported risk factors–over such an extensive sampling period can offer representative and valuable insights into the origins of the heterogenous quality observed in forensic risk assessment.
Conclusion
The present study investigated the influence of subject-related characteristics on risk assessment in a German sample of individuals convicted of sexual offenses and elucidated potential origins of the observed qualitative heterogeneity in these reports. The results replicate previous findings of substantial variability across risk assessment reports (Haarig et al., 2012; Kunzl & Pfaefflin, 2011; Wertz et al., 2020) and demonstrate that a substantial number of empirically supported risk factors remains insufficiently discussed by expert witnesses.
The use of risk assessment instruments contributed to an empirically driven risk assessment, thereby highlighting the need to follow a standardized and structured approach. At the same time, the application of such instruments did not prevent expert witnesses from considering unsupported risk factors. Furthermore, the presence of psychiatric disorders significantly reduced the probability of low-risk judgments and eliminated the accuracy-improving effect of comprehensively considering supported risk factors, even when such disorders were considered recidivism-irrelevant.
Taken together, our findings provide further support for the use of structured and standardized risk assessment procedures. At the same time, they highlight that even empirically driven assessments remain vulnerable to the influence of psychiatric diagnoses on both prognostic direction and accuracy. The presented findings therefore caution expert witnesses not only against incorporating empirically unsupported risk factors but also against clinically overriding inferences derived from structured assessment approaches.
Supplemental Material
sj-docx-1-cjb-10.1177_00938548251397535 – Supplemental material for The Consideration of Empirically Unsupported Risk Factors in Risk Assessment Reports About Individuals Convicted of Sexual Offenses
Supplemental material, sj-docx-1-cjb-10.1177_00938548251397535 for The Consideration of Empirically Unsupported Risk Factors in Risk Assessment Reports About Individuals Convicted of Sexual Offenses by Maximilian Wertz, Maren Giersiepen, Kolja Schiltz and Martin Rettenberger in Criminal Justice and Behavior
Footnotes
Authors’ Note:
The authors would like to thank Tobias Kalenscher (Heinrich-Heine-University Düsseldorf) for his support in conducting the present study. The authors declare that they have no conflict of interest to disclose. The data are available on request due to privacy/ethical restrictions. This study was not preregistered. Authors state no funding involved.
Supplemental Material
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
