Abstract
In personality psychology, questionnaires are an established tool for assessing psychological traits. In forensic risk assessment, however, their use is often met with skepticism. The aim of this review is to critically evaluate the role of self-report information for assessing the risk of sexual recidivism. Focusing on individuals convicted of sexual offending, about 500 publications were identified through a systematic, string-based search across three electronic databases. The final sample was constituted by 95 publications that met the inclusion criteria – empirical studies using or investigating self-report measures in sexual risk assessment, with a minimum of 50 participants. Various risk-relevant constructs assessed by self-reports were examined. The results predominantly support the validity of self-report measures, particularly for assessing sexuality-related constructs, offense-supportive cognitions, prior offenses, and aggression. Comparing self-reports to other instruments showed some unique variance to the prediction of recidivism. The association with desirable responding was found to have an overall small effect size. Additionally, social desirability often emerged not as a response bias but as a risk-relevant trait. However, contextual factors, such as confidentiality and incentives, may significantly influence response distortion, presenting limitation for their use in high-stakes forensic decision-making. Overall, self-reported information appears to be a valuable complement to other assessment methods, significantly contributing to the prediction of recidivism. Nevertheless, the relationship between self-reports, contextual factors, and offender characteristics should be carefully considered when selecting the most appropriate assessment method. The findings of this study, along with its limitations and implications for future research, are discussed.
Introduction
The multimodal approach is a common and desirable methodology in psychological assessment, as it has been shown to greatly enhance the validity of different measures (Haynes et al., 1995; Walters, 2006) and to help avoid methodological deficits (Brewer & Hunter, 2006). This is particularly relevant in settings where high-stake decisions are made based on psychological assessments. Clinical, forensic, or correctional settings, where societally and security-relevant decisions might entail significant consequences, highlight the importance of thorough and accurate assessment of relevant constructs. Within the frame of this much-recommended methodology of multimodal assessment, some methods are more established than others: More objective data such as official records or external reports are frequently used, and their validity is predominantly relied upon, whereas self-report measures are used less commonly – in part due to ongoing concerns regarding their validity (Neal et al., 2020; Nunes & Jung, 2012). Prior work has documented discrepancies between self-reported and official records of offending (Kirk, 2006; Lauritsen, 2017), raising questions about the accuracy of retrospective self-reporting in forensic contexts.
One key reason for the comparatively limited use of self-report instruments in risk assessment is the potential for response distortion. Response biases, such as socially desirable responding (SDR), may be particularly prevalent in forensic contexts, where truthful answers may have legal consequences (Cooper, 2005). Another significant limitation is their dependence on personal information, which assumes that individuals are aware of and can report critical data – an assumption that may not apply to many mental processes. Furthermore, offender populations may struggle with introspection due to common cognitive biases, which may hinder accurate reporting. Additional challenges to the validity of self-reports include difficulties with memory recall and potential issues with comprehension.
Although the subjectivity of self-reports is its most controversial factor, it can also be seen as an advantage: The possibility of using the insights into cognitive processes, attitudes, and unrecorded criminal behavior that only an offender himself knows is important (Haapasalo & Moilanen, 2004). Additionally, there is some evidence that social desirability is no confounding variable, but it rather may contribute relevant information to the prediction of recidivism (Mathie & Wakeling, 2011; Mills et al., 2003). Furthermore, the need for multimodal diagnostics provides another reason to further investigate the usefulness and validity of self-report instruments in forensic contexts.
In the forensic setting, self-report measures are primarily used to assess personality traits, cognitions, and attitudes in offenders’ risk assessment and treatment evaluation. A particularly important field of forensic diagnostics is the assessment of recidivism risk among offenders who may commit especially severe crimes, particularly violent and sexual offenses. The risk factors for sexual recidivism may differ to some extent from those indicative of general reoffending, implying the need for separate risk assessment of violent, sexual, and general offenders (Hanson & Bussière, 1998). Self-reporting is primarily used to assess dynamic risk factors, but there is also evidence that static risk factors may be validly assessed through self-report (e.g., criminal history; Thornberry & Krohn, 2000). More specifically, previous research has identified several risk factors for sexual reoffending, including indicators of sexual deviance (Abel et al., 1988; Haywood et al., 1990), sexual motives and intimacy (Ward et al., 1993), treatment participation and success (Hanson & Bussière, 1998; Scalora & Garbin, 2003), and antisocial or psychopathic personality (Hanson & Harris, 1998). The latter also found problems during supervision, less social support, poor self-regulation skills, and increased anger to predict sexual recidivism. In their updated meta-analysis, Hanson and Morton-Bourgon (2004) summarized predictors of sexual recidivism with regards to instruments used as well as the setting. While their findings remain influential, the study also highlights the relevance of more recent reviews synthesizing current evidence.
Self-report instruments are well established in many areas of psychology, such as personality, clinical, and social psychology, and a large proportion of empirical studies rely on self-reported information as a primary data source. In forensic contexts, however, decisions based on such information can carry substantial legal, ethical, or public safety consequences – such as sentencing, parole, and risk management in individuals convicted of sexual offenses (ICSO). Therefore, the validity of self-reported data requires particularly rigorous scrutiny in these settings (Venn, 2023). To address this need, the present review focuses on four key questions: First, can self-reported information be used to validly predict recidivism in offender populations? This question is central to determining whether self-reports can serve as a standalone or supplementary tool in risk assessment. Second, to what extent do self-report instruments overlap with other assessment methods, such as clinical interviews, file-based evaluations, and official records? Exploring this overlap sheds light on whether self-reports provide unique information or merely replicate what is captured through other means. Third, how strongly is self-reported information influenced by response distortions, such as SDR? Understanding the susceptibility of self-reports to manipulation is crucial for evaluating their trustworthiness in forensic decision-making. Lastly fourth, is the validity of self-report instruments context-dependent, for example, differing between anonymous applications and high-stake decision-making settings? This question addresses whether the reliability of self-report data changes depending on the circumstances under which it is collected. To explore these questions, we conducted a structured qualitative systematic review, aiming to synthesize and critically reflect upon the empirical findings related to each of these aspects.
Method
Search Strategy
The systematic literature search was conducted between October and November 2020 and limited to publications retrieved from three electronic databases: Frankfurt University Search Portal (Frankfurter Suchportal – hebis), PSYNDEX/PubPsych, and PubMed. This predefined time frame was selected to ensure methodological consistency and replicability. After the publications were extracted, an additional reverse search was conducted using the reference lists of the relevant publications collected. Altogether, 1486 hits consistent with the search syntax were yielded (duplicates included). The literature was searched using the search string sex* offender* OR child molester* AND self report* OR questionnaire AND recidivism OR risk assessment.
Study Selection
The initially extracted 473 publications were evaluated using several inclusion criteria. To be eligible for inclusion, articles had to be publicly accessible via a scientific database between October and November 2020, and written in either English or German. Following an initial screening, additional eligibility constraints were applied (see Box 1), relating to sample characteristics, measurement methods, topic relevance, publication type, and publication date. Regarding topic relevance, only studies that focused on the assessment or prediction of recidivism were eligible for inclusion. The focus was on ICSOs, though studies investigating other offender types were included as comparing offender types was expected to provide further insight into the matter. Studies were excluded if they focused on unrelated constructs, did not include recidivism as a primary or secondary outcome, or did not pertain to a forensic or correctional context. With respect to the type of measurement, only studies that used standardized self-report questionnaires to assess risk-relevant constructs (e.g., cognitive distortions, sexual deviance) were included. Studies that relied solely on interviews, clinical ratings, or observational data, without incorporating self-reports, were excluded. Additionally, the outcome variable (recidivism) had to be assessed via official records (e.g., reconviction data).
Definition of Inclusion and Exclusion Criteria.
After being selected from the identified literature due to their potential relevance, the initial corpus of 473 publications was reviewed as documented in the flow diagram (Figure 1). There were several reasons for excluding the 388 publications, the most common being sample size or type (179 articles; see Figure 1). Other important exclusion factors were the topic relevance as well as type of measurement or using a self-report measure in the study. The process of setting the exclusion criteria involved repeated adjustment of the criteria due to the explorative nature of this study.

PRISMA flow diagram of the systematic review search and exclusion process (based on Moher et al., 2009).
Ultimately, 85 publications met the criteria and were included in the review. An additional systematic database screening was conducted in October 2025 to ensure the review’s currency and relevance, resulting in 227 new records identified. Following the identical search strategy and the multi-step screening process detailed below, 10 additional publications were identified, raising the final count to 95 included studies. The total number of participants considered in the empirical studies included is 130,664 participants, giving an average of approximately 1,375 participants per study. The average follow-up time of the 39 studies identified as follow-up studies was roughly 6.01 years, ranging from 6 months to 20 years. In this review, the interpretation of the effect sizes reported in the studies follow Cohen’s conventional benchmarks for small, medium, and large effects (Cohen, 1988).
Thematic categories were developed through inductive synthesis based on recurring constructs, methodological characteristics, and outcomes reported in the included studies. Studies were grouped thematically based on similarities in reported risk factors, measurement strategies, instruments applied, or contextual influences on self-report validity. A total of N = 67 studies focused on the predictive validity of self-report instruments, N = 64 on aspects of their convergent and incremental validity, N = 43 on response distortion, and N = 22 on the role of setting variables, whereby a substantial number of publications (N = 68) addresses multiple themes simultaneously. Conclusions were derived by evaluating the number of studies supporting a finding, consistency of effects, and methodological quality indicators (e.g., sample size, follow-up design, use of validated instruments).
Review
Predictive Validity of Self-Report Instruments
Since the validity of the recidivism risk prediction is primarily determined by the constructs measured and the instruments used, it is essential that the constructs themselves show predictive validity. This highlights the importance of examining self-report measures regarding their functional relationship with risk-relevant constructs. Research suggests self-regulation problems, family factors, offense-supportive cognitions, emotional congruence with children, and lifestyle impulsivity as risk factors for sexual recidivism (Kroner & Loza, 2001; Wakeling et al., 2011) and will therefore be addressed below.
Limitations of Self-Report Measures in Risk Prediction
Firestone et al. (2000) suggested that self-reports may not be useful tools for predicting recidivism, which was supported by Wakeling et al. (2011), reporting that static risk factors may be more predictive of recidivism than self-reported dynamic risk factors. Although some self-reported information may be informative and descriptive of the offense-related conditions – particularly for child sexual abuse (Firestone et al., 2005) – they appear to be too transparent and face-valid as well as insensitive to sexual recidivism, thus may not be sufficient as its only predictor. The issue of face-validity and possible response distortion is especially relevant for personality self-report measures in forensic contexts, so many researchers advise against their standalone use in risk assessment (Firestone et al., 2000; Nunes & Jung, 2012). Although a particular value of the risk factor aggression and anger was pointed out (Firestone et al., 2005), it should be emphasized that their predictive validity may depend on the offending type (Shechory & Ben-David, 2005) and that the overall evidence regarding their relationship with sexual recidivism was mixed (Boccaccini et al., 2010, 2013; Pettersen et al., 2015).
Mixed Evidence on Cognitions, Attitudes, and Empathy
Olver et al. (2014) suggest that changes in self-reported attitudes and cognitions may not be significant predictors of recidivism as in their study: Pre- as well as post-treatment scores on self-report measures were only weakly and inconsistently related to all kinds of recidivism. The predictive validity of self-reported constructs such as cognitive distortions, as well as related constructs such as (victim) empathy, appears to vary depending on the type of offender and the specific self-report instrument used. For instance, cognitive distortions (and deviant sexual fantasies) have shown stronger associations with recidivism in contact child molesters than in offenders with non-contact offenses or mixed criminal histories (Beggs & Grace, 2011; Elliott et al., 2013, 2019; Firestone et al., 2000). Similarly, empathy deficits and victim-blaming cognitions were more predictive of reoffending among child molesters than among rapists or non-sexual offenders (Arkowitz & Vess, 2003; Marshall et al., 2001).
Nunes and Jung (2012) noted that it might be difficult for a self-report measure to accurately assess minimization or denial. However, Tierney and McCabe (2001a) found cognitions and attitudes to be relevant risk factors for recidivism, and the value of their self-report assessment has been supported by Mills, Kroner, and Hemmati (2004) as well as Rodrigues et al. (2016). Relatedly, Walters et al. (2015) found self-reported criminal thinking to have predictive validity and thus value in risk management and treatment planning, which was supported by various studies (Allan et al., 2007; Mills & Kroner, 2003). For instance, Schippers and Smid (2020) found high-risk rapists to display more explicit sexism and hostile attitudes toward women than community males. However, other specific measures of hostile cognitions, such as self-reported Rape Myth Acceptance, have been found to lack significant predictive validity for sexual or violent recidivism in men convicted of rape (Freudenthaler & Eher, 2024). The findings of Thornton et al. (2004) indicate a moderate association between low self-esteem and higher rates of sexual recidivism. Conversely, Loinaz et al. (2021) reported that self-report measures of self-esteem, empathy, and attachment often contradicted theoretical risk assumptions (e.g., showing high self-esteem), questioning the validity of self-report measures in forensic contexts. Self-report measures of emotional identification with children were found to highly correlate with risk of sexual recidivism (McPhail et al., 2018).
Predictive Potential of Specific Self-Reported Constructs
Nunes et al. (2013) showed that ICSO who reported being sexually abused as children displayed significantly more indicators of pedophilia. Expanding on this, Breiling et al. (2020) found that an accumulated score of self-reported early sexual experiences (Sexual Biographical Index) was a significant predictor of sexual recidivism, demonstrating substantial predictive utility. The constructs of impulsivity (Craig et al., 2004) and self-control (Grieger et al., 2012) were also identified as valid predictors of recidivism, whereas self-reported locus of control was shown to be a valid assessment of sex offenders’ blame attribution (Huntley et al., 2012), which is associated with risk of reconviction (Cima et al., 2007). Gillespie et al. (2015) found self-reported psychopathy scales to be valid indicators of psychopathic traits, linked to recidivism and particularly violent reoffending (Hiscoke et al., 2003), but nevertheless cautioned about the assessment of psychopathy, as dishonest responses may be common among offenders with a psychopathic personality disorder. However, recent evidence demonstrates that while psychopathy self-reports are significant predictors of general (felony) recidivism, they lack significant utility for predicting violent reoffending (Allen et al., 2024). This aligns with findings suggesting that interpreting specific psychopathy facet scores – such as the Antisocial Behavior Facet – on self-report measures predicts outcome more precisely than relying solely on the conventional total score (Gabriel et al., 2024). This precision, however, is limited to general crime: Ruchensky et al. (2025) reported that antisocial personality facets (Callousness, Risk Taking, Hostility) modestly predicted violent and non-violent recidivism as well as supervision violations, but confirmed their lack of predictive utility for sexual recidivism. Self-reported alcohol abuse, which has been found to be common among ICSOs (Aromäki & Lindman, 2001), was a significant predictor of sexual recidivism in the study of Firestone et al. (2000), with a weak effect. Self-reported previous suicide attempts were found to be unrelated to both sexual and violent recidivism (Monaghan et al., 2025). Stephens, Seto, et al. (2017) concluded that self-reported sexual interest represents an important but limited facet of sexuality assessment. In their study, only a minority of individuals who had sexually offended admitted to a pedophilic or hebephilic sexual interest, even though convergent indicators (e.g., phallometric data and victim information) suggested the presence of such interest in a larger proportion of cases, underscoring the tendency of self-report measures to underestimate atypical sexual interests. Self-reported sexual deviance was shown to be a valid assessment of deviant sexual interests and behaviors in ICSOs (Holland et al., 2000). Bartels et al. (2019) summarized research on the relationship between self-reported sexual fantasies and recidivism risk in ICSOs, suggesting that frequent sexual fantasies may predict sexual recidivism and be validly indicated by sexual fantasy questionnaires, though Seifert et al. (2017) caution about the impact of response bias on self-reported sexual deviance. In addition, self-reported use of deviant pornography was shown to predict general and sexual recidivism (Kingston et al., 2008).
Validity of Self-Reported Criminal History
The frequency and prevalence of offending, as well as the type and severity of offense, were found to be validly assessed by self-reports by Payne and Piquero (2016) as well as Theobald et al. (2014). Cardona et al. (2020) found emotional instability to predict the severity of violence in both sexual as well as non-sexual crimes. Pham et al. (2020) found self-reported offenses to predict recidivism, although Payne and Piquero (2016) mentioned that the validity of self-reports might vary depending on the seriousness of the crime, with self-reports of major crimes being less valid. Mainly consistent with this, Widman et al. (2013) – despite an issue of underreporting – demonstrated that self-report measures captured many self-reported offenses with almost 70% of the offenders endorsing previous (sexually) delinquent behavior, highlighting their potential use in forensic contexts.
Predictive Utility of Self-Reported Treatment Motivation and Change
Findings regarding the predictive validity of self-reported treatment motivation remain inconsistent (Drieschner & Boomsma, 2008). Beggs and Grace (2011) demonstrated the predictive validity of self-reported treatment change (deviation between pre- and post-treatment scores in risk factors) in offender populations, showing that effective treatment targeting dynamic factors was able to reduce (sexual) recidivism. More specifically, self-reported treatment changes of hostility measures, as well as aggression and anger, were found to be significantly correlated with recidivism in the studies of Hornsveld et al. (2014) and Pettersen et al. (2015). Kingston et al. (2012) reported that self-reported hostility was the sole statistically significant moderator of the relationship between hormones and sexual as well as violent recidivism. Contradicting some earlier findings, a large-sample longitudinal study found that both static and dynamic change scores in self-reported criminal attitudes significantly predicted violent and general recidivism over a 14.5-year follow-up, although they remained non-predictive of sexual recidivism (Olver et al., 2021). Generally, it is advised to consider the application of self-report measures with caution and to use pre- rather than post-treatment scores, as SDR has a greater impact on the latter (Beggs & Grace, 2011; Gannon & Polaschek, 2005; Howard & van Doorn, 2018; Mathie & Wakeling, 2011; Olver et al., 2014). Changes in self-reported antisocial attitudes were not found to validly predict recidivism (Howard & van Doorn, 2018; Kroner & Yessine, 2013). Since both general self-reported treatment change and change scores of specific risk factors were found to be associated with recidivism, and thus treatment change may in part be considered predictive, it is reasonable to assign particular importance to rapidly changing constructs and dynamic risk factors, and to assume that responsiveness to treatment is therefore crucial for treatment success in correctional settings.
The overall findings highlight the relevance of developing and utilizing self-prediction instruments, that is, self-report measures specifically designed for forensic risk assessment. Unlike general self-report instruments capturing isolated psychological traits or attitudes, these instruments assess a structured set of criminogenic risk factors to estimate recidivism risk based on the individual’s responses. Such tools are suggested to replace universal self-report measures (Loza et al., 2007), with the Self-Appraisal Questionnaire (SAQ; Loza, 2018) being a prominent example. While self-reported risk factors have shown some utility in predicting general reoffending, their predictive value appears limited in the context of sexual recidivism (Boccaccini et al., 2010; Olver et al., 2014). In general, sexual recidivism appears to be more difficult to predict, likely due to the greater differentiation in risk factors contributing to sexual offenses, necessitating a more specific risk assessment. Thus, to yield the most significant results, the design of a self-report measure should not only focus on risk assessment, but also on the prediction of sexual recidivism.
Convergent and Incremental Validity of Self-Report Instruments
Self-report measures are only one of many commonly used methods, some of which are generally considered to be more valid risk assessment tools than self-reports, emphasizing the need for a comparison between the self-report method and several other instruments in terms of convergent and incremental validity.
Self-Reports’ Convergent and Incremental Validity Regarding Official Records
Due to their objectivity, official data are the main source for information such as criminal history, and other static risk factors. However, it has been argued that official records only include arrests, whereas non-convicted crimes or undetected delinquent behavior cannot be considered when relying solely on official records (Scurich & John, 2019). A substantial body of research supports the convergent validity of self-report measures, as they often yield results comparable to official data. For instance, Kroner and Loza (2001) found self-report measures to validly assess even sensitive offending information, as well as the frequency of reoffenses, as supported by Woessner and Hefner (2020), and Firestone et al. (2005) have also demonstrated strong agreement between self-reported and file-based criminal history. At the same time, the degree of overlap varies by offense type: Payne and Piquero (2016) showed that minor offenses, such as assault, were more frequently overreported, whereas severe offenses tended to be underreported, which is why there were suggested to be useful for disclosing an offender’s criminal history, particularly in cases of less severe crimes. Importantly, even when convergence is high, self-report measures also show incremental validity. They capture additional, unrecorded offenses (Theobald et al., 2014) and other risk-relevant constructs – for example, Mills and Kroner (2003) reported that self-reported antisocial orientation predicted recidivism beyond official data. However, this incremental value appears context-dependent: Self-reported criminal behavior demonstrated strong convergence with other self-reports and recidivism risk scales but was found to be unrelated to the actual number of violent convictions in official records by Nunes et al. (2021). Pham et al. (2020) found that offenders overreported as often as they underreported, indicating unsystematic error variance rather than a systematic bias of dissimulation.
Self-Reports’ Convergent and Incremental Validity Regarding Actuarial Measures
Regarding actuarial measures, specifically designed for the purpose of offender risk assessment, Pham et al. (2020) found that the concordance between the Static-99 (Hanson & Thornton, 2000) and self-reported information was mostly high. In particular, the constructs of self-reported aggression, dominance, and sexual variables were found to significantly add to the prediction of recidivism already provided by actuarial risk measures (Allan et al., 2007; Boccaccini et al., 2010; Craig et al., 2006; Firestone et al., 2005; Olver et al., 2014). However, self-reported early sexual behavior was found to correlate significantly though weakly with the Static-99 and to demonstrate no incremental predictive utility for sexual recidivism (Breiling et al., 2020). Although self-reported criminal thinking and attitudes appeared to be predictive of recidivism, the actuarial measures were better at its prediction (Craig et al., 2004; Mills, Anderson, & Kroner, 2004; Walters et al., 2015), which was not supported by Rodrigues et al. (2016). Self-prediction measures (Kroner et al., 2020) were found to be strongly correlated with actuarial measures yet did not have incremental validity (Kroner & Loza, 2001), which was attributed to their similarity, with mostly significant yet small correlations (Loza et al., 2004), suggesting applying the SAQ as a substitute for an actuarial risk measure rather than a complement. Overall, Beggs and Grace (2011) reported that dynamic risk factors such as sexual interests, hostility, and offense-supportive attitudes were often found to have incremental validity in predicting recidivism over actuarial risk measures. Supporting this, Olver et al. (2021) found that self-report changes in criminal attitudes converged moderately with general risk factors as assessed by the Static-99R, and demonstrated significant incremental predictive validity for the reduction of violent recidivism beyond those established measures. Likewise, Wakeling et al. (2011) demonstrated that including self-reported dynamic risk factors such as sexual deviance can improve the prediction accuracy in risk assessment. Gardner and Boccaccini (2017) as well as Watts et al. (2015) reported strong correlations between the Hare Psychopathy Checklist-Revised (PCL-R; Hare, 1991) and self-report measures of psychopathy, and both were found to have incremental validity beyond the other method (Buffington-Vollum et al., 2002), though the correlations vary depending on the measure used (Arkowitz & Vess, 2003). This pattern of convergence was recently corroborated by Allen et al. (2024), who reported significant correlations across all expert-rated and self-report psychopathy measures; nevertheless, they found that the expert rating demonstrated a clear superiority in predicting violent recidivism.
Self-Reports’ Convergent and Incremental Validity Regarding Attentional and Implicit Measures
Regarding the association between self-report instruments and attentional measures – mainly viewing time tasks used to assess sexual interest, orientation, and arousal – Babchishin et al. (2014) and Keown et al. (2010) found small and mostly insignificant effects. Other implicit measures have predominantly been found to correlate with self-reports, with both uniquely contributing to the prediction of recidivism (Banse et al., 2010; McPhail et al., 2018; Welsch et al., 2020), though Kanters, Hornsveld, Nunes, Huijding, et al. (2016) questioned the self-report measures’ incremental validity and found their predictive validity to be highest when combined with implicit measures.
Self-Reports’ Convergent and Incremental Validity Regarding Physiological Measures
Concerning physiological measures, the phallometric method was found to yield predominantly moderate correlations with self-report measures of sexual interest and pedophilia (Stephens, Cantor et al., 2017; Stephens, Seto et al., 2017), though Graham (2002) did not report any significant correlation between the physiological measures used in his study – such as heart rate and electromyography – and participants’ self-reported involvement in any kind of sexual offending. Laws et al. (2000) showed that the self-report measure of sexual interest in children was the only measure that significantly improved the accuracy of risk assessment beyond other methods of pedophilic interest such as penile plethysmography.
Response Distortion in Self-Report Instruments
SDR is commonly considered to be a threat to the validity of self-report measures in forensic, clinical, and correctional settings (Wortley et al., 2019). The following section outlines how SDR may influence the accuracy of self-reported information in the context of forensic risk assessment.
Effects of Social Desirability in Forensic Self-Report Measures
Uzieblo et al. (2014) and Wood and Riggs (2009) found that self-reported psychopathy and cognitive distortions were negatively related to underreporting and social desirability. Even though offense-specific measures appear to be negatively related to social desirability, they appear to be less susceptible to SDR than the social-functioning measures which SDR had a greater effect on (Mathie & Wakeling, 2011). Moderate to strong correlations were found between SDR scores and self-reported responsibility, emotional loneliness, aggression, anger, hostility, vengeance, and blame attribution (Kingston et al., 2014). Although Mackaronis et al. (2011) reported that after excluding participants with high SDR scores, self-reported psychosexual traits remained unchanged, implying a negligible impact of social desirability on psychosexuality, Craig et al. (2006) found SDR to be correlated with cognitive distortions. Similarly, Allan et al. (2007) noted negative correlations between social desirability and sexual deviance as well as Static-99 scores, suggesting that higher scores of social desirability are associated with reduced sexual deviance and risk of reoffending. These findings suggest that the answers of ICSOs would still predict recidivism even if the subjects answered defensively, because scoring higher on SDR measures entailed lower self-reported sexual deviance yet also a lower risk of recidivism.
Mills and Kroner (2005) found impression management to be significantly negatively correlated with antisocial attitudes with a small effect, yet no significant changes were found in the relationship between self-reported antisocial attitudes and recidivism risk when the variance accounted for by SDR was removed (Mills & Kroner, 2005, 2006). No significant impact of SDR was found for self-reported treatment motivation (Drieschner & Boomsma, 2008) and frequency of sexual offending (Carvalho & Nobre, 2019). The effects of SDR were predominantly smaller than assumed (Mathie & Wakeling, 2011; Stevens et al., 2016).
Validity Scales and Other Techniques to Reduce SDR
Different kinds of validity scales have been regarded as valid measures of social desirability (Stevens et al., 2016). In the study by Mills et al. (2003), the SAQ – that includes an internal validity subscale addressing careless or inconsistent responding – was correlated with an external SDR measure (BIDR; Paulhus, 1998). Although significant associations between the SAQ and SDR indicators were observed, social desirability did not reduce the SAQ’s predictive validity for general and violent recidivism. This predictive power of self-report measures despite the presence of SDR may be explained by the correlation between SDR and recidivism itself, meaning that removing SDR-related variance could eliminate important predictive information (Mills et al., 2003). The authors caution that by eliminating self-report information from participants with elevated SDR scores, valuable information about risk, and recidivism may ultimately be excluded from risk assessment. Further supporting the integrity of such data, post-treatment changes in self-reported criminal attitudes were found to be largely independent of cognitive ability, suggesting that the self-report captured genuine therapeutic progress rather than improved self-presentation by higher-functioning clients (Olver et al., 2021). Loza et al. (2007) showed that correlations between SDR and self-prediction measures were insignificant, suggesting that SDR does not affect the self-report measure. Gardner and Boccaccini (2017) note that validity scales may have greater utility in forensic contexts as the incentives common to a correctional setting may lead offenders to provide more socially desirable responses.
To increase self-report measures’ resistance against SDR, several techniques, designs, and methods were devised, mostly referring to its item design and response patterns (Arkowitz & Vess, 2003; Bumby, 1996; Wright & Schneider, 2004). Other techniques like the Randomized Responding Technique (RTT; Warner, 1965), aiming to guarantee confidentiality to the respondent and thus restricting the need for SDR (Miner & Center, 2008), or the extraction of additional information from response patterns (Holland et al., 2000) may be considered negligible until further research provides evidence supporting its utility in self-report measures and risk assessment contexts. Predominantly in the treatment context, polygraphs are used to validate self-reported information as they have been commonly assumed to have rather good validity and in meta-analyses have been found to get offenders to disclose six times as many victims and prior sexual offenses than without a polygraph (Hindmann & Peters, 2001). Relatedly, the aim of the postconviction sexual offender polygraph testing is to verify the accuracy and completeness of self-reported criminal behavior, if its application leads to an increased amount of information disclosed by offenders. However, given the high-stake nature of forensic decision-making, confidence in polygraph results has been deemed unrealistic, and their use is recommended with caution. The finding that cognitive distortions are more likely to be reported if offenders are attached to a pseudo lie detector (Gannon et al., 2007) speaks in favor of the use of (bogus) lie detectors, though not even high-risk ICSOs believing that they were attached to a lie detector revealed a general agreement with cognitive distortion items.
Challenges and Implications of SDR in Self-Report Risk Assessments
Nonetheless, these results suggest that SDR may indeed have an impact on cognitive distortion endorsements as well as on overall responding and that the use of the bogus pipeline can result in more honest responding. As is suggested by evidence, controlling for SDR does not always equal a higher validity of the self-report and may even remove informative and additionally predictive data of people whose responses were in fact honest (Mills et al., 2003). Therefore, measuring SDR with validity scales without knowing about its structure or definition, and subsequently removing its variance from an equation, appears to be prematurely, highlighting the need for a cautious application of control measures.
All in all, the question remains whether nonsignificant changes in the relationship between risk factors and recidivism after the removal of SDR are due to its lacking impact or methodological problems (Bartels et al., 2019). It may also be due to the significant correlation between SDR and self-reported dynamic risk without influencing their relationship with actual recidivism, suggesting that it should indeed be viewed as a stable personality trait (Mills et al., 2003) and that a self-reports’ association with SDR does not necessarily compromise its validity. Furthermore, if the relationship between self-reported construct and recidivism is found to decrease after the removal of SDR, social desirability may act as a construct uniquely contributing to the prediction of an outcome criterion. Both the impact of SDR on self-report measures and the frequency of response bias and answering socially desirable in offender populations appear to be smaller than assumed (Mathie & Wakeling, 2011), implying that self-report methods may be valid for individuals with both high and low SDR scores. Furthermore, social desirability may be seen as a personality trait that is inherently associated with a reduced risk of recidivism, as offenders scoring higher on SDR scales tend to have a lower risk of recidivism.
Setting Variables and Self-Report Instruments
With respect to risk assessment and the use of self-reports in a forensic setting, contextual factors also play a critical role. In particular, the distinction between research and forensic settings raises significant challenges, as the latter involve high-stake evaluations for parole, release, risk assessment, or treatment effectiveness. However, this kind of setting is difficult to maintain if the objective is, in fact, research. Looking closely, these two settings differ regarding several aspects, such as granting confidentiality, anonymity and immunity or potential incentives, and consequences, whose impact on the validity of self-reports needs to be examined.
Impact of Research- Versus Forensic Settings on Self-Report Validity
Most included studies were conducted solely for research purposes so that granting confidentiality to the participants was mandatory due to ethical research guidelines (Fernandez & Marshall, 2003; Kingston et al., 2017; Rodrigues et al., 2016). Even though some doubts about the transferability of self-report validations into actual adversarial or correctional settings have been addressed – as exemplarily expressed by Pham et al. (2020), granting confidentiality to their participants – there is not much research on this topic. However, ICSOs with anonymity granted were found to endorse significantly more distorted statements than parole-eligible ICSOs, though they still reported higher cognitive distortions than community males (Gannon & Polaschek, 2005), and the additional use of immunity agreements to obtain the least distorted answers possible is recommended. Kanters, Hornsveld, Nunes, Zwets, et al. (2016) suggested that especially inpatients may still feel suspicious about potential legal consequences.
Loza et al. (2004) found the SAQ to have sound psychometric properties in authentic forensic settings, and Mills et al. (2003) demonstrated the instrument’s unchanged predictive value despite its significant correlation with setting-related response distortions and SDR. In 2007, Loza et al. conducted another study testing the self-report measure’s validity in two different settings – one without any consequences and one with potential prerelease – and found insignificant correlations between the SAQ scores and impression management and self-deceptive enhancement, suggesting that social desirability did not influence the offenders’ self-reported responses. Building on this, the validation of self-report measures of criminal and antisocial attitudes demonstrated their assessment equivalently across diverse, demanding forensic contexts, supporting its equivalence regardless of potential high-stakes setting bias (Mills et al., 2025).
Effects of Setting Variables, Incentives, and Relationship With Test Administrator
Bartels et al. (2019) pointed out that in a forensic setting, there may be few incentives for truthfully reporting deviancy and delinquency. On the contrary, there may be many incentives for an offender to “fake good” such as negative legal consequences. In their non-incentivized research sample, Watts et al. (2015) found that statistically controlling for SDR in the self-report assessment of psychopathy lead to smaller associations with recidivism. They did not, however, find SDR to substantially limit the validity of psychopathy self-report measures, though they highlighted the possibility that, in authentic correctional settings, incentives like evaluations for parole may have a substantial impact on the offenders’ answers. In terms of treatment evaluations, the incentive of demonstrating a successful treatment has been highlighted (Beggs & Grace, 2011). Kaplan et al. (1990) found a supposed impact of the test coordinator – which, in a forensic setting may be the parole officer, therapist, or prison officer – on the extent of SDR depending on the offender’s relationship with this person, highlighting the importance of clarifying the legal obligations of therapists and parole officers.
Discussion
In forensic contexts, where decisions often carry significant consequences, it is essential to critically evaluating the validity of self-reported information, usually assessed with questionnaires. Therefore, the systematic review aimed to synthesize current empirical evidence concerning the role of self-report information. The results suggest that the usefulness of self-report instruments in forensic and legal settings cannot be answered with a simple yes or no – it depends on the context.
Focusing on the first question to evaluate the predictive validity of self-report instruments, we found that the overall ability of self-reports to predict outcome depends largely on the types of constructs and offenders measured, or the instrument used. Whenever focusing on risk-relevant constructs such as criminal history, offense-supportive attitudes, and cognitive distortions, self-report instruments appeared to be predictively as well as incrementally valid and commonly empirically supported. Furthermore, mental disorders such as antisocial or psychopathic personality disorder and alcohol abuse have been found to be predictive of (general) recidivism (Allen et al., 2024; Aromäki & Lindman, 2001; Gillespie et al., 2015; Hiscoke et al., 2003), though research on mentally disordered offenders remains limited. However, findings regarding the constructs of denial, victim empathy, sexual fantasies, treatment motivation, and history of child sexual abuse offenses are inconclusive and require further research, whereas anxiety may not be a relevant risk factor for sexual recidivism.
The results highlight that the predictive validity of self-report instruments depends not only on the mode of assessment – in this case self-report – but also on the specific constructs that are designed to measure in relation to repeated sexual offending, meaning that the predictive value of self-reported information is inherently limited by the validity of the underlying construct. In this regard, only one questionnaire (Loza et al., 2004, 2007) is currently available to measure risk factors directly, indicating that the full potential of self-report methodology has yet to be realized. A further critical consideration is the heterogeneity among offender groups: It is important to note that more general offender groups on the one hand – consisting of sexual, violent, or mixed offenders – and the different kinds of sexual offenders on the other hand – such as rapists, online sexual offenders, and intra- versus extra-familial child sexual offenders – differ widely regarding risk-relevant factors, but also recidivism rates, and their proclivity to response biases. For instance, cognitive distortions and deviant sexual interest are particularly relevant in child molesters (Elliott et al., 2013, 2019), whereas general aggression, anger, and hostility are more predictive in rapists and violent offenders (Garofalo et al., 2018; Overholser & Beck, 1986). Online sexual offenders, by contrast, tend to show higher social anxiety and fewer antisocial traits, requiring different assessment foci (Carvalho & Nobre, 2019; Shechory & Ben-David, 2005; Webb et al., 2007). These differences highlight the importance of comparing offender types to identify the most appropriate assessment strategies – both in terms of relevant constructs and the selection of suitable instruments.
Another question central to the review was whether self-reported scores include specific information that adds incremental validity in the prediction of recidivism beyond general risk factors as well. The studies in this review point to the finding that self-reported dynamic and content-relevant risk factors such as aggression, hostility, offense-supportive attitudes, dominance, and sexuality-related constructs indeed add incremental validity to risk measures, which is consistent with the findings of a review by Farrington and Ttofi (2014). The self-reported assessment of risk factors shows strong correlations with actuarial instruments but does not add incremental predictive value, suggesting it could serve as a substitute when traditional tools are unavailable, such as in the absence of file information or when face-to-face contact is impossible (Kroner & Loza, 2001; Kroner et al., 2020; Loza et al., 2004). However, further research is needed to ensure their validity and reliability across diverse forensic settings. The finding that self-report measures can show incremental validity in predicting recidivism for certain constructs suggests that they may capture unique aspects of risk not accessible through other methods. When used within a multimethod approach, self-reports may therefore provide additional information by detecting different facets of a construct – particularly those related to internal states or subjective experiences. For example, cognitive distortions cannot be fully assessed through official records and are only partially captured by actuarial tools and therefore clinical interviews, despite the well-established role of cognitions, attitudes, and mental processes as key risk factors for reoffending (Kanters, Hornsveld, Nunes, Zwets, et al., 2016).
Concerning SDR, its overall impact on self-report validity appears small but varies by construct and offender type. Mills et al. (2003) found stronger impression management effects in sexual offenders (compared to non-sexual offenders), particularly regarding anger and criminal history. Mathie and Wakeling (2011) reported elevated bias only for cognitive distortion measures. Loinaz et al. (2020) observed reduced scores on antisocial attitudes among participants with high SDR, especially in those without psychiatric comorbidities. Furthermore, emerging evidence suggests that an individual’s ability to understand and apply socially desirable response strategies may itself carry predictive value for assessing the risk of recidivism potentially reflecting interpersonal manipulation or adaptive functioning that is relevant for risk management (Etzler et al., 2023).
Regarding the question whether there are setting-related and other moderating variables influencing the validity of self-report assessment, the review found that confidentiality, anonymity, and incentives such as evaluations for early parole play a crucial role for the outcome of the assessment. However, other findings indicate that potential (legal) consequences accompanying the disclosure of sensitive information may not be necessarily led to a distortion of the offenders’ answers (Loza et al., 2004, 2007) thus leaving open whether there are also specific settings in high-stake decisions that ensure an acceptable validity of self-reports.
There are several limitations to the publications and studies included in the present review. Primarily, most studies were conducted with rather small sample sizes (Gannon & Polaschek, 2005). This may not pose a significant problem when measuring risk factors in offender samples, as there is considerable overlap with clinical samples and therefore a small sample size might indicate a more thorough investigation (Ildstad & Evans, 2001). However, given the potential for social desirability bias, the small sample sizes may pose a problem as the test power is limited. As a way of addressing this issue and increasing the sample size, Tierney and McCabe (2001b) suggest analyzing archival data sets or the databases of treatment programs and clinical settings. Nevertheless, legal implications, immunity, and confidentiality are persisting issues that need to be considered, especially in clinical and forensic settings. Another related problematic aspect of this review is that even with large sample sizes, there is a bias to be expected in offenders that voluntarily take part in research studies without incentives, especially in long-term studies, as this type of offender may not be representative of a wider offender population. Additionally, the measures used, their psychometric properties, and the sample types differed greatly among the studies. Furthermore, the type of offender varied not only among the studies but also within studies, potentially compromising the comparability of studies and their transferability (Tierney & McCabe, 2001b). Several of the included studies employed mixed offender samples (Payne & Piquero, 2016), potentially limiting the specificity and generalizability of conclusions for ICSO. Such studies were included when subgroup-specific results were reported or when they addressed constructs highly relevant to sexual offending (e.g., social desirability, impulsivity, self-regulation). Given the considerable methodological heterogeneity across studies (e.g., variations in outcome metrics, offender populations, and self-report instruments), conducting a quantitative meta-analysis was deemed inappropriate at this stage. Instead, we employed a structured thematic synthesis to enable a more nuanced evaluation of the literature and to integrate findings across diverse methodological frameworks in a systematic, theory-informed manner. This approach allowed us to identify recurring conceptual themes and methodological patterns that may guide the design and control variables of future quantitative meta-analyses once a more comparable evidence base becomes available. Thus, the present narrative synthesis serves as an exploratory step toward mapping the current state of research and delineating priorities for subsequent meta-analytic work.
In this context, several methodological strengths and limitations of the included studies should be noted. While many studies employed longitudinal designs (Allan et al., 2007) and applied previously validated self-report instruments (Cardona et al., 2020), strengthening the internal validity of their findings, notable methodological limitations also emerged across the literature. In particular, the operationalization of key constructs varied substantially, with some studies using narrow behavioral indicators and others relying on broader trait-level measures. Some studies served as initial validation studies for the self-report measures themselves, limiting the certainty regarding their predictive utility in independent samples (Loza et al., 2004; Welsch et al., 2020). Follow-up periods also differed markedly in length and consistency, complicating the comparison of predictive validity estimates. One of the set inclusion criteria was the measurement of recidivism by means of official data, though they provide only a partial picture: As the dark figure of recidivism is not included in the number of offenses officially filed (Scurich & John, 2019), the prediction of recidivism may be limited to further reconvictions instead of general reoffending, potentially underestimating delinquency, which also explains the higher number of self-reported offenses compared to official records. While there was an initial concentration of studies from the 2000s and early 2010s, likely reflecting a surge in research activity during that period, the current review was systematically supplemented by an updated search in October 2025. This crucial step ensures the inclusion of recent advancements and bolsters the temporal currency and bolsters the temporal currency and robustness of the synthesized evidence base. A final problematic issue regarding the included studies is the publication bias. A general limitation of this review is constituted by the fact that only three databases were searched; this research might benefit from a more diverse source of data and the application of additional search strings.
To fully understand the variability in findings regarding the predictive validity of self-report measures, it is important to consider alternative explanations for nonsignificant results beyond measurement limitations. For instance, certain constructs may genuinely lack predictive value (Olver et al., 2014), or null findings may result from study-specific methodological factors such as limited sample sizes, short follow-up periods, and restricted outcome variance. Moreover, as illustrated by the mixed evidence on incremental validity (Kanters, Hornsveld, Nunes, Huijding, et al., 2016), divergent results across studies may reflect differences in sample composition, the type or scope of self-report instruments used, the type of outcome measured, or the inclusion of complementary assessment approaches (e.g., implicit measures).
A critical limitation of the reviewed literature concerns the scarce and inconsistent reporting of diversity-related sample characteristics. Most samples were drawn from Western, predominantly English-speaking countries, including Canada (n = 30), the United States (n = 23), the United Kingdom (n = 11), the Netherlands (n = 5), Australia (n = 4), Austria (n = 3), Germany (n = 4), New Zealand (n = 3), Scandinavian countries (n = 2), and one study each from Belgium, Israel, Spain, and Portugal; two studies did not report the country of origin. Only three studies employed cross-cultural or combined samples, including Loza et al. (2004). The robustness of self-report measures across diverse groups was strongly supported by Mills et al. (2025), who confirmed that self-report instruments reliably measure the same antisocial attitudes among major racial and ethnic groups in incarcerated populations. Such cross-cultural applications point to the potential of self-report instruments to reduce assessor bias in contexts where cultural or interpersonal distance exists between assessor and respondent. Self-report instruments may be particularly valuable in reducing bias stemming from ethnic or cultural mismatches, as they allow respondents to reflect on their own behavior in a less confrontational setting. However, this assumption requires empirical validation, and instrument translation, adaptation, and norming must be conducted with cultural sensitivity to ensure accurate and unbiased responses. The lack of studies focusing on minoritized populations – along with insufficient reporting of diversity variables – highlights the need for targeted validation research and culturally responsive assessment practices, as exemplarily examined by Olver et al. (2024). Instruments such as the SAQ show promising cross-cultural applicability, but broader investigations into cultural validity, response styles, and contextual influences on self-report data are needed to support equitable risk assessment, communication, and intervention planning.
To conclude, there is substantial evidence that self-report information should not be dismissed in forensic contexts, even though the review also points to the weaknesses of self-report measures and situations in which their use is not recommended. The existence of contradictory findings complicates a definitive conclusion about the utility of self-report measures in offender risk assessment. The findings regarding the most mentioned weakness of self-reports, their susceptibility to response distortion, suggest that the impact of SDR as well as the nature of its influence may have been overestimated (Mills et al., 2003). Most self-report measures used in the forensic context and particularly with ICSOs are commonly applied along with a validity scale, which can be considered beneficial, because even if SDR may have a smaller effect than previously assumed, its assessment as an additional variable and uniquely-predictively valid personality trait may be interesting. These findings are rather consistent with Hildebrand et al. (2018), as in a more recent meta-analysis they found that both impression management and self-deceptive enhancement were negatively associated with self-reported dynamic risk factors, and that these effects were moderated by setting and present incentives. In the prediction of recidivism, it is essential to relying primarily on instruments that assess empirically validated risk factors with established links to (sexual) reoffending. Furthermore, while the individual predictive validity of self-reports may be a subject of interest, the true practical relevance is to be found by exploring its incremental contribution to other methods and instruments in the frame of the multimodal approach, as the inclusion of self-reported static as well as dynamic risk factors and particularly criminal-thinking-related constructs increased the explained variance of a risk assessment equation in more than 50% of the multimethod-studies (Walters, 2011). The need for further research with a higher degree of differentiation beyond the “convenient conditions” is obvious, and a more standardized application of self-report measures should be targeted. There is also a clear need for the development of instruments specifically tailored to the risk profiles of different offender subgroups. For practitioners aiming to assess clinically relevant constructs, validation studies in forensic populations remain critical to ensuring high-quality assessments in terms of reliability and validity. As our review indicates, self-report measures are particularly useful when evaluating subjective and non-observable factors – such as attitudes, beliefs, and cognitive styles – as they systematically incorporate the individual’s perspective. Nevertheless, future research should further investigate how contextual factors and external incentives affect the validity of self-reported risk information.
Critical Findings of the Systematic Review.
Implications for Practice, Policy, and Research.
Footnotes
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
