Abstract
Theory of mind (ToM) impairment is associated with poor social functioning in some psychological disorders (e.g., autism and schizophrenia). ToM deficits have also been linked with offending behavior in the theoretical literature. However, no review has examined the empirical evidence for such a link. We carried out a systematic review to provide a critical overview of studies involving ToM ability in offenders. We included studies published in English that used an instrument to measure at least one aspect of ToM. Twenty-eight eligible studies were identified and coded. Our findings reveal a generally mixed literature. Taking study quality into account, our findings suggest that offenders and nonoffenders do not differ in their first-order ToM. For second-order ToM, findings are mixed, even when only the highest quality studies are examined. Studies exploring advanced ToM showed mixed results overall, though the highest quality research appeared to indicate that offenders have impairments in advanced ToM which means that they may have difficulty understanding various mental states such as pretense, white lies, irony, double bluffs, and sarcasm. We suggest that well-controlled future studies, which also measure other facets of ToM (e.g., distinguishing between cognitive and affective ToM or examining ToM content), are needed to fully understand the role of ToM in offending.
Theory of mind (ToM) is a term used to describe complex cognitive processes (Duval et al., 2011) that allow humans to understand their own mental states and those of others (Klin, 2000). This phenomenon appears cognate with the terms
ToM ability is often operationalized in terms of first-order ToM, second-order ToM, and advanced ToM. Success in first-order false belief tasks requires the ability to understand that another person is holding an incorrect belief (Shamay-Tsoory et al., 2005). Success in second-order false belief tasks entails the ability to comprehend that a person holds a false belief about another person’s belief. Advanced ToM tasks involve insights into mental states such as jokes, sarcasm, double bluffs, and faux pas. Children typically show implicit awareness of others’ perspectives from around 18 months old (e.g., Buttelmann et al., 2009; Kovács et al., 2010; Onishi & Baillargeon, 2005; Rubio-Fernández & Geurts, 2013; Senju et al., 2009) and are thought to develop the necessary skills to pass false belief ToM tasks between the ages of 2 and 7 years old (Wellman et al., 2001). Moreover, empirical findings suggest that ToM performance is affected by sociodemographic variables such as age (Brunsdon et al., 2019; Ferguson et al., 2018), socioeconomic status, and education (Li et al., 2013; Shatz et al., 2003), as well as individual difference variables such as intelligence (Charlton et al., 2009), and executive functioning (Cane et al., 2017; Devine & Hughes, 2014).
Absent or impaired functioning of ToM is thought to be associated with psychosocial difficulties in various types of psychopathology, both in children and adults (Brüne & Brüne-Cohrs, 2006), including, but not limited to, schizophrenia (Frith, 1992), autism (Baron-Cohen, 1995; Bradford et al., 2018), bipolar affective disorder (Kerr et al., 2003), and antisocial personality disorder (Richell et al., 2003). Studies also appear to show that individuals exhibiting violent, antisocial, and delinquent behavior have deficits in ToM (Abu-Akel & Abushua’leh, 2004; Fonagy & Levinson, 2004). The primary aim of this review is to consolidate, synthesize, and critically evaluate existing research on the ToM-offending link. We aim to establish whether there is sufficient evidence to substantiate this link and highlight areas for future research.
While there is no comprehensive theory that models the relationship between ToM and offending in general, theory relating to sexual offending provides a starting point for how such a model might work. Ward et al. (2000; see also Keenan & Ward, 2000) proposed that sexual offending is linked with ToM deficits. According to their model, individuals who commit sexual offenses may have failed to develop an adequate ToM, and this failure may lead those individuals to view or process information about their own or other people’s mental states in a biased or distorted way. Alternatively, these individuals may have a ToM impairment specific to particular kinds of mental states in certain relationships—for example, having a theory that is underpinned by false assumptions about women or children. In a similar vein, Elsegood and Duff (2010) suggested that ToM impairment might contribute to offending by underpinning criminogenic needs, such as intimacy deficits. Since criminogenic needs are the focus of offender treatment programs (Andrews & Bonta, 2010; Serin et al., 2009), it is important for practitioners and policy makers to know whether or not deficits in ToM represent a criminogenic need that should be targeted in treatment.
Empathy is a multidimensional term that describes the affective/emotional response to another’s mental state (e.g., Stotland, 1969) or the cognitive mechanisms that enable people to understand others’ perspectives (Dymond, 1949). This cognitive conceptualization of empathy therefore overlaps with ToM (Ferguson et al., 2015; Wang & Wang, 2015) and has been studied widely in forensic populations (Jolliffe & Farrington, 2004; van Langen et al., 2014). Importantly, empathy has been a key component of intervention programs (Laws & Ward, 2011) for offenders who committed serious crimes, such as sexual and violent offending (Day et al., 2010). However, targeting empathy in interventions is controversial because evidence for the impact of treating empathy on later recidivism is mixed (Brown et al., 2012; Hanson & Morton-Bourgon, 2004, 2005). One factor that might have led to inconsistent results is that most treatment programs focus on generalized empathy deficit training and overlook the cognitive and affective components of empathy (Brown et al., 2012). Considering that cognitive empathy is closely related to ToM and that the two terms are even used interchangeably by some researchers, treatments targeting general empathy might actually be targeting aspects of ToM (or potentially missing important aspects of ToM). Therefore, it is important to fully understand any relationship between offending and ToM in order to inform clinical decision making and to underpin interventions for offending populations. To date, there have been no adequate reviews of the current state of the literature on ToM and offending.
Method
This systematic review was conducted in line with Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA; Moher et al., 2009). PRISMA is an empirical reporting process which uses a 27-item checklist and a four-phase flow diagram to organize the process of identifying relevant studies for systematic analysis.
Eligibility Criteria
Research articles published in English that included one or more instruments that measure at least one aspect of ToM (e.g., first-order ToM, affective ToM) in offenders were eligible. Studies had to compare the ToM of a minimum of two groups, including at least one offending group and a nonoffending control group. We excluded articles based on the following criteria: (1) articles that were not published in English; (2) articles not measuring an aspect of ToM; (3) articles solely measuring affective empathy (see Eisenberg et al., 2006); (4) articles measuring ToM with a basic facial emotion recognition task;
1
(5) studies assessing ToM with interview methods (which therefore measure people’s evaluation of their own ToM rather than providing an objective measure of ToM; see Discussion section); (6) studies measuring ToM with questionnaires where participants rate their own ToM (since responses to the questionnaires may not reflect participants’ true ToM abilities; see Discussion section); (7) articles involving case reports, literature reviews, book reviews, and commentaries; and (8) studies with fewer than 14 participants per group in a core analysis of interest (this reflects the minimum group size for a one-tailed
Search Strategy and Screening
There was no restriction on the year of publication of the study. Searches were conducted of relevant databases: PsycINFO, PsycARTICLES, Science Direct, Scopus, Criminal Justice Abstracts (from EBSCO), Open Access Theses and Dissertations, EBSCO host, and ProQuest Dissertations & Theses Global. Additionally, targeted searches in Google and in the references of identified studies were performed. The main search terms were “theory of mind,” “mentalizing,” “mentalising,” “mentalization,” “mentalisation,” “mindreading, ” “mind reading,” “mind-reading,” “mind perception,” “social intelligence,” “cognitive empathy,” “false belief reasoning,” “metacognition,” and “social cognition,” and were systematically paired with each of the following key words: “incarcerated,” “offenders,” “criminals,” “offending,” “prisoners,” “inmates,” “convicts,” and “forensic.” Subsequently, after removing duplicates, titles and abstracts of the remaining articles were evaluated to determine whether they were eligible for this review. Furthermore, we contacted all corresponding authors of eligible papers for whom we could find current email addresses to request unpublished manuscripts that could be included in the systematic review.
Quality Assessment
After we identified eligible studies, we performed a quality assessment by adapting the
Results
Description of the Included Studies
A total of 6,294 articles were identified: 4,277 from Scopus, 1,515 from PsycINFO, 158 from PubMed, 162 from PsyARTICLES, 89 from Criminal Justice, 35 from Open Access Theses and Dissertations, 31 from EBSCO host, 26 from ProQuest Dissertations & Theses Global, and one from the references of identified studies. After duplicates were removed, 2,982 remaining studies were carefully checked and assessed to determine their eligibility for the review. From these, 2,889 were excluded (233 were non-English articles; 1,095 were literature reviews, meta-analyses, interview studies, case reports, conference presentations, or commentaries; and 1,561 did not measure ToM; see Figure 1).

Flowchart of literature review.
From the remaining 93 studies, a final sample of 28 studies spanning 16 years (2004–2019) was included in this review. Table 1 summarizes the characteristics of each study.
Details of Studies Included in the Review.
a We excluded the second order task in this study due to low numbers achieving second-order ToM.
Description of Study Characteristics
All studies were cross-sectional. Ten studies were conducted in the United Kingdom. The remaining studies took place in one of the following countries: Canada (
Summary of First-Order ToM Methods
First-order ToM tasks assess whether people can infer another person’s thoughts, feelings, beliefs, and intentions accurately. In this review, nine studies measured first-order ToM, with seven of them focusing on cognitive first-order ToM. Five of the studies (Castellino et al., 2011; Dolan & Fullam, 2004; Hammond & Beail, 2017; Majorek et al., 2009, Proctor & Beail, 2007) utilized different tasks such as the Smarties task (Hogrefe et al., 1986), the Sally–Anne test (Baron-Cohen et al., 1985; Wimmer & Perner, 1983), the Marble Story Task, and the Picture Sequencing Task (Brüne, 2003), each of which measured participants’ ability to understand another person’s false-belief about the content of an item or the location of an object. The latter task also included picture sequencings that assessed participants’ ability to understand the intention (cooperative and deceptive) of cartoon characters (an example of the task can be found in Brüne, 2003). A study by Jones et al. (2007) measured ToM with an animation task (Abell et al., 2000) where participants were required to attribute mental states to triangles based on their interactions. Engelstad et al. (2019) used the Hinting Task (Corcoran et al., 1995) where participants were asked to explain the intention or the message in the protagonist’s statement to another character. Shamay-Tsoory et al.’s (2010) study which measured both cognitive and affective first-order ToM used the Yoni task where participants had to infer the mental state of a cartoon character Yoni based on eye gaze (examples of the task can be found in Shamay-Tsoory et al. 2010). Robinson et al. (2007) measured affective first-order ToM using the empathy continuum (Strayer, 1993) where participants were required to infer the mental state of a protagonist presented in video sketches and express it in an interview.
First-Order ToM Results
Two of the nine first-order ToM studies found that offenders performed significantly worse than nonoffenders (Majorek et al., 2009; Robinson et al., 2007). 3 The remaining studies did not find performance differences in first-order ToM between offenders and nonoffenders (Castellino et al., 2011; Dolan & Fullam, 2004; Engelstad et al., 2019; Hammond & Beail, 2017; Jones et al., 2007; Proctor & Beail, 2007; Shamay-Tsoory et al., 2010).
Our quality assessment indicated that eight of the first-order studies were of low quality, with one moderate quality study. Dolan and Fullam’s (2004) study, which found no first-order ToM deficits in offenders, had the highest quality rating among the reviewed first-order ToM studies because it controlled for several potential confounding variables, reduced memory load, and accounted for comprehension. Nevertheless, results from Dolan and Fullam’s (2004) study should be interpreted with caution, because almost all participants passed the first-order ToM task, and the nonsignificant differences might have been caused by a ceiling effect as a result of using a task (the Sally–Anne test) that was too simple for an adult population. We identified that other studies using the same or similar tasks that are easy for adults (i.e., the Smarties, cartoon, or animation tasks) also found nonsignificant first-order ToM differences between offenders and nonoffenders. However, the studies which used tasks that were more appropriate for adults (e.g., video stimuli) found first-order ToM deficits in offenders with a medium to large effect size (Majorek et al., 2009; Robinson et al., 2007).
Summary of Second-Order ToM Methods
Second-order ToM tasks examine whether people can accurately understand a person’s mental state about another person’s mental state (what X thinks about Y’s thoughts, feelings, intentions, or beliefs). Second-order ToM was examined by five studies and used tasks to evaluate cognitive second-order ToM (Castellino et al., 2011; Dolan & Fullam, 2004; Majorek et al., 2009; Proctor & Beail, 2007). Specifically, three studies used the Ice Cream Van story (Perner & Wimmer, 1985) alone, or together with the Burglar Story (Happé et al., 1999), where the protagonists in both tasks held false-beliefs about the thoughts and beliefs of another person. Majorek et al. (2009) used the Picture Sequencing Task (Brüne, 2003), where participants were required to understand the beliefs or thoughts of a character about another character’s intentions or thoughts. A similar task, the Yoni Task, was used by Shamay-Tsoory et al. (2010), but this task evaluated both cognitive and affective second-order ToM.
Second-Order ToM Results
Results on second-order ToM tasks were particularly inconsistent between studies. While Dolan and Fullam (2004) found no difference between the second-order ToM ability of offenders and nonoffenders, Castellino et al. (2011) demonstrated second-order ToM deficits in offenders. On the other hand, Proctor and Beail (2007) found that second-order ToM among offenders was significantly better than second-order ToM among nonoffenders. Shamay-Tsoory et al. (2010) found that offenders’ second-order cognitive ToM was intact, but their second-order affective ToM was impaired. In addition to these four studies, Majorek et al. (2009) used tasks that combined aspects of first- and second-order ToM (see Note 3). The results of this study suggested that ToM task performance among offenders was worse than nonoffenders.
Our quality assessment showed that the quality of the second-order studies ranged between low (three studies) and moderate (two studies). Dolan and Fullam (2004), who found no second-order ToM deficits in offenders, and Castellino et al. (2011), who found second-order ToM deficits in offenders, had the highest quality ratings. Caution is needed when interpreting and generalizing the results of both studies because, although they successfully controlled several ToM-related variables, the validity and reliability of their second-order ToM tasks for adults are yet to be established.
Summary of Advanced ToM Methods
Advanced ToM tasks investigate whether people can understand another person’s higher functioning mental states, such as sarcasm, jokes, double bluffs, accusing, and preoccupation. Advanced ToM in offenders was examined by 22 studies, of which 15 utilized a single measure of advanced ToM, and seven used multiple advanced ToM measures. Advanced ToM tasks in these studies fall into three groups: tasks that measure cognitive ToM, tasks that look at affective ToM, and tasks that simultaneously evaluate cognitive and affective ToM in the same task without reporting the results separately (we refer to these as testing
Cognitive advanced ToM
Cognitive ToM was assessed in six studies by using three different tasks. Two studies (Castellino et al., 2011; Nentjes, Bernstein, Arntz, Slaats, & Hannemann, 2015) employed the Strange Stories Task (Happé, 1994), which examines participants’ understanding of various mental states involving jokes, pretense, white lies, irony, double bluffs, and sarcasm. Two studies (de Jong et al., 2018; Kristof et al., 2018) used the Faux-pas task (Baron-Cohen et al., 1999; Varga et al., 2008, respectively) in which one person tells the second person something inappropriate, hurtful, or insulting, without realizing that it should not have been said. Two studies (Morosan et al., 2017; Newbury-Helps et al., 2017) employed the perspective taking task known as the director task (Keysar et al., 2000) where participants are required to mentally position themselves where a director is standing in a scene and, when instructed by the director to move an object from some shelves, move the correct object, taking into account whether the object can be seen by the director or not.
Affective advanced ToM
Two studies examined advanced affective ToM. Mariano et al. (2017) used the emotion attribution task (Blair & Cipolotti, 2000), which measures the ability to understand the emotions of other people through stories. Morosan et al. (2017) used the Geneva Emotion Recognition Test (GERT; Schlegel et al., 2014), where emotional states were presented through videos.
Cognitive–affective advanced ToM
Seventeen studies assessed advanced cognitive–affective ToM, and five of them used two cognitive–affective tasks. Eleven studies (Domes et al., 2013; Elsegood & Duff, 2010; Mariano et al., 2017; Milojević &Dimitrijević, 2014; Mundy, 2004; Nentjes, Bernstein, Arntz, van Breukelen, & Slaats, 2015; Newbury-Helps et al., 2017; Romero-Martínez et al., 2013; Schiffer et al., 2017; Spenser, 2017; Woodbury-Smith et al., 2005) utilized the Reading the Mind in the Eyes Test–Revised (RMET-R; Baron-Cohen et al., 2001), in which a wide range of mental states are presented through pictures of eyes. Dolan and Fullam (2004) used an earlier version of the RMET-R, while another study (Elsegood & Duff, 2010) used a version of the test that presents the images of children’s eyes (The Mind in a Child’s Eyes Task; Duff & Schulte-Mecklenbeck, 2010). Additionally, one study (Winter et al., 2017) used a similar task to the RMET-R, the EmpaToM (Kanske et al., 2015) where different emotional states were depicted in videos. Three studies (Engelstad et al., 2019; Mayer et al., 2018; Newbury-Helps et al., 2017) used the Movie for the Assessment of Social Cognition task (Dziobek et al., 2006), which required participants to watch video clips and answer questions involving intentions, feelings, and thoughts of the characters in the videos. Two studies (Spenser, 2017; Spenser et al., 2015) used the Social Stories Questionnaire (Lawson et al., 2004), which is similar to the faux-pas task that was described above, but this task also included an affective component of ToM. Additionally, a study by Dolan and Fullam (2004) used a cognitive–affective faux-pas task, and two studies (Domes et al., 2013; Schuler et al., 2019) measured advanced ToM with the Multifaceted Empathy Test (MET; Dziobek et al., 2008), which contains pictures of people in emotionally charged situations from everyday life.
Advanced ToM results
Among the 22 studies, seven did not find differences in advanced ToM between the offenders and nonoffenders (Kristof et al., 2018; Mayer et al., 2018; Mundy, 2004; Nentjes, Bernstein, Arntz, Slaats, & Hannemann, 2015; Nentjes, Bernstein, Arntz, van Breukelen, & Slaats, 2015; Winter et al., 2017; Woodbury-Smith et al., 2005), whereas nine studies found deficits in advanced ToM in offenders (Castellino et al., 2011; Engelstad et al., 2019; Mariano et al., 2017; Milojević & Dimitrijević, 2014; Newbury-Helps et al., 2017; Romero-Martínez et al., 2013; Schuler et al., 2019; Spenser, 2017; Spenser et al., 2015). Additionally, six studies reported inconsistent patterns of impairment, depending on the task that was assessed. For example, Domes et al. (2013) found deficits in advanced ToM among offenders using the MET (Dziobek et al., 2008), but no difference between groups using the RMET-R (Baron-Cohen et al., 2001). Further, De Jong et al. (2018) found that violent offenders who had psychotic disorder scored lower on advanced ToM than both nonviolent participants who had psychotic disorder and healthy control participants. However, discriminant function analyses indicated that between-group differences were better explained by impairments in metacognition and neurocognitive function than advanced ToM (as measured by faux pas). Morosan et al. (2017) found a deficit in advanced ToM among offenders using the director task and a partial impairment in advanced ToM using the GERT, where offenders scored lower than nonoffenders on recognition of interest, anxiety, and amusement. Additionally, Schiffer et al. (2017) found that violent offenders who had schizophrenia and nonoffenders with schizophrenia had lower scores on tests of advanced ToM than healthy controls. However, violent offenders with conduct disorder or antisocial personality disorder 4 without schizophrenia had similar advanced ToM scores to healthy nonoffenders.
The remaining two advanced ToM studies demonstrated a selective impairment in advanced ToM among offenders. That is, Dolan and Fullam (2004) found that offenders and nonoffenders did not differ in understanding complex emotions from pictures, detecting faux pas, or identifying the person who committed the faux pas. However, offenders were worse than nonoffenders at recognizing basic emotions from pictures, understanding the mental state of the person who committed a faux pas or the person to whom the faux pas was made. Additionally, Elsegood and Duff (2010) reported that individuals who committed sexual offenses against children showed impairments in advanced ToM when inferring the mental states of adults but intact advanced ToM when understanding the mental states of children (i.e., the age-group consistent with their victims).
The quality of the reviewed studies that tested advanced ToM ranged between low and high. Four of the 22 studies were of low quality, 17 were of moderate quality, and one was of high quality. The highest quality study (Newbury-Helps et al., 2017) reported that offenders had lower advanced ToM scores than nonoffenders on all advanced ToM measures. This study included a control group who did not have criminal records; selected participants who did not have a learning disability or head injury; controlled for potentially confounding variables such as age, education, and verbal intelligence; and additionally assessed participants’ memory, attention, and comprehension capacity in control questions.
Offense Type and ToM
In this review, we also explored the relationship between ToM and crime type. Among studies examining first-order ToM, three studies did not report crime type. Studies that focused on individuals who committed sexual offenses (Castellino et al., 2011; Hammond & Beail, 2017) and violent offenses (Dolan & Fullam, 2004; Engelstad et al., 2019; Hammond & Beail, 2017) found no first-order ToM deficits in these offender groups. Studies that found first-order ToM deficits in offenders (Majorek et al., 2009; Robinson et al., 2007) had recruited mixed offender groups reflecting six or more different criminal offense types, including but not limited to, sexual and violent crimes. It is possible that presence or absence of first-order ToM deficits in offenders relates to the crime type they committed. However, the current evidence does not allow us to draw firm conclusions.
In two of the five studies examining second-order ToM studies, researchers did not report a breakdown of the crime types in the sample. The study by Majorek et al. (2009), which employed an offender group with mixed crime types, found a second-order ToM impairment in this group. Castellino et al. (2011) also reported second-order ToM deficits in individuals who committed sexual offenses against children and adults. Additionally, the study by Dolan and Fullam (2004), which consisted of individuals who committed violent crimes, found no second-order ToM deficits in this offending group. As with the findings for first-order ToM, the heterogeneity of findings limits the conclusions we were able to draw regarding second-order ToM and crime type, though it appears that violent crime, at least, is not strongly associated with second-order deficits.
Again, we examined whether crime type 5 relates to the patterns of findings in studies examining advanced ToM. Four studies that included participants who had exclusively committed sexual offenses found that those individuals had global or selective impairments in advanced ToM (Castellino et al., 2011; Elsegood & Duff, 2010; Schuler et al., 2019). A study by Romero-Martínez et al. (2013) found that individuals who perpetrated intimate partner violence had advanced ToM deficits. Studies of individuals convicted of violent crimes yielded mixed findings, with different measures/studies indicating impairment (Engelstad et al., 2019; Newbury-Helps et al., 2017), no impairment (de Jong et al., 2018; Winter et al., 2017), or selective impairment (Dolan & Fullam, 2004) in advanced ToM. Studies that included offender groups consisting of five or more different criminal offense types yielded mixed results. Some of these studies showed no deficits in advanced ToM among offenders (Mayer et al., 2018; Nentjes, Bernstein, Arntz, Slaats, & Hannemann, 2015; Nentjes, Bernstein, Arntz, van Breukelen, & Slaats, 2015; Woodbury-Smith et al., 2005). However, findings were mixed in other studies showing impairment and no impairment in advanced ToM depending on the type of ToM tasks that were used or participant psychopathology (Domes et al., 2013; Kristof et al., 2018; Mariano et al., 2017; Morosan et al., 2017; Schiffer et al., 2017).
Discussion
This systematic review examined ToM in offenders by reviewing 28 published studies. Overall, our review revealed inconsistent and sometimes conflicting results for first-order, second-order, and advanced ToM among offenders. There are many potential reasons for these discrepancies, most notably the fact that so little research has been conducted on ToM in offenders (recall that we used a broad range of search terms and no publication date restrictions to maximize our selection of studies). Another important reason for the inconsistent findings is that the studies reviewed here employed a range of different ToM tasks to measure the same domain, meaning that comparison across studies was difficult. This finding fits with recent observations of wide variability in ToM performance among children, adolescents, and adults, as well as minimal correlations between ToM tasks (Warnell & Redcay, 2019). In fact, our review showed that even when identical ToM measures were employed, this did not always produce the same outcome across studies, even in similar offending populations (e.g., convicted individuals with antisocial personality disorder; Newbury-Helps et al., 2017; Schiffer et al., 2017). As such, ToM should be considered a multidimensional process that relies on input from a number of other abilities, which is likely to have led to the inconsistencies seen here. Relevant factors might include differences in the samples, such as sample size (sample size of the offenders ranged from 15 to 200), cognitive abilities, offending history, variance in early socialization of offenders, levels of neuroticism (Dolan & Fullam, 2004), and offenders’ differential relatedness, or closeness, to their victims (Möller et al., 2014). These factors were rarely measured or controlled in the studies we identified.
Contradictory results might also have stemmed from limitations in the studies themselves, as reflected in the quality ratings (see Table 2 for critical findings). For example, the majority of studies included in the review failed to control for important confounding differences in cognitive abilities (e.g., intelligence, knowledge of vocabulary, executive functioning, and working memory capacity). Thus, ToM performance might have been influenced by confounding factors, and it is not clear whether the findings from these studies truly reflect the relationship between offending and ToM ability.
Summary of Critical Findings.
The type of crime committed by offender groups employed in each study might also be a factor that contributed to contradictory results. Although the number of available studies was small, our review suggests that different crime types may produce different ToM outcomes for first-order, second-order, and advanced ToM. Specifically, our review suggests that sexual offending may not be related to first-order ToM deficits but that it may be associated with second-order and advanced ToM deficits. Violent offending appears unrelated to first- and second-order ToM and is inconsistent on measures of advanced ToM. Given the dearth of studies looking at ToM in sexual and violent offending, any possible relationships between these types of offending and ToM deficits should be viewed with caution. We believe that understanding the link between ToM and crime types is important to determine whether certain offending groups need a treatment program that includes a ToM component. We suggest that there is a need to conduct rigorous ToM studies that compare distinct categories of offenders rather than combining individuals with mixed offense types.
The choice of tasks used to assess ToM in offenders is another important factor that is likely to contribute to the inconsistent results. First, we note that the tasks were simple response-based tasks, many of which were originally developed for child or clinical samples, and therefore have the potential for ceiling performance in adults. The wider research on ToM in healthy adults has developed sophisticated tasks that examine real-time inferences about others’ mental states and are therefore more sensitive to subtle processing differences between individuals. These tasks provide insights not only into whether a person’s ToM is impaired or not but also the mechanisms and timings with which these inferences are made (e.g., Bradford et al., 2015; Ferguson & Breheny, 2012; Kovács et al., 2010; Samson et al., 2010). Future research on ToM and offending should therefore adopt some of these more complex tasks to identify the specific nature of any difficulties in ToM.
Second, our quality assessment identified concerns about the validity and reliability of some of the ToM tasks employed by the studies we reviewed. Most did not report the validity and reliability of their ToM tasks. While many of the basic ToM tasks featured in these studies have been used frequently in the literature, there is insufficient evidence to show that they are valid, reliable, and are suitable to demonstrate individual differences in adults. For example, the mental state items in the faux-pas task have good test–retest reliability (Zhu et al., 2007) and excellent internal consistency, but the control items have low internal consistency, skewed distribution, and ceiling effects (Söderstrand & Almkvist, 2012).
The widely used RMET-R has been criticized for its association with verbal intelligence (Baker et al., 2014) and concerns that it reflects emotion recognition rather than ToM (Oakley et al., 2016). However, there is now evidence that ToM has cognitive and affective components (Shamay-Tsoory et al., 2010), and affective ToM refers to understanding others’ emotions (Gabriel et al., 2019). Further, there is a claim that the RMET-R assesses mental states more comprehensively than the earlier version of the RMET, including basic and complex emotions, cognitive mental states (e.g., thinking and scheming), and relational mental states such as flirting (Warrier et al., 2017). Considering the current state of the RMET-R and its wide use throughout offending research, we decided to include the studies that used the RMET-R for completeness. Nevertheless, we note that it is important to clarify the controversy around what the RMET-R measures to help researchers assess the usability of the RMET-R.
Another aspect that is overlooked by the majority of reviewed studies is the importance of measuring ToM as a construct that has two distinct, cognitive and affective, components. While accumulating evidence from empirical studies supports this distinction (Hynes et al., 2006; Shamay-Tsoory et al., 2002, 2005; Vollm et al., 2006), the reviewed studies—with the exception of Shamay-Tsoory et al. (2010)—either treated ToM as a single construct or did not take this distinction into account adequately when analyzing or reporting their results. Moreover, the majority of studies reported here employed only a single task to measure one aspect of ToM rather than a multimodal approach that uses a battery of ToM tasks to examine the broad spectrum of ToM skills. Failure to assess ToM as a multifaceted construct leads to an ambiguity about the source of potential ToM deficit, because it does not enable us to identify whether a ToM deficit is cognitive or affective, and within these subdivisions the specific mechanisms that are impaired/intact.
We suggest that ambiguities on which specific components of ToM each task measures partly stem from a more general problem in defining cognitive and affective ToM. For example, cognitive ToM has been defined as “our ability to make inference regarding other people’s beliefs,” whereas affective ToM was described as an “inference one makes regarding others’ emotions” (Shamay-Tsoory et al., 2010, p. 669). These definitions are not explicit enough to prevent confusion over the task selection to evaluate cognitive and affective ToM. We suggest that explicit definitions for cognitive and affective ToM, and detailed information about what qualities they measure, are essential. The definition must clearly state whether cognitive ToM just detects thoughts, beliefs, and intentions or whether it also detects emotions. The definition of affective ToM must express whether it simply detects emotions and feelings of others or whether it also includes understanding the detected emotions and feelings of others. Without clearly defining these aspects and identifying the tasks that measure each dimension, research in this area might unintentionally be misleading and misinforming researchers, as well as practitioners and policy makers.
The studies that met our inclusion criteria assessed ToM only quantitatively. However, a small number of studies in the ToM literature examining the content of ToM suggest that although children with problem behaviors (e.g., antisocial behavior, conduct disorder) have intact ToM, the content of their ToM is problematic. They may, therefore, have a
The current systematic review also has its own limitations. For example, we included studies that used small sample sizes (Castellino et al., 2011; Jones et al., 2007; Morosan et al., 2017; Proctor & Beail, 2007; Romero-Martínez et al., 2013; Shamay-Tsoory et al., 2010; Woodbury-Smith et al., 2005). Two of our included studies were with participants who had intellectual disability (i.e., Hammond & Beail, 2017; Proctor & Beail, 2007), meaning that their conclusions may not generalize to other populations. While the studies contributing to the systematic review spanned 11 countries, all bar two were based in Europe. As a result, our conclusions from these studies—tentative as they are—may not generalize to other jurisdictions or cultural contexts. Importantly, the studies included in the review overwhelmingly focused on male offending and ToM. Only around 7% of offending participants were reported as female; thus, any conclusions may not apply to females. Furthermore, we included only studies that had a nonoffender control group resulting in the omission of a study (Richell et al., 2003) that did not have a control group but instead compared the mean score of their experimental group with the mean score of nonoffenders who participated in another study.
We also excluded studies that used interview methods such as the Reflective Functioning Task which focuses on participants’ attachment experiences with their parents during childhood (Fonagy et al., 1997). We had two main reasons for these exclusions. First, in tasks such as these, the accuracy of participant inferences about others’ mental states is unknown to the researchers who score participant interviews. Therefore, researchers can rate whether or not participants articulate certain mental states but cannot know whether their inference of these states is accurate. For example, a participant who states, “I thought my mother felt resentful of us, but I’m not really sure if she felt that way herself,” would get a point for mental inference in the Reflective Functioning Task, but the researcher would not know whether the mother felt resentful or not. In contrast, the types of ToM measure included in our review, in which mental inferences are presented through pictures, videos, or stories, provide researchers with certainty about the accuracy of the participants’ mental state inferences of others’ minds.
The second reason for exclusion is that the studies using reflective functioning as a measure of ToM is limited by a focus on mental inferences specific to attachment figures. Fonagy et al. (1998) suggested that reflective capacity in the attachment context may not generalize to other domains. We also excluded studies that used questionnaires asking participants to self-report their ability to theorize about other people’s minds (e.g., I find it easy to put myself in somebody else’s shoes) because scores obtained from questionnaire may not reflect the true ToM abilities of participants and may not predict actual ToM abilities in everyday situations (Queirós et al., 2018). While our focus on studies that measured ToM using performance-based tasks limited the scope of our systematic review, we believe that doing so eliminated a number of potential confounding or contaminating factors.
Our quality assessment also has limitations. Evaluating task difficulty and the ecological validity of the tasks required subjective judgments. In addition, studies that scored highest on quality showed that there were no first-order ToM differences between offenders and nonoffenders and produced inconsistent results for second-order ToM. However, these studies were still only of moderate quality, and, like the relatively lower ranking studies, they had shortcomings of their own (e.g., using an age-inappropriate task for adults, failing to control some potential confounding variables, treating ToM as a single construct, and recruiting offenders who had various personality disorders). These factors may have an influence on ToM task performance, directly or indirectly, because studies indicate that they are significant moderators or predictors of ToM success (Brock et al., 2018; Spenser et al., 2019). Therefore, despite their higher quality scores, caution is recommended when drawing conclusions from the studies, and future research should aim to overcome these shortcomings.
In conclusion, the current review adds to a growing body of literature on ToM in offending populations in several aspects (see “Implications of the Review” section). The vast majority of the studies in the review indicated that offenders had intact first-order ToM. On the other hand, results regarding second-order and advanced ToM were more mixed. Some studies found that offenders had intact second-order and advanced ToM, whereas others found that both were impaired. Even more curiously, a number of studies found superior ToM among offenders or reported selective impairment in their second-order and advanced ToM. However, we note that the vast majority of studies used a single response-based measure of ToM to assess ToM as a single construct rather than considering its cognitive and affective aspects independently. Consequently, these studies do not clearly distinguish whether offenders had intact/impaired cognitive or affective ToM, or both, or identify the specific mechanisms that are impaired. This review demonstrates that the relationship between ToM and offending is complex and influenced by multiple factors. We suggest that the only way to have an accurate understanding about the relationship between ToM and offending is to establish clear definitions and distinctions for ToM components, use valid and reliable ToM measures, and conduct well-designed studies. Finally, it remains an open question whether ToM impairment may be criminogenic and whether it may form a treatment need within offender rehabilitation.
Implications of the Review
Research
Our review shows clear gaps in the research on ToM and offending.
Researchers should ensure precision in the definition and operationalization of ToM constructs.
ToM measures should be validated for use with the population under investigation.
Confounds should be controlled where feasible.
Research designs should allow for ToM data to be examined across offense types.
Researchers should use our quality checklist to help guide study design.
Policy
There is currently no clear evidence to suggest that work on first-order, second-order, or advanced ToM should be routinely incorporated into treatment programs for individuals who have offended.
Funders should prioritize rigorous and generalizable research on ToM and offending.
Practice
Practitioners should carefully examine the weight of evidence for ToM deficits in their client group, paying close attention to the quality of studies and the limitations of the evidence base.
Practitioners should use case formulation to explore whether a facet of ToM represents a treatment need for individual clients.
Supplemental Material
Supplemental Material, sj-xlsx-1-tva-10.1177_15248380211013143 - Theory of Mind in Offending: A Systematic Review
Supplemental Material, sj-xlsx-1-tva-10.1177_15248380211013143 for Theory of Mind in Offending: A Systematic Review by Nilda Karoğlu, Heather J. Ferguson and Caoilte Ó Ciardha in Trauma, Violence, & Abuse
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
The supplemental material for this article is available online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
