Abstract
This meta-analysis tested whether autistic people show a marked, isolated difficulty with mentalising when assessed using the Frith-Happé Animations, an advanced test of mentalising (or ‘theory of mind’). Effect sizes were aggregated in multivariate meta-analysis from 33 papers reporting data for over 3000 autistic and non-autistic people. Relative to non-autistic individuals, autistic people underperformed, with a small effect size on the non-mentalising control conditions and a medium effect size on the mentalising condition. This indicates that studies have reliably found mentalising to be an area of challenge for autistic people, although the group differences were not large. It remains to be seen how important mentalising difficulties are in accounting for the social difficulties diagnostic of autism. As autistic people underperformed on the control conditions as well as the mentalising condition, it is likely that group differences on the test are partly due to domain-general information processing differences. Finally, there was evidence of publication bias, suggesting that true effects on the Frith-Happé Animations may be somewhat smaller than reported in the literature.
Lay abstract
Autistic people are thought to have difficulty with mentalising (our drive to track and understand the minds of other people). Mentalising is often measured by the Frith-Happé Animations task, where individuals need to interpret the interactions of abstract shapes. This review article collated results from over 3000 people to assess how autistic people performed on the task. Analysis showed that autistic people tended to underperform compared to non-autistic people on the task, although the scale of the difference was moderate rather than large. Also, autistic people showed some difficulty with the non-mentalising as well as mentalising aspects of the task. These results raise questions about the scale and specificity of mentalising difficulties in autism. It also remains unclear how well mentalising difficulties account for the social challenges diagnostic of autism.
Autism has often been associated with difficulties in mentalising, that is, our drive to track and understand the minds of other people (Frith, 2001). Several tests have been devised to measure mentalising skills; the most fundamental is the false-belief task, which requires individuals to track a character’s false belief to predict their behaviour. While such tasks can capture mentalising difficulties in younger children, these are not sensitive to the real-world social difficulties of older autistic children and adults (Frith, 1994). This has led to the development of ‘advanced’ mentalising tests, including those which
In the Frith-Happé Animations, several cartoon clips of moving triangles are presented, and the test-taker is prompted to consider what is happening in each clip, responding either through verbal descriptions or a multiple-choice format (White et al., 2011). There are three conditions, one of which targets mentalising, while two conditions control for more general skills in perceiving and interpreting movement and action. In these control conditions, the triangles either move
Given these inconsistencies, this study presents a meta-analysis to evaluate the extent to which other studies support, or undermine, the hypothesis that the Frith-Happé Animations test reveals a specific mentalising difficulty in autistic people.
Methods
A meta-analysis was carried out by screening the citation lists in Web of Science for the original studies that developed the Frith-Happé Animations (Abell et al., 2000; Castelli et al., 2002) as well as the citation list for a more recent multiple-choice version of the test (White et al., 2011). After removing duplicates, 1781 titles/abstracts were screened, and of these, 121 papers were read to determine eligibility. Papers were excluded because they were theoretical/review papers (
Studies were included if they presented the Frith-Happé Animations to a group of autistic children or adults and to a control group. Participants needed either (1) to produce verbal descriptions of the animations that were rated according to the method of Abell et al. (2000) or similar criteria set out by Castelli et al. (2002) or (2) to complete the multiple-choice questions (MCQs) devised by White et al. (2011). Under the criteria of Castelli et al. (2002), verbal descriptions are rated for appropriateness and use of mental state language (although the latter is only meaningful for the mentalising animations). The MCQ version includes two question types: classification (of each animation as showing random, physical or mental interaction) and identification of feelings (represented in the mentalising animations). As this review compared performance
The following data were extracted from all papers: sample characteristics (sample size, age, gender, verbal ability and use of autism assessments), task characteristics (number and type of animations presented, scoring criteria used and the presence of inter-rater quality check), and means and SDs for performance on each animation type (in terms of appropriateness ratings and/or the classification MCQs, depending on the procedure used in this paper). Where means and SDs were not available in the paper, effect sizes were calculated on the basis of test statistics or authors were emailed. Authors were also emailed to confirm that samples were largely independent if there was uncertainty whether samples might have overlapped, for example, if the same authors published more than one paper in quick succession.
A meta-analysis was carried out in R (R Core Team, 2020) using the metafor package (Viechtbauer, 2010). Data and code are available on the Open Science Framework (https://osf.io/qa8p2/). For each animation type reported in each paper, the standardised mean difference between autistic and non-autistic groups (Hedges’

Forest plots for group differences between autistic and non-autistic people on the Frith-Happé Animations by condition.
This meta-analysis depends on the close matching of autistic and non-autistic groups across studies. While most studies had attempted to match participants for verbal ability, age and gender, there were some small discrepancies, as can be seen in Table 1, so the robustness of the results to these participant variables was assessed. For all studies, it was possible to determine the standardised mean difference in verbal IQ (or an equivalent measure) between the autistic and non-autistic groups recruited into the study, so this control variable was added to the initial meta-analysis as a fixed effect alongside animation type to control for any group differences in verbal ability. Differences in age and the proportion of males versus females across groups were not always calculable from the studies. Therefore, these factors were assessed in subsequent models re-running the meta-analysis across subsets of studies reporting this information (For age, the standardised mean difference was computed, whereas for gender the proportion of males in the autistic group was divided by proportion of males in the non-autistic group.).
Sample characteristics.
ADOS: Autism Diagnostic Observation Schedule; DISCO: Diagnostic Interview for Social and Communication Disorders; ADI-R: Autism Diagnostic Interview-Revised; DAWBA: Development and Wellbeing Assessment; 3Di: Developmental, Diagnostic and Dimensional Interview.
For each study, the first row relates to the autistic group and the second row to the non-autistic group. All studies required autistic participants to have a clinical diagnosis of autism based on Diagnostic and Statistical Manual of Mental Disorders (DSM)/International Classification of Diseases (ICD) criteria. Some studies gave participants or their families a diagnostic interview to confirm the diagnosis; this information is shown in the far column. For age, means (SDs) are given in years; months. For verbal ability, standard scores on norm-referenced tests of verbal intelligence are given.
Verbal ability for this paper was measured using raw scores from the Spot the Word Task.
Authors kindly supplied data for the Frith-Happé Animations relating to a larger sample size than reported in the paper (exclusions in these papers were made based on fMRI criteria). Descriptive statistics given here for age, sex and verbal ability relate to the slightly smaller samples reported in the paper.
These studies did not provide descriptive statistics for verbal ability, so full-scale IQ scores have been reported here.
For this paper, verbal ability was measured using a non-standardised vocabulary test.
After running the meta-analysis, Cook’s distance was used to assess for samples exerting undue influence on the results. Then, three further fixed effects – age-group (adult or child sample), format (verbal description or MCQs) and the inverse of the sample size – were included to investigate whether these possible moderators accounted for heterogeneity in individual effect sizes. The inverse of the sample size was included as a moderator to assess for publication bias, as a relationship between smaller studies and larger effects might exist if the former were only published if a large effect was found.
Results
Data from 1530 autistic and 1569 non-autistic people, drawn from 33 papers, were included in the meta-analysis. There were 2138 adults (1067 non-autistic and 1071 non-autistic) and 961 children (463 autistic and 498 non-autistic). Due to incomplete reporting, it was not possible to determine the exact gender distribution, but among adults, approximately 68% of the autistic and 59% of the non-autistic individuals were male, and among children, equivalent percentages were approximately 83% and 74%. Tables 1 and 2 present sample and task characteristics for each paper, and Figure 1 shows forest plots for effect sizes for each animation type from each paper.
Study characteristics.
This table shows the type of session used for data collection; the number of animations used; the rating criteria (Abell et al., 2000; Castelli et al., 2002) used for verbal descriptions (with or without inter-rater checks); and use of the multiple-choice questions (MCQs) of White et al. (2011).
Whereas effect sizes in the meta-analysis were generally calculated on the basis of means and SDs reported in papers, these studies did not present SDs but did give effect sizes. Therefore, these effect sizes were directly aggregated in the meta-analysis, with sampling variances calculated on the basis of the effect size and sample size.
Although behavioural and fMRI sessions were used in this study, data on the verbal descriptions have yet to be fully published, so only effect sizes based on data collected on the MCQs in the fMRI session have been included in the meta-analysis. Note, however, that the authors reported no group difference on total scores on the verbal descriptions, so results are likely to be similar to the null results collected in the fMRI session.
Although the MCQs were administered in both these studies alongside the verbal description paradigm, the papers did not report scores broken down by animation type, so only data on verbal descriptions are analysed in this review.
These studies did not present means or SDs, so effect sizes were calculated on the basis of test statistics.
Animation type (goal-directed movement, random movement, mentalising) was investigated as a fixed effect predicting autistic participants’ interpretation of the clips. Each level of the fixed effect was significant, with autistic people showing less normative interpretations than non-autistic people. Controlling for any group differences in verbal ability, absolute effect sizes [95% confidence intervals (CIs)] were small for the animations with random,
Effect sizes across the child and adult samples.
The outlying study (Clemmensen et al., 2016) involved children and has been removed from analysis. The random movement condition was only presented in studies involving adults.
However, there was significant heterogeneity in effect sizes,
Results of the moderator analysis.
The intercept reflects performance on the goal-directed movement condition of the Frith-Happé Animations. Hedges’
Discussion
Across the studies collated in this review, autistic people experienced a gradient of difficulty on the Frith-Happé Animations, with a small effect size difference between autistic and non-autistic people on the control conditions and an additional small increase in effect size on the mentalising animations. Analysis indicated that similar effects were found across children and adults on the spectrum and that there was evidence of publication bias slightly inflating these effects. It has been claimed that ‘impairments in individuals with autism can be revealed in characteristic inaccuracies in mental state attribution to animated shapes’ (Castelli et al., 2002, p. 1845). On the one hand, this meta-analysis did find a reliable difference between autistic and non-autistic people in mentalising skills as measured by the task. On the other hand, there are questions about the scale and specificity of the difference.
In the first study using the Frith-Happé Animations with autistic adults (Castelli et al., 2002), there was a very substantial difference between autistic and non-autistic people on the mentalising animations (
A different interpretation of the results would revolve around the task: that the Frith-Happé Animations might not be sensitive to individual differences in mentalising. First, it should be noted that restricted variance is not a problem for the task when presented to general population or clinical groups, as no study included in this review reported a ceiling effect. Therefore, the question is not about the presence of individual differences on the task, but what these represent. On the one hand, the Frith-Happé Animations and other mentalising tasks do not tend to correlate highly, if at all, suggesting that we cannot be confident in precisely what accounts for variance in performance on the tests. Gernsbacher and Yergeau (2019) collate evidence for the poor convergence between different mentalising tests, and conclude that mentalising lacks construct validity. On the other hand, we could equally argue that mentalising is not a single ability but a set of multiple, specific skills – a view that is supported by neural accounts of the ‘social brain’ (Schaafsma et al., 2015). Indeed, it is within the social neuroscience literature that we find the strongest argument for the validity of the Frith-Happé Animations as a test of mentalising. Studies show that the mentalising condition reliably activates social-cognitive networks in the brain that partially overlap with activation patterns observed for other mentalising tasks (see Schurz et al., 2014 for a meta-analysis). In the largest neuroimaging study of the Frith-Happé Animations to date (
As noted above in the meta-analysis, autistic people tended to underperform on the control animations. This raises the question whether difficulties on the Frith-Happé Animations can be explained to some extent by more general difficulties with interpreting motion, whether at a perceptual or higher cognitive level, which manifest across all animation types. Perceptually, autistic people have been found to perform differently on tasks involving detection and discrimination of global and biological motion without mentalising demands (e.g. Klin et al., 2009; Milne et al., 2002; Robertson et al., 2014). Interestingly, in their meta-analysis of global and biological motion tasks, Van der Hallen et al. (2019) found that autistic people underperformed to a very similar degree as they did on the control animations in the present study,
This meta-analysis presented evidence that autistic people show less difficulty in understanding social narratives in abstract animations than early reports indicated. This suggests that we should be cautious about suggesting that mentalising is necessarily an area of marked difficulty for autistic people, although we should also not underplay the subtle but reliable difference in mentalising that did emerge, which may impact on the behavioural phenotype in autism. Given that group differences between autistic and non-autistic people also emerged on the control conditions of the Frith-Happé Animations, it is possible that performance across the task is influenced, in addition to the mentalising demands, by domain-general abilities in perceiving and assigning meaning to motion.
Supplemental Material
sj-pdf-1-aut-10.1177_1362361321989152 – Supplemental material for Do animated triangles reveal a marked difficulty among autistic people with reading minds?
Supplemental material, sj-pdf-1-aut-10.1177_1362361321989152 for Do animated triangles reveal a marked difficulty among autistic people with reading minds? by Alexander C Wilson in Autism
Footnotes
Acknowledgements
The author would like to thank Professor Dorothy Bishop for comments on this article.
Funding
The author received no financial support for the research, authorship and/or publication of this article.
Declaration of conflicting interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
