Abstract
Impaired capacity for Theory of Mind (ToM) represents one of the hallmark features of the behavioral variant of frontotemporal dementia (bvFTD) and is suggested to underpin an array of socioemotional disturbances characteristic of this disorder. In contrast, while social processing typically remains intact in Alzheimer’s disease (AD), the cognitive loading of socioemotional tasks may adversely impact mentalizing performance in AD. Here, we employed the Frith-Happé animations as a dynamic on-line assessment of mentalizing capacity with reduced incidental task demands in 18 bvFTD, 18 AD, and 25 age-matched Controls. Participants viewed silent animations in which geometric shapes interact in Random, Goal-Directed, and ToM conditions. An exclusive deficit in ToM classification was observed in bvFTD relative to Controls, while AD patients were impaired in the accurate classification of both Random and ToM trials. Correlation analyses revealed robust associations between ToM deficits and carer ratings of affective empathy disruption in bvFTD, and with episodic memory dysfunction in AD. Voxel-based morphometry analyses further identified dissociable neural correlates contingent on patient group. A distributed network of medial prefrontal, frontoinsular, striatal, lateral temporal, and parietal regions were implicated in the bvFTD group, whereas the right hippocampus correlated with task performance in AD. Notably, subregions of the cerebellum, including lobules I-IV and V, bilaterally were implicated in task performance irrespective of patient group. Our findings reveal new insights into the mechanisms potentially mediating ToM disruption in dementia syndromes, and suggest that the cerebellum may play a more prominent role in social cognition than previously appreciated.
INTRODUCTION
Human social behavior is governed largely by the capacity to construct a “Theory of Mind” (ToM), enabling us to infer the thoughts, beliefs, and feelings of others [1]. This aptitude to consider perspectives distinct from our own appears to be so ubiquitous that we spontaneously attribute human character traits and ascribe mental states to inanimate shapes devoid of any of the facial, emotional, or social cues present in everyday social scenarios [2, 3]. In recent years, there has been a surge in research interest seeking to clarify the neurobiological substrates of ToM using functional neuroimaging techniques (reviewed by [4]) and to delineate how alterations to this complex process relate to the emergence of maladaptive social behaviors [5]. Here we explore the neural substrates of ToM impairments in younger-onset dementia.
The behavioral variant of frontotemporal dementia (bvFTD) presents a striking illustration of the degeneration of the “social brain” [6]. This form of younger-onset dementia is characterized initially by marked changes in behavior and personality, manifesting in executive dysfunction, emotion dysregulation, and dramatic impairments in interpersonal functioning [7]. A profile of emotional blunting, decreased empathy, loss of social interest, and diminished responsiveness to the feelings of others is commonly observed [8–12] resulting in florid violations of social norms, decreased tact, loss of empathy, and reduced interpersonal responsiveness [13, 14]. These changes are attributable to characteristic brain atrophy which originates in medial prefrontal, frontoinsular, and paralimbic structures and encroaches in a predictable fashion into adjacent prefrontal and anterior temporal regions [15–17].
The pervasive deficits in social function observed in bvFTD have been suggested to reflect the specific vulnerability of a core mentalizing mechanism subserved by the frontal lobes [18–20]. This hypothesis is supported by converging evidence of marked ToM impairments in bvFTD across an array of experimental paradigms including first- and second-order false-belief tasks [19, 21–24], cartoon tasks requiring social inferences [8, 26], tests of faux pas recognition [27–30], and ecologically valid tasks which require social inference to detect sarcasm in conversation [31, 32]. Consistent with the early medial prefrontal cortex (PFC) and frontoinsular atrophy in bvFTD, affective ToM (i.e., attributions of emotions and feelings) appears to be uniquely vulnerable in the initial stages of the disease, followed by the emergence of cognitive ToM deficits (i.e., attributions of beliefs and intentions) [33, 34].
In contrast, socioemotional functioning tends to remain relatively intact in Alzheimer’s disease (AD), at least in the early stages of the disease trajectory, despite marked impairments in episodic memory, language, and visuospatial abilities [31, 35]. Although a number of studies have revealed social dysfunction in AD, these deficits appear to manifest predominantly as a product of general cognitive dysfunction, rather than reflecting a primary impairment in specific social or affective cognitive processes per se (see [9, 25, 36]). A recent meta-analysis concluded that, even when present, ToM deficits in AD are typically less severe than in bvFTD, and modest when considered relative to overall cognitive impairment [37]. Consequently, it has been suggested that ToM measures may be well-suited to differentiate between AD and bvFTD [38], particularly in light of mounting evidence pointing to prominent memory impairments in bvFTD [39, 40] and executive dysfunction in AD [41, 42].
The multifaceted nature of ToM, and its reliance on several interacting social processes [43–45], renders the selection of appropriate tasks in clinical populations particularly challenging. In the context of bvFTD, task demands must be considered given conflicting findings regarding the extent to which ToM processes relate to executive dysfunction [46] and semantic impairments [25, 47] in this syndrome. As yet, there remains no firm consensus on how best to capture the inherently complex nature of ToM while limiting the influence of incidental task demands on results. Here, we employed the Frith-Happé animations; a silent dynamic ToM task widely used in the developmental literature, which shows reliable discrimination between children with autistic spectrum disorder and matched control groups [3, 48, 49]. Participants watch simple animations of geometric shapes moving in random, goal-directed, or in response to the mental state of each other (theory of mind) and must label the nature of the interaction (or lack of) accordingly. The advantage of this approach lies in the simplicity of the visual stimuli, contrasting with previous approaches which rely upon complex spatial arrays or detailed vignettes. This lowering of incidental task demands is important, as the cognitive loading of social cognitive tasks has been shown to adversely impact perspective-taking performance in AD [9, 33].
The objectives of the present study were twofold. First, we sought to investigate the capacity for ‘online’ mental state attribution in well-characterized cases of bvFTD using the Frith-Happé animations, and to compare their performance with that of disease-matched AD cases. Given that no study to date has used this dynamic task in dementia syndromes, a second aim was to delineate the neural substrates of ToM performance in each patient group using whole-brain voxel-based morphometry analyses, to further illuminate our understanding of the social brain.
MATERIALS AND METHODS
Participants
A total of 61 participants were included in the study. Eighteen patients with a clinical diagnosis of probable AD with predominantly amnestic presentation were contrasted with 18 bvFTD patients presenting with socioemotional and executive dysfunction. Patient performance was compared to that of 25 cognitively intact older Controls. Participants were recruited through FRONTIER, the frontotemporal dementia research group in Sydney. Clinical diagnoses were established in accordance with current diagnostic criteria for AD [35] or bvFTD [7] by consensus among a multidisciplinary team of a senior neurologist, neuropsychologist, and occupational therapist based on detailed cognitive assessment, clinical investigation, activities of daily living, and structural neuroimaging. Disease staging was estimated in terms of duration of months elapsed since symptom onset. Functional status of patients was determined using the frontotemporal dementia Functional Rating Scale (FRS) [50], a dementia staging tool sensitive to changes in functional abilities, activities of daily living, and behavioral symptoms.
Healthy Controls were recruited from volunteer panels and local community groups. All controls scored 88 or above on the Addenbrooke’s Cognitive Examination-III (ACE-III) [51] and 0 on the Clinical Dementia Rating scale (CDR) [52]. Exclusion criteria for all participants included: significant head injury, movement disorders, cerebrovascular disease, alcohol and other substance abuse, significant history of mental illness, and limited English proficiency.
Ethical approval for this study was obtained from the University of New South Wales ethics committee and the South Eastern Sydney Local Health District. All participants, or their Person Responsible, provided informed consent in accordance with the Declaration of Helsinki. Participants volunteered for their time and were reimbursed for travel costs.
Behavioral tasks
General cognitive screening
All participants completed a comprehensive battery of neuropsychological tests assessing integrity of the main cognitive domains. Global cognitive functioning was assessed using the ACE-III [51], which comprises orientation, memory, verbal fluency, language, and visuospatial subscales. Attention and working memory were measured using Digit Span forwards and backwards [53]. The Trail Making Test (Part B-A) [54] provided an index of executive function, while the Hayling Sentence completion test was included as a measure of response inhibition (Scaled Score C) [55]. Verbal episodic memory performance was measured using the Rey Auditory Verbal Learning Test (RAVLT) [56], whereas the Rey Complex Figure test was used as an index of non-verbal episodic memory (RCF) [57]. A percentage retained score was derived for RCF performance to control for executive and visuoconstructive processes (Recall score/Copy score×100).
Behavioral and socioemotional disturbance
Carers rated the extent of behavioral change in the patient groups via the Cambridge Behavioral Interview (CBI) [58]. In addition, a subset of carers (AD, n = 11; and bvFTD, n = 13) rated changes in socioemotional functioning in patients using the Empathic Concern (EC) and Perspective Taking (PT) subscales of the Interpersonal Reactivity Index (IRI) [59].
Assessment of Theory of Mind
Participants completed a revised version of the Frith-Happé animations task, as a dynamic assessment of ‘on-line’ mentalizing capacity [48]. Briefly, participants view a series of short silent animations, in which two geometric shapes (triangles) move about the screen. Three types of animations are presented: 1) Random, in which the movement of the triangles is purposeless and conveys little regarding the interaction, goals, or intentions of the triangles (e.g., bouncing); 2) Goal-Directed, in which the interaction between the two triangles depicts a clear behavioral purpose (e.g., dancing); 3) ToM, in which the interactions between the triangles suggest that one triangle anticipates or manipulates the “mental state” of the other (e.g., tricking). Examples of the test stimuli are provided at https://sites.google.com/site/utafrith/research.
Participants were required to view each animation and to give a concurrent verbal description of what was happening (i.e., narratives). At the end of each animation, participants then selected an appropriate multiple-choice categorization: “No Interaction” (Random), “Physical Interaction” (Goal-Directed), or “Mental Interaction” (ToM). The multiple-choice options were clearly visible on a sheet of paper in front of participants for the duration of the task (see Supplementary Material).
Participants completed two practice trials and were given feedback to ensure they were familiar with, and understood, task requirements. Then, the 12 animations were presented one at a time in a pseudo-random order. General prompts were given to ensure participants remembered task instructions but without being directive (e.g., “Can you tell me what is happening here”). No further feedback was given. Participants could only view each animation once.
Affective inference on Theory of Mind trials
If participants correctly identified a “Mental Interaction” on ToM trials, two additional multiple-choice questions were presented. Participants were asked to select from five adjectives, which feeling best applied to each of the two triangles at the end of the animation (e.g., ‘Frustrated’, ‘Loving’, ‘Tense’, ‘Playful’, or ‘No Feelings’). These questions thus provided an index of affective state inference and were not asked if participants failed to identify ToM animations, or on Random and Goal-Directed trials.
Multiple choice scoring
Correct categorization of interactions across the three experimental conditions (Random, Goal-Directed, ToM) were each awarded 1 point, leading to a maximum score of 4 points per condition, and a total of 12 points overall. Correct attribution of feelings on ToM trials, were each awarded 1 point (2 per ToM animation), leading to a maximum score of 8 points.
Coding of narrative content
Narrative content of ToM trials was analyzed for Appropriateness and Intentionality, in line with previous studies [3, 49]. The Appropriateness score reflected participants’ comprehension of the central theme of the animations, rated by the experimenter as 3 = ‘appropriate descriptions’, 2 = ‘partially appropriate descriptions’ (including descriptions that focused on one aspect or character of the script), or 1 = ‘non-appropriate descriptions’ (‘don’t know’ or unrelated answers). The Appropriateness criteria was specific to the events depicted in each animation; for example, for the animation depicting ‘mocking’, the narrative needed to convey the idea that the little triangle was copying the big triangle with the intention of not being noticed, e.g., ‘pretending’, ‘hiding’, ‘being naughty’.
The Intentionality score captured participants’ appreciation of mental states based on their use of verbs (e.g., ‘floating’, ‘running’, ‘mocking’). An ‘intentionality ladder’ was used to rank verbs on a 6-point scale with lower levels indicating non-deliberate movement and no interaction between the agents, all the way to purposeful actions deliberately intended to affect the other agent’s mental state. The highest-scoring verb within each narrative was taken as that narrative’s Intentionality score. Finally, Appropriateness and Intentionality scores were averaged across ToM trials. To maximize data across participants, the average score was included provided the individual had elaborated to some degree on at least three of four ToM trials.
Intentionality rating scale
Non-deliberate movement and no appreciation of, or interaction with, another agent, e.g., ‘moving around’ ‘floating’. Purposeful movement with no interaction, e.g., ‘walking’, ‘swimming’. Purposeful action with another agent (parallel in time), e.g., ‘fighting’, ‘following’. Purposeful action in response to actions of another agent (sequential in time), e.g., ‘copying’, ‘chasing’. Actions in response to a mental state, e.g., ‘mocking’, ‘arguing’. Actions with the goal of affecting another agent’s mental state, e.g., ‘persuading’, ‘surprising’.
Narratives were coded by A.S. unblinded to participant group. To guard against potential bias, an independent rater (A.M.) scored a randomly selected subset of transcripts (n = 10) comprising Control, bvFTD, and AD narratives, blind to participant diagnoses and study hypotheses. Inter-rater reliability was established using the intraclass correlation coefficient. Excellent convergence was evident as revealed by Cronbach’s alpha across Intentionality (Random: α= 0.802; Goal directed α= 0.848; ToM α= 0.849) and Appropriateness (Random α= 0.811; Goal-directed α= 0.897; ToM α= 0.945) subscales.
Statistical analyses
Behavioral data were analyzed using IBM SPSS Statistics (Version 23). Univariate analyses of variance (ANOVA) investigated main effects of group (AD, bvFTD, Controls) across demographic variables and background neuropsychological tests, with the exception of sex, for which a Chi-square test (χ2) was used. Given the low range of possible scores on the experimental task, non-parametric tests were employed. Kruskal-Wallis tests (H) were used to investigate main effects of group for the multiple-choice categorization scores, as well as the Appropriateness and Intentionality content scores for ToM narratives. Simple effects were then explored using Mann-Whitney tests (U), with Šidák correction for multiple comparisons. Within-subject differences across conditions were explored using Friedman tests followed by Wilcoxon post-hoc tests, with Šidák correction for multiple comparisons. Finally, one-tailed Spearman rank correlations were conducted to explore within-group relationships between task performance and cognitive domains of interest. Statistical significance was set at p < 0.05, with the exception of the correlation analyses where a more stringent p < 0.01 was employed to guard against the potential for false positive findings. Effect sizes are reported using eta-squared (η2) for parametric, and r coefficients for non-parametric, analyses.
MRI acquisition
Participants underwent whole-brain imaging using a 3T Philips MRI scanner with standard quadrature head coil (eight channels). Structural T1-weighted images were acquired via the following sequences: coronal orientation, matrix 256×256, 200 slices, 1 mm2 in-plane resolution, slice thickness 1 mm, echo time/repetition time = 2.6/5.8 ms, flip angle α= 8°. Scans were examined by a neuroradiologist for structural abnormalities; none were reported for Controls. Prior to analyses, all scans were visually inspected for significant head motion artefacts. Scans were available for 16 AD, 13 bvFTD, and 24 Control participants.
Voxel-Based Morphometry (VBM)
Structural MRI data were analyzed using the FSL-VBM toolbox [60, 61] from the FMRIB software package [62] (http://www.fmrib.ox.ac.uk/fsl/fslvbm). Briefly, images were extracted using the FSL brain extraction tool [63], following which tissue segmentation was conducted using FMRIB’s Automatic Segmentation Tool [64]. Grey matter partial volumes were aligned to the Montreal Neurological standard space (MNI152) via the FMRIB non-linear registration technique [65, 66] using a b-spline representation of the registration warp field [67]. A study-specific template was created in which AD, bvFTD, and Control participants were equally represented, following which the native grey matter images were re-registered non-linearly to this template. The registered partial volume maps were then modulated by dividing by the Jacobian of the warp field to correct for local expansion or contraction. Modulated segmented images were smoothed using an isotropic Gaussian kernel with a sigma of 3 mm.
Covariate analyses
Correlations between performance on the experimental task and regions of grey matter atrophy were explored in each patient group combined with Controls, by including the total multiple-choice categorization score as a covariate in the general linear model. The total multiple-choice score was included as the covariate of interest given its larger range of possible scores (0–12) ensuring sufficient variability in the data to capture brain-behavior relationships. For statistical power, a covariate only statistical model with a positive [1] t-contrast was used, providing an index of association between grey matter intensity and performance on the experimental task. Education was included as a nuisance variable in these analyses. Clusters were extracted voxelwise and reported uncorrected at p < 0.001, using a conservative cluster extent threshold of 100 contiguous voxels. This approach minimizes Type I error whilst balancing the risk of Type II error [68] and is consistent with previously published methods [69, 70].
Anatomical locations of significant results were overlaid on the MNI standard brain, with maximum coordinates provided in MNI stereotaxic space. Anatomical labels were determined with reference to the Harvard-Oxford probabilistic cortical atlas.
RESULTS
Demographics and clinical characteristics
Table 1 displays background clinical and cognitive data. The groups did not differ in terms of age (F(2, 58) = 2.861, p = 0.065) or sex distribution (χ2 = 3.963, p = 0.057). Education, however, differed between the groups (F(2, 58) = 6.800, p = 0.002), with Controls spending significantly longer in formal education relative to AD (p = 0.007) and bvFTD (p = 0.011) patients (AD versus bvFTD; p = 0.998). Patient groups did not differ in terms of disease duration (months elapsed since onset of symptoms; p = 0.995). While greater functional impairment was evident in bvFTD compared to AD (FRS: F(1, 34) = 19.031, p < 0.001), AD patients displayed greater overall cognitive impairment relative to the bvFTD group (p = 0.041). These profiles are consistent with previous reports in the literature [9, 71].
Demographic and neuropsychological performance of the study cohorta,b
aScores depict mean values with standard deviations provided in parentheses. bMaximum test scores provided in parentheses, where applicable. cPatients were rated by carer, controls provided self-ratings. bvFTD, behavioral variant frontotemporal dementia; AD, Alzheimer’s disease; FRS, Frontotemporal Dementia Rating Scale; ACE-III, Addenbrooke’s Cognitive Examination –3rd Edition; CBI, Cambridge Behavioral Inventory; RCF, Rey Complex Figure test; RAVLT, Rey Auditory Verbal Learning Test; IRI, Interpersonal Reactivity Index. Unavailable data by test and group: CBI, 5 controls; ACE-III and ACE-III Fluency, 1 bvFTD; Digit span, 2 AD and 1 controls; Hayling Scaled score C, 9 AD, 2 bvFTD, and 2 controls; RAVLT short delay and long delay, 2 AD, 5 bvFTD, and 1 controls; RAVLT recognition, 3 AD, 5 bvFTD, and 1 controls; RCF copy, 3 AD and 2 bvFTD; RCF three minute recall and % retained, 4 AD and 2 bvFTD; Trail Making Test part A, 2 AD, and 2 controls; Trail Making Test part B and B-A, 6 AD, 6 bvFTD and 2 controls; IRI perspective taking and empathic concern, 7 AD, 5 bvFTD, and 7 controls. *p < 0.05; **p < 0.01; ***p < 0.001; n.s. not significant; ‘–’, not applicable.
General cognitive functioning
Patients displayed cognitive profiles in keeping with their clinical diagnoses (Table 1). Briefly, AD patients exhibited hallmark episodic memory deficits (RCF; RAVLT; CBI Memory %), with further impairments evident in visuoconstruction (RCF Copy), processing speed (Trail Making Test A) and set-shifting (Trail Making Test B-A) relative to Controls (all p values < 0.05). BvFTD patients displayed characteristic impairments in attention (Digit Span Forwards), working memory (Digit Span Backwards), and delayed episodic recall (RAVLT) compared to Controls (all p values < 0.001).
Direct comparison of the patient groups revealed disproportionate impairments in verbal and non-verbal memory in AD versus bvFTD (RCF; RAVLT; both p values < 0.01). In contrast, carers of bvFTD patients reported higher levels of abnormal behaviors on the CBI (e.g., tactless, impulsive, embarrassing, or uncooperative behavior) relative to the AD group (p < 0.001).
Socioemotional functioning
Carer ratings on the IRI revealed lower capacity for empathy in both patient groups relative to controls. Perspective Taking (i.e., cognitive empathy) was significantly disrupted in AD and bvFTD (p values < 0.001), as was Empathic Concern (i.e., affective empathy; AD, p = 0.043; bvFTD, p < 0.001). No significant differences were observed between the patient groups for either subscale on the IRI (Perspective Taking: p = 0.484; Empathic Concern: p = 0.362).
Theory of mind performance
Overall classification of interactions
A Kruskal-Wallis test revealed a significant main effect of group for the correct classification of trials (H(2) = 17.769, p < 0.001). Follow-up Mann-Whitney U tests revealed that, irrespective of condition, both AD (U = 77.50, Z = –3.67, p < 0.001, r = –0.56) and bvFTD (U = 77.50, Z = –1.73, p = 0.001, r = –0.51) patients showed poorer categorization of the animated trials relative to Controls, with no significant differences between the patient groups (U = 138.0, Z = –0.77, p = 0.462, r = –0.13).
Classification performance by interaction type
Figure 1 displays the classification of animations by interaction type across participant groups. A significant main effect of group was observed in the Random condition, (H(2) = 11.798, p = 0.003), driven exclusively by poor performance in the AD group relative to Controls (U = 98.5, p = 0.001, Z = –3.409, r = –0.52). In contrast, bvFTD patients scored in line with Controls by correctly classifying random movements (U = 165.500, p = 0.084, Z = –1.730, r = –0.26). No significant differences were evident between the patient groups (U = 111.500, p = 0.096, Z = –1.665, r = –0.28).

Mean correctly identified animated shapes trials by trial type and group. Error bars display standard error of the mean. Asterisks denote group differences relative to Controls: **p≤0.01; ***p≤0.001.
No main effect was observed on Goal-Directed trials, (H(2) = 1.090, p = 0.580), indicating that patients could correctly classify the purposeful and concrete physical interactions of the animated triangles (e.g., playing tennis).
Finally, a main effect of group was observed (H(2) = 15.954, p < 0.001) in the ToM condition, with significant impairments emerging in both patient groups compared to Controls (AD, U = 118.0, p = 0.004, Z = –2.892, r = –0.44; bvFTD, U = 76.0, p < 0.001, Z = –3.935, r = –0.60). No significant differences were found between the patient groups (U = 157.00, p = 0.888, Z = –0.163, r = –0.03).
Within-group comparisons using Friedman’s tests in each group separately, revealed a main effect of condition in Controls (χ2(2, n = 25) = 19.279, p < 0.001), with significantly poorer performance on Goal-Directed relative to Random (z = –3.363, p = 0.001, r = –0.67) and ToM (z = –3.020, p = 0.003, r = –0.60) trials. In contrast, performance was comparable across conditions for AD (χ2(2, n = 18) = 0.918, p = 0.632) and bvFTD (χ2(2, n = 18) = 3.966, p = 0.138) patients.
Identification of feelings on ToM trials
Given that the ‘feelings’ questions related exclusively to correctly classified ToM trials, a reduced pool of responses was available across participant groups (Controls 91.0%, AD 59.7%, bvFTD 61.1% of available questions). Figure 2 displays average correct responses for feelings identification on correctly classified ToM trials. A significant main effect of group was observed for accuracy on the identification of ToM Feelings (H(2) = 28.906, p < 0.001), reflecting the poorer performance of both AD (U = 35.500, Z = –4.297, p < 0.001, r = –0.68) and bvFTD (U = 34.0, Z = –4.614 p = <0.001, r = –0.71) patients, relative to Controls. No significant differences were observed between the patient groups (U = 114.5, Z = –0.502, p = 0.628, r = 0.09).

Mean performance for the correct identification of feelings on Theory of Mind trials by group. Error bars display standard error of the mean. Asterisks denote group differences relative to Controls: ***p≤0.001.
Correlations
One-tailed Spearman correlations explored associations between classification of interactions on ToM trials and cognitive domains of interest in each participant group separately (Table 2). A significant positive association was observed between verbal episodic memory and correct categorization of ToM trials in AD (r = 0.588). In contrast, carer ratings of empathic concern on the IRI correlated with ToM performance in bvFTD (r = 0.712). No other significant associations were evident at the corrected threshold of p < 0.01.
Correlations between classification of ToM trials and cognitive and interpersonal variables by groupa
**p≤0.01. aUnavailable data by test and group: ACE-III and ACE-III Fluency, 1 bvFTD; Digit Span, 1 control and 2 AD; RAVLT Short Delay, 1 control, 2 AD, and 5 bvFTD; Trails A, 2 controls and 2 AD; Trails B-A, 2 controls, 6 AD, and 6 bvFTD; Hayling Scaled score C, 9 AD, 2 bvFTD, and 2 controls; RCF copy, 3 AD and 2 bvFTD; CBI Total, 5 controls; IRI-EC and IRI-PT, 7 controls, 7 AD, and 5 bvFTD.
ToM narrative descriptions
Constraining our focus to ToM trials, participants’ accompanying narratives were analyzed in terms of Appropriateness (i.e., accurate description of the activities portrayed in the animation) and Intentionality (i.e., the use of verbs reflecting the appreciation of mental states) (see Fig. 3). Representative transcripts are included in Supplementary Material.

Breakdown of narrative content on Theory of Mind trials in terms of (A) Appropriateness and (B) Intentionality across participant groups. Error bars display the standard error of the mean. Asterisks denote group differences relative to Controls: **p≤0.01; ***p≤0.001.
Appropriateness
A significant main effect of group was observed for Appropriateness (H(2) = 28.350, p < 0.001), reflecting the fact that both AD (U = 57.000, Z = –4.222, p < 0.001, r = 0.64) and bvFTD (U = 40.0, Z = –4.634, p < 0.001, r = 0.71) patients gave less accurate descriptions of the activities portrayed by the shapes compared to Controls. AD and bvFTD groups did not differ in Appropriateness (U = 140.5, Z = –0.726, p = 0.468, r = –0.12).
Intentionality
A significant main effect of group was also observed for Intentionality (H(2) = 13.474, p = 0.001), reflecting compromised mental state attributions in both the AD (U = 121.0, Z = –2.595, p = 0.009, r = 0.40) and bvFTD (U = 94.000, Z = –3.268, p = 0.001, r = 0.50) groups relative to Controls. No significant differences were observed between the patient groups for Intentionality (U =117.500, Z = –1.443, p = 0.149, r = –0.24).
Controlling for verbal generativity
During scoring, it was noted that patients generated less verbal content overall than Controls. To control for generativity, a word count was performed on, and averaged across, the ToM trials. Audio recordings were not available for 2 Controls, 2 ADs, and 2 bvFTD participants. In these cases, the missing value for each participant was imputed using mean substitution.
An analysis of covariance (ANCOVA), with word count as a covariate, continued to reveal main effects of group for Appropriateness (F(2, 57) = 8.176, p = 0.001, η2 = 0.19) and Intentionality (F(2, 57) = 4.114, p = 0.021, η2 = 0.12). Post hoc simple effects analyses, however, revealed an altered pattern of findings. That is, controlling for word count served to ameliorate the deficits in the AD group relative to Controls, including how well they captured the intended underlying script (Appropriateness, p = 0.087) and their appreciation of mental states (Intentionality, p = 0.528). In contrast, the bvFTD group continued to show significant deficits relative to Controls for Appropriateness (p < 0.001) and Intentionality (p = 0.020).
Neural correlates of task performance
Figure 4 shows the significant regions to emerge from the covariate analyses investigating overall task performance (i.e., correct classification of interactions across conditions) in AD and bvFTD, controlling for education.

Voxel-based morphometry covariate analyses showing brain regions which correlate significantly with task performance in AD (green) and bvFTD (red). Colored voxels show regions that were significant in the analyses at p < 0.001 uncorrected. All clusters reported t > 3.3. Education is included as a covariate in all analyses. R, right. For full description of clusters and relevant coordinates, please refer to Table 3.
In AD, overall classification performance correlated with integrity of the right hippocampus, and lobules I-IV and V of the cerebellum, bilaterally. In contrast, a distributed network of regions was implicated in the bvFTD group, including the bilateral medial, orbitofrontal, and frontoinsular cortices, and the caudate. Left lateral anterior temporal regions further emerged as significant in the analyses, as did regions in the left lateral parietal cortex, and the precuneus, bilaterally. Finally, subregions of the cerebellum including the bilateral lobules I-IV and V, Right Crus I and Crus II, and left lobule VI were also implicated (Table 3).
Voxel-based morphometry results showing regions of significant grey matter intensity decrease associated with classification of interactions in AD and bvFTD, combined with Controls
MRI scans not available for 2 AD, 5 bvFTD, and 1 Control participant. All clusters reported using voxel-wise contrasts, uncorrected at p < 0.001, and with a cluster extent threshold of 100 contiguous voxels. Years in education included as a nuisance variable in all contrasts. All clusters reported at t > 3.3. L, left; R, right; B, bilateral; MNI, Montreal Neurological Institute.
DISCUSSION
This study demonstrates comparable deficits in the classification of ToM interactions and feelings in bvFTD and AD, using the Frith-Happé animations. Whereas ToM disruption in AD appears to be primarily mediated by hippocampal degeneration, ToM impairments in bvFTD reflect the breakdown of a distributed set of regions implicated in specific social and affective cognitive processes. We discuss our findings in terms of understanding different drivers of theory of mind disruption across dementia syndromes, and how damage to discrete brain regions impacts the capacity for social inference.
The most striking finding to emerge from our study was the observation of marked impairments in mental state attribution in bvFTD on a simplified task designed to minimize cognitive load. Critically, these impairments were not attributable to a broader difficulty in comprehending ambiguous movements or interactions more generally, as bvFTD patients scored in line with Controls for Random and Goal-directed classification. Moreover, this mentalizing deficit spanned both cognitive and affective branches of ToM, as even when bvFTD patients successfully determined that a ToM interaction had taken place, they could not accurately identify the predominant feelings of the main characters. Our findings thus reinforce a large body of evidence pointing to bvFTD as a disorder of social cognition, with marked deficits evident irrespective of ToM domain, or indeed the cognitive loading of the task [9, 72].
Correlation analyses did not show significant associations between overall task performance, and neuropsychological tests of executive function, episodic memory, or semantic comprehension in bvFTD. This lack of association between ToM performance and executive function is in contrast with the proposal of a domain-general executive component to mental state attributions [21, 73] and may reflect the limited nature of our executive battery (Digit span, Trail Making Test, Hayling Test). The relationship between executive dysfunction and ToM disruption in bvFTD remains poorly understood [74], and depends upon the nature of the ToM and executive tasks employed, and the disease severity of the patient samples [22]. In this context, the only significant association to emerge was with carer ratings of empathic concern on the Interpersonal Reactivity Index, a measure of the capacity to share the feelings of others. While our findings reinforce the close correspondence between ToM dysfunction and the characteristic loss of empathy displayed in everyday social interactions in bvFTD [75], the cognitive mechanisms underlying these symptoms remain unclear. Turning our attention to the AD group, significant impairments were evident not only for ToM attributions, but also for Random movement classifications. Interestingly, AD patients scored in line with Controls for Goal-Directed trials, in which the triangles moved purposefully in a concrete pattern. This profile of responses may therefore reflect an inability to build a figurative interpretation from ambiguous movements, reflecting the general deterioration in abstract reasoning commonly observed in AD [76]. Overall task performance was found to correlate with episodic immediate recall on the RAVLT, suggesting that the short delay between viewing the animations and subsequently conferring a judgment may have further impacted AD performance. While we did not directly assess response times, it was noted during testing that participants tended to provide their classification response whilst viewing Goal-Directed trials, whereas for Random and ToM conditions, participants tended to wait until the events of the animation had unfolded before responding. Similarly, judgments of affective ToM were as impaired in AD as bvFTD, with some AD patients commenting that they simply could not remember what had transpired during the animation in order to answer the affective questions. Concordant with recent findings [36], we suggest that the ToM deficit in AD is multifactorial, in this case, reflecting difficulties in interpreting the ambiguous nature of the stimuli, coupled with hallmark impairments in episodicmemory.
Analysis of participants’ narratives provided further insights into the nature of the ToM impairment in bvFTD and AD. Both patient groups showed comparable difficulties in conveying an accurate description of the animations (Appropriateness) and the use of suitable verbs to reflect the underlying mental state (Intentionality). Importantly, this on-line capacity to interpret the animations as they unfold in real time is proposed to reflect the fast-paced nature of social interactions, lending ecological validity to the task [48]. Nevertheless, the production of verbal narratives in this manner is highly dependent on generative processes, known to be impacted in both syndromes [77]. Controlling for the overall production of content during narration served to ameliorate the Appropriateness and Intentionality deficits in the AD, but not the bvFTD, group. This finding suggests that the relative paucity of verbal material generated by AD patients may, at least partially, underlie the diminished quality of their narrative descriptions. In contrast, deficits across both tasks persisted in the bvFTD group, despite controlling for verbal production.
VBM analyses allowed us to further explore the potential mechanisms driving ToM impairments in each patient group. In keeping with previous studies, we demonstrated robust associations between task performance in bvFTD and atrophy in a distributed set of brain regions, including bilateral medial and orbitofrontal, frontopolar, insular, lateral temporal, and occipitoparietal cortices. These regions have previously been implicated in cognitive perspective-taking deficits in bvFTD [9] suggestive of a common neural mechanism mediating cognitive aspects of social inference. Moreover, activity in this network is consistently reported in functional neuroimaging studies of ToM [78] and increases disproportionately with mentalizing level [79]. It is important to note that many of the regions implicated in ToM dysfunction in bvFTD are critical nodes of the brain’s ‘Salience Network’; a distributed functional network posited to play a central role in processing socially salient internal and external stimuli [80]. Degeneration of the Salience Network has been proposed to underlie the florid socioemotional difficulties characteristic of bvFTD, limiting the capacity to rapidly process, integrate, and respond to socially-relevant information [81, 82].
In addition, we found that task performance in bvFTD was associated with atrophy in dorsal (caudate nucleus, putamen) and ventral (nucleus accumbens) striatal regions, which have previously been implicated in cognitive and affective aspects of ToM attribution, respectively [83]. Striatal contributions to ToM are seldom discussed, although a number of studies have documented striatal activity during mentalizing tasks [83, 84]. Given its dense connections with cognitive, motor, and limbic circuits in the brain, striatal activity on ToM tasks may reflect the coordination of cortical and subcortical information [85] in the service of goal-directed behavior [86]. Given that fronto-striatal atrophy is disproportionately present in bvFTD compared to AD [87], it will be important for future studies to determine the precise role of the striatum in higher-order social cognitive processes.
Overall classification performance in AD was found to relate exclusively to grey matter intensity decrease in the right hippocampus and the cerebellum, bilaterally. Observation of a significant hippocampal contribution complements our behavioral findings, implicating episodic memory disruption as a key driver of ToM disruption in AD. While not typically associated with ToM capacity, the hippocampal declarative memory system is proposed to support a number of processing features which may be crucial for social cognition [88]. First, the hippocampus supports representational flexibility, enabling memories to be accessed across different processing systems in the service of diverse cognitive capacities [88]. Second, the hippocampus supports on-line processing of complex configurations, enabling information to be held “in-mind” in the service of task performance [89, 90]. Patients with damage to the hippocampus display stark alterations in socioemotional functioning [88, 91], attributable to the breakdown of representational flexibility and on-line processing [88]. In the context of the current study, we suggest that disruption to these hippocampal dependent processes in AD impedes the ability to recognize the shifting and changing status of unpredictable trials (Random, ToM) and to communicate the unfolding of events in a coherent manner.
Finally, an interesting, and somewhat unexpected finding, was our observation of significant cerebellar contributions to overall task performance. The common cerebellar subregions implicated, irrespective of patient group, included lobules I-IV and V, bilaterally. While not typically associated with mentalizing, the cerebellum has been implicated previously in fMRI studies of higher-order intentionality [79], basic ToM processing [92, 93], and the emergence of mentalizing deficits in neurodevelopmental disorders such as autism spectrum disorder [94]. Importantly, a recent meta-analysis of cerebellar activation in social cognitive tasks suggests that its contribution may be crucial for mentalizing in conditions where the level of abstraction is high [92]. Such abstract judgments are essential for successful social interactions, and require us to move away from the concrete “here and now” to consider abstract personality traits, hypothetical scenarios, or social group characteristics [92]. Further, a recent study suggests that the cerebellum’s involvement in higher-order intentionality reasoning may reflect its coordination of multiple cognitive processes particularly when it is necessary to keep track of and differentiate between several mental states simultaneously [79]. By this view, keeping track of the states of mind of the two triangles, under highly abstract conditions, is likely to disproportionately tax the cerebellum. Notably, our finding of common anterior lobe involvement in higher-level cognitive functions resonates with previous studies [95, 96] and challenges the prevalent anterior-sensorimotor versus posterior-cognitive/emotional dichotomy in the human cerebellum [97].
A number of methodological issues warrant consideration in this context. First, our sample sizes are relatively modest, reducing our power to detect significant brain-behavior relationships using conservative correction methods. Accordingly, it will be important for future studies to replicate these findings in a larger sample of patients. The Frith-Happé animations present a novel way to assess the on-line interactions between agents, stripped of various semantic, executive, and attentional demands. This inherent simplicity, however, comes at a cost in terms of the ecological validity of the task in the sense that it removes much of the necessary contextual information on which social functioning is predicated. As such, paring the task back to focus on the movements of the triangles strips these animations of the naturalistic social cues that we invariably rely on to make social judgments and inferences in our daily lives. Further, although not originally intended for this purpose, the task failed to distinguish between AD and bvFTD patient groups across any of the ToM subscales, limiting its clinical utility in the differential diagnosis of dementia syndromes. While we did not find evidence of an association between the ToM task and visuospatial functioning, it is possible that the demands placed on shape and movement perception may impede task performance in advanced stages of AD, and this represents an important consideration for future studies. Further, the simplicity of the triangle stimuli may, paradoxically, prove too abstract for dementia patients to conceptualize as social agents, in the face of increasingly concrete styles of thinking. Finally, a clear limitation of this task lies in the fact that, by its nature, participants are reduced to spectators rather than active participants in the social scenarios. This aspect of the task is divorced from the complex way in which we fluidly interact in social scenarios and comes at a critical cost in terms of the ecological validity of the task. The challenge for future studies will therefore be to develop ecologically valid tasks that, on the one hand, foster the active participation of individuals within the test scenario, yet at the same time ensure that cognitive demands are minimized, in order to dissociate between cognitive and affective contributions to social dysfunction. Further, as neurodegenerative disorders are characterized by widespread network disturbances [17, 98], it will be crucial for future work to elucidate how alterations in structural and functional connectivity differentially impact ToM capacity across dementia syndromes.
Conclusions
In summary, we have demonstrated the pervasive nature of ToM deficits in bvFTD, manifesting across cognitive and affective domains even when a relatively simple task is employed. These deficits are attributable to the vulnerability of a distributed brain network, consistently implicated in social cognitive function. In contrast, while AD patients display ToM impairment of the same magnitude as observed in bvFTD, these deficits appear largely cognitively driven. A novel finding to emerge from this study was the common involvement of the cerebellum in task performance irrespective of patient group. Given mounting evidence of selective vulnerability of the cerebellum across a broad range of psychiatric [99] and neurodegenerative disorders [95], delineating the specific contribution of the cerebellum to social cognitive function represents a major challenge in dementia research.
Footnotes
ACKNOWLEDGMENTS
The authors are grateful to the patients and their families for their continued support of our research. The authors wish to acknowledge Jody Kamminga and Nadene Dermody for their assistance with participant recruitment and testing. This work was supported in part by funding to Forefront, a collaborative research group dedicated to the study of frontotemporal dementia and motor neuron disease, from the National Health and Medical Research Council (NHMRC) of Australia program grant (APP1037746) and the Australian Research Council (ARC) Centre of Excellence in Cognition and its Disorders Memory Program (CE110001021). MI is supported by an ARC Future Fellowship (FT160100096). FK is supported by an NHMRC-ARC Dementia Research Development Fellowship (APP1097026). YC is supported by the State Scholarship Fund of China (No. 201608200010). OP is supported by an NHMRC Senior Research Fellowship (APP1103258).
