Abstract
Evidence has shown that older adults have lower accuracy in Theory of Mind (ToM) tasks compared with young adults, but we are still unclear whether the difficulty in decoding mental states in older adults stems from not looking at the critical areas, and more so from the ageing Asian population. Most ToM studies use static images or short vignettes to measure ToM but these stimuli are dissimilar to everyday social interactions. We investigated this question using a dynamic task that measured both accuracy and error types, and examined the links between accuracy and error types to eye gaze fixation at critical areas (e.g., eyes, mouth, body). A total of 82 participants (38 older, 44 young adults) completed the Movie for the Assessment of Social Cognition (MASC) task on the eye tracker. Results showed that older adults had a lower overall accuracy with more errors in the ipo-ToM (under-mentalising) and no-ToM (lack of mentalisation) conditions compared with young adults. We analysed the eye gaze data using principal components analysis and found that increasing age and looking less at the face were related to lower MASC accuracy in our participants. Our findings suggest that ageing deficits in ToM are linked to a visual attention deficit specific to the perception of socially relevant nonverbal cues.
Theory of Mind (ToM) involves the attribution of independent mental states to self and others to predict and explain behaviour (Baron-Cohen et al., 1985). Having good ToM skills is thought to enable individuals to understand social contexts and to adopt socially adequate behaviours. It is known that older adults who are socially engaged tend to better adjust to the process of ageing (von Humboldt & Leal, 2014), which implies that having good social interactions is a result from having good ToM skills too (Lecce et al., 2017). Most researchers generally agree that older adults perform worse compared with young adults in ToM tasks despite the greater heterogeneity in the materials used (Fernandes et al., 2021; Henry et al., 2013). In their meta-analyses, Henry et al. (2013) raised an important theoretical point as to whether the difficulty on ToM tasks in older adults is due to a domain-specific decline in mental state understanding or general cognitive changes that impact their perception and understanding of cues in social interaction.
One other interesting question also lies in whether culture has an effect on ToM performance. Quesque and colleagues (2022) reviewed 587 participants from 12 countries across 18 sites and found that approximately 20% variation in ToM performance is from cultural and location differences after controlling for age, gender, and education, and that those who identified English as their first language tended to perform better in a ToM task compared with non-English speakers. In terms of non-White populations, one study showed that older Malaysians were worse in ToM accuracy (as measured using the faux pas test) compared with older British and that older adults in general were worse than young adults (Yong et al., 2022). Other studies conducted in China reported age-related declines in false belief reasoning (Li et al., 2013), and lower accuracy in terms of recognising social transgressions (Wang & Su, 2006; Zhou et al., 2019) and that these results were mediated by education levels, working memory, and motivation (although see Kong et al. (2022), who found that ToM difficulties were independent of working memory, processing speed and fluid intelligence). Taken together, we know little about cultural differences on ToM particularly from non-White population.
Studies examining ToM tend to use a wide range of measures such as still images or dynamic videos, silent or busy speech interactions (although Henry et al. (2013) argued that older adults still perform poorer despite differences in methodology). Nevertheless, many standard ToM tasks do not measure skills typically needed in everyday social interactions as these tasks use two-dimensional still images (Adams et al., 2010; Baron-Cohen et al., 2001; Dalton et al., 2005) and written vignettes (Happé, 1994; White et al., 2009), which are quite different from real-life social interactions that often involve the perception of subtle dynamic social cues. Performance in verbal tasks is more dependent on executive function and working memory (Moran, 2013), whereas real-life interactions are likely to be more reliant on the understanding of rapid nonverbal cues interleaved with verbal cues in a complex manner (Carmichael & Mizrahi, 2023). Indeed, interaction with others is necessary when judging others’ goals/desires, intentions, and personality characteristics as demonstrated in one study (De Lillo & Ferguson, 2022). Unlike most studies, they found that older adults had higher accuracy in judging goals/desires and intentions, suggesting that a lifetime experience of making such inferences has facilitated wisdom (De Lillo & Ferguson, 2022). Taken together, more evidence is required to examine this methodological and ecological point with a dynamic task that is more comparable to real-life stimuli than still photos or written stories.
In a typical social interaction, there are multiple cues in both verbal and nonverbal form. For example, the speaker conveys important cues from facial expressions and bodily movements, which may contain emotional content. The facial expression of the speaker helps the listener to understand and infer the speaker’s mental states. Real-time perspective-taking studies have shown that, compared with young adults, older adults have difficulties in perceiving social cues (Pavic et al., 2021; Saryazdi & Chambers, 2021). One reason could be that decoding visual social information in older adults requires a high cognitive load (Pavic et al., 2021) and older adults are possibly unable to inhibit looking at less relevant areas (Störmer et al., 2013). For instance, a recent study showed that older adults attended less to the face during face-to-face conversation, and also less to people generally when navigating in the real world (De Lillo et al., 2021), demonstrating further evidence that older adults are more susceptible to distraction than young adults. When still photos are employed and the face is split into eyes- versus mouth-looking, studies indicate that young adults spend more time looking at eyes than older adults (Firestone et al., 2007; Murphy & Isaacowitz, 2010; Sullivan et al., 2007; Wong et al., 2005); although see Ebner et al., 2011). Chaby et al. (2017) suggested that older adults adopted a focused-gaze strategy on the lower half of the face, thus limiting their ability to identify some emotions shown in the upper half, while young adults adopted an exploratory-gaze strategy which has a wider scanning area. Consistent with this idea, when just the eyes or just the mouth are presented to participants, older adults have more consistent difficulty identifying emotions from the eyes than the mouth compared with young adults (Kong et al., 2022). Such findings suggest that older adults might look at mouths more because they find the emotion information in mouths more informative than that in the eyes.
Another explanation is that older adults have a preference to gaze away from negative stimuli (Knight et al., 2007) consistent with the socioemotional selectivity theory proposed by Charles and Carstensen (2010). Other than faces, body movements can also convey important cues although, here again, older adults tend to be worse than young adults at identifying emotion (Ruffman et al., 2008). Furthermore, evidence has shown that congruent contextual information identified from actor bodies is more prominent to older adults compared with faces (Noh & Isaacowitz, 2013; Vicaria et al., 2015), suggesting that older adults obtain additional useful information from bodily movements to assist them in recognition and making overall emotion judgements.
Unlike studies that used still images/faces only, the studies that have examined bodily cues have employed different modalities as well. For example, Noh and Isaacowitz (2013) included images that were accompanied with three possible contexts; a congruent context (e.g., raised fist with an angry face), incongruent context (holding dirty underwear with angry face), and “neutral” with no context, and Vicaria et al. (2015) included video clips depicting a young adult male–female dyad having a debate on controversial topics with the clips containing cues such as gestures, synchrony, proximity, and other forms of overt behaviour that either encourages or discourages rapport building. Another study had their participants viewing facial expressions (anger, disgust, fear, neutral) that were contextualised by other faces, objects and scenes, and were told explicitly to ignore the context (Study 1) or were informed of their (ir)relevance (Study 2) (Ngo & Isaacowitz, 2015). Results showed that older adults were significantly more influenced by context when the target was paired with an incongruent context. However, when participants were informed about the irrelevant context, they were able to reduce the contextual influence of body posture. In summary, while all three studies demonstrated a lower accuracy in older adults compared with young adults, their eye gaze results showed that older adults had more fixations in the context region (such as actors’ bodies, hand gestures, objects) compared with the eye region (Noh & Isaacowitz, 2013), and that older adults were able to use the social gestures and cues to correctly infer the actors’ rapport judgement (Ngo & Isaacowitz, 2015; Vicaria et al., 2015). This suggests that there is some combination of bottom–up and top–down attentional control when making judgements and that older adults were more likely to use contextual information (scenes, bodily cues) compared with faces to help them form a judgement as an indication of understanding. Seeing that context may play an important role in older adults’ attention, and information from the eye tracker may help us to identify whether accuracy and errors are related to a specific region in the face or non-face context. Alternatively, information in bodies or wider contextual information might help to offset any difficulties that older adults have in recognising emotion from faces alone.
Following this, one question remains as to whether eye gaze fixations could help us understand older adults’ ToM detection and understanding. Here, we report some studies that use real-time social interactions as stimuli. One study found that older and younger adults had different viewing approaches when making rapport judgements, in that older adults looked less at faces and used bodily cues to accurately infer on rapport judgements (Vicaria et al., 2015). Prior evidence suggests that having additional contextual information would be beneficial for older adults for social interactions with others but this was not the case for one study (Grainger et al., 2019). Their results showed that no significant relations between viewing faces or bodies to ToM accuracy in both young and older groups in The Awareness of Social Inference Test (TASIT) task. Another study reported similar findings in that they found no age difference in gaze patterns for a film-based task for all three age groups (young, middle-aged, older adults) as they looked equally at target and background (Grainger & Henry, 2023). The findings suggest that older adults relied equally on both faces and background/contextual information to make decisions. Yet they had shown some viewing preference in other studies. In sum, we do not know whether older adults eye gaze patterns would fixate more on specific regions on the face (eyes, mouth) or their bodies, or the background as contextual information when making social cognition judgements.
Other than fixation areas, there has been some evidence suggesting differential patterns of eye gaze between Western and Asian samples when gazing at an emotional expression. The previous studies mentioned earlier had minimal or no information at all on the ethnicity of the participants, thus making any conclusions as to whether ethnicity or cultural specificity had an effect on eye gaze rather difficult. However, Westerners tend to have a more distributed gaze pattern (taking in both higher and lower face regions and avoiding the very middle nose region), whereas Asians’ looking tends to be more centralised to the eyes and nose (Blais et al., 2008; Caldara et al., 2010; Jack et al., 2009). So far, three studies examined eye gaze fixation in Asian samples, and all three were located in Eastern countries, for example—Chinese samples (Bi & Han, 2015; Fung et al., 2008) and Japanese samples (Saito et al., 2020). Two studies reported that Asian older adults had a reduced looking preference for positive stimuli and similar fast reaction to negative stimuli (Fung et al., 2008; Saito et al., 2020) but the opposite was reported in a Chinese sample (Bi & Han, 2015) in that the older adults showed a positivity effect. Chinese and Japanese societies are known for having homogeneous cultural norms (Hofstede, 2011) which could influence how they express mental state cues. Unlike China and Japan, Malaysia is a multi-ethnic society with less homogeneous social rules, and Malaysians being more expressive, and in turn, possibly better at decoding such expressions (Gelfand et al., 2011). Some authors have proposed that this differential gaze preference in Asians reflects the interdependent society in which older adults have an interest in maintaining harmony with others so that the ability to recognise negative emotions is more important than recognising positive stimuli. However in Bi and Han’s (2015) study, they did not find an emotional bias in the recognition task despite showing a positivity effect when viewing emotional stimuli in older adults, suggesting that attentional processes might be triggered for negative stimuli only. Further research is required to examine this with a diverse Asian sample (non-Eastern Asian) to learn about the processes of attention to emotion and facial features and subsequently in learning whether fixation at a specific region such as eyes, mouth, and body leads to detecting and understanding ToM.
Thus, previous research leaves it unclear whether the difficulty in decoding mental states in older adults stems from not looking at critical areas, particularly from an ageing Asian population. We investigated this question using a dynamic task: the Movie for Assessing Cognition (MASC) (Dziobek et al., 2006). This allowed us to examine the links between accuracy and error types as contributors to eye gaze fixation at critical areas (e.g., eyes, mouth, and body). First, our aim was to replicate Lecce et al.’s (2019) study in a Malaysian sample to determine whether age effects were affecting specific types of ToM errors. In the MASC task, errors were divided into three categories: (1) “under-mentalising,” which refers to a literal or overly simplistic form of reasoning about mental states; (2) “over-mentalising,” exaggerated or overly complex mental state reasoning; and (3) “no-ToM,” which refers to a lack of evidence of ToM.
The present study
The question we examined concerned when older adults attempt to explain behaviour; do they make different error choices compared with young adults because of a complete failure to attribute mental states, or do they utilise mental state explanations but under- or over-mentalise? Lecce et al. (2019) examined the errors of Western young and older adults using the MASC and found that the pattern of errors was different between the two age groups, in that older adults made more under-mentalising choices while young adults made more over-mentalising choices. The authors suggested that over-mentalising errors were likely due to young adults making more physical causality inferences. However, our interest was in Malaysian participants. Although one study reported that Malaysian older adults were poorer in a faux pas task compared with young adults (Yong et al., 2022), they did not report the type of errors made by the participants or examine eye gaze at the stimuli. Furthermore, the faux pas task used in Yong et al. (2022) was in a graphical image format, which minimises the ecological validity of the task. Nevertheless, on the basis of previous findings we predicted that older adults would have an overall lower accuracy than young adults, and that they would make more under-mentalising errors similar to Lecce et al. (2019).
A second aim was to extend Lecce et al.’s (2019) study by exploring whether accuracy in the MASC might be related to differences in the way young and older adults viewed a specific area in the video (e.g., more fixations for the eyes in young adults and mouth in older adults as reported in past studies). We hypothesised that higher fixation durations directed at the face would result in higher MASC accuracy. However, based on past studies, we also predicted that, relative to young adults, older adults would have shorter eye fixations at the eyes region of the face and longer fixations at the mouth (Murphy & Isaacowitz, 2010; Sullivan et al., 2017; Wong et al., 2005). While the mouth region is very informative for older adults (Kong et al., 2022), we were unsure whether their difficulties interpreting eyes information might impair any face looking they engaged in.
We also examined patterns of fixations at elements of the visual scene not related to faces such as bodies or parts of the social scene that were considered “objects of interest.” This is because some studies indicate that there may be age effects for how this non-face information might be attended to (Ngo & Isaacowitz, 2015; Noh & Isaacowitz, 2013; Vicaria et al., 2015). This might be particularly true for Eastern individuals because Asian older adults may evaluate low-arousal or non-threatening areas as “good” (Kwon et al., 2009; Nisbett & Masuda, 2003), or negative facial expressions might lead to gaze aversion consistent with socioemotional selectivity theory.
Method
Participants
We recruited 82 participants for this study from university bulletin services, local senior citizen social community clubs, and references from other participants in Kuala Lumpur, Malaysia. There were 44 young adults (30 females, age range: 18–29, M = 22.83, SD = 3.28) and 38 older adults (15 females, age range: 60–85 years, M = 70.32, SD = 6.79). A power analysis was conducted using the findings of Lecce et al. (2019) regarding MASC accuracy between young and young-old to determine the appropriate sample size. To obtain an effect size of d = 1.78 with a power of 1 − β = .80, α < .05, we would need five participants per group. With two groups in the present study, this meant we would need 10 participants. All participants had self-reported normal or corrected-to-normal vision and hearing, and were living independently in the community.
Participants were screened for autism using the Autism Spectrum Quotient (AQ-10: Allison et al., 2012), as those with autism may have problems with ToM understanding (Ruzich et al., 2015). Older adults also completed the Mini-Mental State Examination (MMSE-2 Standard Version: Folstein et al., 1975) for possible cognitive impairment. Ethical approval for this study was obtained from the Institution Research Ethics Committee and we obtained written consent from all participants prior to testing. All participants received a token reimbursement for their travel costs.
Stimuli and procedure
MASC
We used the MASC (Dziobek et al., 2006) accompanied by an eye tracker (model TX300, Tobii Technology, Sweden; 23-in. screen, sampling rate set at 120 Hz, spatial resolution of 0.4o for binocular vision). We did not modify the videos or questions in any form, other than ensuring that the format matched the eye tracker technical requirements. The eye tracker was set at T-mode, meaning that the system was immobile and the distance and viewing angle were optimally fixed. All video stimuli were presented on a black background using Tobii Pro Studio software (version 3.4.7) and subtended a vertical visual angle of 10.4o at a viewing distance of 60 cm. All participants’ eye gaze was calibrated using Tobii Pro Studio before completing the task.
The scenario enacted in the MASC involved four adults of European ethnicity (two males, two females) planning and subsequently spending an evening together. There were 43 videos totalling 15.76 min with an average of 22 s per video (range: 3–67 s, SD = 15.68 s). Each video was followed by one multiple-choice question (MCQ) with four possible answers (note that two questions were included in videos # 6 and #22). The 45 questions were related to the actors’ feelings (e.g., ‘‘What is Betty feeling?’’), or thoughts (e.g., ‘‘What is Cliff thinking?’’) and mostly referred to complex mental states such as first- and second-order false belief, deception, faux pas, persuasion, metaphor, irony, and sarcasm. For each MCQ question, the four choices were either (1) correct; (2) literal or overly simplistic mental state reasoning, “under-mentalising”; (3) exaggerated or overly complex mental state reasoning, “over-mentalising”; or (4) lacked evidence of ToM. For example, Question 1 was, “What is Sandra feeling?,” and the correct answer was, “She is flattered but somewhat taken by surprise.” The three other choices were under-mentalising: “She is pleased about his compliment”; over-mentalising: “She is exasperated about Michael coming on too strong”; and no-ToM, “Her hair does not look that nice.” Following Dziobek et al.’s (2006) paper, we calculated the total number of correct responses as overall accuracy, each error type (under-mentalising, over-mentalising, no-ToM) and control questions. The control questions were to determine participants’ attention to the task and comprehension of the overall plot.
We asked each participant to identify any possible scenes that for which they were (1) unfamiliar with the words or phrases, such as “putting my foot in my mouth,” or (2) felt that there was a violation of personal cultural norms. For the former, the Malay language is the official language in Malaysia and while most people are fluent in English—the use of certain proverbs and metaphors are not likely to be well understood considering that English is their second language (Charteris-Black, 2002). For the latter, there were several scenes with alcohol consumption, which is prohibited among the Muslim community, with Malaysia a Muslim-majority country. Despite our concerns, none of the participants reported any such difficulties or being uncomfortable during the task.
Eye tracking
We obtained eye gaze data by defining areas of interest (AOIs) such as the face, eyes, mouth, body, or objects in the videos using Tobii Pro Studio. The Tobii eye tracker collected the raw eye movement data points, which were processed into fixations. We then drew the AOIs using the oval-shaped and free-hand AOI tool available in the Tobii software (see the online supplementary Figure S1 as an example of multiple AOIs in a single frame in one video). We drew one AOI for every video frame, which allowed us to differentiate the eyes, mouth, body, and objects. The “face” included “eyes” + “mouth” + “nose.” As each video is followed by a question, we were able to tabulate total fixation duration for each question, and domain using Tobii Pro Studio. Our main interest was in eye fixation duration using the standard Tobii fixation filter (two or more consecutive samples falling within a 35-pixel radius), similar to the Murphy and Isaacowitz (2010) guidelines for trackable participants. The eye gaze duration was measured in seconds or fractions of a second.
Procedure
Participants were tested individually in a quiet room. Each participant was given some time to familiarise themselves with the wireless mouse and to determine volume loudness as they deemed fit. When ready, they were shown still images of each actor and then followed the set of instructions on the screen before proceeding to the main task. They were also informed that the task was not timed and to press the space bar after each MCQ to indicate that they were ready for the next video. The short video clips and the corresponding MCQs were then shown sequentially to the participants as designed by Dziobek et al. (2006).
Analysis plan
We first examined the data for normality and outliers using ±3 SD from the mean for all variables. For MASC accuracy and errors, all data followed a normal distribution (Shapiro–Wilk data all ps > .05 and visual inspection of the histogram) except for two older adults who had very low MASC accuracy. Nevertheless, the pattern of results was the same with these two participants included in the analysis so they were retained for all analyses. We used independent t-tests to compare age differences for MASC accuracy and error types.
To examine the links between eye gaze fixations across young and older adults and MASC accuracy, we used principal component analysis (PCA) because the eye tracking data provided many points in which the PCA method could then extract the principal components to explain most of the variance (Cozzolino et al., 2019). We inputted all fixations from each AOI (face, eyes, mouth, body, and objects) into a PCA. The resulting eigenvalues provided a quantitative measure of variability as PCA uses a data-driven approach to select the eye-fixation patterns that account for the most variance (Field, 2017). To aid interpretability, we used a Promax oblique rotation which assumes that components are correlated. Loadings were interpreted using cut-offs that are robust, which was as follows: loadings greater than or equal to 0.45 were relevant, loadings greater than or equal to 0.55 were good, and loadings greater than or equal to 0.63 were very good. We used Stata Version 17 to run the PCA for both age groups and AOIs.
Results
MASC behavioural task
Accuracy
Young adults had significantly higher accuracy (M = 32.16, SD = 3.90) compared with older adults (M = 25.71, SD = 4.99), t(80) = 6.56, Cohen’s d = 1.45.
MASC error types
We conducted a 3 (MASC Error Type: under-, over-, no-ToM)× 2 (Age: Young, Old) repeated-measures analysis of variance (ANOVA). There was a main effect of MASC Error Type, F(2, 160) = 63.18, p < .001, ηp2 = .44, with under-mentalising (M = 7.27, SD = 3.67) being the most common selection compared with over-mentalising (M = 5.87, SD = 2.65) and no ToM (M = 2.68, SD = 2.18). Likewise, there was a main effect of Age, F(1, 80) = 43.61, p < .001, ηp2 = .35, with older adults making more errors than young adults.
There was also a significant 2-way interaction effect between MASC Error Type and Age, F(2, 160) = 5.66, p = .004, ηp2 = .066. We then used Holm’s correction to ensure the familywise error rate was kept to p < .05, and found that older adults chose under-mentalising more frequently (M = 9.21, SD = 3.79), t(80) = 5.09, p < .001, d = 1.13, as well as no ToM options (M = 3.79, SD = 2.30), t(80) = 4.82, p < .001, d = 1.07, compared with young adults (Munder = 5.59, SD = 2.62, Mno ToM = 1.73, SD = 1.55). There was no significant difference for over-mentalising in the two age groups ( = 6.29, SD = 2.88, Myoung = 5.50, SD = 2.41), t(80) = 1.35, p = .18, d = .30. See Table 1 for descriptive statistics for the accuracy in MASC (see Figure 1).
Participants demographic information and MASC scores with means and standard deviations in brackets (n = 82).
AQ-10: Autism Spectrum Quotient; MMSE: Mini Mental State Examination; MASC: Movie for Assessing Social Cognition; M: males; F: females.
p < .05, ** p < .01, *** p < .001.

Mean overall accuracy, error types (under-mentalising, over-mentalising, no-ToM) between young and older adults.
Principal components
For the PCA, the dependent variable was MASC accuracy. If the Kaiser–Meyer–Olkin (KMO) measure of sampling adequacy exceeded the recommended value of 0.6 and Bartlett’s test of sphericity showed a significant p-value, this indicated an adequate sample size for factor analysis. Items with a factor loading coefficient of 0.30 or greater, and differences of at least 0.2 between the loadings on two factors, were included in the component. In the extraction, only factors with eigenvalues more than 1 were accepted (Tabachnick & Fidell, 2007). Items that did not meet these criteria were removed from the analysis. Cattell’s (1966) scree plot was used to plot each of the factor eigenvalues and check on the relative importance of each factor.
The KMO value was 0.603, indicating that this was an adequate sample for this statistical analysis. Bartlett’s test of sphericity of p < .001 showed that the correlations between items were sufficiently large for the PCA analysis. The scree plot test was used to determine the number of factors retained. Results showed eight possible components, but only three components had eigenvalues above 1. We extracted three components, and the values for each component and respective eigenvector were then reviewed (see Table 2). The three components explained 91.75% of the total variance. Specifically, the first principal component (PC1) accounted for 37.24% of the total variance, and this corresponded to the effect for Age in that younger participants had higher MASC accuracy overall. The second highest variance (35.47%) was accounted for by PC2, which concerned looking at the mouth, object, and body AOI. The third component (PC3) accounted for 19.04% of the total variance and related to looking at the upper half of facial region, which involves the eyes.
Eigenvectors for eye gaze from PCA using the oblique promax method (n = 82).
PC: principal component.
Blanks indicate that the loading factor was below 0.3.
Links between fixation durations and ToM accuracy
We ran correlational analyses to determine whether fixation durations at specific AOIs were correlated with MASC accuracy. To control for multiple comparisons, Tables 3 and 4 also demonstrate the correlations that remain statistically significant with a much more conservative p-value of .0063 (Bonferroni correction). Given that older adults had worse MASC performance, we computed separate correlations for young and older adults to determine whether they had a different pattern of correlations. For older adults, there were significant negative relations between accuracy and under-mentalising, p < .001 and no-ToM responses, p = .005, but not for over-mentalising, p = .056. There were no significant relations between accuracy and any AOI or error type, all ps > .024 (not significant after correcting for multiple correlations). Unlike older adults, for young adults we found a negative relation between accuracy, under-mentalising, over-mentalising, and no-ToM, all ps < .005. However, there were no significant relations between accuracy, error choices and any of the AOIs, all ps > .024.
Correlations between MASC accuracy, error types, and AOI type for older adults (N = 38).
Numbers in bold means that the p value is less than .00625.
Correlations between MASC accuracy, error types, and AOI type for young Adults (N = 44).
Numbers in bold means that the p value is less than .00625.
Discussion
To the best of our knowledge, this is the first study to show age effects on both accuracy and eye fixation measures of a dynamic ToM task in an Asian sample. On the behavioural response measures, older adults had lower accuracy compared with young adults. This finding is consistent with previous research (Hayes et al., 2020; Henry et al., 2013), compared with young adults. More specifically, we found that older adults generally made more errors specifically selected under-mentalising and no-ToM choices, rather than over-mentalising errors similar to Lecce et al. (2019). However, unlike Lecce et al. (2019), we did not find any difference for the over-mentalising option. This suggest that the older adults in our sample erred because of a pattern of reduced ToM understanding.
Our PCA results for the eye fixation data may help to shed light on the reasons for the behavioural responses. PC1 corresponded to the effect for Age, and it is clear that increasing age had a negative effect on accuracy. PC2 is of interest in that more eye gaze fixations were given to non-facial AOIs. This suggests that our participants may have adopted an exploratory-gaze strategy and scanned a wider area that included bodily movements/gestures and surrounding objects (Chaby et al., 2017). Past research also showed that older adults tend to view the lower half of the face (Chaby et al., 2017) consistent with their relatively intact emotion recognition from the lower half (Kong et al., 2022), and that older adults are able to obtain useful information from bodily movements (Ngo & Isaacowitz, 2015; Noh & Isaacowitz, 2013; Vicaria et al., 2015). These factors might explain their viewing preferences. We were unable to ascertain whether the positive stimuli (e.g., happiness on Michael’s face) gave an advantage to older adults, consistent with socioemotional selectivity theory (Charles & Carstensen, 2010), or the negative stimuli (e.g., disgust on Sandra’s face) gave a disadvantage to older adults, consistent with their difficulties recognising negative emotions (Ruffman et al., 2008). However, considering the relatively large variance shown in PC2 compared with PC3, we do not think that emotional expressions had a large role in determining their fixation. Furthermore, PC3, which reflects the eye region, suggests that our participants did not glean as much information from this area as initially thought. Our results are also concordant with other real-time (De Lillo et al., 2021) and contextually rich task (Grainger et al., 2019) in that older adults looked less at faces compared with young adults. Taken together, the results indicate that our participants had a pattern of less sustained visual fixations at a variety of cues that are essential to an understanding of complex situations.
Some models of aging have suggested that ageing deficits in ToM are linked to a visual attention deficit specific to the perception of socially relevant nonverbal cues (Owsley, 2011; Störmer et al., 2013). The idea is that older adults fail to recognise emotions because they fail to look at the correct regions. This idea is consistent with evidence that emotion recognition is poorer in older adults and that this difficulty may be linked to deficits in visual selective attention (Hayes et al., 2020; Phillips et al., 2002; Ruffman et al., 2008). This argument implies that directing older adults’ attention to relevant areas might improve their emotion recognition. Consistent with this idea, Sullivan et al. (2007) found that whereas young adults look at eyes about 67% of the time and mouths 33% of the time, older adults look comparatively more at mouths (52% at eyes and 48% at mouths). However, an alternative explanation is that older adults look less at relevant areas because they have difficulty interpreting information from these regions (Ruffman et al., 2008; Sullivan et al., 2007). According to this argument, looking at these regions would not help older adults’ social understanding; it is instead a symptom of their failure to understand. Consistent with this idea, Kong et al. (2022) compelled participants to look only at the eyes or only at the mouth by presenting just the eyes or just the mouth. In this study, older adults had difficulty recognising one of six expressions from mouths, but were worse on four of six expressions in eyes. Thus, this study shows that even when compelled to look at the eyes, older adults are worse detecting emotional expressions.
We did not find a significant relation between eye gaze and accuracy in our young adults sample, unlike other studies (Chaby et al., 2017; Grainger et al., 2017; Murphy & Isaacowitz, 2010). While these findings are different than those obtained with Westerners, they do lend more support to the idea that face processing might be a product of distinct flexible looking strategies in different regions of the world (Blais et al., 2008; Kelly et al., 2011), and the findings are consistent with the idea that Asian individuals deploy different strategies in interpreting the social cues around them as demonstrated in other face processing studies (Tan et al., 2012, 2016). A recent comprehensive review noted that the eye tracking data in other-race-effect (ORE) seems to be a mixed bag (Stelter et al., 2021) in that the eye region seems to be more beneficial for White faces, but the lower half of the face seems more useful in recognising Black and Asian faces. Yet even these findings are questionable given that Kong et al. (2022) found that the lower half of the face was substantially more informative than the upper half for both European faces and Asian faces. Thus, there are too many inconsistencies to reach firm conclusions regarding ethnicity and face region. Furthermore, greater looking at a particular area (e.g., the eyes) could signal either confusion and focused attention, or understanding of the significance of eyes information. These two things would counteract each other, making looking time uninformative.
Some might argue that the MASC task duration is too long and, therefore, a disadvantage to the older adults with poor memory. Future studies could include a working memory task to determine whether accuracy on the task is mediated by this factor. Other than memory, some might argue that the task itself is not suitable for the Asian population due to the Western cultural inferences, social norms, and English language proficiency. However the accuracy demonstrated by young adults in our study is similar to other neurotypical adults in Western populations (Dziobek et al., 2006; Montag et al., 2010, 2011), suggesting that the social interactions shown in the MASC are clearly understandable, and also at similar levels to another study involving older adults (Lecce et al., 2019), indicating that the MASC is robust and clearly able to distinguish the age-related differences regardless of cultural differences. Nevertheless, future studies could look into a more culturally-appropriate task that may address this cultural perspective.
In conclusion, our findings showed that older adults had an overall lower accuracy compared to young adults in the MASC. In addition, our results showed that older adults committed more under-mentalising and ToM-unrelated errors, suggesting a pattern of reduced ToM understanding. Our PCA and correlational analyses found a pattern of looking at non-facial regions which corresponds to MASC accuracy. Our results are consistent with theoretical models suggesting that ageing deficits in ToM performance may be related to a decreased ability of older adults to infer nonverbal cues.
Supplemental Material
sj-docx-1-qjp-10.1177_17470218241235811 – Supplemental material for Effects of age on behavioural and eye gaze on Theory of Mind using movie for social cognition
Supplemental material, sj-docx-1-qjp-10.1177_17470218241235811 for Effects of age on behavioural and eye gaze on Theory of Mind using movie for social cognition by Min Hooi Yong, Muhammad Waqas and Ted Ruffman in Quarterly Journal of Experimental Psychology
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was funded by the Ministry of Higher Education Malaysia (FRGS/1/2016/SS05/SYUC/03/2) awarded to M.H.Y.
Ethical approval
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. The study was approved by the Sunway University Research Ethics Committee (SUREC2016/ 050). Informed consent was obtained from all individual participants included in the study.
Data availability statement
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
Supplementary material
The supplementary material is available at qjep.sagepub.com.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
