Abstract
Telepractice (TP) refers to the use of telecommunication devices for remote psychological and medical assessment and treatment. To date, no study involving healthy adults has combined TP with Theory of Mind (ToM), that is, the ability to understand and attribute mental states and use this knowledge to explain actions and behavior. With this study we evaluated the feasibility and effectiveness of the Theory of Mind Assessment Scale (Th.o.m.a.s.) administered via TP. Th.o.m.a.s. is a semi-structured interview that investigates various aspects of ToM (first- and third-person, first- and second-order ToM, egocentric vs. allocentric perspectives) in healthy adults. It consists of 37 open-ended question items on four scales: Scale A (I-Me) examines first-person ToM in the egocentric perspective; Scale B (Other-Self) examines third-person ToM in the allocentric perspective; Scale C (I-Other) examines third-person ToM, in the egocentric perspective; and Scale D (Other-Me) examines first-person ToM in second-order ToM. The study sample was 80 healthy adults (36 men) divided into two groups, with one group assessed remotely and the other in-person (controls). There were no statistically significant differences in any of the measures between the two groups. Interrater agreement and internal consistency were consistently high. Th.o.m.a.s. proved a valid instrument for assessing ToM in TP. The present results have practical implications; a future area of focus could be to conduct remote assessment with Th.o.m.a.s. across different clinical or educational contexts.
Plain Language Summary
Telepractice, also known as telehealth, refers to healthcare delivered via electronic devices such as computers or smartphones. Online psychological and medical assessment tools have been evaluated for their feasibility; however, no studies to date have shown whether tools measuring the ability to understand other people’s minds—known as Theory of Mind (ToM)—are equally effective when administered online. To fill this gap, we applied the Theory of Mind Assessment Scale (Th.o.m.a.s.; Bosco et al., 2016), which has been validated for in-person assessment, to determine whether it is sensitive and effective also via remote use. Th.o.m.a.s. is a semi-structured interview consisting of questions that explore how people comprehend different facets of the ToM, as first- and third-person and first- and second-order ToM, as well as various mental states: beliefs, emotions and desires. Some questions investigate the knowledge a person may have about their own mental states (first-person ToM) as, for example, the emotions (Do you happen to experience emotions that make you feel good?), while others investigate the knowledge a person may have about another person’s mental state (third-person ToM) (Do you notice when other people feel good?), and so on. The study sample was 80 healthy adults (36 men) divided into two groups: assessment was conducted remotely in one group and in-person in the other. Our hypothesis was that there would be no difference in results between the two groups. As expected, our analysis showed comparable results for administration of the tool in-person and remotely, indicating that Th.o.m.a.s is reliable also when administered via telepractice.
Introduction
Telepractice (TP), also known as telehealth or telemedicine, refers to the use of telecommunication services (e.g., computer-based videoconferencing software and the Internet) for remote assessment and treatment (Joint Task Force for the Development of Telepsychology Guidelines for Psychologists, 2013). Telepractice-based assessment is a safe, cost-effective, and time-saving means to mitigate the physical distance between healthcare professionals and patients. Its benefits to residents of remote rural areas are that it can reduce travel time and costs and associated stress (Hu et al., 2025; Martin et al., 2020). Furthermore, telepractice has been proven a safe way to minimize the risk of disease transmission, particularly in the frail and the elderly (Wadsworth et al., 2017) or those with clinical illness (Weidner & Lowman, 2020). Finally, telepractice can provide for a larger, geographically more representative population sample than conventional methods, which typically include participants only from one area (Henrich et al., 2010).
Although telepractice is not new, it began to gain wider acceptance during the coronavirus 2019 (COVID-19) pandemic. To address the challenges of the pandemic, clinicians explored new assessment approaches, and researchers began to compare remote and conventional in-person assessment. There is now ample evidence for the validity, reliability, and utility of tele-assessment in neuropsychology and language (Marra et al., 2020; Raiford & Wright, 2020; Ruffini et al., 2022).
Teleneuropsychology (TNP), a specialized subset of telehealth services that relies on audiovisual technologies to conduct cognitive and neuropsychological assessment (Bilder et al., 2020) and interventions remotely (De Nocker & Toolan, 2023; Zheng et al., 2023), has amassed a large body of research on language skills (Ruffini et al., 2022). More recently, TNP has been employed for the assessment of communicative-pragmatic ability, that is, the ability to use language and other expressive means in a given context (Bischetti et al., 2024; Traetta et al., 2025). Language assessment tests adapted for use in telepractice include the Token Test (Vestal et al., 2006), the Boston Naming Test-15 (BNT-15, Wadsworth et al., 2017), and the Western Aphasia Battery–Revised (WAB-R; Rao et al., 2022). Tests for language assessment rely primarily on verbal interaction and visual materials that can be easily shared via computer screens, thus facilitating their adaptation for online use. Other validated telepractice neuropsychological tests, for example, the mini-mental state examination (MMSE; Folstein et al., 1975; Munro Cullum et al., 2014), the Repeatable Battery for the Assessment of Neuropsychological Status (RBANS; Galusha-Glasscock et al., 2016; Randolph et al., 1998), and the Cambridge Neuropsychological Test Automated Battery (CANTAB; Green et al., 2019; Luciana & Nelson, 2002), assess general cognitive functions and specific skills in attention and working memory. For a systematic review of neuropsychological testing in telepractice in Italy, see Zanin et al. (2022). Other telepractice tools are self-report questionnaires for investigating personality, such as the Minnesota Multiphasic Personality Inventory (MMPI; Corey & Ben-Porath, 2020; Menton et al., 2022).
The remote evaluation of social cognition, and Theory of Mind (ToM) in particular, is understudied. The term ToM, originally proposed by Premack and Woodruff (1978), refers to the ability to comprehend and attribute mental states, emotions, intentions, desires, and beliefs to ourselves and to others and then use this knowledge to anticipate, interpret, and explain actions and behaviors. Much of the literature focuses on examining the development of ToM in childhood and how a deficit in ToM ability may explain psychopathological conditions such as autism and schizophrenia (Baron-Cohen, 1997; Baron-Cohen et al., 1985; Frith, 2004; Leslie, 1987). In contrast, very little research has focused on ToM in healthy adults (Olderbak et al., 2015; Semerari et al., 2012).
Originally considered a unitary ability, empirical data have revealed the complex nature of ToM (Bosco et al., 2009b; Mazza et al., 2001; Nichols & Stich, 2003; Spaulding, 2020; Vogeley et al., 2001). A major distinction is made between first and third-person ToM (Nichols & Stich, 2003). The former refers to the ability to be aware of one’s own mental states, while the latter refers to the ability to attribute mental states to others. These abilities appear to be based on common and distinct activation of brain areas. In the first-person perspective, activation has been observed in the mesial cortex, the posterior cingulate cortex, the superior temporal cortex, the right temporoparietal junction, and the medial aspects of the superior parietal lobe. Differently, the third-person perspective is associated with activation in the precuneus, the right superior parietal area, the anterior cingulate cortex, and the right prefrontal cortex. Common brain regions are the anterior cingulate cortex and the parietal regions, suggesting their involvement in more general processes of perspective taking (Vogeley et al., 2001, 2004).
Within the frame of first- and third-person ToM, a further distinction between the egocentric and the allocentric perspective was made by Frith and De Vignemont (2005). The former refers to the ability to attribute mental states to others in relation to oneself, whereas the allocentric perspective refers to the representation of mental states independent of oneself. Since the two are orthogonal to each other in the first/third person perspective, other ToM components can be analyzed. Indeed, the ego-/allocentric distinction has been found useful in clinical conditions such as bulimia nervosa (BN), for example, where first-person ToM, from an allocentric perspective, and third-person ToM, from an egocentric perspective, were found to differ (Laghi et al., 2014). In their study, Laghi et al. (2014) noted that, compared with the healthy controls, the individuals with BN had difficulty in third-person ToM in the allocentric perspective but not in third-person ToM in the egocentric perspective.
Another distinction is made between first- and second-order ToM. First-order ToM refers to the ability to understand another person’s mental states (Wimmer & Perner, 1983), whereas second-order ToM involves the ability to infer what someone thinks about a third person’s mental state (Perner & Wimmer, 1985). Developmental studies have reported that first-order ToM tasks are easier to solve than second-order ones (Mazza et al., 2001; Wellman & Liu, 2004).
In addition, ToM comprises a variety of mental states, including emotions (Harris, 1994), beliefs (Wimmer & Perner, 1983), and desires (Wellman et al., 1990), which are the main types of mental states that a person can understand (Tirassa, 1999; Tirassa & Bosco, 2008; Tirassa et al., 2006). For example, children are known to progress in their understanding of mentalistic content, beginning with the mental states related to emotions, then the volitional states (e.g., desires), and finally the epistemic states (e.g., beliefs; Wellman & Liu, 2004).
Difficulties in emotion recognition and ToM can arise in people with clinical illnesses, for example, developmental, neurological, and psychiatric conditions, that may severely impair quality of life. Furthermore, social cognitive measures can help identify in the early stages of the disease individuals at risk of developing dementia or serve as a marker for autism or schizophrenia (Cotter et al., 2018).
Conventional tasks for the empirical investigation of ToM are the False Belief tasks devised to test children’s developmental abilities (Baron-Cohen et al., 1985; Wimmer & Perner, 1983). The tasks entail understanding and acknowledging another person’s beliefs when they differ from one’s own, which is considered fundamental evidence for the presence of ToM. However, such tasks focus primarily on first- and second-order ToM, with second-order tasks proving more difficult, especially for clinical populations (Perner & Wimmer, 1985). Since most tasks are designed for use with children, they may be ill-suited for adults, given that a healthy adult would probably perform at ceiling on such tasks and thus produce potentially confusing results (Karmakar & Dogra, 2019). Tools for assessing advanced ToM include Strange Stories (Happé, 1994) and the Theory of Mind test (TOM test; Muris et al., 1999). Happé’s task entails the detection of double bluff, persuasion, white lies, and misunderstanding presented in stories. The TOM test consists of story comprehension about which a child (age range, 5–12 years) is asked a series of questions investigating various aspects of ToM, including advanced ones such as second-order beliefs. Finally, two other sophisticated paradigms are the Hinting Test (Corcoran et al., 1995) and the Faux Pas task (Baron-Cohen et al., 1999).
Another aspect of ToM ability that can be evaluated is emotion recognition, that is, the ability to recognize and discriminate basic emotions in others typically through facial expression. The Reading the Mind in the Eyes test (RMET; Baron-Cohen et al., 2001a) and the Faces Test (Baron-Cohen et al., 1997) were developed specifically to assess an individual’s ability to recognize emotional states by reading the eyes or decoding facial expressions. While several task types assess ToM, few studies have examined their psychometric properties (Ahmadi et al., 2015), except for the TOM test (Muris et al., 1999), the Reading the Mind in the Eyes test (Baron-Cohen et al., 2001a), ToM Storybooks (Blijd-Hoogewys et al., 2008), and the Hinting Test (Frøyhaug et al., 2019). Summarizing, studies examining the psychometric properties of assessment tools are fundamental for ensuring test reliability, validity, and quantifiability of its results.
Very few studies to date have investigated ToM in telepractice. Schidelko et al. (2021) compared the performance of children (age range, 3–4 years) on the False Belief Task (Wimmer & Perner, 1983) in face-to-face and remote modalities. The study was conducted using BigBlueBotton, a free open source platform with slides and video animations; no significant difference was found between online and in-person assessment. In their study, Engel et al. (2014) used the RMET and other individual personality scales (e.g., Five Factor Personality Inventory, McCrae & Costa, 1987) to determine whether ToM in the general population can predict collective intelligence, a measure of overall group effectiveness. It was found that ToM skills are as crucial for group effectiveness in online text-based activities as they are in face-to-face interactions. This finding suggests that the RMET measures a deeper aspect of social reasoning beyond the mere recognition of facial expressions. Another study demonstrated the suitability of the online adaptation of the RMET as an effective tool to examine ToM in Mexican adults (age range, 18–87 years). The study found observable differences in response by sex and age (Téllez-Alanís et al., 2022). Di Girolamo et al. (2019) investigated the effectiveness of the online version of the Cognitive and Affective Empathy Questionnaire (Reniers et al., 2011), a 31-item self-report measure of cognitive and affective empathy as compared with the face-to-face format in two adult samples (paper-and-pencil vs. online). The study had good psychometric properties for both online and face-to-face administration.
Despite the limited research on remote assessment with ToM, there is ample evidence for the comparable effectiveness of online and in-person assessment. Currently, available tools for online administration assess only certain aspects of ToM, however. Indeed, ToM goes beyond the mere deduction of false beliefs or emotions to include attributing intentions, seeing oneself from a different perspective, understanding others’ mental states, and making connections between mental states and actions. Because a person may experience difficulty with one ToM aspect while performing well in another, a tool that can distinguish between various mentalistic abilities is essential for overall assessment (Karmakar & Dogra, 2019). To the best of our knowledge, there is no tool for a comprehensive investigation of ToM components in healthy adults via remote assessment. Furthermore, most tasks conceive ToM as a unitary skill, without taking into account its various different aspects. Given the importance of understanding mental states in social interactions and daily life, it is crucial to examine ToM in all its different facets.
The Theory of Mind Assessment Scale (Th.o.m.a.s., Bosco et al., 2009a, 2016) was developed to take into account these aspects. Th.o.m.a.s. is a validated semi-structured interview for assessing ToM on different levels of articulation rather than as a singular phenomenon. The tool has shown good psychometric properties in construct validity and reliability and good interrater agreement and internal consistency (Bosco et al., 2016). Different from other classical ToM tasks, such as the False Belief Task (Baron-Cohen et al., 1985), the Strange Stories (Happé, 1994) or the Reading the Mind in the Eyes (Baron-Cohen et al., 2001a), Th.o.m.a.s. was created to capture the complex nature of ToM (Bosco et al., 2009b) and to investigate several ToM facets at the same time. Indeed, Th.o.m.a.s. distinguishes between first- and third-person perspectives and between first- and second-order ToM, explores egocentric and allocentric perspectives, and examines mental states (beliefs, desires, positive and negative emotions). Moreover, each scale is divided into three subscales: awareness, relation, and realization. These relate to how the interviewee perceives different types of mental states, the causal relationships between them and between them and the agent’s visible behaviors, and how they perceive and imagine the possibility of influencing one’s own mental state and the mental state of others.
The advantage of having a single instrument that can capture multiple facets of ToM is that it can directly compare how these functions differ within the same individual or clinical group/control group. Moreover, Th.o.m.a.s. has been successfully used to assess ToM in healthy individuals from preadolescence and adolescence to adulthood (Bosco et al., 2014b, 2016) and in various clinical conditions, including schizophrenia (Bosco et al., 2009a), borderline personality disorder (Colle et al., 2019), alcohol use disorder (Bosco et al., 2014a), sex offenders (Castellino et al., 2011), congenital heart disease (Chiavarino et al., 2015), bulimia nervosa (Laghi et al., 2014), opiate dependence (Gandolphe et al., 2018), nonsuicidal self-injury in adolescents (Laghi et al., 2016), treatment for non-psychotic disorders (Francesconi et al., 2016), medication-overuse headache and episodic migraine (Romozzi et al., 2022), and autism spectrum disorders in adolescents (Fadda et al., 2024).
The aim of the present study was to compare the feasibility and the effectiveness of Th.o.m.a.s. administered remotely with respect to the administration in-person. To do this, we assessed the tool’s validity by calculating interrater reliability and internal consistency. The tool’s effectiveness was evaluated by comparing two groups: the one interviewed remotely and the other interviewed in-person. Our hypothesis was that there would be no statistically significant difference between the two groups on any of the four scales (scale A, B, C, D) for any of the dimensions (beliefs, desires, positive and negative emotions) and subscales (awareness, relation, realization).
Materials and Methods
Study Sample
The study sample was 80 healthy Italian-speaking adults (36 men, age range 20–73 years, M = 46.10, SD = 15.38) from various parts of the country (Piedmont, Apulia, Veneto, Tuscany, Liguria, Calabria regions). The invitation to participate was advertised in flyers, on the research group’s website, and via popular social media channels (Facebook, Instagram, X). The sample was subdivided into two groups: 40 (18 men, age range 20–73 years, M = 46.03, SD = 15.48), education level, 8–18 years, M = 12.75, SD = 3.46) were evaluated remotely (telepractice, TP) and 40 matched for demographic characteristics were evaluated in-person (face-to-face, FtF). The two groups were similar in male-to-female ratio (18 males and 22 females), age (t-test, t[78] = 0.04; p = .96), and years of education (t-test, t[78] = 0.31; p = .75; see Table 1).
Demographic Characteristics and Comparison Between the Telepractice and Face-to-Face Groups.
Inclusion criteria were being a native Italian speaker and aged between 20 and 75 years. Exclusion criteria were history of neurological and/or psychiatric disorders, history of alcohol or drug abuse or drug treatment, ongoing or previous psychotherapy. An additional inclusion criterion for the TP group was having access to a telecommunication device with a well-functioning Internet connection. The criteria were listed at the bottom of the information letter and verified in a preliminary screening interview by the experimenter at the beginning of the interview session. All participants were informed of the research objectives and procedures and gave their informed written consent in accordance with the Declaration of Helsinki. The research was approved by the Bioethics Committee of the University of Turin (protocol number 202271).
Materials and Procedure
Theory of Mind Assessment Scale (Th.o.m.a.s.)
The Theory of Mind Assessment Scale (Th.o.m.a.s.; Bosco et al., 2009a, 2016) is a semi-structured interview originally created in Italian to assess components of ToM. It consists of 37 open-ended question items in response to which interviewees can freely express their thoughts. The interview is divided into four scales, each focusing on a specific knowledge area related to ToM:
Scale A, I-Me investigates the interviewee’s (I) knowledge of his/her/their own mental states (Me). For example, “Do you happen to experience emotions that make you feel good?” evaluates first-person ToM in an egocentric perspective.
Scale B, Other-Self explores the knowledge that other individuals (Other) have about their own mental states (self), independent of the interviewee’s perspective. For example, “Do the other persons happen to experience emotions that make them feel good?” evaluates third-person ToM in an allocentric perspective.
Scale C, I-Other examines the interviewee’s knowledge (I) of the mental states of other individuals (Other). The scale is similar to scale B, since both investigate third person ToM and the mental state of others, whereas here the subject is asked to take an egocentric perspective. For example, “Do you notice when other people feel good?” evaluates third-person ToM in an egocentric perspective.
Scale D, Other-Me investigates the knowledge that other individuals (Other) have of the interviewee’s mental state (Me). For example, “Do other people notice when you feel good?” can be compared with a second-order ToM task since its abstract form of questioning is “What do you believe others think about your thoughts?.” The scale evaluates first-person ToM in the second-order.
Each scale is further divided into three subscales that explore:
Awareness refers to the interviewee’s ability to recognize different mental states (e.g., beliefs, desires, emotions) in him/her/themselves and in others. Discerning diverse types of mental states is a prerequisite for comprehending interconnections with one another and with the external world. For example, Do you think you understand others’ wishes?
Relation investigates the interviewee’s ability to recognize causal relationships between mental states and between them and resulting behaviors. For example, “When you feel bad, do you feel you understand why?” The ability to connect and integrate various mental states and to understand their mutual relations with perception and action is important for formulating an explanatory ToM and the social world.
Realization refers to the ability to adopt effective strategies in achieving desired states. For example, “Do you succeed in getting what you want? If so, how?.” To achieve adaptive behavior, one needs to possess a theory of the causal relationships between mental states and their connection with the world and then use this knowledge effectively to influence one’s own mental states and behaviors, as well as those of others.
Consistent with the theorizing of the most important types of mental states that an agent possesses (Bosco et al., 2009a, 2009b; Tirassa et al., 2006), each scale examines the interviewee’s perspective on different mental states involved in ToM, that is, dimensions (beliefs, desires, positive and negative emotions). The aim is to evaluate the interviewees’ views on how various agents engage with their environment and how this mental dynamic shapes their perception of and interaction with the environment.
Procedure and Scoring
The TP group was interviewed via videoconference using a computer or tablet. They could avail themselves of technical support, such as having someone initiate the video call if unfamiliar with the technology. The FtF group met the experimenter in-person.
All interviews were conducted individually in a quiet room; the participants were asked to mute their phones and other devices to prevent distraction or disturbance. The interview took about 45 min to complete. All interviews were audio-recorded with the interviewees’ consent for transcription and offline scoring by two independent raters who had not participated in the interview and were blinded to whether a participant belonged to the TP or the FtF group. Each rater assigned a score between 0 and 4 points to each response according to the criteria outlined in the guide. A more detailed description of scoring criteria, together with examples, is provided in Bosco et al. (2009a, 2016).
Statistical Analysis
The scores were entered into a dataset for analysis. Analysis was performed with IBM SPSS Statistics 29 software. The interrater reliability of the four scales for each group was tested by evaluating inter-rater agreement using the intraclass correlation coefficient (ICC) in a two-way mixed-effects (absolute agreement) model. Interrater agreement was assessed at 40% (16 interviews) of each group. As a general rule, a score <.5 indicated poor reliability, .5 to .75 indicated moderate reliability, .75 to .9 indicated good reliability, and >.9 indicated excellent reliability (Koo & Li, 2016). Cronbach’s alpha (α), was used to assess score consistency across the four scales. The equivalence of internal consistency between remote and in-person assessment (TP vs. FtF) was calculated using the Feldt and Kim (2006) independent samples Cronbach alpha difference test. Finally, an independent samples t-test was conducted to reveal between-group differences in demographics (see Paragraph 2.1.) and the Th.o.m.a.s. total score. Multivariate analysis of variance (MANOVA) was used to assess the differences in responses between the two groups on the scales, dimensions, and subscales.
Results
Interrater Agreement and Internal Consistency
The ICC for remote assessment was .79, .79, .78, .86, respectively, for scales A (I-Me), B (Other-Self), C (Ego-Other), and D (Other-Me) and .83 for the Th.o.m.a.s. total score, all of which indicated good interrater agreement. Internal consistency was good for all four scales, with Cronbach’s alpha (α) between .72 and .92 (Table 2).
Inter-Rater Agreement and Internal Consistency (Cronbach’s Alpha) of the Four Th.o.m.a.s. Scales and Total Score in TP Modality (N = 16) and FtF Modality (N = 16).
The ICC for in-person assessment was between .67 and .92: scale A (I-Me) and scale D (Other-Me) showed excellent reliability (.91 and .92, respectively), scale B (Other-Self), scale C (I-Other) showed moderate reliability (.72 and .67, respectively) and also the Th.o.m.a.s. total score indicated good reliability (.84). Internal consistency for all four scales and the Th.o.m.a.s. total score was good, with Cronbach’s alpha (α) ranging between .74 and .87 (Table 2).
The Feldt independent samples test revealed no statistically significant differences between the two groups for the Cronbach alpha values. The internal consistency coefficients did not differ significantly for the Th.o.m.a.s total score (F[39, 39] = 1.63, p = .134), scale A (F[39, 39] = 1.18, p = .604), scale B (F[39, 39] = 1.10, p = .764), scale C (F[39, 39] = 1.31, p = .400), and scale D (F[39, 39] = 0.71, p = .298).
Comparison Between the TP and the FtF Group
The responses of the two groups were compared based on the mean total scores of all interview question items. The independent samples t-test showed no statistically significant differences in the total score (t = 1.42, p = .159, d = 0.31) between the two groups (Figure 1).

Telepractice versus face-to-face group: Th.o.m.a.s. total mean scores and mean values of the scales (scale A, B, C, and D). Bars indicate standard errors of the mean.
As expected MANOVA, with group as the between-subject factor (TP vs. FtF) and the four scales (A, I-Me; B, Other-Self; C, I-Other; D, Other-Me) as the dependent variables, revealed no statistically significant group effect: scale A (F[1, 78] = 0.47; p = .491; η2 p = .006), scale B (F[1, 78] = 1.95; p = .166; η2 p = .02), scale C (F[1, 78] = 2.77; p = .100; η2 p = .03), and scale D (F[1, 78] = 1.17; p = .281; η2 p = .01; Figure 1).
There was no group effect on the dimensions: beliefs (F[1, 78] = 0.64; p = .423; η2 p = .008), desires (F[1, 78] = 3.94; p = .050; η2 p = .04), positive emotions (F[1, 78] = 1.95; p = .166; η2 p = .02), and negative emotions (F[1, 78] = 0.15; p = .699; η2 p = .002; Figure 2).

Telepractice versus face-to-face group. Mean scores on dimension of Th.o.m.a.s. (beliefs, desires, positive and negative emotions). Bars indicate standard errors of the mean.
Finally, there was no statistically significant group effect on the subscales: awareness (F[1, 78] = 3.01; p = .087; η2 p = .03), relation (F[1, 78] = 1.86; p = .176; η2 p = .02), and realization (F[1, 78] = 0.42; p = .515; η2 p = .005; Figure 3).

Telepractice versus face-to-face group. Mean scores on subscale scores of Th.o.m.a.s. (awareness, relation and realization). Bars indicate standard errors of the mean.
Discussion
With the present study we wanted to determine the reliability, validity, and feasibility of Th.o.m.a.s. (Bosco et al., 2009a, 2016) in telepractice. To do this, we compared the responses of two groups of healthy adults: the first group was administered the instrument remotely and the other group, matched for gender, age, and years of education, was administered the instrument in-person. Consistent with our hypotheses, we found no statistically significant differences between the two groups for the overall Th.o.m.a.s. score or for the scores on the four scales (A, B, C, D) or the subscales (awareness, relation, realization) or in the dimensions (beliefs, desires, positive and negative emotions).
The tool’s interrater reliability and internal consistency were analyzed separately for each group. Previous studies (Bosco et al., 2016) demonstrated the instrument’s good psychometric properties when administered in-person. This was confirmed in the present study, with high reliability of the total score and moderate-to-excellent reliability of the four scales separately for in-person administration. We expected to observe similar results for remote administration. Indeed, the intraclass correlation coefficient indicated good reliability of the total score and across the various scales. Internal consistency was also good, with high Cronbach’s alpha (α) calculated for each scale in both groups. Even higher internal consistency in the total score was noted for the TP group. Furthermore, the results indicated equivalent internal consistency for the two assessment modalities, thus confirming the reliability and robustness of Th.o.m.a.s. administered in-person and its comparability and suitability when administered remotely.
These results are shared by previous studies that indicated a comparable pattern of performance by telepractice platforms compared with face-to-face assessment in other contexts. Consistency between group scores has been found for tasks that assess language in clinical conditions compared to conventional in-person assessment (Rao et al., 2022; Theodoros et al., 2008; Vestal et al., 2006; Wadsworth et al., 2017; Weidner & Lowman, 2020) or videoconferencing-based neuropsychological assessment tools such as the MMSE (Munro Cullum et al., 2014). The same can be stated for telerehabilitative interventions delivered remotely (Kaiser et al., 2021; Marino et al., 2023).
Various assessment tools have shown moderate-to-high internal consistency and construct validity when used in telepractice, for example, a self-reported questionnaire investigating cognitive and affective empathy (Di Girolamo et al., 2019), the web version of the Pediatric Quality of Life Inventory Multidimensional Fatigue Scale (PedsQL MFS; Lassandro et al., 2020), the MMSE to assess general cognitive functioning (Munro Cullum et al., 2014; Vanacore et al., 2006), and semi-structured tools to assess quality of life in assisted elderly (De Leo et al., 1992) or, more recently, the APACS Brief Remote to assess pragmatic ability (Bischetti et al., 2024). These findings underscore the validity and feasibility of telepractice and its application via various assessment tools and interventions.
In their study, Schidelko et al. (2021) assessed ToM ability in children aged 3 to 4 years old and reported that the performance scores on the false belief tasks were comparable for the groups administered remotely and in-person. Moreover, Di Girolamo et al. (2019) reported comparable responses by two adult samples, the one remotely administered the Cognitive and Affective Empathy Questionnaire (Reniers et al., 2011) and the other administered the questionnaire in-person. Furthermore, our findings are consistent with studies that used the Th.o.m.a.s. in various populations. The tool showed high interrater reliability when administered in-person to people with schizophrenia (Bosco et al., 2009a) or with alcohol use disorders (Bosco et al., 2014a) or with congenital heart disease (Chiavarino et al., 2015) or with bulimia nervosa (Laghi et al., 2014) or with autism spectrum disorders (Fadda et al., 2024), sex offenders (Castellino et al., 2011), and non-suicidal self-injury adolescent inpatients (Laghi et al., 2016).
Remote administration of the Th.o.m.a.s. has shown good reliability and feasibility. Nonetheless, some limitations should be acknowledged. Telepractice provides residents of remote or rural areas with greater accessibility to healthcare services. Flexibility of use means it can boost access to assessment and training. By the same token, its use requires a basic level of technological proficiency, the availability of appropriate telecommunication devices, and stable internet connections. Individuals lacking confidence with digital technologies or without access to reliable internet connection may be less willing to engage in telepractice. Indeed, technical issues can affect the continuity and the overall quality of remote assessment. Certain limitations can be overcome with simple support, as seen in our study. Some older participants in the telepractice group received assistance from their family members who helped them establish the connection and set up the video call prior to the assessment session. Other problems and technical issues arising in telepractice may reduce the control an experimenter needs to have over environmental conditions, for example, setting limits to background noise, interruptions by family members or other distractions that may diminish participant concentration and performance.
Moreover, an area for future research is to investigate age-related differences in ToM performance, as previous studies have shown that ToM changes across the lifespan and that the average performance on ToM measures increases with age in adolescents (Bosco et al., 2014b) and then again between adolescence and young adulthood (Meinhardt-Injac et al., 2020). Similarly, Bosco et al. (2016) found that adolescents performed lower on average on the four Th.o.m.a.s. scales than young adults and adults. In their study involving older adults, Lee et al. (2021) found that performance on social cognition tasks, including ToM ability, declined with age from 66 to 105 years. Similarly, Fischer et al. (2017) reported age-related differences in both cognitive and affective ToM, that is, the ability to infer meta-cognitive beliefs and intentions (cognitive ToM) and to recognize and interpret emotional mental states (affective ToM), with younger adults performing better than older adults. Since the present study sample included participants up to the age of 74 years, future studies should include older adults and explore differences in ToM performance in late adulthood and between age groups.
Another area for future studies is gender differences. Indeed, previous studies have shown that women outperform men on mentalization tests (Baron-Cohen et al., 2001b; Bosco et al., 2014b; Téllez-Alanís et al., 2022) and on both cognitive and affective ToM tasks (Białecka-Pikul et al., 2017). Some evidence suggests that women interpret affective cues embedded in facial expressions with greater accuracy (Connolly et al., 2019; Wingenbach et al., 2018), especially negative emotions (e.g., anger, sadness, fear or disgust) than positive emotions (e.g., happiness; Thompson & Voyer, 2014) and are more accurate when inferring others’ beliefs, intentions, and thoughts (e.g., Charman et al., 2002; Proverbio, 2021). Similarly, Bosco et al. (2014b) found that preadolescent and adolescent girls outperformed age-matched boys on all Th.o.m.a.s. scales—A (I-Me), B (Other-Self), C (I-Other), and D (Other-Me), subscales (awareness, relationship, realization) and dimensions (beliefs, desires, positive emotions, negative emotions). While previous research has compared gender differences in preadolescents and adolescents with the Th.o.m.a.s. administered in-person (Bosco et al., 2014b), future studies could examine whether similar patterns emerge when the interview is delivered remotely and whether such differences persist or change across different age groups, including adults and older adults. Lastly, future studies could generate normative data for the Th.o.m.a.s. to establish the presence of a ToM deficit and the degree of impairment severity. This could be done in different languages, since the instrument can be translated and adapted to cultural contexts outside the Italian context (see, e.g., Gandolphe et al., 2018).
Moreover, the tool’s good psychometric properties and lack of statistically significant differences in scores between the two groups in the present study sample suggest that its validity holds in telepractice, making it useful for scenarios where reaching the patient/participant may be difficult or risky due to safety or practical issues. In brief, the Th.o.m.a.s. is suitable for a wide range of applications across different clinical conditions owing to its ability to assess multiple facets of ToM, including the understanding of different perspectives and mental states (emotions, desires, beliefs), as well as several abilities related to mental states (awareness, relation, realization). In clinical contexts, the Th.o.m.a.s. can provide a detailed profile of ToM strengths and weaknesses and can be effectively used to assess and monitor ToM difficulties in clinical conditions such as schizophrenia (Bosco et al., 2009a), personality (Colle et al., 2019), and autism spectrum disorders (Fadda et al., 2024). It can serve as a valuable tool to guide targeted therapeutic interventions and improve continuity of care and strengthen therapeutic goals. Furthermore, in educational contexts, the Th.o.m.a.s. may aid in the early detection of difficulties in ToM and so guide the development of personalized interventions to improve social-cognitive skills in children and adolescents, especially those at risk or with special educational needs.
When delivered remotely, the Th.o.m.a.s. offers greater flexibility in scheduling, making it easier for both participants and healthcare providers to arrange appointments for assessment and reduce logistical barriers to travel. The flexibility of the Th.o.m.a.s. in telepractice may be particularly attractive for adolescents and young adults, who are often more engaged and motivated to use digital tools for remote rather than in-person sessions (Neavel et al., 2022). It can also be used to reach people with clinical conditions or movement limitations (e.g., migraine or physical immobility) and facilitate reassessment as part of follow-up for disease monitoring over time.
Conclusions
The present study adds to the growing body of evidence in psychological research that telepractice modalities are comparable with conventional methods of delivering healthcare services. Our data indicate that remote assessment by the Th.o.m.a.s. is robust and of comparable psychometric quality with in-person administration. Furthermore, our findings help fill a gap in the availability of tasks and tools that examine various aspects of ToM in healthy adults, thus expanding the set of tools for researchers and clinicians interested in exploring the complex dimensions of ToM abilities across the lifespan. The versatility and adaptability of the Th.o.m.a.s. make it a valuable tool for both research and clinical application in psychology, psychiatry, and educational contexts.
Footnotes
Ethical Considerations
The research was approved by the Bioethics Committee of the University of Turin, Italy (protocol number 202271).
Consent to Participate
All participants were informed of the research objectives and procedures and gave their informed written consent in accordance with the Declaration of Helsinki.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was financially supported by Fondazione Cassa di Risparmio di Torino, ‘Scongiurare l’isolamento sociale oltre lo schermo: valutazione delle abilità sociali in telepractice con la Theory of Mind Assessment Scale’ (Grant No. 113475/2025.0328).
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The data are not publicly available for privacy or ethical restrictions; however, data will be available from the corresponding author on reasonable request.
