Abstract
Speakers typically describe scenes from their own spatial perspective, but have the ability to use allocentric perspectives as well. In two experiments, we examined how cultural, language, and situational factors affect which perspective speakers choose when describing the spatial locations of objects. Experiment 1 found that native Chinese speakers were more likely to use an allocentric perspective when describing scenes that included a person with an opposite perspective than scenes that did not; moreover, they did so to a greater extent than native English speakers. Experiment 2 compared spatial perspective taking in native Chinese speakers using either their native language (Chinese) or a non-native language (English) while measuring their degree of collectivism. Non-native language use but not higher collectivism was associated with an increase in allocentric perspective use. These results suggest that cultural factors beyond collectivism, language nativeness, and the presence of a person play a role in spatial perspective taking in language production. Thus, future academic research as well as professionals across various industries should take into account these factors when spatial perspective taking is relevant.
Introduction
Speakers can frame a message from different perspectives (Tversky, 1996). For instance, to tell someone on the other side of a table about a particular bottle, they can take their own (egocentric) perspective by saying the bottle on my left or their addressee’s (allocentric) perspective by saying the bottle on your right (or even a perspective based on absolute reference points by saying the bottle to the east). But what makes speakers take a particular perspective? Spatial perspective taking (henceforth perspective taking for brevity) is influenced by characteristics associated with the referent, for example its precise location and situational context (e.g., Tosi et al., 2020). But is it also influenced by factors related to speakers, such as their cultural background and whether they are using their native language? Examining perspective taking in language production is crucial for effective communication, as perspective taking happens frequently in many different types of interactions, and often in high-stakes situations (e.g., in air traffic control coordinating planes landing from various locations at an airport). In these cases, when speakers and listeners adopt different perspectives, inefficient coordination and misunderstandings can occur. Thus, understanding the factors that lead speakers to take a particular perspective can provide techniques on how to better align perspectives in communication. In addition, perspective taking is a fundamental cognitive process, and examining how speaker internal and external factors affect perspective taking can give insight into the mechanisms behind this process. In this paper, we report two experiments that investigate how such factors affect the choice between egocentric and allocentric spatial perspectives.
Perspective Taking in Language Production
To produce an utterance, speakers need to conceptualize a preverbal message out of the information they seek to convey, before they can turn the message into linguistic output (Levelt, 1989). One aspect of conceptualization is the selection of information to be expressed. For example, speakers may describe a bottle simply by saying the bottle or may include additional information by saying the green bottle with a cork. But in addition, there may be different ways to frame a message relating to choice of spatial perspective. For instance, in the above example a speaker can take an egocentric perspective (i.e., using their own perspective) and say the bottle on my left or even the bottle on the left; alternatively, they can take an allocentric perspective (i.e., using someone else’s perspective) and say the bottle on your right or even the bottle on the right.
Speakers often tend to take an egocentric perspective (Levelt, 1984; though see Levinson, 1996, on cultural variation in perspective taking systems 1 ), but they adopt an allocentric perspective in certain circumstances. For instance, Tversky and Hard (2009) had participants read a question that required them to describe the spatial relation between two objects positioned on a table; they could describe it egocentrically or allocentrically. Critically, some of the scenes additionally included a person behind the table, who either was just looking at the objects or was reaching out in the direction of one object. Overall, participants tended to use an egocentric perspective, but they used an allocentric perspective more when the scene included a person than when it did not. In particular, when the question mentioned the person in the scene and cued participants to attend to the person’s reaching act (e.g., In relation to the bottle, where does he place the book?), they used an allocentric perspective most of the time. In addition, participants mostly use allocentric perspectives when they are instructed to describe spatial relations in a way that would be clearest for their partner (who has a known different perspective) in a cooperative task (Mainwaring et al., 2003). Allocentric descriptions are also more likely when speakers believe that their addressee has low spatial cognition abilities or is struggling to do a task (Schober, 2009).
Likewise, Tosi et al. (2020, Experiment 3) had participants view scenes of two objects on a table and describe where one object was in relation to the other, with a person either being in the scene or not. When present, the person had either the same or the opposite spatial perspective as the participant and could either see (and potentially act on) the objects on the table or not. Participants used more allocentric perspectives when the person had the opposite than the same perspective as themselves, especially when the person could see the objects. These results suggest that participants simulated the person’s perspective, which influenced their choice of perspective when formulating their response. Overall, these studies show that perspective taking is influenced by factors external to speakers, including the presence of an entity with a differing perspective.
In addition, other research has shown that speakers adopt allocentric perspectives when presented with a cue to a person with a different spatial perspective even without a person actually present. Quesque et al. (2020) had participants describe the spatial relation between objects in scenes which featured combinations of a person and/or a chair (i.e., an inanimate object), both of which appeared with different perspectives across the experiments (e.g., forward-facing and side-facing). Participants often allocentrically adopted the perspective of the person (replicating Tversky & Hard, 2009), but also often adopted the perspective of the chair. They did so even when the chair could not be easily sat upon because it was broken or had an object on it, and also when the chair was not referred to in the experimental questions probing the positions of the target objects. Overall, these results suggest that people may simulate people in positions in which they are often found (e.g., sitting in a chair, even when the chair does not afford sitting) or just simulate the perspective of an inanimate object with an unambiguous front side, and are thereby induced to take a corresponding (allocentric) perspective.
Possible Effects of Culture and Language Nativeness on Perspective Taking
While previous research has demonstrated how perspective taking can be influenced by speaker-external factors, there is less evidence about whether speaker-internal characteristics influence perspective taking. One such characteristic is cultural background, which can affect a wide range of cognitive processes and behaviors (Bender & Beller, 2013), including the representation of space (Haun et al., 2011; Majid et al., 2004) and processes of attention; for example, the directional orientation of writing systems affects where people attend to most (Bergen & Chan, 2005) and Asians showing stronger memory than Americans for the context in which items are presented (e.g., in an underwater or neutral background for pictures of fish; Masuda & Nisbett, 2001). A cultural factor that may be especially relevant to spatial perspective taking is collectivism, which refers to the degree to which a culture prioritizes the group over the individual. Previous research shows that collectivism affects spatial representations (Cohen & Gunz, 2002) and social attentional processes (e.g., judging the social pain of others and suppressing emotion as a strategy to preserve social relations; Atkins et al., 2016; Chiang, 2012).
Evidence for cultural effects on cognition via collectivism stems largely from comparisons between Asian and Western cultures, with Asian cultures regarded as more collectivistic than Western cultures in general (Kitayama et al., 2007; Markus & Kitayama, 1991). Collectivist cultures put a greater emphasis on interpersonal harmony maintenance (Chiang, 2012), where having accurate insights into others’ emotions is paramount, and various studies have found that Asian participants perform better on tasks that require them to infer the mental state of others or to take an allocentric perspective than Western participants (Atkins et al., 2016; Cohen & Gunz, 2002; Wu & Keysar, 2007). For example, Atkins et al. found that Asian participants were more accurate at judging the social pain of others experiencing exclusion or devaluation in valued social relationships than their British counterparts. Moreover, Asian participants were less self-focused in their autobiographical memories and their emotions when judging the emotions of others (Cohen & Gunz, 2002): When recalling memories of events in which they were the center of attention, they more often visualized the event allocentrically than Western participants, and were less likely to project their own emotions onto others. Furthermore, in a communicative task involving perspective taking, Chinese participants were more attentive of their partner’s differing perspective (and thus took their partner’s perspective more) compared to American participants (Wu & Keysar, 2007; though in a similar task Wang et al., 2019 found no difference in egocentrism between British and Taiwanese participants). Note, however, that these studies did not directly measure collectivism, and hence the differences found between Asian and Western participants may have been due to other cultural factors. In addition, more recent work examining collectivism differences between Asian and Western populations has shown that these differences are becoming smaller (Lomas et al., 2022; Pelham et al., 2022).
Thus, collectivism may be associated with predominantly allocentric cognition, and individualism with predominantly egocentric cognition (Markus & Kitayama, 1991;Triandis & Gelfand, 2012). Speakers’ preferred spatial perspective may therefore be affected by the degree of collectivism of their culture, with greater collectivism causing increased attention to the mental states and needs of others, and leading to more allocentric perspective-taking. We might expect these group-level effects of collectivism also to be reflected at the individual level, so that individuals who score higher in collectivism would also show more allocentric perspective use than those who scored lower.
Another speaker-internal factor that may also be relevant to perspective-taking is language nativeness. Speakers’ use of a native versus non-native language may affect a variety of behavioral and cognitive processes during comprehension. For instance, they are less risk averse when responding to problems narrated in a non-native than a native language (e.g., Keysar et al., 2012). They are also more utilitarian in moral decision making when using a non-native than a native language; for example, when faced with a hypothetical moral dilemma in which participants could either do nothing and let a train kill five people or push one heavy man onto the train tracks (thereby killing one person but saving five others; Thomson, 1985), participants chose the utilitarian option of pushing the man more often when the scenario was framed in their non-native language than in their native one (Costa et al., 2014).
Overall, these effects may stem from non-native language use being more cognitively demanding than native language use (e.g., Costa & Sebastián-Gallés, 2014), causing people to shift from automatic and intuitive cognitive processing to a more deliberative style of thinking when using a non-native language. Therefore, the findings of less risk aversion and more utilitarianism with non-native language use may be due to these behaviors being aided by increased cognitive deliberation. Likewise, non-native language use may also affect perspective taking via increasing deliberation. Taking an egocentric perspective is hypothesized to come first and be automatic, whereas taking an allocentric perspective may require additional and conscious effort (Keysar, 2008; Keysar et al., 2000). Thus, people may be more likely to take an allocentric perspective when using a non-native than a native language. However, it could be that because non-native language use requires more cognitive resources, instead of shifting cognition in general to a more deliberative mode, non-native language use may do the opposite and take away cognitive resources and shift thinking to be more automatic. Under this scenario, non-native language use then should cause people to take an egocentric perspective more often compared to when using a native language. Thus, given these contrasting hypotheses regarding non-native language use and perspective taking, we make a non-directional hypothesis with regards to these effects.
In addition, while there are many other factors that affect perspective taking, such as lifestyle and ecological ones, this paper focusses on examining populations that in general are urban and educated. One particular factor that is relevant to this study is differences between urban and rural populations. Urban environments have been found to promote egocentric perspective taking more so than rural environments, especially when conceptualizing spaces on small scales (Bohnemeyer, 2014; Majid et al., 2004; Marghetis et al., 2024; Pederson, 1995). This may be due to urban populations spending more time inside and having less access and knowledge of salient landmarks that could be used to ground allocentric reference points (e.g. the store is to the left of the pond compared to the store is a minute walk from me). Furthermore, many cultures, and especially rural ones, do use coordinate systems based largely on salient rural landmarks such as waterways and mountains, or not relying on relative terms like left and right but on cardinal directions such as north and south (e.g., Majid et al., 2004). For example, Hindi speakers in Rural Nepal tend to take allocentric perspectives by using use local ecological landmarks as reference points, whereas Hindi speakers in urban India tend to take more egocentric perspectives (Mishra et al., 2003).
Nonetheless, we focused on an urban and educated demographic in order to make this study comparable to the majority of past research on perspective taking (e.g., Tosi et al., 2020), as well as for the fact that across the world people are becoming increasingly urbanized and educated (United Nations, 2025), with over half of the global population living in urban areas currently, and by 2050 more than two-thirds of the global population is projected to be urban. The task we use in this study (locating objects on a table in an inside setting) reflects that of a more urban setting as well (e.g., in offices) rather than outdoors on a farm, setting up the task naturally to reflect a choice between an egocentric perspective and a perspective coming from the direct opposite side of table. The scenes do not lend well to taking perspective from cardinal directions or ecological landmarks (as no compass nor landmarks feature in the scenes), making this study most appropriate for researching perspective taking in populations used to perspective taking in small and enclosed environments. Thus, all claims made in this study should be interpreted in light of our participants (i.e., speakers) being relatively urban and educated, with these claims possibly not holding up for different groups. However, future research should examine how culture and language nativeness affect perspective taking in a wide variety of populations and contexts, especially more rural populations and scenes.
The Current Study
The research objective for the current study sets out to investigate whether perspective taking in spatial descriptions is affected by two speaker-internal characteristics of cultural collectivism and language nativeness, and speaker-external characteristics such as the presence and positioning of another person in a scene. Past research has mainly focused on perspective taking from a language comprehension perspective (Wang et al., 2019; Wu & Keysar, 2007), or only on how speaker-external factors affect perspective taking (Mainwaring et al., 2003; Quesque et al., 2020; Tosi et al., 2020; Tversky & Hard, 2009). In addition, the current study is the first to examine whether language nativeness effects perspective taking. Overall, our research design allows us to address these literature gaps and providing further nuance on the mechanisms behind perspective taking.
For our research we use the theoretical frameworks of dual-process and social-cultural accounts of perspective taking, whereby from a dual-process cognitive account egocentric perspective taking is seen as the automatic default that people use (Levelt, 1984), and then various speaker-internal (such as culture) and external factors modulate the degree to which cognitive resources are employed to allow allocentric perspective to override egocentric ones. For example, cultural factors such higher levels of collectivism may increase other-oriented cognitive predispositions (e.g., Chiang, 2012), and hence increase allocentricism (Markus & Kitayama, 1991; Triandis & Gelfand, 2012). In a similar fashion, using a non-native language is cognitively demanding (e.g., Costa & Sebastián-Gallés, 2014), and using a non-native language may cause people to switch from automatic to deliberative styles of cognition. This switch then may increase the tendency for allocentric perspective taking, as deviating form the taking the default egocentric perspective may require deliberative cognitive effort (Keysar, 2008; Keysar et al., 2000).
To this end, we conducted two online experiments based on Tosi et al. (2020, Experiment 3). As in Tosi et al., we used an object-location query task. Participants were shown a picture containing a table with two objects on top of it, and with or without a person. The person could either see or not see (and thus potentially act upon; henceforth action potentiality) the objects on the table and had either the same or different visual orientation (i.e., were looking in the same or the opposite direction) as the participants (see Figure 1). We kept both of these manipulations from Experiment 3 of Tosi et al. in order to make our Experiment 1 directly comparable with it, as well as to examine whether visual orientation and action potentiality affect perspective taking to differing extents depending on collectivism. Participants were then asked On which side of the X is the Y? and typed an answer. Here, an egocentric response would be The bottle is on the right side of the apple, while an allocentric response would be The bottle is on the left side of the apple.

The conditions and example stimuli 2 in Experiment 1 (see Appendix 1 for the full stimuli list). Note that both a Chinese and English question are shown in the figure for reference, but in the experiment a participant saw only one version on a specific trial.
Experiment 1 tested whether cultural collectivism affects spatial perspective taking in language production by comparing Chinese and English native speakers. Tosi et al. (2020) observed more allocentric responses when participants viewed scenes involving people with opposite perspectives from their own than with the same perspective as their own, a finding that is consistent with perspective taking being influenced by the presence of others with different perspectives. In Experiment 1, we first tested whether this effect would differ between Chinese- and English-speaking participants using their respective native languages (i.e., Chinese and English) in order to examine whether perspective taking in spatial language production is influenced by culture. To this end, we compared the native Chinese-speaking participants (from a collectivist culture) with the native English-speaking participants (from an individualist culture). If cultural background (collectivist vs. individualist) influences perspective taking, then we predict that native Chinese-speaking participants should show more allocentric perspective-taking than the native English-speaking participants, as collectivism should reduce egocentricity and cause more attention to be directed toward others’ perspectives.
Experiment 2 tested the effect of language nativeness on perspective taking, making use of Chinese-English bilinguals (with Chinese as their native language and English as their non-native language). Note that it is possible that the prevalence of egocentric responses in Tosi et al. (2020) was due to the use of a native language: As we have noted, using one’s native language compared with one’s non-native language may affect other aspects of cognition, including more automatic and less deliberative decision making, which might in turn affect perspective taking by increasing egocentricity. Thus, in Experiment 2, we compared Chinese-English bilingual participants producing descriptions in their native language and in a non-native language. If native-language use does indeed affect deliberative decision making, then we predict that Chinese-English bilingual participants should show more allocentric perspective-taking when using a non-native language than when using a native language. We also tested whether individual differences in collectivism affect perspective taking, with the prediction that higher levels of collectivism would be associated with more allocentric perspective taking. The data and analytical scripts for both experiments can be found at https://osf.io/8ytfb/?view_only=28d77b2725084d97bfa1ca240c71de13.
Experiment 1
Methods
Participants
We recruited 97 native speakers of Chinese (mean age = 21.72; 18 male, 79 female) from various universities in Guangzhou (southern China), and 94 native speakers of English (mean age = 32.91; 41 male, 53 female) from the crowdsourcing platform Prolific Academic (https://www.prolific.co/). All of the participants in Experiment 1 (as well as in Experiment 2) gave their informed consent to take part in the study. Ethical approval was granted for this study by the Survey and Behavioural Research Ethics Committee at The Chinese University of Hong Kong (ethics code SBRE-22-0464).
Materials
These were the same as in Experiment 3 of Tosi et al. (2020). There were five conditions (see Figure 1): Four conditions had a depicted agent (i.e., person) in the scene, crossing action potential (whether or not the agent could act on the objects on the table; can-act-agent vs. cannot-act-agent, corresponding to Tosi et al.’s can act and cannot act conditions respectively) and orientation (whether or not the agent was facing the same or opposite direction as the participant; same orientation vs. opposite orientation); the fifth condition was a no-agent condition (serving as a baseline) that did not include an agent and was included to allow for comparisons with experimental conditions in which agent-related perspectives were available (which we will term agent-present conditions for exposition).
There were 12 experimental items, each consisting of two paired sets of five photographed scenes, with one set corresponding to one of the two possible left/right arrangements of the objects on the table (e.g., apple on the right/bottle on the left) and the other set corresponding to the other arrangement (e.g., bottle on the right/apple on the left). Each set contained five photographs corresponding to the five conditions together with a spatial description question (in Chinese or English, according to condition; see Figure 1). The two objects had no apparent relationship with each other and had symmetrical left/right sides and no clear front/back. The agent had his or her arms crossed and looked directly at the camera. There were four individual agents (two females and two males, each appearing in three items). Each experimental item also included a question about the spatial relationship of the two objects in the scene (e.g., Chinese: 苹果在瓶子的哪一边?; English: On which side of the bottle is the apple?). There were 16 filler items, consisting of pictures of objects arranged on a table, with questions about the color or size of the objects (i.e., not about spatial locations: Chinese: 碗是什么颜色; English: What color is the bowl?).
From these items, we created 12 lists, each consisting of 24 experimental and 16 filler trials. The order of trials was randomized for each participant. For the experimental trials, each list had 8 no-agent experimental items, and 4 experimental items from each of the other conditions. In each list, each of the 12 experimental items occurred twice, in two different conditions (i.e., no experimental item occurred in the same condition on two trials). The position of each individual object was queried once per list (e.g., for the apple/bottle item, the spatial locations of the apple and bottle were each queried once, on different trials).
Procedure
The experiment was conducted online on the online survey platform Qualtrics (https://www.qualtrics.com/). After giving their consent, participants were instructed in their native language that they would be viewing various pictures and would need to answer questions about these pictures, and were shown a picture of a practice item with an example of an appropriate response in the relevant language. Thus, the Chinese-speaking participants saw a practice item with an example response in Chinese underneath, and the English-speaking participants saw this practice item with an English example response. During this instruction phase and throughout the entire experiment there was no mention of the people featured in the pictures. Participants then started the main trials, in which they were presented with a scene and a question below the scene at the same time. They typed a sentence or phrase as an answer into a text box underneath the scene. They pressed a continue button when they had completed their response and then the next trial appeared. The experiment took around 10 min to complete. Chinese- and English-speaking participants were paid 10 Yuan (about 1.1 British pounds) and 1 British pound respectively after completing the experiment.
Data Coding
The Chinese and English typed responses were manually coded by a native speaker of Chinese and English respectively as egocentric, allocentric, or other. A response was coded as egocentric if it described the spatial relation of the two objects from the perspective of the participant (e.g., 苹果在左边; the apple is on the left). A response was coded as allocentric if it described the spatial relation of the two objects from the perspective of the opposite viewpoint to the participant (e.g., 苹果在右边; the apple is on the right). Otherwise, responses were coded as other. These include unclear perspective descriptions (e.g., 苹果在桌上; the apple is on the table), use of both perspectives (e.g., 苹果在我的左边, 他们的右边; the apple is on my left and their right), and irrelevant answers (e.g., 苹果更大; the apple is larger). Other responses were excluded from further analyses. In addition, we coded responses on filler trials for accuracy to check for lack of attention and potential random responding, given that the questions were straightforward. For example, if a participant responded on a filler trial that the color of the bowl in the scene is red (when in fact it is green; see Figure 1 for this filler trial), then this response would be coded as incorrect. All of the data, experimental items, and analytical scripts from Experiment 1 (and also Experiment 2) can be found at https://osf.io/8ytfb/?view_only=28d77b2725084d97bfa1ca240c71de13.
Research Design
Experiment 1 employed an experimental, mixed-design approach to investigate how speaker-internal characteristics of culture and speaker external characteristics of presence of an agent in a scene affect perspective taking, In particular, Experiment 1 employed a between-subjects design for the factor of cultural background by recruiting native Chinese and English speakers to take part in the study. From these cultural groups, perspective taking was examined across four within-subjects scene conditions that crossed the agent’s visual orientation (same vs. opposite to the participant) and the agent’s action potentiality (can act agent vs. cannot act agent), following Tosi et al. (2020). A fifth within-subjects condition was also included, that being scenes without any agent present.
We adapted the object-location query paradigm of Tosi et al. (2020, Experiment 3), whereby participants described the spatial relation between two objects placed on a table, sometimes in the presence of a human agent that had the same or different perspective as the participant and either could see or not see the objects on the table (i.e., could easily or not easily interact with the objects). For each trial, participants viewed a scene of two objects on a table and, depending on the trial, an agent with a particular visual orientation relative to the participant and ability to act on the objects. Participants then answered a question of the form of On which side of the X is the Y? in their native language. Responses were coded as egocentric, allocentric, or other, with the primary dependent variable being whether participants responded with an egocentric versus allocentric description.
Results
Of the 191 participants, four participants had an accuracy rate lower than 90% in their responses on the filler trials and were excluded from further analyses. Among the responses in experimental trials from the remaining 187 participants, 24 were coded as other responses and were excluded from further analyses. Of the remaining 4,664 responses, participants produced allocentric descriptions 19.4% of the time and egocentric descriptions 80.6% of the time.
We first carried out a binomial logit mixed effects (LME) analysis on the egocentric and allocentric responses (with egocentric responses as the reference level) on the four experimental conditions, using language, orientation, and action potential (contrast coded) as interacting predictors (see Table 1). We also compared the no-agent condition between Chinese and English speakers in a further LME analysis. These analyses (as well as all further analyses) used forward model comparison with a .2 alpha level to establish the optimal random-effects structure as justified by the data (e.g., Bates et al., 2015; Matuschek et al., 2017; see Dunn & Cai, 2025 for an example of similar analyses). More specifically, this forward comparison process entails starting with a base model (termed model 0) including only the random intercepts and no random slopes, and then iteratively comparing models which include additional random slopes (termed model 1 for 1 additional random slope, model 2 for 2 additional random slopes, etc., to model max for the maximum amount of additional random slopes; however, models that did not converge were not included in this process due to the instability of their parameter estimates). First, model 0 is statistically compared with model 1 using an ANOVA to assess whether the inclusion of an additional random slope significantly (in a statistical sense) increases the fit of the model (i.e., how well the model describes the data). If model 1 shows a statistically significant improvement over model 0 in terms of variance explained (with a p-value less than .2, this p-value was chosen over the usual .05 to be relatively liberal in allowing random slopes to be considered as substantially beneficial enough in terms of improving model fit to be added into the final chosen model), then model 1 is compared to model 2 and so forth until the best-fitting model is found. Overall, this forward model comparison provides a systematic and best-practice approach to determining the optimal amount of random slopes through a data-driven method, as merely including the maximum amount of random slopes in a model runs the risk of including many parameters that have no contribution to the quality of the model.
LME Results for the Agent Condition With Language, Orientation, and Action Potential.
Note. Predictors coded as language (Chinese = 0.5/English = −0.5), orientation (opposite orientation = 0.5/same orientation = −0.5), and action potential (can-act-agent = 0.5/cannot-act-agent = −0.5).
In addition, for all of the models in this study, an exploratory approach using all two-sided statistical testing was adopted. Even though some one-sided predictions were implied when discussing the included effects (e.g., collectivism being hypothesized to increase allocentric perspective taking), testing for competing predictions allows for a more flexible and robust strategy for complex phenomenon such as perspective taking.
There was no main effect of language, with Chinese speakers producing a similar proportion of allocentric responses (24.2%) as English speakers (19.9%; see Figure 2). There was a main effect of orientation, 3 with participants producing more allocentric responses when viewing scenes of agents with opposite orientations (29.1%) than when viewing scenes of agents with same orientations (15.0%). More importantly, there was an interaction of language and orientation, indicating that the effect of orientation was greater for Chinese speakers (23.4%) than English speakers (4.9%). No effects involving action potential were significant.

Proportion of allocentric descriptions for language, orientation, and action potential (error bars show the 95% CIs). The plotted values represent the sample means, and the confidence intervals represent the uncertainty around these sample means, with this being the case for all of the CIs in the figures of this study.
Importantly, for the no-agent (baseline) condition (see Figure 3), there was a main effect of language: When an agent was not present in the scene, Chinese speakers used allocentric perspectives (11.3%) less often than English speakers (16.9%) (β = −.93, SE = 0.39, z = −2.38, p = .017).

Proportion of allocentric descriptions for language for the no-agent condition (error bars show the 95% CIs).
Discussion
Experiment 1 confirmed Tosi et al.’s (2020) finding that the presence of an agent with an opposite orientation in a scene increased the use of allocentric perspectives to describe the location of an object. It replicated their results for English speakers, and it demonstrated the same pattern of results for Chinese speakers. Critically, this tendency to use allocentric perspectives in contexts with an agent with an opposite orientation was larger in Chinese speakers than in English speakers. In contrast, in scenes without an agent, English speakers showed a stronger tendency to use allocentric perspectives than Chinese speakers.
This overall pattern of results is consistent with English speakers being more self-focused in the task and thus paying less attention to contextual cues (i.e., if there is another perspective present) than Chinese speakers when choosing a spatial perspective, leading to relatively more allocentric perspectives in the no-agent condition and fewer allocentric perspectives in the agent-present conditions. This pattern of results also suggests that the difference in perspective-taking between groups was unlikely to be due to a difference in linguistic properties (e.g., that the allocentric perspective is more compatible with Chinese than English, which would lead to more allocentric perspective taking in Chinese speakers than English speakers across all conditions). Perspective taking was not affected by whether the agent could act on the object or not, in contrast to Tosi et al.’s findings (although the [non-significant] trend of our results was consistent, i.e., more allocentric perspective-taking when an agent with an opposite orientation could act on the object).
Experiment 2
In Experiment 2, we investigated whether speakers’ perspective taking is affected by whether or not they are using their native language. Specifically, we asked whether they are more likely to take an allocentric perspective when they use a non-native language than a native language. Additionally, we further explored the role of culture by investigating whether individual differences in collectivism affected speakers’ propensity toward allocentric perspective taking. We also examined both language nativeness and collectivism with respect to orientation in order to assess the interplay between these factors. To this end, we had native Chinese speakers describe spatial relations (via typing) using either their native (Chinese; simplified characters) or non-native (English) language, and then answer a questionnaire measuring collectivism. Note that in this experiment, the within-participant manipulation of language (Chinese vs. English) means that any observed difference in perspective taking between the two languages within each participant could not be due to cultural differences. Additionally, based on the greater use of allocentric perspectives by English- than Chinese-speaking participants in the no-agent (i.e., baseline) condition in Experiment 1, we can be confident that increased allocentric perspective taking in the presence of an opposite perspective would not be attributable to a language-specific linguistic property of Chinese.
We were interested in the effect of language nativeness on allocentric perspective taking in scenes both with and without opposite perspectives present. However, we were not interested in the relationship between action potential and language nativeness. We therefore dropped the same-orientation agent conditions (as in these conditions the participant’s and the agent’s spatial perspectives were the same, and these conditions also yielded the same rate of allocentric perspective taking as the no-agent condition, suggesting that the same-orientation agent conditions and the no-agent condition modulate perspective taking similarly). We also dropped the opposite-orientation cannot-act-agent condition, as the difference between the opposite-orientation cannot-act-agent and can-act-agent conditions did not affect perspective taking in Experiment 1. Thus, the experiment had a design of 2 (agent: opposite-orientation can-act agent vs. no-agent) × 2 (language: native vs. non-native). Chinese-speaking participants viewed scenes either with or without an opposite-orientation agent who could see (and potentially act on) the objects in the scene, together with a question written in Chinese (their native language, in one block) or in English (their non-native language, in the other block), with language order counterbalanced between participants. Participants were instructed to answer the question in the same language by typing into a text box.
If using a non-native language promotes deliberation in decision making (Keysar et al., 2012), and if this then leads to increased consideration of other perspectives, then our participants should use more allocentric responses to describe scenes in English than Chinese. And if the effects of language nativeness on perspective taking are similar to the effects of culture in Experiment 1, they should do so especially in cases that feature an opposite perspective to their own. Additionally, if the differences in spatial perspective-taking found in Experiment 1 were due to cultural differences associated with collectivism, we would expect participants with higher collectivism scores to use more allocentric responses than participants with lower collectivism scores, and this tendency should be particularly strong for scenes containing opposite perspectives (i.e., opposite-orientation can-act agent condition).
Methods
Participants
A further 70 native Chinese-speaking participants (mean age = 22.52; 19 male, 51 female) were recruited from universities in Guangzhou (southern China).
Materials
We used all items (i.e., the same object combinations in the scenes) from the no-agent condition and the opposite-orientation can-act-agent condition in Experiment 1 (i.e., excluding the same orientation conditions and the opposite orientation cannot-act condition). These two conditions were labeled in Experiment 2 as the opposite-orientation can-act agent condition and the no-agent condition.
We created two lists with 8 filler items and 24 experimental items, each consisting of 12 opposite-orientation can-act agent trials and 12 no-agent trials. Within each list, the position of each individual object was queried only once, and each object orientation was used only once as well. Between the two lists, each object orientation was used in a different condition of agent (e.g., a right-side target with the opposite-orientation can-act agent condition in List 1 and the same target on the left-side with the no-agent condition in List 2). For each list, we created a native language version (where the trial questions were in Chinese, and participants were given instructions in Chinese before starting the trials to produce their responses in Chinese) and a non-native language version (where the trial questions were in English, and participants were given instructions in Chinese before starting the trials to produce their responses in English). Participants saw both lists, one in each language, with list order counterbalanced between participants.
Procedure
The procedure was the same as Experiment 1, except that participants were instructed to produce typed descriptions in Chinese and in English (the language to be used was prompted in Chinese before each block, e.g., 请用英语回答以下问题; translation: please use English to answer the following questions). Half of the participants produced Chinese descriptions in the first block and English descriptions in the second block; the other half produced English descriptions first and Chinese descriptions second.
After the main experiment, participants completed a 14-item questionnaire in Chinese (Sivadas et al., 2008) designed to measure collectivism (see Appendix 2). Each question used a Likert scale from 1 to 9. These questions generally probed the emphasis people place on themselves versus a larger entity (e.g., I usually sacrifice my self-interest for the benefit of my group, I enjoy being unique and different from others in many ways; see Appendix 2 for full list). A collectivism score was obtained for each participant by averaging their questionnaire responses (8 items pertained to collectivism and 6 items pertained to individualism, with items pertaining to individualism reverse-coded). Participants also provided their self-reported English proficiency on a scale from 0 to 100. Participants were paid 15 Chinese Yuan (about 1.6 British pounds) after completing the experiment.
Data Coding
This was the same as in Experiment 1.
Research Design
Experiment 2 employed a similar research design to Experiment 1 to investigate how collectivism and native/non-native language use affect perspective taking, In particular, Experiment 2 measured collectivism at the individual level and employed a within-subjects design for the factor of language/non-native language use by recruiting native Chinese speakers with English as their non-native language to take part in the study. From these bilingual participants, perspective taking was examined across two within-subjects scene conditions that featured either an agent with an oppositive perspective relative to the participants that could see the objects in the scene, or featuring no agent in the scene.
As in Experiment 1, we adapted the object-location query paradigm of Tosi et al. (2020, Experiment 3), whereby participants described the spatial relation between two objects placed on a table, either in the presence of a human agent that had a different perspective as the participant and could see the objects on the table, or not in the presence of a human agent. For each trial, participants viewed a scene of two objects on a table and, depending on the trial, with or without an agent in the scene. Participants then answered a question of the form of On which side of the X is the Y? in their native language (Chinese) or there non-native language (English; this within-subjects condition used a blocked design, whereby participants either responded in the first half of the trials in their native language then the second half in their non-native language, or vice-versa). Responses were coded as egocentric, allocentric, or other, with the primary dependent variable being whether participants responded with an egocentric versus allocentric description.
Results
We excluded five participants for low accuracy (less than 90% correct) on the filler trials. For the remaining 65 participants, following the exclusion criteria as Experiment 1, we further excluded 24 “other” responses (out of 3,094 target responses).
An initial LME analysis (Table 2) was conducted with the remaining responses, using agent, nativeness, collectivism (mean-centered and standardized), and the order of the language blocks (termed order) as interacting predictors. There was a main effect of agent: Participants were more likely to use an allocentric perspective when there was an (opposite-orientation) agent (17.3%) than when there was no agent (11.3%; see Figure 4). There was also a main effect of nativeness: Participants were more likely to use an allocentric perspective when they used their non-native language (20.3%) than when they used their native language (8.3%). The interaction between agent and nativeness was significant (see Figure 4); separate analyses revealed that the effect of agent occurred for the native language responses (β = −3.52, SE = 1.46, p < .05) but not for the non-native language responses (β = .11, SE = 0.51, p > .05). In addition, there was a three-way interaction 4 among agent, nativeness, and collectivism (and likewise a significant interaction between agent and collectivism). Separate analyses showed that there was a trend between nativeness and collectivism for the no-agent scenes (β = −1.25, SE = −1.66, p = .098) but no significant interaction for the opposite-orientation can-act agent scenes (β = .73, SE = 1.02, p = .477; see Figure 5). Crucially, there was no main effect of collectivism (β = .21, SE = 0.51, p = .683) nor an interaction between collectivism and agent (β = .31, SE = 0.29, p = .290), suggesting that collectivism does not increase allocentric perspective taking in general, nor increase allocentrism specifically in scenes with an oppositive perspective present.
LME Results of Experiment 2 for Agent, Nativeness, Order, and Collectivism.
Note. Predictors coded as agent (opposite orientation can-act agent = 0.5/no-agent = −0.5), nativeness (native = 0.5/non-native = −0.5), and order (Chinese first = 0.5, English first = −0.5).

Proportion of allocentric descriptions for agent and nativeness (error bars show the 95% CIs).

Proportion of allocentric responses for agent, nativeness, and collectivism.
The order predictor had significant two-way interactions with both agent and nativeness (as well as a significant three-way interaction among nativeness, order, and collectivism). To rule out the possibility that the effects involving agent, nativeness, and collectivism were due to the nature of the blocked design of Experiment 2, a model was built using only the first language block for each participant (i.e., only native language responses for participants who used their native language in the first block, and only non-native language responses for participants who used their non-native language in the first block). By examining only the first block for each participant, no ordering effects are possible. Thus, this first-block-only model included only agent, nativeness, and collectivism as interacting predictors (the full model results can be found in Appendix 3). This model produced comparable results to the model that included order as an interacting predictor (with all the significant effects from the main analysis involving agent, nativeness, and collectivism still being significant, i.e., the main effects of agent and nativeness, and the interactions between agent/nativeness, and agent/nativeness/collectivism), with the addition of a significant main effect of collectivism and a significant interaction between agent and collectivism. These results indicate that the effects of interest found in Experiment 2 are stable and are not primarily driven by language order effects.
In order to further examine the effects of language-specific properties on perspective taking, we conducted another LME analysis (Table 3) that compared the English responses in the no-agent condition in Experiments 1 and 2, with nativeness as a predictor (i.e., native for the Experiment 1 English responses and non-native for the Experiment 2 English responses). There was no main effect of nativeness: English-speaking participants were as likely to use an allocentric perspective for no-agent scenes when responding in their native language (16.9%) as Chinese-speaking participants responding in their non-native language (18.1%; see Figure 6). We also compared the Chinese responses in the no-agent condition in Experiments 1 and 2 (Table 4 and Figure 7); participants were more likely to use an allocentric perspective in Experiment 1 (11.3%) than Experiment 2 (4.5%). However, a caveat must be made regarding these between-experiment analyses, in that Experiment 1 contained relatively more trials of scenes with an agent (two-thirds of trials) than Experiment 2 (half of trials); it is possible that this difference might have affected the use of allocentric perspectives on no-agent trials. That is, these results may reflect increased priming of allocentric perspectives in Experiment 1, which could mask the effects of language nativeness for the Chinese participants by increasing allocentric perspective taking in Experiment 1 to a similar extent as non-native language use increases allocentrism in Experiment 2.
LME Results for No-Agent English Responses With Nativeness.
Note. Predictor coded as nativeness (native = 0.5/non-native = −0.5).

Proportion of allocentric descriptions for nativeness for no-agent English responses (error bars show the 95% CIs).
LME Results for No-Agent Chinese Responses With Experiment.
Note. Predictor coded as experiment (Experiment 1 = 0.5/Experiment 2 = −0.5).

Proportion of allocentric descriptions across the experiments for no-agent native Chinese responses (error bars show the 95% CIs).
Discussion
In Experiment 2, Chinese speakers undertook the same spatial description task as in Experiment 1, in both their native language (Chinese) and a non-native language (English). They produced more allocentric descriptions for scenes with an opposite-orientation can-act agent compared to scenes without an agent (replicating the critical finding from Experiment 1). These results provide further evidence that, when describing spatial relations, speakers can be influenced by the perspective of an agent in a scene. Speakers also produced more allocentric descriptions when responding in a non-native language than their native language, suggesting that non-native language use induces greater allocentric perspective-taking. Note that language nativeness was manipulated within-participants, and so the increase in allocentric perspective-taking in the non-native language could not be due to differences between speakers’ cultural characteristics such as collectivism. Furthermore, the finding of more allocentric perspectives when using English than Chinese suggests that if any context-related priming of cultural differences did occur (e.g., if the use of Chinese primed collectivism), it was outweighed by factors relating to language nativeness.
In addition, differences in allocentric perspective taking between opposite orientation can-act-agent and no-agent scenes were smaller in non-native language descriptions than in native language descriptions, indicating that the perspective-cueing effect of an agent is weaker when using a non-native language than a native language. However, there was no difference in perspective-taking between non-native and native responses in English when no agent was present in the scene. Higher collectivism in individuals was not associated with an increase in allocentric perspective taking overall, or in the particular case of scenes with an opposite perspective present. However, collectivism was associated with an increase in allocentric perspective taking in the very specific context of using a non-native language to describe scenes without an agent.
General Discussion
In two experiments that investigated spatial perspective taking in Chinese and English language production, we found that perspective taking can be influenced by both speaker-external factors such as the presence of a person in the scene and speaker-internal factors such as culture and language nativeness. Although egocentric responses constituted the majority of responses in both experiments, speakers were more likely to use an allocentric perspective in the presence of an agent with an opposite perspective from their own than an agent with the same perspective as their own or no agent (Experiments 1 and 2). Importantly, Chinese native speakers were more likely to use an allocentric perspective than English native speakers when viewing scenes with an agent with an opposite orientation (but English native speakers used more allocentric perspectives than Chinese native speakers when there was no agent in the scene; Experiment 1). Moreover, using a non-native language increased allocentric perspective taking overall compared with using a native language (Experiment 2) in a way that was not due to language-specific properties (see section “Culture”). In contrast to these group-level effects, at an individual level higher collectivism was not associated with an increase in allocentric perspective taking overall, or when describing scenes in which an agent was present (Experiment 2). The following discussion explores how the presence and positioning of a person in a scene, culture of the participants, and language nativeness contribute and interact to affect spatial perspective taking.
Perspective Cueing
The increase in allocentric perspectives when describing scenes containing people with opposite perspectives is consistent with earlier findings that viewing agents with opposite perspectives can induce allocentric perspective taking (e.g., Furlanetto et al., 2013; Tosi et al., 2020; Tversky & Hard, 2009). However, our findings suggest that this increase in allocentric perspectives is not affected by whether or not the agent with an opposite perspective can easily act on the relevant object (in contrast to Tosi et al., 2020), as viewing an agent with an opposite perspective who could not act on the object induced similar increases in allocentric perspective taking compared to an agent with an opposite perspective who could do so (Experiment 1). This suggests that the mere presence of an agent with an opposite perspective is sufficient to induce allocentric perspective taking, and that action potentiality (i.e., being able to see and easily act on the relevant objects) is not necessary to affect allocentrism. It is possible that this study may have lacked the experimental power to detect any effects of action potentiality (and indeed the numerical tendencies give tentative support to the notion that people pay increased attention to the perspective of agents that can see and easily act on the relevant objects). Regardless, this study shows that the perspective of an agent has larger effects on perspective taking than action potentiality.
Culture
Chinese native speakers used allocentric perspectives more often than English native speakers when cued by an opposite perspective. In principle, it could be the case that, relative to English, Chinese has some linguistic properties that promote allocentric responses. However, two observations argue against this linguistic-property account. First, when viewing scenes without an agent (i.e., containing no cue to another perspective), Chinese native speakers actually exhibited less allocentric perspective taking behavior than English native speakers (Experiment 1), thus suggesting that use of Chinese is not driving increased allocentric perspectives. Second, in Experiment 2 participants provided more allocentric descriptions when they used English than Chinese, contrary to the prediction of the linguistic-property account (according to which we would have expected more allocentric responses in Chinese than in English).
Therefore, we suggest that cultural differences – likely not related to collectivism, but rather to other factors such as theory of mind (ToM) or high/low context differences – are more likely than language differences to have caused the divergence in perspective taking between Chinese and English speakers, supporting a social-cultural account for variation in perspective taking. For example, various findings suggest that Asians on average have enhanced ToM abilities compared to westerners (Atkins et al., 2016; Cohen & Gunz, 2002; Wu & Keysar, 2007), which may in part explain differences in perspective taking between Chinese and English speakers. ToM reflects the ability to understand others’ mental states (Premack & Woodruff, 1978), and therefore may aid in simulating others’ spatial perspectives that other people have. However, other studies have found similar profiles between Chinese and English speakers in some specific ToM abilities such as spatial perspective taking (Wang et al., 2019), belief attribution (Bradford et al., 2018), and children’s false belief reasoning (Sabbagh et al., 2006). Therefore, while ToM may have an influence on perspective taking, it is unclear whether ToM differences at the group level (if any) and at the individual level affected spatial perspective taking in our study.
Another cultural factor that may explain perspective taking differences between Chinese and English speakers is cross-cultural contexting theory (Hall, 1976; Hall & Hall, 1990), which postulates that Asian cultures are “high-context cultures,” whereby the meaning conveyed in communication is more implicit and context driven. In contrast, Western cultures are viewed as “low-context cultures,” whereby communication is more explicit (i.e., expressed directly via language) and less context driven. Therefore, in the perspective-taking task used in this study, participants from a high-context culture (i.e., Chinese native speakers) may have contextualized the communication scenario as one that implicitly referenced the needs of the agent, leading these participants to be more likely to use the perspective of the agent (i.e., an allocentric perspective in cases where the agent had an opposite perspective). In contrast, participants from a low-context culture (i.e., English native speakers) may have taken a more explicit approach to the task by answering the question as literally stated and not taking into account the broader context with an opposite-facing agent, leading to more egocentric perspective taking compared to Chinese native speakers.
But why does collectivism (measured in our study at an individual level) not seem to increase the tendency to take an allocentric perspective? Collectivism in other domains has been demonstrated to have a variety of behavioral and cognitive consequences, for example affecting group reward allocation to favor giving rewards to others (Hui et al., 1991), increasing group-interdependent self-conceptualizations (Eaton & Louw, 2000), and enhancing holistic visual processing in detecting global shapes in scenes with cluttered backgrounds (Chua et al., 2021). These findings relate back to the core principle of collectivism, namely an increased emphasis away from the self and toward the group. However, these effects of collectivism are typically observed in (and may only arise in) tasks with more direct references to others and/or consequential behaviors (e.g., allocating money to others: Hui et al., 1991; tasks involving self-descriptions: Eaton & Louw, 2000). These tasks may invoke greater awareness of one’s behavior toward or relationship with other people, allowing collectivism to play a larger role in modulating behavior. In contrast, the task in our study made no reference to the agent and presented participants with relatively neutral, non-consequential questions. Thus, future research should examine if collectivism plays a role across a range of perspective taking tasks that vary in degree of references and consequential behaviors toward others.
Nonetheless, one marginal effect of collectivism was found when conducting follow-up analyses on the three way interaction between agent, nativeness, and collectivism, in that more collectivist participants showed a weak tendency to be more allocentric when using a non-native language in scenes with no agent (Experiment 2). This may suggest that collectivism could have an effect on perspective taking but only in very limited circumstances. However, this effect was not originally predicted (and it was only marginally significant), and also goes against the finding in Experiment 1 that culture had an effect on perspective taking primarily in scenes featuring an agent with an opposite perspective. Therefore, while collectivism may play a small role in perspective taking, these results suggests that other cultural and scene-external factors are more influential in modulating perspective taking.
Language Nativeness
We found that participants in Experiment 2 took an allocentric perspective more often when using a non-native language (English) than a native language (Chinese). This pattern could arise because using a non-native language is more cognitively demanding than using a native language (e.g., Costa & Sebastián-Gallés, 2014), and therefore lead non-native language users to shift their thinking to more deliberative and slower thinking in many decision-making tasks (e.g., Keysar et al., 2012), in line with a dual-process account of perspective taking. As a result, using a non-native language may increase awareness of different perspectives in spatial description compared to using a native language, and thus lead to increased allocentric responses. However, this native-language effect was reduced in scenes featuring an agent with an opposite perspective compared to scenes with no agent present, suggesting that the presence of a person in the scene is a salient cue for an allocentric perspective that may have overridden some of the effects of non-native language use on perspective taking.
This pattern of results regarding language nativeness may also reflect the relatively straightforward nature of our task, in which each trial involved only a short sentence followed by a single-word response to complete the task. This simplicity may have allowed participants to allocate the necessary cognitive resources toward initiating allocentric perspective-taking (Segalowitz & Hulstijn, 2005). In contrast, other, more demanding tasks might yield a different pattern: Because non-native language use is associated with increased cognitive demand, it is plausible that using a non-native language would inhibit the ability to take other perspectives, thereby reducing instead of increasing allocentric perspective taking. The non-native allocentrism effect that we found in our study may therefore arise only in tasks that use relatively few cognitive resources. Future research should examine this in more complex tasks to assess the extent of the effects of non-native language use in spatial perspective taking.
Language nativeness may also act in concert with other factors to affect perspective taking. In our study, participants who were higher in collectivism tended to use more allocentric perspectives than participants who were lower in collectivism when using a non-native language to describe scenes that did not include an agent with an opposite perspective – but this pattern was not found when they used their native language or when they described scenes that included a agent with an opposite perspective. We tentatively suggest that perspective taking is particularly susceptible to the salient cue of another person’s perspective, and that the presence of such a cue will typically override the effects of other factors such as language nativeness and collectivism in isolation. However, the co-occurrence of non-native language use and high collectivism may together provide additive effects and thus create a sufficiently strong influence to boost allocentric perspective taking. The results provide some preliminary evidence – requiring further empirical support – that spatial perspective taking may modulated by a complex interplay between language nativeness and collectivism, while also highlighting the relative primacy of the presence of person with an opposite perspective in promoting allocentric perspectives.
It should be noted that another possibility for increased allocentric perspective taking using a non-native language in theory could be due to higher error rates (i.e., confusing left and right) in non-native language use that could result in higher rates of apparently allocentric responses. However, our findings as well as past research provide evidence against this alternative explanation. In a statistical model of Experiment 2 that includes self-rated English proficiency as a covariate, this covariate is not significant (p = .428), and there is no change in the effect of increased allocentrism when using a non-native language (p < .001; nor do any of the other results change in terms of statistical significance). Moreover, previous research using a more linguistically challenging task (involving moral decision making after reading short dilemma-based scenarios) did not find that non-native language use resulted in more random/mistaken responding than native language use (Costa et al., 2014). Overall, these findings suggest that the dual-process cognitive account of differences between native and non-native language use modulate perspective taking but not differences in linguistic error.
A General Mechanism of Spatial Perspective Taking
Overall, across the dimensions of different perspectives, culture, and language nativeness, the choice to take an allocentric rather than egocentric perspective seems to have a common basis, stemming crucially from an increase in attention directed toward another perspective beside one’s own perspective. First, being in the presence of a person with an opposite (or simply different) perspective is a salient reminder that other perspectives exist, and indeed in such contexts people seem to take notice of another person’s perspective and use it in subsequent spatial perspective taking. This cueing effect may have an automatic component, as it happens even without mention of another perspective or an explicit associated task-related goal, and even in the mere presence of inanimate objects that do not have an inherent visual perspective but suggest a perspective that a person could potentially have (e.g., a chair; Quesque et al., 2020). In addition, this effect also seems to have a goal-oriented component, as allocentric perspective taking increases with explicit mention of the other person (Tversky & Hard, 2009), when asked to provide clear instructions for the other person (Mainwaring et al., 2003), and when the other person is struggling with the task or is believed to have lower spatial cognitive abilities (Schober, 2009). Similarly, the effects of non-native language use and culture on spatial perspective taking may also be driven by an increase in attention directed toward other perspectives (e.g., Costa et al., 2014; Wu & Keysar, 2007) that plausibly also reflects both automatic and goal-oriented mechanisms (although probing these mechanisms is beyond the scope of this study).
While we have shown that non-native language use and culture affect spatial perspective taking in language production, we hypothesize that these effects may also apply more broadly, such as in comprehension of spatial language. For example, either being immersed in Eastern cultures or listening to a non-native language may cause people to be more attentive to the perspective of a partner, similar to how Chinese participants are more attentive of others’ perspectives than American participants in a communication task (Wu & Keysar, 2007). Thus, these factors may affect perspective taking in comprehension by increasing the rate of allocentric interpretations of spatial descriptions. These effects may also extend to other (non-spatial and/or non-linguistic) forms of perspective taking, such as in judging social pain and emotions (e.g., Atkins et al., 2016; Cohen & Gunz, 2002), and inferring mental states of others (e.g., Wu & Keysar, 2007).
Conclusion
In two experiments investigating spatial descriptions in Chinese and English, we found that cultural factors and language nativeness, as well as the presence of another person, affect perspective taking in language production. Overall, scenes featuring a person with an opposite perspective increased allocentric perspective taking. But this effect was stronger in Chinese than English native speakers, a difference that did not appear to be associated primarily with cultural factors related to collectivism. Chinese speakers also exhibited increased allocentric perspective taking when using a non-native than native language. We conclude that spatial perspective taking in language production is modulated by a complex interplay involving both speaker-related and contextual factors that include language nativeness, cultural factors beyond collectivism, and the presence of another agent.
Footnotes
Appendices
The Target and Filler Items.
| Item type | Question (English translation) |
|---|---|
| Target | On which side of the apple is the bottle? |
| Target | On which side of the salt is the can? |
| Target | On which side of the cup is the lid? |
| Target | On which side of the tennis ball is the ashtray? |
| Target | On which side of the bagel is the jar? |
| Target | On which side of the pineapple is the candle? |
| Target | On which side of the bowl is the pepper? |
| Target | On which side of the plate is the glass? |
| Target | On which side of the sugar is the wine? |
| Target | On which side of the tape is the pear? |
| Target | On which side of the squash is the vase? |
| Target | On which side of the wool is the toilet paper? |
| Target | On which side of the bottle is the apple? |
| Target | On which side of the can is the salt? |
| Target | On which side of the lid is the cup? |
| Target | On which side of the ashtray is the tennis ball? |
| Target | On which side of the jar is the bagel? |
| Target | On which side of the candle is the pineapple? |
| Target | On which side of the pepper is the bowl? |
| Target | On which side of the glass is the plate? |
| Target | On which side of the wine is the sugar? |
| Target | On which side of the pear is the tape? |
| Target | On which side of the vase is the squash? |
| Target | On which side of the toilet paper is the wool? |
| Filler | What color is the bowl? |
| Filler | What color is the coffee cup? |
| Filler | What color is the box? |
| Filler | What color is the desk lamp? |
| Filler | What color is the charger? |
| Filler | What color is the shoe? |
| Filler | What color is the mixer? |
| Filler | What color is the glove? |
| Filler | Is the scarf larger or smaller than the glasses? |
| Filler | Is the electric kettle larger or smaller than the coffee maker? |
| Filler | Is the grater larger or smaller than the bag? |
| Filler | Is the blender larger or smaller than the cup? |
| Filler | Is the toothpaste larger or smaller than the sewing machine? |
| Filler | Is the phone larger or smaller than the book? |
| Filler | Is the umbrella larger or smaller than the lipstick? |
| Filler | Is the rope larger or smaller than the boots? |
Appendix 2
The collectivism questionnaire (Sivadas et al., 2008).
Ethical Considerations
Ethical approval was granted for this study by the Survey and Behavioural Research Ethics Committee at the Chinese University of Hong Kong (ethics code SBRE-22-0464).
Consent to Participate
All of the participants in this manuscript gave their informed consent to take part in the study.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by a General Research Fund grant (Project Number: 14600220).
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
