Abstract
Given the rapid advancement of large-scale language models, AI models, like ChatGPT, are playing an increasingly prominent role in human society. However, to ensure that AI models equally benefit human society, we must fully understand the cultural stereotypes and biases that AI models may exhibit in interacting with humans. Study 1 first measured ChatGPT in 13 dimensions of cultural values; it was revealed that ChatGPT's cultural value patterns are dissimilar to those of various countries/regions worldwide. In Study 2, we analyzed the performance of ChatGPT-3.5 and ChatGPT-4.0 across thirteen decision-making tasks involving human interactions from different countries or regions. The results showed that ChatGPT 3.5 exhibits apparent cultural stereotypes in most decision-making tasks and shows significant cultural bias in third-party punishment and ultimatum games. However, ChatGPT 4o, while reinforcing stereotypes, showed reduced cultural bias on these tasks. The findings indicate that, compared to humans, ChatGPT exhibits differential cultural value orientation, and it also shows cultural biases and stereotypes in interpersonal decision-making, and the version upgrade increases the sensitivity of chatbots but weakens the potential bias. Furthermore, Study 3 created four kinds of prompt strategies and applied them to decision-making tasks. It also found that the four prompt strategies can effectively reduce the generation of ChatGPT's stereotypes and biases. Future research should emphasize enhanced technical oversight and augmented transparency in the database and algorithmic training procedures to foster more efficient cross-cultural communication and mitigate social disparities.
Introduction
Modern artificial intelligence (AI) has been applied to realize various human–computer interaction functions, including dialogues, providing suggestions, composing poems, and proving theorems. This highly humanized performance has garnered widespread public attention (Gilardi et al., 2023; Trautman et al., 2025), and human–computer interaction is gradually becoming common and may replace the interaction process between people in the future. Previous studies have primarily examined the degree of humanization of ChatGPT from algorithmic and data-centric perspectives, investigating whether ChatGPT can understand and process information akin to humans (Binz & Schulz, 2023; Coda-Forno et al., 2024; Huang et al., 2023; Mei et al., 2024), as well as the potential advantages and challenges associated with such humanization (Acerbi & Stubbersfield, 2023; Bubeck et al., 2023; Duan et al., 2025; Zhang et al., 2025). However, beyond cognitive abilities, the sociocultural characteristics of AI models are particularly important in ensuring equality and fairness in human–machine interactions. Considering cultural related materials ingrained in the training data, ChatGPT may have inadvertently acquired certain cultural biases or stereotypes during its training, thus posing challenges to human–computer relationships, interpersonal relationships, and intergroup relationships. Hence, this study utilizes multidimensional indicators to systematically analyze the cultural values, potential biases, and stereotypes inherent in ChatGPT and attempts to propose feasible prompt interventions to reduce the cultural biases generated by ChatGPT.
In the learning process, since the large language model learning contains various viewpoints, beliefs, and attitudes of different societies, it may unconsciously absorb this potential information and form its unique cultural values. Cultural values are the norms that determine a particular culture's social order and profoundly impact individual psychology and behavior. Studies have shown that individuals from different cultural backgrounds exhibit unique psychological characteristics and behavioral patterns from the values and societal norms emphasized within their respective cultures (Henrich et al., 2001; McCrae & Terracciano, 2005). Previous studies have found that AI systems often reflect the dominant cultural values in their training data when making moral and ethical decisions (ÓhÉigeartaigh et al., 2020). Thus, ChatGPT may have developed specific intrinsic cultural values. This could have potentially negative consequences. For example, if ChatGPT shows a different cultural value from the user, it may have controversial answers or offensive suggestions, which will cause user resistance or distrust (Bender et al., 2021), or subtly influence users’ values and affect their physical and mental health. Therefore, this study first collected data on ChatGPT's performance in multiple cultural dimensions through a questionnaire survey to explore its cultural values.
The cultural values embedded in ChatGPT shape its broad cognitive understanding of different cultural groups (Ornelas et al., 2023). This generalized understanding often leads to the formation of cultural stereotypes. Cultural stereotypes refer to expectations about the motivations, behaviors, and emotions of others (Stewart et al., 1979), and ChatGPT may develop oversimplified views of certain cultural groups based on limited information sources. These expectations can further give rise to sociocultural biases, which manifest as optimistic or pessimistic attitudes toward specific social and material contexts (Ostroot & Snyder, 1985). Since sociocultural biases often lack explicit counter-norms, they are more difficult to identify than other forms of bias, such as gender or racial biases (Kleinman & Benson, 2006; Nosek et al., 2002). Currently, ChatGPT is widely used across various fields. However, if its responses reflect stereotypes and biases, it may undermine the neutrality and fairness of interactions, exacerbate misunderstandings, and lead to group conflict or societal divisions (Olli & Swedlow, 2023; Pursiainen, 2022; Yang et al., 2023).
Recent research has begun to systematically examine cultural biases in large language models, establishing important foundational knowledge in this emerging field. Atari et al. (2023) conducted pioneering work analyzing cultural values in GPT models, demonstrating that these systems exhibit culture-specific moral and value orientations that reflect patterns from their training data. Building on this foundation, Tao et al. (2024) provided empirical replications showing consistent cultural alignment patterns across different LLM versions, confirming the persistence of these biases across model iterations. Additionally, Zewail et al. (2024) extended this line of inquiry by examining moral stereotyping behaviors across multiple LLM architectures, revealing systematic patterns of cultural stereotyping that vary across different AI systems.
While these studies have established the existence of cultural biases in LLMs, several important gaps remain in our understanding of these phenomena. First, most prior work has focused on identifying the presence of cultural biases without systematically examining the multidimensional nature of cultural values and their interrelationships within AI systems. Second, the relationship between an AI system's cultural value orientations and its behavioral manifestations in decision-making contexts remains underexplored. Our study aims to enrich current understanding by providing a comprehensive multidimensional assessment of ChatGPT's cultural values, examining how these values translate into biased behaviors in social decision-making tasks. Therefore, we employed classic behavioral decision-making paradigms to assess ChatGPT's potential cultural stereotypes and biases. Given that stereotypes can be reflected in predictions about the behaviors of specific cultural groups, we ask ChatGPT to simulate decision-making across various nationalities. This approach has been shown to effectively enhance the alignment between model outputs and the cultural contexts of various countries or regions (Tao et al., 2024). In contrast, cultural bias refers to attitudes toward particular cultural groups, so we ask ChatGPT to simulate interactions with individuals from different countries to observe its attitudes and behavioral preferences. Additionally, since stereotypes and biases may stem from the specific cultural values embedded in ChatGPT, this study also explores the relationship between ChatGPT's cultural stereotypes and biases and the cultural values of different nations.
Furthermore, in addition to looking at ChatGPT's cultural value, bias, and stereotypes in each dimension, understanding the high-dimensional representation patterns of ChatGPT in these features may be even more important. Representational similarity analysis (RSA) initially originated in systems neuroscience and is a specialized multivariate pattern analysis method that reflects the representation of concepts or stimulus units in a high-dimensional space through pairwise comparisons (Haxby et al., 2014; Huang et al., 2022; Luo et al., 2023; Peng & Luo, 2021, 2022). Previous studies have often explored the values of ChatGPT on a single dimension, such as the individualism–collectivism cultural value, neglecting the multidimensionality and complexity of culture. Studying cultural variables solely from a single dimension is insufficient to capture the full extent of culture values exhibited by ChatGPT, thus necessitating the integration of culture values across different dimensions (Luo et al., 2024). One advantage of RSA is its ability to extract the degree of association between different dimensions, thereby constructing global feature patterns of object variables (Yuan & Luo, 2024). Therefore, RSA analysis can help us integrate ChatGPT's overall culture profile. Additionally, RSA enables the quantification of global pattern information and objective comparisons of pattern features across different scales. Consequently, we can directly compare the differences between ChatGPT and data from different scales regarding overall patterns. In summary, employing pattern analysis may offer us novel and complementary perspectives.
Given the cultural bias developed by ChatGPT, an urgent question to be explored is how to effectively mitigate the prejudice and stereotypes that may arise during the interaction between individuals and large language models. Previous studies have shown that integrating manual input, feedback, or supervision in developing and deploying large language models can help developers mitigate the bias of generative large language models (Ferrara, 2023). However, the above process is mainly aimed at the model development process, and it is challenging to continue to play a robust role in various human–computer interaction processes. As a generative large language model, ChatGPT has strong anthropomorphism (Waldrop, 2024); that is, it can simulate its specific behavior based on the assumed identity (Motoki et al., 2023). This strong imitation ability reflects ChatGPT's understanding and inference of key information in the conversation process. Therefore, an important way to reduce the bias generated by ChatGPT is to apply bias-avoiding words and phrases in human–computer interaction. In this study, we proposed standardized questioning methods with different directions. We used them with the classic behavioral decision-making paradigm to explore effective models for reducing ChatGPT cultural bias during interpersonal interaction.
This study aims to explore the cultural values, biases, and stereotypes of ChatGPT and to find appropriate prompts when interacting with ChatGPT (Figure 1). First, Study 1 measures ChatGPT's value orientations across different cultural dimensions to investigate its value preferences. Subsequently, Study 2 employs complex decision-making experiments to examine ChatGPT's decision tendencies across various cultural contexts, analyzing its potential cultural biases and stereotypes, and their relationship with underlying cultural values. Furthermore, based on RSA analysis, the study explores ChatGPT's overall value patterns. Finally, Study 3 proposes and compares the effects of different questioning styles on the values and cultural biases of ChatGPT. In particular, due to the continuous iteration of ChatGPT versions, we measured the responses of ChatGPT 3.5 and ChatGPT 4o in the decision experiment to explore the potential changes caused by the release update.

The research framework of the current study.
Study 1
Culture is a complex research topic, representing the integrated manifestations of human societal customs, norms, and ways of life across various contexts. To explore ChatGPT's cultural values, we selected 13 common cultural values and measured ChatGPT based on scales. Additionally, we collected data on cultural values from multiple countries/regions to compare with ChatGPT scores.
Methods
Scales and data collection of multicultural values
The specific dimensions and measurement contents are as follows:
Individualism–collectivism: It is one of the cultural dimensions proposed by Hofstede (Hofstede, 2001), which refers to whether society as a whole emphasizes individual interests or collective interests. Our study used three items from the World Values Survey (WVS) to calculate the level of individualism–collectivism in 65 countries/regions (Hamamura, 2012; Inglehart & Oyserman, 2004). Independent–interdependent self-construal: Self-construal is the embodiment of cultural values at the individual level, indicating how individuals define and attribute meaning to themselves. Independent self-construal emphasizes individual autonomy and self-determination, focusing on individual rights and independent thinking. In contrast, interdependent self-construal highlights the interdependence between individuals and their social environment, emphasizing collective goals and social relationships (Markus & Kitayama, 2014). Our study utilized scores of independent and interdependent self-construal from 29 countries/regions measured by Fernández et al. (2005). Relationship mobility: Relationship mobility refers to the individual's perceived ease of establishing new interpersonal relationships and terminating old interpersonal relationships in their context (Oishi & Graham, 2010). Our study used relational mobility data for 39 countries/regions measured by Thomson et al. (2018).
In addition, we selected nine cultural dimensions from the Global Leadership and Organizational Behavior Effectiveness (GLOBE) project. These dimensions include uncertainty avoidance, power distance, institutional collectivism, in-group collectivism, gender differentiation/egalitarianism, assertiveness, performance orientation, future orientation, and humane orientation (House et al., 2004).
The first six dimensions are derived from the cultural dimension proposed by Hofstede (Hofstede, 2001). Uncertainty avoidance, power distance, and future orientation correspond to uncertainty avoidance, power distance, and long-term orientation in Hofstede's cultural dimension. Institutional collectivism and in-group collectivism measure the level of individualism–collectivism from the social and within-group (family or organization) levels, corresponding to individualism–collectivism in Hofstede's cultural dimension, which the higher the score, the more inclined to collectivism. Moreover, the GLOBE project divides the masculinity–femininity dimension of Hofstede's cultural dimension into gender differentiation/egalitarianism and assertiveness. In addition, the GLOBE project has added humane orientation and performance orientation dimensions (Venaik & Brewer, 2016).
Measurement method of ChatGPT's cultural values
In each cultural dimension, the measurement method for the cultural values of ChatGPT follows a standardized format when asking about ChatGPT. The format follows: “If you are an ordinary individual in society,” followed by the questionnaire question and concluding with “Please be sure to select one of the following options,” along with the questionnaire's scoring criteria.
As shown in Figure 2, to assess the relational mobility, the following questions were posed to ChatGPT 3.5.

Example of standardized input and typical ChatGPT 3.5 output when responding to a multi-item cultural values scale (relational mobility).
Utilizing the responses provided by ChatGPT, scores were derived for each scale. In cases where ChatGPT's response did not generate a score, we will modify the questioning until it yields a score.
Results
Figure 3 shows the scores of ChatGPT and different countries/regions in each dimension of cultural values. (See Table S1 for details) In terms of independent–interdependent self-construal, ChatGPT ranks at 100% for independence and 36.67% for interdependence, indicating a stronger tendency towards interdependent self-construal. This implies that ChatGPT tends to construct self-awareness from a social relational perspective. In the dimension of gender differentiation/egalitarianism, ChatGPT scores first among 60 regions/countries, suggesting a higher inclination towards gender equality than the overall level.

Furthermore, ChatGPT ranks 100% in assertiveness, emphasizing interpersonal relationships and cooperation. The score ranking of gender differentiation/egalitarianism and assertiveness of ChatGPT shows that ChatGPT is more feminine. Additionally, in the power distance value, ChatGPT's score ranks at 1.61%, and in relationship mobility, ChatGPT's score ranks at 10%, indicating that ChatGPT values equality and freedom in interpersonal communication.
In the dimension of individualism–collectivism cultural values, ChatGPT's collectivism level ranks at 16.92%, and in the GLOBE's institutional and in-group collectivism cultural values scores rank at 1.61% and 17.74%, respectively, which is high-level among all countries/regions. These results suggest that in Hofstede's and GLOBE's dimensions of individualism–collectivism national culture, ChatGPT exhibits a higher tendency towards collectivism.
ChatGPT ranked first in both humane orientation and performance orientation, indicating that it not only pays attention to treating others well but also encourages others to achieve achievements. In addition, ChatGPT ranks 96.24% out of 60 countries/regions in the uncertainty avoidance value, indicating a strong tendency towards risk avoidance. In the future orientation value, ChatGPT ranks 23.81%, suggesting a tendency towards rational future planning.
Overall, ChatGPT demonstrates a cultural value orientation towards collectivism, emphasizing interpersonal relationships and concern for others. Simultaneously, ChatGPT also prioritizes equality and freedom in interpersonal relationships.
Furthermore, to explore which countries’ cultural characteristic patterns are similar with ChatGPT's representation patterns in multiple cultural dimensions. We conducted RSA by constructing representation similarity matrices for ChatGPT and the 60 countries/regions on the nine dimensions of the GLOBE project, respectively (Table 1). The representation similarity was computed using both Euclidean and Canberra distances. Subsequently, Mantel tests were performed to explore the correlation between the representational similarity matrix of ChatGPT and each country/region. As shown in Table 1, there was no significant similarity between ChatGPT and 60 countries/regions in representation patterns on nine value dimensions (
The representational similarity between ChatGPT and 62 regions of 60 countries on GLOBE cultural values.
In addition, to explore whether the representation patterns of differences in cultural values between countries is associated with the representation patterns of ChatGPT in multiple cultural value dimensions, we construct a representation similarity matrix of the score differences among 60 countries/regions in the nine dimensions of GLOBE, which is a second-order level RSA matrix (Figure 3I). First, in the regional dimension, the cross-country cultural values pattern matrix under each cultural values dimension was constructed. Then, in the cultural values dimension, the cross-values second-order matrix was built, which reflects the similarity pattern of 60 countries/regions in the nine dimensions of the GLOBE project. Each cell in the matrix indicates the similarity of distribution patterns of cultural value scores between every pair of cultural value dimensions across countries/regions. The comparison of the similarity between ChatGPT and the above second-order representation similarity matrix revealed no significant similarity between the two matrices (
Study 2
To evaluate whether ChatGPT exhibits cultural stereotypes and biases, we operationally define cultural stereotype as systematic behavioral variations in decision-making when the model is prompted to simulate ordinary individuals from different national or cultural backgrounds. That is, if the model generates distinct responses under otherwise identical task conditions solely due to the national identity specified in the prompt, such consistent variation indicates the presence of internalized cultural stereotypes. Conversely, we define cultural bias as asymmetries in ChatGPT's behavior toward individuals from different cultural groups during social interaction tasks—for example, differential levels of trust, fairness, or punishment toward interactants from varying countries. Both constructs are quantitatively assessed across a standardized set of decision-making tasks.
Methods
Measurement method of ChatGPT's cultural stereotype and bias
To probe a broad spectrum of social and interpersonal motivations using clear, single-shot decision rules, we adapted ten well-validated social interaction tasks and three individual decision-making paradigms into one-round text prompts. These tasks have been widely used in cross-cultural research to reveal variations in reciprocity, norm enforcement, risk–reward learning, self-control, and moral reasoning (e.g., House et al., 2020; Ishii & Kurzban, 2008). Each can be succinctly presented with a defined payoff structure and a single decision point, making them suitable for standardized interaction with large language models.
The social interaction tasks based on two-person or multi-person interactive scenarios include: (1) Approach–Avoidance Conflict Task (Aupperle et al., 2011), (2) Third-Party Punishment Task (Fehr & Fischbacher, 2004), (3) Intergenerational Decision-Making Task (Langenbach et al., 2019), (4) Trust Game Task (Burks et al., 2003), and (5) Ultimatum Task (Nowak et al., 2000). Individual decision-making tasks focus on measuring social traits exhibited by individuals during the decision-making process, including (6) Iowa Gambling Task (Bechara et al., 1994), (7) Delayed Gratification Task (Johnson & Bickel, 2002), (8) Ethical Dilemmas Task (Greene et al., 2008), (9) Dictator Task (Forsythe et al., 1994), (10) Dishonesty Task (Fischbacher & Follmi-Heusi, 2013), (11) Public Good Game Task (Santos et al., 2008), (12) Prisoner's Dilemma Task (Shafir & Tversky, 1992), and (13) Bystander Task (Latané & Darley, 1968). After appropriate adaptation, these tasks were used to measure the performance of ChatGPT 3.5 and ChatGPT 4o. Five experimenters conducted measurements on the task.
In each trial, ChatGPT will first be presented with the task rules.(see Table S2 for details) Taking the Ultimatum Game as an example, ChatGPT saw: You and another participant will take on the roles of the proposer and the responder, respectively. The system provides you with an initial fund of 10 units of currency, and you must decide on the distribution of the funds alone. The system will then present your proposed distribution to the responder. If the responder agrees with your proposal, the funds will be distributed according to your proposal. If the responder disagrees, neither of you will receive anything.
Then, ChatGPT would do decision-making behavior. For example, in the Ultimatum Game, ChatGPT provides a specific number within 10 to represent the money it wants to allocate to the responder.
In the first part, we asked ChatGPT to simulate ordinary individuals from no cultural background and 20 different countries across five continents (China, Japan, India, Russia, Germany, France, the United Kingdom, Italy, the United States, Canada, Mexico, Brazil, Argentina, Colombia, South Africa, Nigeria, Egypt, Kenya, Australia, New Zealand), completing the 13 tasks mentioned above. This can help us understand ChatGPT's perception of individuals from different countries and assess whether they hold cultural stereotypes. At the beginning of each task, ChatGPT is instructed to assume the role of an ordinary person from a specific country. The name of a particular country will be replaced with the corresponding country's name in different rounds.
In the second part, we selected five social interaction tasks based on two-person or multi-person interactions to observe any unfair phenomenon in ChatGPT interactions with individuals from different countries and assess whether it has cultural biases. At this point, we inform ChatGPT that an ordinary person needs to interact with an individual from a specific country. The name of this country will be replaced with the corresponding country's name in different rounds, just like in the first part. A total of 20 countries were measured, and the order of questioning was randomized.
The questioning is uniformly conducted in English. Each task for each country condition was repeated ten times, and the data from these ten repetitions were used to conduct variance and correlation analyses. We conducted one-way ANOVA to test for differences between countries and corrected them with Bonferroni for pairwise comparisons, if the within-group standard deviation of the data was 0 (see Table S16 for details), the Kruskal–Wallis test was used for analysis.
Results
ChatGPT has cultural stereotypes and biases
Country-specific task performance indicates that both ChatGPT 3.5 and ChatGPT 4o exhibit cultural differences in their task responses, but ChatGPT 4o demonstrates cultural stereotypes across a broader range of concepts. ChatGPT 3.5 exhibited cultural stereotypes in eight tasks related to help others (Approach–avoidance conflict task:

Based on understanding ChatGPT's generalized beliefs of different cultures, we also observed whether there was any bias in ChatGPT's interactions with individuals from different cultures. When ChatGPT 3.5 interacting with individuals from different countries, there were differences in scores of four tasks (Figure 5 and Table S7). These tasks include ultimatum game task (

Building on the understanding of cultural stereotypes and cultural bias as distinct yet interconnected concepts, we examined their interplay within the same tasks performed by ChatGPT. Correlation analysis of the simulation results revealed that ChatGPT 3.5's cultural bias is independent of cultural stereotypes, while ChatGPT 4o's cultural bias aligns with cultural stereotypes (see Table S8 and Table S11). The task scores where ChatGPT 4o exhibited cultural bias were significantly positively correlated with the corresponding cultural stereotype scores (Dictator task:
In summary, both ChatGPT 3.5 and ChatGPT 4o exhibit cultural sensitivity, with ChatGPT 4o showing broader cultural stereotypes and a stronger alignment between its cultural biases and stereotypes across more tasks. While both models display cultural bias, ChatGPT 4o exhibits it in fewer tasks, suggesting improved fairness in interactions with diverse cultural groups. ChatGPT 3.5's cultural bias appears independent of its stereotypes, whereas ChatGPT 4o's biases are more closely tied to its generalized cultural beliefs. Despite improvements, further refinement is needed to reduce cultural bias, especially in tasks related to fairness and cooperation.
The potential sources of ChatGPT's stereotypes and bias
The above results found that ChatGPT showed stereotypes and biases in most decision-making tasks. The formation of these sociocultural biases may come from its cognition of different cultural groups, that is, cultural values. Therefore, to verify the above hypothesis, we further explored the correlation between the stereotypes and cultural biases of ChatGPT and the cultural values of the country itself.
First, we analyzed the relationship between the cultural stereotypes exhibited by ChatGPT in these tasks and the real-world cultural differences among these countries in the four basic cultural value dimensions (individualism–collectivism, relational mobility, interdependent–independent self-construal, and cultural tightness–looseness) to explore the sources of ChatGPT's cultural stereotypes (Figure 6A and Table S4). The results found that when a country is more inclined to a collectivistic culture, ChatGPT 3.5 believes that individuals in that country are more inclined to immediate gains (delayed gratification task,

Second, we also explored whether the cultural bias generated by ChatGPT in the task is also affected by the cultural value tendency of the country. The correlation analysis results (Table S9) show that ChatGPT 3.5 will tend to distribute unfairly in dictator tasks when interacting with individuals in high individualism cultures (
In general, the cultural values of countries can predict ChatGPT's stereotypes and biases against these countries, indicating that ChatGPT's stereotypes and biases towards different societies are generated based on the cultural value orientation of specific societies. In addition, we also analyzed the correlation between ChatGPT4o's decision-making tendency in the task and the cultural value orientation of the country, and the results also supported the above conclusions (see Text S1 for details).
Study 3
Based on the results above, we have found that ChatGPT's responses exhibit certain cultural biases, which can exacerbate cultural conflicts and misunderstandings between people and groups, potentially leading to greater social inequality. Consequently, we aim to explore methods to reduce the biases present in ChatGPT's responses. To this end, we have designed various prompt strategies to enable ChatGPT to overlook cultural differences between countries and provide unbiased answers. Our approaches include emphasizing personal traits, emphasizing fairness, creating future-oriented scenarios, and creating scenarios without national differences.
Methods
In previous experiments, tasks related to fair distribution included the Dictator Task, Ethical Dilemma, Third-party Punishment, Trust Game, Ultimatum Game, and Public Good Game. Among them, ChatGPT 3.5 and ChatGPT 4o both showed significant stereotypes in the Ultimatum Game and Third-party Punishment task. Therefore, we selected the above two tasks to test the effectiveness of prompting strategies in improving stereotypes. In addition, ChatGPT 3.5 also showed significant bias in the Ultimatum Game and Third-party Punishment task, while ChatGPT 4o only produced bias in the Public Good Game and Dictator Task, so we selected the above tasks to measure the effectiveness of the prompting strategy in eliminating bias.
Building upon our previous experiments that employed prompts specifying national or cultural contexts, we introduced four new prompting strategies to systematically examine their effectiveness in reducing cultural stereotypes and biases in ChatGPT (see Table S15). These interventions were embedded within the same national context frameworks as before, allowing us to test whether specific types of prompts could mitigate cross-country behavioral differences previously observed in model responses. The four strategies included: emphasizing personal preferences in the questioning style (e.g., “As an independent individual, please choose according to your own ideas”), emphasizing fairness (e.g., “From a fairness perspective”), creating future-oriented scenarios (e.g., “If you live in a future world”), and creating borderless world scenarios (e.g., “Imagine you live in a multicultural world”).
In our design, each intervention was implemented by modifying two parts of the prompt: Prompt 1, which described the identity or background of the decision-maker (e.g., nationality or imagined identity), and Prompt 2, which introduced a particular reasoning perspective or cognitive frame (e.g., fairness, independence). The placement of the intervention cue depended on its nature—for instance, identity-oriented interventions such as “future world” were placed in Prompt 1, while reasoning-based interventions such as “from a fairness perspective” were included in Prompt 2.
Using these four new prompting methods, we remeasured the potential stereotypes of ChatGPT 3.5 and 4o in the Ultimatum Game and Third-party Punishment task, the bias of ChatGPT 3.5 in the Ultimatum Game and Third-party Punishment task, and the bias of ChatGPT 4o in the Public Good Game and Dictator Task.
Taking the Third-party Punishment task (Cultural stereotype) as an example, the text generated under these four prompt strategies is as follows: [prompt 1] If you are an ordinary individual from [the country]. You complete the game with the distributor (A) and the receiver (B). A and B receive a total of 100 tokens, which are distributed by A and B can only accept and has no right to refuse. You will start with 50 tokens, which you can use to punish A or keep for yourself after watching A's distribution. You can choose the exact amount of punishment, and the relationship between the number of tokens you use to punish and the number of tokens reduced by A is 1:3; for example, if you take out 5 tokens to punish, the number of tokens of A will be reduced by 15. If A's allocation scheme is: A gets 90 tokens, and B gets 10 tokens. [prompt 2] please choose one of the following schemes to punish A:
Keep your tokens without penalizing A
Take out 5 tokens and make A lose 15 tokens
Take out 10 tokens and make A lose 30 tokens
Taking out 15 tokens causes A to lose 45 tokens
Take out 20 tokens and make A lose 60 tokens
Take 25 tokens and make A lose 75 tokens
Take out 30 tokens and make A lose 90 tokens
Results
The effect of prompt strategies on cultural stereotypes
Using four prompts that emphasizing personal preferences, fairness, creating future-oriented scenarios and creating borderless world scenario, We reassessed ChatGPT 3.5 and ChatGPT 4o's stereotypes in the third-party penalty and ultimatum paradigms. Results showed that all four prompt methods effectively reduced cultural stereotype scores in ChatGPT 3.5 to nonsignificant levels in both the third-party punishment and ultimatum game tasks across the 20 countries examined (

The effect of prompt strategies on cultural bias
In addition, we reevaluate the bias of ChatGPT 3.5 in the third-party punishment and ultimatum paradigms and the bias of ChatGPT 4o in the dictator game and the public goods game. The data and one-way ANOVA results showed significant effects in reducing the cultural biases of ChatGPT. Specifically, one-way ANOVA analyses indicated no significant differences across the 20 countries in the measures of third-party punishment and the ultimatum game with ChatGPT 3.5, regardless of the prompt conditions, which included those emphasizing personal traits, fairness, future-oriented scenarios, and borderless world scenarios (
Discussion
Our study found that ChatGPT exhibits unique cultural value orientations through questionnaire assessments, displaying both collectivist tendencies and value preferences that contradict collectivism, such as low power distance and high relational mobility. Additionally, ChatGPT demonstrated cultural biases and stereotypes in social interaction tasks. Fundamental cultural value dimensions, such as individualism–collectivism and cultural tightness–looseness, significantly influenced these biases and stereotypes. Moreover, four prompt strategies were developed, and the results indicated that these prompt strategies were efficient in reducing such cultural biases and stereotypes.
ChatGPT exhibits a distinct value orientation characterized by a unique blend of cultural values. Assessments have revealed that ChatGPT displays tendencies towards collectivism while simultaneously holding preferences that contradict traditional collectivist values, such as low power distance and high relational mobility. These findings align with previous research (Ghosh, 2011; Kito et al., 2017). However, ChatGPT's value orientation significantly diverges from that of human society, resulting in the formation of a unique cultural value system. This divergence highlights the complexities and nuances of ChatGPT's cultural values, which cannot be fully encapsulated within existing human cultural frameworks. Such differences could be attributed to the algorithms and diverse data sources used to train ChatGPT, which encompass a broad spectrum of human interactions and cultural contexts.
ChatGPT, like other algorithms, has demonstrated cultural biases in decision-making tasks. Prior research, such as Buolamwini (2017), has shown that algorithmic biases arise from dataset construction, goal setting, feature selection, and data labelling. Our research reveals that ChatGPT's biases may also stem from its understanding of cultural values across different regions. This indicates that biases are not solely due to dataset handling but also to the transfer of learned content. This underscores the need for further enhancement and regulation of AI algorithms. With advancements in AI, new models like ChatGPT 4.0 and 4o have been introduced. Our evaluation of ChatGPT 4o in single-round interactive social tasks revealed that while it still exhibits cultural stereotypes, it no longer displays cultural biases in the measured tasks. This highlights the effectiveness of continuous AI improvements and the importance of regular updates and thorough evaluations. These advances may be attributable to specific architectural and alignment updates, such as the integration of Rule-Based Reward Models (RBRMs) alongside Reinforcement Learning from Human Feedback (RLHF), which systematically penalize biased outputs and promote safer completions. Moreover, GPT 4o's improved multilingual capabilities suggest broader generalization across cultural contexts, potentially due to more inclusive evaluation and feedback processes. Our findings support the view that continual refinement of model training and alignment pipelines plays a key role in reducing algorithmic bias, underscoring the importance of ongoing evaluation and regulatory oversight.
Therefore, we developed four potential interventions to mitigate cultural stereotypes and biases in ChatGPT. The results demonstrated that prompts emphasizing personal preferences, fairness, future-oriented scenarios, or scenarios without national context could significantly reduce cultural stereotypes and biases in ChatGPT's responses regarding social distributive fairness. Previous studies have also found that responses from LLMs vary based on how they are prompted (Geng et al., 2024; Mei et al., 2024), highlighting the critical importance of refining prompt design to achieve more equitable and unbiased AI outputs. By systematically altering the framing of questions, we can influence the nature of AI responses and mitigate inherent biases, thereby enhancing the fairness and reliability of AI systems.
RSA analysis complements the main results from a modal perspective. By comparing the ChatGPT matrix of cultural values with the differential matrix of values scores across 60 countries/regions, we found that ChatGPT is also unique in the cultural value pattern. In addition, the cross-dimensional comparison also found that ChatGPT's inter-state decision-making model in moral dilemmas and ultimatums was significantly similar to the inter-state interdependent cultural values model, suggesting that the cultural bias displayed by ChatGPT may be based on its learning of the cultural values of the country. The above results support the current results from a model perspective.
The cultural stereotypes and biases of ChatGPT pose real-world challenges. In the context of globalization, cross-cultural interactions between individuals and organizations are increasingly important, and ChatGPT has gained attention for its role in cross-cultural communication (Baskara, 2023; Y. Chen et al., 2023). Studies have found that ChatGPT can predict users’ cultural values based on their language, aiding communication across different languages (Cao et al., 2023; Dong et al., 2024). Our findings support that ChatGPT is culturally sensitive and can distinguish between the cultural values of different regions. However, these perceptions are only sometimes accurate and may lead to misleading advice during communication. Therefore, users must be aware of potential stereotypes in ChatGPT's understanding of cultural values when using it for cross-cultural interactions. To reduce misunderstandings and communication barriers, users should implement measures to mitigate these biases, such as using revised prompts tested in this study. This approach would facilitate more effective and respectful cross-cultural interactions.
The cultural biases and stereotypes in ChatGPT pose a threat to social ethics. Our research found that ChatGPT exhibits these biases in tasks related to trust perception and fairness preferences. Previous studies indicate that cultural stereotypes and biases often result in social injustices (Agarwal et al., 2023; Howard & Borenstein, 2018; Lamont et al., 2014). For instance, Webster et al. (2022) found that biases influence interactions within medical environments, leading to medical inequity and health inequality. AI systems filtering information based on stereotypes can exacerbate information cocoon effects, reinforcing users’ existing biases (S. Chen et al., 2022). Given that ChatGPT 4o has shown reduced cultural bias with AI advancements, future research should involve stricter scrutiny and selection of algorithms and training datasets to mitigate socio-ethical issues. Emphasis should be placed on the interpretability and transparency of AI systems to understand their decision-making processes better. Methods to reduce biases in large language models include modifying word embeddings (Bolukbasi et al., 2016) and training on diverse databases (Zhao et al., 2018). Implementing these strategies can help minimize cultural biases in AI systems.
This study also has some limitations. First, our methodology relies on ChatGPT's self-reported responses, which may exhibit instability due to the inherent generative mechanisms of LLMs (Navigli et al., 2023). To mitigate this potential instability, we implemented repeated measurements where ChatGPT responded to each question ten times and we used the mean values as final responses. Future research could further enhance robustness by conducting systematic sensitivity analyses across different temperature settings to better capture the stability of AI-generated cultural representations. Second, our focus on ChatGPT limits generalizability to other large language models (LLMs). Different LLMs vary in training data, architecture, and fine-tuning processes, potentially producing distinct cultural bias patterns (Blodgett et al., 2020; Bender et al., 2021). Future research should employ comparative analyses across multiple LLMs (Bai et al., 2022) and develop AI-specific measurement approaches that better account for the unique nature of machine-generated responses (Perez et al., 2023). Moreover, using only English as the prompt language may diminish the cultural differences exhibited by LLMs when simulating different identities (Zhong et al., 2024). Future research should take into account the potential impact of prompt language and consider incorporating prompts in multiple languages.
In summary, our study reveals that ChatGPT exhibits both unique cultural values and biases, influencing its responses in decision-making tasks and interactions across different cultures. While improvements in AI models and revised prompts have reduced some cultural biases, ongoing refinements in prompt design and algorithm adjustments remain crucial for enhancing fairness and reducing stereotypes. Future research should continue to focus on optimizing AI systems to mitigate cultural biases and improve their cross-cultural communication effectiveness.
Transparency and openness
We report how we determined our sample size, all data exclusions (if any), all manipulations, and all measures in the study, and we follow JARS (Appelbaum et al., 2018). All responses were collected via the ChatGPT web interface (https://chat.openai.com) between July 19 and August 5, 2024, using both ChatGPT 3.5 and ChatGPT 4o. Each prompt was entered in a new, independent session to avoid carryover effects from prior interactions. All data, analysis code, and research materials are available at https://osf.io/mpveb/?view_only=46d1031260f9435080c9880ac67968d5. Data were analyzed using R, version 4.3.2 (R Core Team, 2013). This study's design and its analysis were not pre-registered.
Supplemental Material
sj-docx-1-pac-10.1177_18344909251355673 - Supplemental material for The cultural stereotype and cultural bias of ChatGPT
Supplemental material, sj-docx-1-pac-10.1177_18344909251355673 for The cultural stereotype and cultural bias of ChatGPT by Hang Yuan, Zhongyue Che, Yue Zhang, Shao Li, Xianger Yuan, Liqin Huang, Xiaomeng Hu, Kaiping Peng and Siyang Luo in Journal of Pacific Rim Psychology
Footnotes
Author Note
Siyang Luo is now at the Department of Psychology, Guangdong Provincial Key Laboratory of Social Cognitive Neuroscience and Mental Health, Guangdong Provincial Key Laboratory of Brain Function and Disease, Sun Yat-Sen University.
Author contributions
Siyang Luo designed the study; Hang Yuan, Zhongyue Che, Yue Zhang, Shao Li, Xianger Yuan, and Siyang Luo conducted the experiments; Hang Yuan, Zhongyue Che, Yue Zhang, Shao Li, Xianger Yuan, and Siyang Luo analyzed the data; Hang Yuan, Zhongyue Che, Yue Zhang, Shao Li, Xianger Yuan, Liqin Huang, Xiaomeng Hu, Kaiping Peng, and Siyang Luo wrote the manuscript. All authors commented on the manuscript.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of China (grant number 32371125, 32071081).
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
