Abstract
The purpose of this review is to unveil mechanisms underlying cross-cultural differences in facial emotion perception. We synthesized findings from 105 studies across five thematic areas in facial emotion perception across cultures: contextual influence, processing facial features, display rules and interpretation, affiliation with ethnic and social groups, and emotion conceptualization. Nine key mechanisms were identified to explain cross-cultural differences in facial emotion perception, categorized into attention allocation (attention between context vs. face, modalities, eye/mouth regions, left/right hemifield) and social-cognitive interpretation (cognitive representations, knowledge of cultural display rules, prejudice and stereotypes, motivation, and emotion conceptualization). These mechanisms were analyzed through a two-stage model of emotion perception adapted for cross-cultural contexts, offering a structured framework for understanding how cultural factors influence emotion recognition.
The concept that facial expressions are integral to understanding emotions, dating back to Aristotle and later expanded by Charles Darwin in “The Expression of the Emotions in Man and Animals” (1872/1956), suggests that emotional expressions are universal aspects of human nature. Empirical studies in the 1960s (Ekman et al., 1969; Ekman & Friesen, 1971; Tomkins & McCarter, 1964) supported this universality hypothesis, demonstrating that even individuals from preliterate tribes, with minimal exposure to Western culture, could recognize different categories of emotions in faces from other cultures. Despite this compelling research, there is consistent evidence for cultural variations in the perception of facial emotions. For instance, a review by Russell (1994) found that while people worldwide generally recognize emotions in American faces, the accuracy of this recognition declines with increasing cultural distance from Western societies. This pattern indicates that cultural factors play a substantial role in how facial emotions are interpreted, leading to a shift in research emphasis toward understanding the reasons behind these cultural disparities in facial emotion perception.
This new focus on cultural variations, when considered alongside the earlier evidence for universality, has given rise to what was termed “the great debate” concerning the nature of emotions (Barrett et al., 2007). This ongoing debate reflects the unresolved question of whether emotions and their expression and perception are primarily biological and universal, or significantly shaped by cultural factors that impact one’s conceptual processes. The importance of resolving this debate is underscored by the vital role of emotion communication in human social interaction, facilitating cooperation, conflict resolution, and social bonding (Keltner & Haidt, 1999). However, cultural differences can impede facial emotion perception (Elfenbein & Ambady, 2003), leading to misunderstandings and social friction, highlighting the practical implications of this theoretical dispute.
While emotion perception involves multiple sensory modalities, including vocal, bodily, and tactile cues, this review focuses specifically on facial emotion perception for several reasons. First, faces are often the most immediate and accessible source of emotional information in social interactions. Second, a wealth of research has been conducted on facial expressions, providing a rich foundation for cross-cultural comparisons (Elfenbein & Ambady, 2002b). Finally, the face has been central to debates about emotion universality and cultural specificity, making it a crucial area for understanding the interplay between biology and culture in emotion perception (Barrett et al., 2019; Russell, 1994).
Given this emphasis on facial display in emotion research, cross-cultural studies on this area typically adopt one of the two main approaches, focusing on either the cultural background of the facial emotion producer or the perceiver. Studies on producers examine how the ways to express emotion vary or remain consistent across cultures (e.g., Cowen et al., 2021), investigating the cultural influences on the generation and configuration of emotional signals. Conversely, research on perceivers investigates how cultural norms and experiences influence emotion recognition and interpretation. This approach encompasses a wide range of culturally specific interpretative processes, in addition to examining responses to signal differences.
Both approaches offer valuable insights into the complex interplay between culture and facial emotion perception. This review focuses on the latter, the perceivers’ side, examining how cultural factors influence the way individuals interpret and respond to emotional expressions across different cultural contexts. This focus is particularly pertinent in an era where global interconnectedness is the norm. Grasping the nuances of why different cultures perceive facial emotions differently is crucial for fostering effective cross-cultural interactions and for the development of culturally sensitive and responsive technologies and practices.
Research into cross-cultural differences in facial emotion perception is characterized by its complexity and diversity, encompassing a wide spectrum of research goals and methodologies. For example, some research has focused on the intersection of ethnicity and facial emotion perception in relation to prejudice (e.g., Hugenberg & Bodenhausen, 2003), while other research fuels the debate over whether emotions are innate or learned (e.g., Gendron et al., 2014), and yet other research examines how different cultures vary in their contextual awareness (e.g., Masuda et al., 2008). These examples represent only a fraction of the broader array of topics in this field, which also employ a diverse range of methodologies, from conventional emotion judgment tasks to advanced techniques like eye-tracking and brain imaging. Such diversity in both subject matter and research approaches highlights the multifaceted and expansive nature of studies into cross-cultural facial emotion perception.
The current challenge lies in integrating these diverse insights into a cohesive theoretical framework, a crucial step not only for deepening our understanding of cross-cultural facial emotion perception but also for laying a clear and intuitive groundwork that will inform future research endeavors in this vital area.
To date, there have been several significant systematic attempts to explore various aspects of cross-cultural emotion perception, including the universality hypothesis (Russell, 1994), the ingroup advantage (Elfenbein & Ambady, 2002b), methodology concerns and cultural dimensions (van Hemert et al., 2007), and cultural effects on emotion expressions (Wood et al., 2016). In general, each of these detailed reviews found evidence for cross-cultural differences in the manner in which emotions are perceived. However, while these reviews have contributed valuable insights, it remains unclear which specific mechanisms drive cross-cultural variations. In part, this issue arises from the very diversity of methodological approaches used to date. These make it difficult to see patterns that speak to systematic causes for cross-cultural differences. As such, there is a need for (a) the development of a unified research framework with which to classify the research, (b) a systematic consolidation of findings within this framework, and (c) use of this framework to attempt to identify potential mechanisms underlying cross-cultural differences and how these may be related.
Current Study
The current study aimed to review the relevant published literature to develop a unified research framework for understanding the mechanisms underpinning cross-cultural differences in facial emotion perception. Given the breadth and range of published studies in this field, this review was constrained to studies that explicitly investigated why these differences occur. Cross-cultural studies on facial emotion perception that did not explicitly report a study designed to address underlying mechanisms were excluded. This focus was necessary so as to align with the research question as to “why” cross-cultural differences occur and to enable a comprehensive synthesis of mechanism-oriented studies. In doing so, it is acknowledged that the review possibly overlooked published studies that, while not stating this explicitly, also provided insights into potential mechanisms. With this caveat in mind, a large number of diverse studies were accessed providing a broad coverage of many of the mechanisms researched to date.
The objective of this review was threefold. First, we aimed to categorize studies by thematic areas, providing a structured framework for understanding how cultural variations in facial emotion perception have been investigated. Second, we synthesized findings across individual studies, to uncover patterns and relationships that may not be apparent when examining studies in isolation. This synthesis sought to clarify what has been discovered to date. Third, and most importantly, we aimed to consolidate and summarize the proposed underlying mechanisms that account for cross-cultural differences in this field, addressing the critical question of why these variations exist.
Method
Considering the varied designs and measurement approaches employed in the identified studies, a narrative synthesis was considered the most suitable method for this systematic review. The review adhered to the reporting guidelines outlined in The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA; Page et al., 2021). The protocol for this review was registered on the Open Science Framework https://osf.io/dqh9v/files/osfstorage/61b1bbb00db6591042835aa1.
Identifying Studies
The strategy was designed to encompass peer-reviewed studies on cross-cultural comparisons of emotion perception from facial expressions with no restrictions on the publication date. Five databases were utilized: Scopus, Web of Science, PsycINFO, Embase, and PubMed. Keywords focusing on emotion, culture, and facial expressions were applied to the title, keywords, and abstract of each article. A subset of these keywords is presented in Table 1 (see Supplementary Table 1 for a complete list).
Sample Keywords to Search the Web of Science Database.
Selection of keywords for article searching reflects current conceptual framework for the three key concepts. “Culture” terms encompass diverse regions and ethnicities, and following (Elfenbein & Ambady, 2002b), also include in-group and out-group distinctions relevant to cross-cultural contexts. “Emotion” keywords cover a range of specific affective states, excluding a more general facial responses like pain or distress. “Face” terms relate to stimulus types that are likely to include facial expressions. This approach ensures an inclusive yet focused search strategy. Table 2 provides detailed definitions and exclusions for each concept.
Key Concepts.
Study Selection
Given the breadth of the field and the complexity of the methods and approaches, study selection used liberal inclusion criteria initially to identify all potentially suitable research, followed by screening using basic exclusion criteria (Eligibility round 1) and then strict exclusion criteria (Eligibility round 2) to enable the study to focus on research that had the potential to investigate cognitive mechanisms behind cross-cultural differences.
Inclusion Criteria
Articles were included if they contained at least one keyword from both the culture and face categories, with emotion-related keywords being confined to the title. This strategy’s effectiveness was confirmed by its alignment with a recent meta-analysis (Elfenbein & Ambady, 2002b), capturing 82 of the 87 articles included in that review. The initial search was conducted in February 2021, followed by updates in April 2022, and July 2023.
Exclusion Criteria Round 1
All references were screened to identify peer-reviewed articles focusing on cross-cultural comparisons of facial emotion perception among healthy adult populations. Specifically, articles were excluded if they focused on clinical conditions, focused on infant or children, did not examine facial expressions, did not use an emotion perception measure, did not make a cross-cultural comparison, did not contain empirical data, were not written in English, or were not published in peer-reviewed journals.
Exclusion Criteria Round 2
In adherence to the defined scope of this review, in the second stage, a more detailed screening was performed on the articles shortlisted from the first round to exclude studies that could not provide evidence as to cognitive functions underpinning cross-cultural differences. This applied to studies on universality and studies where cognitive mechanisms may have been implied but were not examined empirically, e.g., studies that proposed cultural differences (e.g., display rules) which implied a hypothetical cognitive mechanism which was not, in itself, directly examined. Also excluded were studies which confounded multiple sources of cross-cultural variation, investigated the recognition of the overall emotional tendency of a crowd, or examined the interaction between bilingual language organization and cognition mechanisms.
The exclusion criteria for both screening stages, along with the number of articles meeting each criterion, are detailed in the eligibility boxes of Figure 1. For an in-depth understanding of the exclusion criteria applied in the second round of screening, Table 3 provides a detailed explanation of these criteria.

PRISMA Flowchart.
Explanation of Exclusion Criteria for Round 2 Article Screen.
An exception to this exclusion criterion was made for studies that only compared emotions of different valence (e.g., happiness vs. anger), or smiling faces only. In such cases, the potential influence of cultural dialects on emotion expression configuration was considered less critical, as the primary focus was on the distinction between positive and negative emotions rather than on subtle variations within each valence category.
Figure 1 depicts the flowchart for the study selection process. The initial search yielded a total of 13,032 articles. Upon the removal of duplicates, 7,884 articles remained. Screening of titles and abstracts reduced this number to 620 articles, with an additional 15 articles identified from reference lists, making them eligible for full-text examination. After the first round of screening, 402 articles were excluded, and a further 113 articles were removed in the second round for not meeting the inclusion criteria. Consequently, the final count of articles included in this review amounted to 105.
The screening process was primarily undertaken by the first author. To ensure objectivity, a subset comprising 10% of the articles was proof-screened by co-author M.F. up to the eligibility screening of the first round, achieving a concordance rate of 93.75%. Any discrepancies were reconciled through mutual discussion between these two authors. In the second round of screening, a selection of articles was subjected to joint deliberation with co-author Y.X. to determine their inclusion. This collaborative approach was applied to articles where the decision on inclusion was not immediately clear or where the insights from both authors were deemed necessary for a consensus.
Data Charting
The data charting process was spearheaded by the first author, encompassing a comprehensive extraction of information across various dimensions. This included research topic, sample size, cultural backgrounds of participants, emotions depicted in stimuli, facial stimulus sets used, cultural background of posers, emotion perception measures, additional measures, paradigms employed, statistical methods, and key findings.
Summarizing
Initially, the eligible articles were categorized based on their specific research topics. These topics were then consolidated into broader themes to establish a unified analytical framework while preserving the original topics as sub-themes. This thematic integration was facilitated through collaborative discussions between the first author and co-author Y.X. A comparative analysis within each thematic category was then performed, systematically assembling the findings from the studies. This comparative process aimed to identify significant patterns or characteristics, providing an overview of the key findings within each thematic area. Finally, potential mechanisms explaining these findings were summarized. This was accomplished by examining additional measures analyzed in individual studies or by interpreting methodological patterns revealed from the aggregated data within each thematic group.
Risk of Bias
To fit this specific cohort of research, we integrated criteria from several established checklists, including the NIH Study Quality Assessment Tool for Observational Cohort and Cross-Sectional Studies (National Institutes of Health, 2014), the Joanna Briggs Institute (JBI) Critical Appraisal Tools for Cohort, Prevalence, and Randomized Controlled Studies (Porritt et al., 2014), and the Mixed Methods Appraisal Tool (MMAT; Pace et al., 2012). Furthermore, we added two additional criteria specific to cross-cultural research using facial expressions: language validity and stimuli validity.
The final risk of bias assessment checklist comprised 14 questions, as detailed in Appendix. Each question was assigned one point, except for questions 6a to 6d, which were assigned half a point each. The differentiated weighting of questions in this assessment framework reflects the need to balance the relative importance of various methodological aspects across different study designs, especially considering that these questions were selected from multiple assessment tools. Risk of bias scores were calculated as the proportion of total negative answers (risk scores/total scores, ranging from 0 to 1), with lower scores indicating a lower risk of bias. It is important to note that while we use the term “risk of bias,” lower risk also can be seen as a proxy of better overall methodological quality.
All studies were evaluated by the first author. To improve impartiality, following the method reported in Aival-Naveh et al. (2019), co-author H.Q. independently assessed a representative sample of one-third of the eligible studies. This method balanced thoroughness with resource efficiency. The process yielded a good inter-rater agreement of 95.31%, with any disagreements resolved through discussion between the raters. The complete risk of bias assessment for each article is available in Supplementary Table 2.
The analysis yielded an average risk of bias score of 0.31 (SD = 0.12, Median = 0.30), with individual scores ranging from 0 to 0.7. While there is no universally accepted threshold for “satisfactory” quality, we interpreted these scores as indicating generally acceptable quality with room for improvement.
A noteworthy finding was that over half of the eligible studies did not meet the criteria set out in questions 4, 5, 6a, 6b, and 7 of the checklist. These questions address crucial aspects of study design: sample representativeness (Q4), baseline comparisons between groups (Q5, 6a, 6b), and sample size determination (Q7). This pattern highlights areas requiring attention in future research, particularly regarding generalizability, group comparability, and statistical power. A visual representation of this analysis is provided in Supplementary Figure 1.
Results
This review encompassed 105 articles, involving 20,238 participants from 228 different samples across 27 countries and regions. Detailed composition about the participants is available in Supplementary Figure 2. The articles were thematically organized into five distinct categories, each reflecting common research topics. These themes and their sub-themes are visually represented in Figure 2.

Five Themes Were Identified From the Studies, With the Outer Ring Naming Each Theme and Indicating the Number of Included Studies.
The five themes reflect a range of research foci spanning interest in sensory/perceptual processing through to increasing more conceptual aspects of emotion understanding. The “Contextual Influence” theme comprised studies examining cross-cultural responses to various contextual factors like the audio or visual background, temporal factors, and provision of vignettes to provide a semantic context. It sheds light on how cultural factors influence reliance on different characteristics of the context when interpreting emotional faces. “Processing Facial Features” explored cross-cultural differences in the use of both whole-face expressions and specific facial regions, when making emotion perception judgments.
The remaining themes addressed increasingly abstract aspects of the interpretive process. The “Display rules and interpretation” theme included studies that examined the varying assumptions across cultures as to how an internal experience is reflected in the external expression. In the “Affiliation with Ethnic and Social Groups” theme, research investigated how the role of group affiliations and motivations to affiliate, affect the interpretation of emotional expressions. Finally, “Emotion Conceptualization” centered on understanding how cultures differ in their comprehension of emotion concepts, conceptual knowledge structures, and mentalization levels and how these influence facial emotion perception.
Aligned with the objectives of this review, findings were primarily synthesized around mechanisms identified in each theme. Sub-themes were introduced as subheadings where they contributed to the organizational clarity of the findings. A methodology summary of each article can be found in Supplementary Table 3.
Contextual Influences
This theme encompassed 19 studies exploring cross-cultural variations in responses to contextual cues. The studies were divided into four sub-themes based on context type: three on audio context, nine on visual background context, four on temporal context, and four on vignette context, with one falling into two sub-themes. This section synthesizes results according to context type, enabling the identification of coherent patterns that emerged along the East-West cultural axis and potential mechanisms for cross-cultural differences.
Audio Context
One mechanism that emerged from research into contextual factors was that of modality reliance. The impact of audio contexts on facial emotion perception was explored through the Audio-Visual Stroop Task, requiring participants to judge facial expressions while listening to prosody, or to assess the prosody accompanying facial displays. Three studies employing this method revealed notable cross-cultural variations, indicating a preference for specific sensory channels rather than a general sensitivity to context. Eastern participants showed reduced interference from irrelevant facial cues and heightened interference from audio information, in contrast to their Western counterparts (Chen et al., 2023; Liu et al., 2015; Tanaka et al., 2010). This distinct modality reliance appeared to be influenced by the ethnic match between the poser and the perceiver (Chen et al., 2023), suggesting that contextual familiarity significantly contributes to how different cultures process audio-visual emotional cues.
Visual Context
Seven of the nine studies using a visual background as context showed that Eastern participants were similarly (Ito et al., 2012; Ito et al., 2013) or more (Goto et al., 2013; Ito et al., 2013; Ko et al., 2011; Masuda et al., 2008, 2012, 2022 1 ) influenced by the context than their Western counterparts. Conversely, one study suggested that visual context aids emotion detection more for Western than Eastern participants (Stanley et al., 2013), and another indicated that the effect of contextual influence on facial emotion perception varied by emotion type across cultures (Hess et al., 2016), interpreted through display rules. Two potential mechanisms were explored to account for these observed trends.
One proposed mechanism was the extent to which viewers attend to the visual context. Two eye-tracking studies found that compared to Western participants, Eastern participants showed increased frequency and duration of fixations on the background. This pattern was associated with greater contextual influence on facial emotion perception, as demonstrated by larger differences in intensity judgments when backgrounds were either congruent or incongruent with the central expression (Masuda et al., 2008, 2012). Another eye-tracking study revealed that Western participants exhibited a greater distinction in attention between the target face and surrounding contextual faces compared to Eastern participants, a focus shift linked to better emotion recognition accuracy (Stanley et al., 2013).
A second potential mechanism, referred to as the “autonomy belief” effect posits that while both Westerners and Easterners are influenced by emotions in visual contexts involving others, Westerners tend to minimize this influence, viewing facial emotions more as indicators of individual internal states than as reflections of group dynamics. Ito et al. (2013) discovered differences between Caucasian Canadians and Japanese in their sensitivity to contextual influences involving people, but not with inanimate scenes. Further investigation into autonomy belief, defined by the intensity of negative emotions felt by observers when assessing emotions against backgrounds of congruent landscapes versus people, suggested that cognitive efforts to counteract contextual influences might elicit negative feelings in observers. Among Caucasian participants, heightened negative feelings correlated with smaller differences in intensity ratings between congruent and incongruent contexts, implying cognitive efforts to reduce contextual impacts. Nonetheless, there is a need for replication of this effect across cultural groups. Furthermore, caution is advised in interpreting the link between negative feelings and autonomy belief, given there may be potential indirect factors.
Temporal Context
The four studies on temporal context revealed mixed findings. Two studies using morphing movies found that Chinese participants were more attuned to facial expressions preceding (Fang et al., 2021) or following (Fang et al., 2023) the target face compared to Caucasians or Dutch. Conversely, another study indicated that Caucasians or Dutch were more influenced by initial emotions than Chinese when the beginning and ending emotional expressions were visually similar (Fang et al., 2018). In addition, one study reported no notable cross-cultural differences between Japanese and Caucasians in reactions to static landscapes vs. emotional faces presented before the target face (Ito et al., 2012). Given this complex and mixed picture, we can only conclude that current research lacks robust evidence to conclusively determine the impact of temporal context on facial emotion perception differences between Eastern and Western cultures. This complex finding highlights the role of potential moderators, such as emotion type, in influencing the extent of cross-cultural differences in sensitivity to sequential context, should such differences exist.
Vignette Context
Studies using text vignettes to provide context yielded varied results on how background stories influence facial emotion perception. In scenarios with matching vignettes and facial expressions, Americans perceived emotions as more intense than Koreans (Song et al., 2020). Similarly, in assessing smile genuineness, a separate study found Americans perceived smiles as less genuine when paired with negative vignettes compared to Chinese (Mui et al., 2020). Both studies, thus, demonstrated a larger contextual influence among Westerners when the vignette context did not apparently contradict with the facial expression. In contrast, it was reported that Japanese and Koreans agreed more with vignette-expressed emotions that conflicted with facial expressions, indicating a greater contextual influence in Easterners (Matsumoto et al., 2012). Similarly, another study noted that Japanese participants had a larger discrepancy in ratings between face-alone and polite situations (e.g., display smiles in roles where politeness is anticipated, such as in a retail assistant) than UK Caucasians, suggesting a stronger contextual influence in Easterners (Namba et al., 2020). These findings point to no clear-cut pattern across Eastern and Western cultures but rather to a variety of moderating factors, such as specific emotions and cultural nuances.
Summary
Research in audio and visual contexts has identified three potential mechanisms: modality reliance, attention to visual context, and autonomy belief, the latter being tenuous and reported in one study. These mechanisms may help explain the observed differences between Eastern and Western populations in processing audio-visual emotional cues, with the majority of studies suggesting Easterners show heightened sensitivity to visual backgrounds. In contrast, the results from studies on temporal and vignette contexts were less consistent, highlighting the importance of various moderating variables in understanding the differences in how Eastern and Western cultures integrate contextual information into facial emotion perception.
Processing Facial Features
Within this theme there were 27 articles, 10 centered on whole-face stimuli, and the other 17 studies concentrated on interpreting specific facial areas. Of these 17, 14 focused on the upper and lower parts of the face, and three delved into the left and right sides of the face.
Differences in Understanding the Whole Face
Culturally specific ways to understand facial configurations were initially studied by comparing facial emotions generated by posers instructed to activate specific facial muscles according to the Pictures of Facial Affect (PoFA; Ekman & Friesen, 1976), which controls for facial display, with culturally specific self-posted or spontaneous expressions. These studies revealed that while there was no same-ethnic advantage for emotions expressed using the PoFA system, culturally stylish facial displays consistently created an ingroup advantage (Dailey et al., 2010; Elfenbein & Ambady, 2003; Elfenbein et al., 2007; S. M. Kang & Lau, 2013; Tsikandilakis et al., 2019; Tsikandilakis et al., 2021). This finding suggested that culturally specific ways of expressing emotions, rather than ethnic appearance, drive the ingroup advantage in emotion recognition.
This idea was further supported by a study noting a significant disparity in the recognition of imitated fearful faces between Caucasian and Japanese participants, regardless of the posers’ ethnic background. While 14 out of 16 Caucasian observers correctly identified imitated fear expressions, all 16 Japanese participants perceived them as surprise, a discrepancy that was also reflected in their distinct brain activation (Moriguchi et al., 2005). This finding demonstrates that the PoFA system, which was developed based on Western norms, may not adequately capture the way emotions are expressed and perceived in non-Western cultures. Instead, facial expressions are refined by cultural influences, with increased familiarity with one’s cultural facial cues leading to improved recognition accuracy.
One potential mechanism that may account for this, is how facial expressions are cognitively represented. Cognitive representations can be operationalized as the most prototypical facial configuration for each emotion and have been directly revealed by reverse correlation. Specifically, European Caucasians and East Asians were asked to choose meaningful facial displays for each emotion from semi-randomly generated facial configurations. By doing this, facial representations of Westerners and Easterners were found to have nuanced differences within each emotion (Jack et al., 2016). Western Caucasian internal representations predominantly highlighted the eyebrows and mouth (Jack, Caldara, & Schyns, 2012) and East Asian internal representations exhibited a preference for expressive information in the eye region (Jack, Caldara, & Schyns, 2012; Jack, Garrod, et al., 2012).
Differences in Understanding Facial Regions
As with the prior work examining whole face perception, studies focusing on the eye and mouth regions have revealed East-West disparities in sensitivity to these areas. When contrasting emotion judgment accuracy between the entire face and isolated regions (eyes or mouth), Western participants display a greater accuracy drop with eye-only stimuli (W. Kang et al., 2019; Saito et al., 2022), while Eastern participants showed a larger decline with mouth-only images (Low et al., 2022). Similarly, in studies altering the shape of facial features, Westerners generally were more accurate (Bennett & Sabanovic, 2015) or demonstrated greater changes in valence ratings (Gao & VanderLaan, 2020; Koda & Ruttkay, 2017; Yuki et al., 2007) when the mouth was altered. This contrasted with Easterners, who were more sensitive to changes in the eye region, when using abstract faces like emoji (Gao & VanderLaan, 2020; Koda & Ruttkay, 2017; Yuki et al., 2007) although not realistic faces (Massaro & Ellison, 1996; Matsumoto, 1989). In addition, a study involving Turkish participants provided a unique perspective, indicating that Turkish individuals, compared to Americans and Japanese, had a markedly different approach to interpreting emotions from altered eyes and mouth in sketch faces (Cuceloglu, 1970), indicating that the typical East-West dichotomy may be too simplistic to encompass the diversity in cognitive representations of emotions across cultures and underscore the need for more culturally diverse and inclusive research to capture the full spectrum of facial emotion perception across various cultures.
Expanding upon these insights, further research has explored the nuanced functioning of these mechanisms. For instance, one study suggested that individuals may hold different cognitive representations of people from various cultural backgrounds. This was evidenced by findings that Chinese participants relied on Duchenne signs 2 to judge the genuineness of smiles expressed by Caucasian Canadians but not Chinese or Gabonese individuals (Thibault et al., 2012), aligning with cultural variations in the authenticity of smiles (Elfenbein et al., 2007). Moreover, culturally specific expressions are often more distinctly observed in the left (Elfenbein et al., 2004) and lower regions of the face (Yan et al., 2016), as indicated by increased accuracy in categorizing ingroup expressions presented in these areas.
One mechanism that has been suggested to account for differential face processing across cultures is attention allocation across facial features. Eye-tracking data revealed that, consistent with other research, East Asian observers predominantly fixated on the eye regions, while Westerners distributed their attention more evenly across facial features. This pattern, also validated by a computer model, suggested that the Eastern approach to processing facial information can lead to inaccuracies in perceiving emotions with similar eye expressions (Jack et al., 2009). However, this focus on the eyes by Chinese observers did not prevent them from processing the whole face holistically. Research demonstrated that both Chinese and Caucasian participants were more adept at rapidly identifying expressions from either the upper or lower half of faces with misaligned halves, each conveying different emotions, in contrast to faces with aligned, but mismatched halves. This observation underscored a consistent inclination toward holistic processing in recognizing facial emotions across these cultural contexts. (Yan et al., 2017).
A related attentional mechanism may explain a relationship between script reading direction and the tendency to focus on the left or right side of a face. This mechanism was explored in studies with participants who typically read left-to-right (as in English) or right-to-left (like Arabic), where they were asked to choose the “happier face” from asymmetric chimeric pairs, each displaying a smile on one side and a neutral expression on the other. Observers accustomed to reading from left to right tended to select faces with a smile on their left side more often (Heath et al., 2005; Vaid & Singh, 1989). Given that facial expressions of emotions are frequently asymmetrical, this uneven distribution of attention between the left and right side of a face can influence their overall perception and interpretation of the expression.
Summary
The research within this theme collectively highlighted cross-cultural differences, particularly between Eastern and Western cultures, in interpreting facial configurations of emotion per se. To account for these disparities, three primary mechanisms were identified. First, culturally distinct cognitive representations of the visual configurations of emotions suggest fine-tuned, culturally specific ways in the mental representation of facial emotions. Second, varying attention between the eyes and mouth was observed, with East Asian observers predominantly fixating on the eye regions, while Western observers distributed their attention more evenly across facial features. Third, differential attention to the left and right sides of the face was noted, primarily affected by script reading direction that differs across cultures due to their dominant languages.
Display Rules and Interpretation
In evaluating facial emotions, individuals not only decode the visible expression but also deduce the underlying emotional state. The kinds of inferences made are, arguably, influenced by knowledge of cultural display rules and represent another potential mechanism. Display rules are norms that dictate the appropriateness of expressing certain emotions within a cultural framework (Ekman & Friesen, 1969). Previous research has indicated that East Asian cultures, which emphasize interdependence, tend to suppress the expression of negative emotions, in contrast to Western cultures, which value direct and unambiguous emotional communication (Giri, 2006; Matsumoto et al., 2008). Five studies explored the variance between observed facial emotions and inferred internal states across cultures, utilizing a range of methodologies.
In four studies utilizing emotion judgment tasks based on the standard or intensified (125%) JACFEE stimuli, it was consistently observed that American participants perceived the external emotional intensity to be greater than what they inferred the internal emotion to be (Matsumoto, 2007; Matsumoto et al., 1999, 2002, 2018). On the other hand, when a lower-intensity version of JACFEE (50%) was used, two studies found that Japanese participants inferred a more intense subjective experience from the external display compared to Americans (Matsumoto et al., 2002, 2018). This pattern suggests a cultural contrast in the interpretation of emotional intensity, suggesting a strategic adaptation to mitigate potential misinterpretations of emotional signals.
The fifth study utilized the Prisoner’s Dilemma game 3 which requires players to balance self-interest (defection) with cooperation, to examine the impact of external facial displays, conveyed through emojis, on decision-making. American participants showed a higher tendency to choose to defect as punishment after a round in which their counterparts expressed joy (via an emoji) at winning points by their own defection, highlighting the influence of a counterpart’s positive expression on their decisions. Conversely, this pattern was less pronounced among Korean participants, who showed a reduced inclination to factor external expressions into their decision-making process (Ji et al., 2022). This observation suggested that individuals from Eastern cultures may prioritize actual behavior over external displays to avoid misinterpreting emotional expressions.
The differential reliance on facial expressions across cultures suggested by the study above, is consistent with the notion that different cultures make inferences regarding an emotional expression that is guided by their cultural knowledge of display rules. Although knowledge of display rules is widely cited to explain cross-cultural variations in facial emotion perception, only two empirical studies have directly examined their impact on facial emotion perception across cultures, yielding inconclusive results. Matsumoto et al. (2018) utilized the Display Rule Assessment Inventory (DRAI) to explore display rules’ mediating role in cross-cultural differences in internal and external intensity ratings of emotions but no overall mediation effect was reported. In the prior study using the Prisoner’s Dilemma game, display rules were assessed by comparing participants’ chosen emoji faces with their actions (cooperation or defection). The hypothesis, based on cultural differences in emotional suppression, predicted that Koreans would select incongruent emoji faces more frequently than Americans, especially after defection. However, the results showed no significant differences between the two groups, challenging the expected cultural distinctions in display rules (Ji et al., 2022).
Summary
Studies comparing the interpretation of external emotional displays and internal experiences have revealed a distinct cognitive process: knowledge of cultural display rules. Preliminary findings suggest that Eastern cultures amplify weak expressions or sometimes disregard external cues, while Western cultures minimize emphasis on strong intensity expressions. However, despite their importance, culturally specific display rules are rarely measured directly in research, underscoring the need for more comprehensive studies that assess participants’ understanding of these rules.
Affiliation With Ethnic and Social Group
Within this theme, 49 articles were analyzed, comprising 39 focusing on ethnic groups and 10 on social groups. Despite their distinct areas of focus, these studies were combined for analysis due to consistent patterns in their findings, which were then categorized as examining two distinct mechanisms. The first mechanism, examined in 22 articles, investigated the impact of an observer’s group membership and consequent prejudice on their propensity to ascribe varying emotions to members of their own group (ingroup) versus those outside it (outgroup). The second mechanism, examined in 27 articles, reflected social motivation, i.e., how distinctions between ingroup and outgroup memberships affect the observer’s sensitivity when perceiving emotions in general.
Prejudice and Stereotypes
A general preference for social groups, commonly known as prejudice and often manifesting as an ingroup preference, seemed to affect the propensity to see positive and negative emotions. This preference was evidenced by quicker categorization speeds for ingroup happy faces and outgroup negative faces (Bijlstra et al., 2010; Craig et al., 2012; Hugenberg, 2005; Kozlik & Fischer, 2020), a more frequent endorsement of happiness to ingroup expressions (Kret & Fischer, 2018; Tae et al., 2020) and negative emotions to outgroup expressions (Kret & de Gelder, 2012; Kret & Fischer, 2018), higher positive rating of ingroup than outgroup expressions (Iidaka et al., 2008; Yang & Yeh, 2018), extended duration of perceived happiness in ingroups than outgroups (Hugenberg & Bodenhausen, 2003), and sometimes a tendency to perceive negative expressions from outgroup members as more intense than those from ingroup members (Hutchings & Haddock, 2008). One exception was observed in which the ethnicity of the poser, whether Caucasian or African, did not affect the speed at which Caucasian judges detected anger in a visual search task (Lipp et al., 2014).
The ingroup bias in facial emotion perception, both in its direction and magnitude, was subject to change and could even be reversed in different contexts. Studies have shown that this bias is influenced by elements such as the ethnicity (Damjanovic et al., 2010; Hu et al., 2017; Li & Tse, 2016; Marinetti et al., 2012), religious background (Korb et al., 2021), and gender (Craig et al., 2017; Li & Tse, 2016; Lipp et al., 2015; Marinetti et al., 2012) of the poser, as well as the ethnicity and gender of other figures presented alongside them (Craig et al., 2017). These findings suggest that preference in facial emotion perception, although sometimes tied to ingroups, is a more fluid evaluative attitude, heavily contingent upon the perceiver’s knowledge and experience.
In addition to this general evaluative account, studies showed that emotions congruent with typical stereotype-ethnicity association are more likely to be endorsed (Kommattam et al., 2017), and recognized faster (Bijlstra et al., 2010) with greater accuracy (Primbs et al., 2022) compared to emotions that do not align with typical stereotypes. This trend, particularly notable in the African-anger association, is ascribed to socially constructed perceptions rather than to inherent facial features characteristic of each ethnicity (Adams et al., 2022).
Prejudice, when measured independently, had been shown to influence facial emotion perception. This bias has been quantified using explicit methods, like questionnaires, or implicitly through tools such as the Implicit Association Test (IAT). In two studies employing IAT, a clear association was found between implicit attitudes and ethnic biases in facial emotion perception, demonstrated by longer lingering over anger in African faces than in Caucasian faces (Hugenberg & Bodenhausen, 2003), along with a tendency to categorize racially ambiguous angry faces as African, and to attribute higher intensity to anger in African faces when perceived as African rather than Caucasian (Hutchings & Haddock, 2008). On the other hand, only one (Korb et al., 2021) out of three studies (Hugenberg & Bodenhausen, 2003; Kret & de Gelder, 2012) utilizing explicit measures of prejudice reported an association between high prejudice and a more negative perception of emotions, suggesting that explicit measures of prejudice do not consistently correlate with facial emotion perception performance.
Motivation
Motivation, manifesting in various forms such as heightened sensitivity, attention, cognitive vigilance, concern, or engagement in identifying others’ emotions, seems to be a key predictor in determining whether an ingroup or outgroup advantage emerges in recognition accuracy. Motivation is often intensified by emphasizing group distinctions through methods like priming with experiences of ethnic discrimination (Chang & Kang, 2022), accentuating ethnic groups in the nation (Kommattam et al., 2019), employing stimuli with conspicuous group information (Stevenson et al., 2012; Thibault et al., 2006; Young, 2017; Young & Hugenberg, 2010; Young & Wilson, 2018), or using more engaging materials such as images with direct eye gaze (Kramer et al., 2013).
Studies with enhanced motivation consistently showed some form of ingroup or outgroup advantage. Two studies revealed ingroup advantage in accurately judging the basic six emotions in both participating groups (Young & Hugenberg, 2010; Young & Wilson, 2018). Likewise, another study identified an outgroup advantage in distinguishing genuine from fake smiles across all participant groups (Young, 2017). The remaining studies uncovered a partial ingroup advantage, observed solely in a single participating group (Kommattam et al., 2019; Thibault et al., 2006), or exclusively under specific conditions such as when posers and perceivers shared the same ethnicity in addition to social group membership (Stevenson et al., 2012), or specifically when assessing negative emotions (Chang & Kang, 2022).
On the other hand, when motivation manipulation is absent, 11 out of 19 studies reported no ingroup or outgroup advantage. This was evident by a lack of interaction between the ethnic backgrounds of the poser and the perceiver (Beaupre & Hess, 2005; Cooper et al., 2022; Elfenbein et al., 2007; Fang et al., 2020; S. M. Kang & Lau, 2013; Kret et al., 2021; Matsumoto, 1992; Shioiri et al., 1999), and a lack of an effect of posers’ ethnicity within each participant group (Hutchison et al., 2018; MacPherson et al., 2006; Matsumoto & Ekman, 1989), in terms of accuracy or intensity rating of emotions.
Four studies revealed a partial ingroup advantage, where a higher accuracy (Friesen et al., 2019; Gulbetekin et al., 2023; Jiang et al., 2022; Ma-Kellams & Blascovich, 2012) was noted for expressions of ingroup members, but only within one participant group. Conversely, three studies reported a partial outgroup advantage, with increased accuracy for outgroups observed in just one of the participant groups (Brooks et al., 2019; Lee et al., 2005), or faster recognition of one specific emotion in outgroup members (Gul & Humphreys, 2014). Only one study reported an ingroup advantage in both participant groups across emotions. This study assessed how Caucasian and Asian Chinese individuals accurately judge self-posted Caucasian faces and Asian faces that posted an emotion modeled on prototypical American expressions of the six basic emotions and found both groups recognized their ingroup expressions more accurately 4 (Low et al., 2022).
The above studies suggest an emerging trend showing that motivation is crucial in affecting ingroup advantage. In addition, two studies have independently measured motivation and correlated it with the facial emotion perception performance of ingroups and outgroups. Stevenson et al. (2012) used self-reported questions to assess participants’ ingroup affiliation and found that this affiliation predicted the accuracy in judging complex emotions displayed by stimuli from the same group. However, Young (2017), focusing on differentiating genuine and fake smiles, did not observe such a correlation. One potential explanation for this incongruent finding was that ingroup preference, which could lead to a bias where all smiles from ingroup members are perceived as genuine, outweighed the motivation to discriminate a smile between genuine and fake when the smile was displayed by an ingroup member.
Summary
The reviewed studies on ingroup-outgroup dynamics in facial emotion perception suggest two mechanisms: emotion attribution and emotion differentiation sensitivity. The first mechanism highlighted the role of prejudice and stereotypes. These studies showed that emotions aligned with ethnic or social group preferences and cultural stereotypes are more readily detected, reflecting how preoccupied knowledge about an ethnic or social group, which can be heavily influenced by one’s cultural background, shapes emotional expectations. The second mechanism underscored the impact of motivation. These studies indicated that the likelihood of observing an ingroup or outgroup advantage in emotion detection was substantially influenced by increased motivation, suggesting that cultural affiliation can enhance sensitivity to emotional cues.
Emotion Conceptualization
The “Emotion Conceptualization” theme was addressed in ten articles, dissecting three key aspects: four articles examined individual emotion concepts, two delved into the structure of conceptual knowledge, and four focused on mentalization. Together, these studies presented emotion conceptualization as a mechanism underlying the varied perception of emotions across different cultures.
Three Aspects of Conceptulization
Three studies focused on the translational equivalence of specific emotions by having participants match emotion words to facial expressions and then creating aggregated profiles for each emotion word based on the expressions linked to it. These emotion profiles were then compared cross-culturally to evaluate consensus in emotion recognition, as reflected by the similarity of their facial expression profiles. Notably, the emotions of disgust and excitement exhibited a low correlation (< .8) between Japanese, Chinese, and Americans (Russell & Sato, 1995), and disgust similarly showed a low correlation among English, Hindu, and Malaysian speakers (Kollareth & Russell, 2017). In addition, a study across 12 emotions revealed that only happiness was universally recognized by American and Palestinian cultures, highlighting cultural differences in the interpretation and facial expression of emotions (Kayyal & Russell, 2013).
Interestingly, while differing emotion concepts affected how facial expressions were labeled, they did not seem to alter the ability to perceptually discriminate between different emotions. For instance, Yucatec Maya speakers, who do not linguistically differentiate between disgust and anger, were still able to effectively discriminate between facial expressions that straddled the boundary of anger and disgust, showing perceptual skills comparable to German speakers (Sauter et al., 2011). This suggested that perceptual recognition of emotions can be independent of specific linguistic categorizations.
Two studies mapped the conceptual structure of emotions from similarities between emotion word pairs and associated them with facial emotion perception. The first of these found that the conceptual structure of emotions systematically differs between Chinese and British Caucasians. In addition, the action units that make up the facial representations of these emotions also showed systematic variations across these cultural groups (Jack et al., 2016). The second study indicated that the conceptual similarity between emotions predicted the propensity of misinterpreting the facial expressions associated with those emotions, as well as the corresponding brain activation, at the individual level across cultures. Given that Americans demonstrated greater precision in categorizing emotions compared to Japanese individuals, these discrepancies could be reasonably attributed to cultural variances in conceptual understanding (Brooks et al., 2019).
Four studies explored cross-cultural variations in attributing emotional expressions to mental states versus behavior. Mentalization entails deducing the underlying reasons for a facial expression, moving beyond mere identification of the action depicted, thus indicating a deeper level of emotion conceptualization. Studies showed Western participants favored emotion terms for labeling facial expressions, contrasting with small-scale isolated African societies’ preference for action-related descriptions (Gendron et al., 2014, 2020; Tcherkassof & de Suremain, 2005). However, a comparative study involving Western, Chinese, and Japanese participants, who are from more industrialized societies, revealed no marked cross-cultural differences in the preference for emotion or action labels (Yik, 1999).
Summary
This theme focused on how emotions are conceptualized across cultures and how this affects facial emotion perception. This mechanism was explored through three distinct subthemes: single emotion concepts, the structural configuration of these concepts, and the extent to which facial emotion perception involved the evocation of mental states. Cross-cultural variations in the conceptual understanding of emotions significantly affected how emotions were labeled when interpreting facial expressions, but they did not necessarily correlate with the perceptual ability to distinguish between various emotional expressions on faces. While conceptually understanding emotion is intricately linked to the language prevalent in each culture, the extent to which emotions are linked to mentalization emerges as a more complex cultural phenomenon, meriting deeper exploration.
Discussion
This review aimed to dissect and understand the factors proposed to underpin cross-cultural differences in facial emotion perception, through a systematic exploration of pertinent scholarly works. A total of 105 research studies met criteria for inclusion. Notably, over half of these studies failed to meet critical risk of bias criteria, including sample representativeness, sample size determination and baseline comparisons between groups. We also need to entertain the possibility of a publication bias, such that many papers not demonstrating a cultural effect, failed to be accepted for publication. The case for real differences in emotion perception across cultures thus remains tentative and should not be over-stated, given clear evidence for universality at least at a basic level. With these caveats in mind, this review was designed to identify the kinds of mechanisms that published researchers have investigated to explain cross-cultural differences that they observed in emotion perception and it was helpful that a limited number of themes were found to underpin this superficially diverse literature.
Five distinct yet interconnected themes (illustrated in Figure 2) were identified: the influence of context, processing of facial features, display rules and interpretation, the role of affiliation with ethnic and social groups, and the conceptualization of emotion. Within each thematic domain, key findings were synthesized to reveal common mechanisms. In essence, themes appeared to either describe the kinds of cues that people across cultures pay attention to (such as context, parts of the face) or the kinds of cultural knowledge and beliefs that they hold which influence how they interpret what they perceive. This notion of cue processing versus interpretation aligns well with neuroscientific models of emotion perception, such as that advocated by Adolphs (2002a, 2002b) developed through psychological and neurological research in monocultural contexts. According to Adolphs, there are two fundamental processes in facial emotion perception: perceptual processing (the collection and initial processing of sensory input) and conceptual processing (the refinement and categorization of this input based on existing knowledge). Importantly, neuroscientific models encompass feed forward (bottom up) and feedback (top-down) mechanisms whereby attention feeds information forward to be processed and categorized and conversely, social knowledge can influence attention. The research themes identified in this review fit well with such a conceptualization and also suggest a reciprocal relationship between attention/processing of cues and social understanding and knowledge when engaged in emotion perception.
In Figure 3, we have developed a process model in which identified mechanisms associated with the research themes (see Figure 2), have been slotted in along a continuum from attention allocation to social knowledge/conceptualization, in line with the interactive, sequential processes suggested by Adolphs and others. In all, we identified 9 mechanisms that for which evidence has been provided in two or more studies and collectively encompass mechanisms identified in all themes identified in Figure 2.

Integrated Models of Cross-Cultural Variations in Facial Emotion Perception.
According to this model, the initial stage encompasses the acquisition of perceptual information, as evidenced by observed cultural variations in attention allocation. These mechanisms illustrate how individuals from diverse cultural backgrounds employ distinct strategies to gather and preliminarily process facial data, thereby constructing detailed representations. Specifically, cross-cultural differences manifest in several key areas: the differential allocation of attention between visual context and facial features, the prioritization of various emotion-conveying modalities, the emphasis on particular facial regions (notably the eyes and mouth), and the relative focus on the left versus right sides of the face. Collectively, these attentional biases shape the initial perceptual input available for subsequent processing, laying the foundation for more high-level conceptual categorization.
The second stage, corresponding to the interpretation phase, encompasses a number of mechanisms that align with the notion of conceptualization and interpretation. These mechanisms relate to how the perceptual information gathered in the first stage is integrated and interpreted through culturally specific knowledge and belief systems. It is premature to suggest a hierarchy in these processes, and indeed, contemporary neuroscientific evidence (which we are using as a heuristic base for our model) would argue for interactive systems rather than linear processes. None-the-less, intuitively, in Figure 2 we have ordered these from more perception linked through to more abstract. Adjacent to culturally varying attention allocation patterns, we have placed cognitive representation, whereby viewers map what they see to an internal representation of what they expect in terms of facial expression. So too, familiarity with culturally specific display rules may influence how such perceptual representations are categorized. This interpretative process will be modulated by a complex interplay of other sociocultural factors, including prejudices and stereotypes associated with cultural groups, and motivations linked to group affiliation. Ultimately, these processes are filtered through culturally-specific emotion conceptualization—reflecting how different cultures fundamentally understand and categorize discrete emotions, which shapes the very framework within which other cultural differences in emotion perception operate. Each of these culturally determined factors exerts influence on the perception and comprehension of facial emotions, underscoring the intricate relationship between cultural background and emotion recognition
As with the neuroscientific model of emotion perception, the model in Figure 3 also emphasizes feedback pathways between perceptual processing and interpretation. While a detailed exploration of these reciprocal interactions extends beyond the scope of this review, the possibility of such interplay warrants consideration when examining how these mechanisms influence each other. For instance, studies have demonstrated that the perceived ethnicity of an observed face can modulate attention allocation during facial emotion perception, exemplifying top-down influences on attentional processes based on social cues (Friesen et al., 2019; Hu et al., 2017). Should autonomy belief be established as a valid concept, this too may influence where attention is allocated in early processing of perceptual input.
While relevant evidence remains limited, future research could fruitfully explore these dynamic interactions between cultural knowledge and initial perceptual processing in facial emotion perception. Furthermore, future studies should explore the applicability of this model to non-facial forms of emotion perception, such as vocal and bodily expressions. This cross-modal investigation could help determine whether the Two-Stage Model’s principles of attention allocation and social-cognitive interpretation are generalizable across different emotion communication channels.
Ingroup Advantage: An Example to Apply the Findings
Not only does the synthesis of research represented in this review speak to theoretical understanding of cross-cultural differences in facial emotion perception, but hopefully, it provides valuable insights with which to address real-world problems. This section uses the ingroup advantage in facial emotion recognition as an example.
The concept of ingroup advantage, often synonymously termed ethnic bias in the literature (e.g., Cooper et al., 2022; Hills & Hill, 2018; Kilbride & Yarczower, 1983), has garnered significant scholarly interest and debate (e.g., Elfenbein & Ambady, 2002a; Matsumoto, 2002). This terminology suggests an intuitive linkage between ingroup advantage and ethnicity, potentially stemming from the “other-race effect” observed in face recognition research. The other-race effect, characterized by enhanced facial identity recognition within one’s racial cohort, is primarily ascribed to increased familiarity (Meissner & Brigham, 2001) and elevated motivation (Hugenberg et al., 2007) for processing ethnically congruent faces.
In the domain of facial emotion perception, the influence of motivation is magnified across various ethnic and social groups. Diverging from the “other race” effect for identity recognition, the research reviewed here has challenged the direct relationship between ethnic-specific geometric facial features and ingroup advantage, especially in studies lacking explicit motivational inducements. This observation aligns with research findings suggesting that distinct neural systems are responsible for processing various types of facial information, including ethnicity, age, gender, and emotions (Bruce & Young, 1986; Haxby et al., 2000).
Therefore, while motivation is a foundational element in both identity recognition and facial emotion perception and is influenced by ethnic bias, the processing of ethnically specific geometric facial features appears to be less critical to facial emotion perception. Instead, the ingroup advantage in this domain is primarily linked to an individual’s familiarity with the unique ways of expressing emotions culturally (Elfenbein et al., 2007), a process deeply embedded within the emotional cognitive representation system. Therefore, the manifestation of ingroup or outgroup advantages is not merely a consequence of ethnic affiliation, but rather the result of a dynamic interplay between cultural familiarity with emotional expressions, and the motivational impetus evoked by ethnicity in diverse contexts.
Future Directions
The research reviewed here provides many potential avenues for further exploration. Of urgent need, studies of cross-cultural effects in emotion perception need to have good research designs that minimize the risk of bias so as to strengthen confidence in many of the areas discussed here. Expanding upon the research avenues previously discussed, several additional areas warrant exploration to further explore cross-cultural differences in facial emotion perception. While instrumental in understanding cross-cultural differences in facial emotion perception, the connection between display rules and other mechanisms lacks solid evidence. Further research might consider methods for operationalizing and measuring display rules and their relationship to facial emotion perception mechanisms. Similarly, certain mechanisms, like autonomy belief, have been identified in singular cultural contexts and reported in an isolated study. Replication studies are needed to establish their Generalizability.
Following suggestions provided in Matsumoto and Yoo (2006), future explorations could incorporate diverse cultural dimensions (e.g., high-context versus low-context communication, see Hall, 1976), especially emerging ones, when identifying mechanisms underlying cross-cultural variations in emotion perception. Ishii et al. (2011) demonstrated that anxious attachment style fully mediated cultural differences in detecting the offset of happiness in morphing displays, establishing attachment as an important cultural dimension closely associated with emotion perception. The authors theoretically proposed that these attachment patterns might connect to physiological differences through serotonin transporter polymorphism (5-HTTLPR), though this hypothesis remains empirically untested. Incorporating cultural dimensions provides a comprehensive framework for understanding cross-cultural variations, effectively connecting cultural backgrounds with cultural characteristics and cognitive/physiological processes—a methodologically sound approach for future investigations.
In addition, moving beyond static images or morphing movies, future studies could employ stimuli that better reflect real-world complexities. This approach would examine the interplay between context and the dynamic nature of facial expressions, as well as the interactivity in emotional exchanges during interpersonal interactions, to unravel the multifaceted aspects of facial emotion perception in more naturalistic settings.
Contributions
This review has provided a synthesis of a diverse and heterogenous literature to identify common themes. Furthermore, it has encompassed these themes into a model, consistent with neuroscientific evidence that may provide a useful heuristic device for future research aimed at furthering understanding of this complex field. By providing a clearer landscape of this research area, this framework aims to facilitate future studies in related fields.
This review also has potential implications that extend beyond academic research. The findings may serve as a foundation for developing more culturally sensitive and effective practices in various fields where cross-cultural understanding is crucial. Specifically, these insights could contribute to enhance training programs in cross-cultural communication for international business and diplomacy, guide the development of culturally sensitive diagnostic tools and therapeutic approaches in mental health practices, inform the design of more adaptive AI systems and user interfaces in human-computer interaction, and contribute to the creation of more inclusive educational environments and teaching strategies in multicultural settings.
Limitations
The review’s primary limitation lies in its focused scope, concentrating specifically on studies investigating mechanisms underlying cross-cultural differences in facial emotion perception. While this strategy allowed for a focused and manageable review, it potentially omitted exploration of certain populations, methodologies, and modalities of emotion expression. In addition, the exclusion of unpublished studies limited its ability to assess potential publication bias. Consequently, for a comprehensive overview of cross-cultural comparisons in facial emotion perception, these findings should be considered alongside (a) studies not designed to explore the reasons for these differences, such as those examining the universality hypothesis or documenting cross-cultural variations without investigating underlying mechanisms, (b) research focusing on non-facial modalities of emotion expression, including vocal cues, body language and gestures, and tactile communication, and (c) unpublished studies.
Another limitation stems from the decision to forgo a meta-analytic approach. This choice was necessitated by the substantial variability in conceptual and methodological frameworks across the reviewed studies. While this decision was pragmatic given the current state of the field, it means this review lacks the quantitative insights typically offered by meta-analyses.
This synthesis, while extensive within its defined scope, reflects the current landscape of published, peer-reviewed research. This review integrates the established body of knowledge on cross-cultural variations in facial emotion perception, drawing from thoroughly evaluated studies. As the field continues to evolve, future research may uncover new insights and perspectives, underscoring the dynamic nature of the field and points to the potential for future studies.
Conclusions
To our knowledge, This review presents the first comprehensive synthesis of cross-cultural research on the mechanisms influencing facial emotion perception across cultures. By identifying mechanisms across five research themes and integrating them into a two-stage model, we provide a novel framework for understanding cultural differences in this crucial aspect of human interaction. We hope the systematic review establishes a solid foundation for future investigations, offering researchers and practitioners a structured approach to evaluate and address cultural differences in facial emotion perception, within and beyond academia.
Supplemental Material
sj-doc-1-jcc-10.1177_00220221251334811 – Supplemental material for Why Do Cultures Affect Facial Emotion Perception? A Systematic Review
Supplemental material, sj-doc-1-jcc-10.1177_00220221251334811 for Why Do Cultures Affect Facial Emotion Perception? A Systematic Review by Ranran Li, Halle Quang, Michaela Filipčíková, Yi Xu, Fiona Kumfor, Branka Spehar and Skye McDonald in Journal of Cross-Cultural Psychology
Footnotes
Appendix
Authors’ Note
This article is based on a chapter from the first author’s doctoral dissertation completed at the University of New South Wales. A part of these findings was presented orally at the 2023 International Neuropsychology Society conference, in San Diego, USA.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the Australian Government Research Training Program (RTP) Scholarship as part of the first author’s doctoral candidature.
Supplemental Material
Supplemental material for this article is available online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
