Abstract
Over 5 days at the Nag’s Head Conference Center, USA in 1987, social and cross-cultural psychologists discussed what would be required if research relating to culture were to gain greater attention from psychology in general, and in particular from what was perceived at the time as its mainstream. The criteria for gaining greater credibility laid down by three leading social psychologists proved daunting in relation to the cross-cultural work presented at the meeting but subsequently inspired cross-culturalists to “raise their game.” In this paper, we describe these crucial challenges and how they have been addressed more recently by cross-cultural psychologists. We assess the extent to which studies focused on cultural differences are now thoughtfully represented in social, personality, and organizational psychology by briefly surveying the content of a single year’s issues of the Journal of Personality and Social Psychology, Journal of International Business Studies, and the Journal of Personality in relation to the concurrent content of the Journal of Cross-Cultural Psychology. We identify the perils of assimilation to psychology in general by diluting the concept of culture and by tyrannizing research with over-specified criteria of statistical rectitude. We also identify studies published in top-rated journals that have nonetheless advanced our field. We reiterate the need for defensible measures of cultural difference and methods for identifying and examining them as a basis for multi-level explanations of cultural effects and cultural change. We conclude by proposing a gold standard for assaying cross-cultural studies of psychological processes and outcomes.
“One of the strengths of scientific inquiry is that it can progress with any mixture of empiricism, intuition, and formal theory that suits the convenience of the investigator. Many sciences develop for a time as exercises in description and empirical generalization. Only later do they acquire reasoned connections within themselves and with other branches of knowledge.”
Setting the Agenda in 1987
“There lies the port; the vessel puffs her sail: There gloom the dark broad seas.” Tennyson, Ulysses
The background for this article is a 5-day encounter between two groups of social psychologists selected to consider how the concept of culture could be sensibly incorporated into the study of the social phenomena researched by social psychologists. In 1986, Michael Harris Bond invited four accomplished social psychologists, the “mainstreamers,” whose work had not yet referred to culture, to meet with around 20, cross-cultural social psychologists, the “cross-culturalists.” With hindsight, we acknowledge that by building the meeting around the notion that certain approaches to psychology can be considered as the “mainstream,” we yielded some ground that would better have been contested. Later research has increasingly underlined that many aspects of cultural groups in North America are distinctively different from those studied elsewhere (Henrich et al., 2010; Hofstede, 1980; Kağıtçıbaşı, 1996). The “mainstream” has proved to comprise an indigenous culture of its own, but the research methods and journals embodied within it remain globally predominant. The Nag’s Head meeting was built around the notion that there is a mainstream psychology, and we retain it here for the sake of developing our argument.
Over 5 days at the Nag’s Head Conference Center in 1987, the attendees discussed what would be required if research relating to culture were to gain greater attention from psychology in general, and social psychology in particular. The entry criteria laid down by the mainstreamers who attended the meeting proved daunting in relation to the cross-cultural work presented, inspiring the cross-culturalists in attendance to “raise their game.” The edited book that emerged from that conference (Bond, 1988) has helped guide succeeding generations of cross-cultural psychologists to refine their understanding of culture and how culture interfaces with the psychological processes they study within their own cultural group and beyond.
The Encounter
The mainstreamers’ challenges
The title of the book emerging from the conference was, “The cross-cultural challenge to social psychology”; its content was framed as a constructive debate. The social psychologists opened the discussion by presenting their initial considerations about current and ideal cross-cultural studies of social psychology. We have selected some of the ways in which they represented these concerns: Tedeschi (1988a) began by acknowledging that, “There are many critical voices pointing to the limitations of a social science that does not consider indigenous psychologies, cultural context, and methodological pluralism.” (p. 15). He further observed that social psychologists might be ignoring the potential inputs of cross-cultural psychology because of “skepticism about its methodology, and a failure of cross-cultural psychology to devise theories or carry out research that captures the imagination of the mainstreamers.” (ibid).
In his opening statement, Messick (1988a) acknowledged the difficulties of doing rigorous comparisons of social psychological phenomena across cultures. He also pointed out that extant theorizing about how culture functions to impact social psychological phenomena tended to focus on conceptualizing culture as an individual difference variable “subject to all the ambiguities that plague research on individual differences within a culture.” (p. 46). He proposed instead that culture would be more sensibly conceptualized as “differences between the institutions, norms, or expectations that elicit different behavior patterns.” (p. 47). He allowed that, “The question of why different cultures have different norms and institutions is an interesting question, but it is a question that falls more into the domain of history and anthropology than social psychology.” (ibid). Thus, he conceived the proper role of social psychology as elucidation of universal social processes and of cross-cultural psychology as the exploration of moderators of these processes.
The cross-cultural psychologists’ reply
In their initial presentations, the cross-culturalists claimed that prior research comparing cultural groups and samples of their members had already shown meaningful cross-cultural differences in psychological processes and their outcomes; these were relevant to content areas of social psychology. They had done and were continuing to do social psychological research across cultures that was important for advancing the generalizability of the field against the claims of triviality (Ring, 1967), historical embeddedness (Gergen, 1973), and intellectual colonialism (Jahoda, 1979) then roiling the discipline.
Specifically, and as later summarized by Messick (1988b), the cross-culturalists asserted that they were addressing three shortfalls of social psychology as then practiced:
Parochialism: The findings upon which the edifice of social psychology was being built were typically derived from a select group of persons: “North American college students.”
Methods: Replication of studies with participants from other demographic and cultural groups was rarely practiced, whereas historical data and sociobiological orientations were ignored.
Theory: Cultural influences were being ignored in theorizing and research on the social psychology of values, beliefs, attitudes, norms, and behavior.
The mainstreamers’ reassessment
After considering the cross-culturalists’ inputs, Tedeschi (1988b) responded by acknowledging the merit of their goal to develop “a theory of human behavior that includes cultural context.” However, he pointed out that establishing “transhistorical universal theories of human behavior” must include well-articulated concepts of culture and their change across human history to accomplish this lofty achievement. He noted that many constructs already studied by mainstream social psychology were “indigenous” because of their developmental provenance from “the West” and would require careful refinement in concert with indigenous constructs proposed from other cultural traditions to justify any claims to universality; temporal processes of cultural change must also be built into such theories in testable ways. He balefully concluded that, “The problems and challenges ahead of the cross-cultural psychologists are formidable.” (p. 285).
In his rejoinder, Messick (1988b) repeated his concern that, “cross-cultural work will have an impact only when it articulates clearly with social psychological theory.” (p. 286). Social psychologists everywhere, whether concerned with culture or not, shared the basic problem of describing “how internal qualities, for example, attitudes or personality dispositions influence behavior” and that, “there is much conceptual work to be done before social psychology will have a good understanding of cultural influences on behavior” (pp. 288–289). He concluded that, if cross-cultural social psychologists were going to incorporate cultural variation into their theorizing and research, then they would need to begin considering cultural rather than individual differences, that is, “the institutions, norms, or expectations that elicit different behavior patterns.” (Messick, 1988a, p. 47).
Meeting These Challenges Post-1987
“To seek, to strive, to find, and not to yield.”
Tennyson, Ulysses
We extract what we consider the crucial aspects of this watershed encounter documented in the 22 chapters of “The cross-cultural challenge to social psychology” in order to structure our overview of subsequent developments in cross-cultural social and organizational psychology. We next outline what will be required if we are to conclude that more recent studies have made progress in addressing these challenges. Following this assessment, we examine the yield of cross-cultural studies that appeared during 2020 in three of the key journals that deal with personality, social, and organizational psychology. We then cast our net more widely, presenting our cull of recent studies that have addressed these concerns more comprehensively. Finally, we propose a “futurescape” for doing sensible work in cross-cultural social psychology.
Key issues Identified at the Nags Head Conference
Results from simple replication
Sharon and Amir (1988) noted the discouraging results when they attempted to replicate prior US findings with samples drawn from Israel. We now know that replicable results are by no means certain even among samples drawn from the same nation (Klein et al., 2018). We also have come to appreciate, however, that replicability itself is of lesser interest than the identification of variables that can explain when replication will or will not be found. The identification of such variables only becomes possible, however, when a broad range of cultural samples has been surveyed.
Perspectives on culture
If we are to generate adequate understandings of cultural variation, we need to incorporate cultural voicings from non- or under-represented cultures. Richer understanding will arise if cross-culturalists draw from other culture-relevant disciplines, such as socio-biology, history, evolutionary psychology, sociology, and anthropology (Kukla, 1988; Trimble, 1988).
Dimensions of cultural variation
Triandis (1988) provided a crucial early exposition of the utility of distinguishing individual and group-level variability. He noted the promise of culture-level individualism-collectivism, tightness-looseness, and traditionality-modernity as explanatory variables in accounting for individual behaviors in a given cultural context. Such dimensions can help to explain how individual differences in personality can find expression in different models of personhood (Miller, 1988).
Cultural context and change
In understanding cultures, we need to take account of both continuities and change. Gabrenya (1988) reiterated the preoccupation of early students of comparative cognition with the material circumstances within which child development occurs. Miller (1988) asked what specific features of ecological context elicit any given cultural adaptation? Yang (1988) asked in what ways does industrial modernization drive cultural adaptations? Each of these contributors stressed the need to address ecological determinants of cultural adaptations across changing circumstances.
Where should our understandings of culture be focused?
What aspects of social behavior are universal and what are variable (Triandis, 1988)? More broadly focused discussions of how to conceptualize culture were notably absent from the 1988 book. Instead, the principal emphasis was on the search for variations in empirical effects across geographically separate cultural groups. The differences that were found were most frequently attributed to contrasts between individualism and collectivism and its close associate, power distance. No discussion of what might constitute a testable theory of cultural variation beyond collectivism-individualism was then envisaged.
Having noted the salience of these five issues at the Nag’s Head meeting, we next lay out our perspective on how they may be, and to some extent, have been addressed subsequently.
Proposed Ways to Move the Field Forward
“My purpose holds to sail beyond the sunset and the baths of all the Western stars until I die.” Tennyson, Ulysses
We are cross-cultural social-personality-organizational psychologists with a keen interest in the developmental processes of enculturation. Based on our experience in practicing cross-cultural psychology over the last four decades, we have selected five areas crying out for improvement.
We Need to Study Cultural Groups Other Than Nations
The title of this journal, the Journal of Cross-Cultural Psychology, reflects the comparative focus that characterized many early studies and was given particular emphasis by the pioneering work of Hofstede (1980) on nations. While there are certainly other useful ways of studying culture, our continuing preference is to work within the method of difference as first defined by Mill (1843), and still firmly embedded within the mainstream of social psychology. Hofstede’s achievement in distinguishing some types of nation-level variability has bequeathed a continuing emphasis in our field on the study of differences between nations rather than between other collective groupings, like organizations of various types, teams, families, and so forth.
Critics of Hofstede were quick to assert that variability between individuals is far greater than variability between nations. We now know this claim to be true. The proportion of total variance that differs between nations ranges between 7% and 25% in values (Fischer & Schwartz, 2011) and in personality (Poortinga & Van Hemert, 2001). This is important: if we are to attempt explanations of cultural differences, there must be an adequate degree of variability between samples to permit valid tests of hypotheses. The issue to be debated is whether around 10% of variance is sufficiently important to be of interest, or whether samples other than nations, like the rural/urban divide, provinces/states, districts, cities, language groups, ethnic groups, religious groups, and so forth, would provide more worthwhile data bases to address.
We certainly do not argue that use of comparison between nations is a requirement for a cutting-edge study. Comparison between nations is a convenience, based on the wide and growing availability of such data, and the fact that a range of institutional, linguistic, and political factors serves to sustain national cultures, giving differences between them a continuing practical importance (Akaliyski et al., 2021). The cultural unit selected for study should be one that has plausible connections with the proposed outcomes of interest. Our first criterion for a cutting-edge study of any type of cultural unit is that it should report measures such as intra-class correlations that inform us of the degree of variability between whichever type of samples has been employed.
We Need Much More Extensive Sampling
Much of the initial impetus within cross-cultural social psychology was built upon attempted replications of prior US studies and on comparisons of data from two nations, one of which was typically the United States, and the other was from East Asia. Such comparisons were most often described as comparisons between “the East” and “the West.” More recent major surveys have provided continuing evidence of substantial variability between “Eastern” nations and ‘Western” nations, as well those that do not fall easily within either broad category (e.g., Minkov et al., 2017; Welzel, 2013). If we wish to undertake valid tests of explanations for results that show differences between samples, we need to undertake much broader sampling. We can now do so by sampling nations located at distinctive positions on any of the many dimensions of cultural variability that have so far been identified.
Our preference for studies that examine variations along known dimensions indicates that we see greater potential in studies whose design is etic rather than emic (Berry, 1989); we favor designs where the same measures are employed in each of the samples surveyed, with checks made as to their appropriateness within each specific context. While there is certainly value in charting local formulations of social processes (Allwood, 2018), such formulations cannot be considered as distinctive or indigenous until such time as their predictive validity has been tested in other cultural groups; they may in fact be pan-culturally useful.
Concepts claimed as indigenous to a particular location have rarely been tested in this way. We note three instances where such tests have been made. The indigenous personality dimension of interpersonal relatedness first identified in China (Cheung et al., 1996) was subsequently found to be predictive in the USA (Cheung et al., 2003). Indigenous personality dimensions identified in South Africa (Fetvadjiev et al., 2015) are found also to be predictive in New Zealand (Fetvadjiev et al., 2021). Evidence is found that the indigenous Chinese process of guanxi has parallels in Lebanon and in Brazil (Smith, Huang et al., 2012; Smith, Torres et al., 2012). As Berry (2013) has noted, the value of emic studies is that they can enhance the derivation of dimensions of cultural variation that are informed by broad sampling rather than by imposition of a single prior perspective. Merely claiming a construct to be indigenous does not make it so until research in other cultures has shown that its meaning differs in these other cultures.
Even with wider geographical sampling, it will still be difficult to interpret the results of comparisons between any two nations, because nations differ on a multitude of factors that are not readily controlled. Despite this initial limitation, the majority of published cross-cultural studies continues to be based on two-nation comparisons. Within this journal, 53% of the cross-cultural comparisons reported between 2008 and 2015 were two-nation comparisons (Boer et al., 2018). Only 18% of the 382 cross-national studies sampled 20 or more nations. Within the one mainstream social psychology journal also surveyed by Boer et al. (2018), Personality and Social Psychology Bulletin, the proportion of two-nation comparisons was even higher at 77%.
Two-nation studies can ease the path toward more extensive sampling, but they have declining value in relation to the investigation of cross-cultural variability that continues to preoccupy our field (Brouwers et al., 2004). Two-nation studies may have future potential when greater attention is given to similarities rather than differences. Designs can be envisaged where tests are made of whether cultural factors mediate similarities rather than differences between individuals located in contrasting cultural groups. For instance, are there continuities between the actions of, say, individualists within differing cultural groups? We leave exploration of such options for another occasion.
Our second criterion for a cutting-edge study is therefore that it includes more than 20 samples. Only by doing so does it provide the basis for estimating whether the variability between samples in the study’s chosen explanatory variable is significantly associated with variations in the outcomes of interest. With many cultural groups, extraneous sources of variability can be excluded, and alternative explanations of effects can be tested competitively against one another (e.g., Van de Vliert & Postmes, 2012). Extensive sampling permits the use of scores on dimensions of sample-level variation as predictors, whether these be derived from Hofstede’s (1980) original dimensions or the many alternative dimensions that have been defined subsequently by culturologists, be they “nationologists” (e.g., Akaliyski, et al., 2021; Minkov, 2012) or “organizationologists” (e.g., Hofstede et al., 1990). Where the dependent measures are available only for nations, this requires sampling of 20 nations. Where dependent measures are available for smaller groupings, the number requirement could be met by sampling different genders, different regions, or different occupational groups, for instance.
The samples selected are important not only in terms of their numerousness, but also in terms of whether they are relevant to the dimension(s) of cultural variability that provide the focus of the study (Boehnke et al., 2011). Representativeness has importance also, given the type of cultural group being explored. Of course, there are substantial impediments to such extensive sampling, even with the most readily accessible samples, but these impediments are less than they used to be, given the increasing availability of big data, such as the World Values Survey. The increasing frequency of studies involving multi-national samples suggests that colleagues around the world are increasingly willing to cooperate in facilitating this wider sampling of cultures. Doing so may result in the destabilizing of established stereotypes about the cultural systems of various countries and even the discovery of novel dimensions of difference across these larger samples of cultural units.
We Need to Establish and Use Equivalent Measures
Valid comparisons of effects obtained from different samples require that the researchers involved establish equivalence in the meanings of the measures that are employed. However, it should be remembered that, given variability between participants and their understandings of the nature of the study in which they are involved, the goal of exact equivalence can rarely be achieved even within experimental studies conducted within a single sample of highly selected participants.
Cross-culturally, the first impediment to address is that of linguistic equivalence. Given the global increase in the use of spoken English, some researchers have employed English language instruments at each of the surveyed sites. This procedure is a serious threat to measurement validity. We know that respondents with English as a second language give different answers when surveyed in English than they do when surveyed in their local language (Chen & Bond, 2010; Harzing, 2005). Translation of surveys and experimental instructions into local languages by competent bilinguals, with checks provided by independent back-translation into the original language, is an essential contribution toward measurement validity (Van de Vijver & Leung, 1997). Even with such procedures in place, there is the need for judgment calls by the researchers involved on whether the use of literal translation or of “decentred” phrases that are more familiar in the second language can provide greater equivalence (Hambleton & Zenisky, 2011). Some items developed in one language culture may prove to be so culture-bound and untranslatable that they need to be dropped from consideration.
A second impediment to equivalence of measures is posed by cultural variations in styles of response to survey questions. It is well established that respondents to surveys differ in their use of scale midpoints, extreme responses, and in the tendency to acquiesce with items regardless of item content (Johnson et al., 2011; Smith, 2004). Opinions differ as to when these variations are best considered as artifacts of measurement or as valid indications of different cultural orientations (Fischer, 2004; Smith, 2011). For instance, respondents’ ratings may vary because they differ in the assumptions they make as to reference points that are relevant to the judgments or opinions that they have been asked to express (Heine et al., 2002). Differing reference group effects may be considered artifactual, or they may equally be treated as an integral aspect of the disposition of members of some cultural groups to take greater account of their context. Survey response formats using Likert-type scales elicit different styles from those found for instance when using the Schwartz et al. (2001) value survey (Smith et al., 2016). Any controls for response style must therefore be specific to a given response format.
Measurement equivalence is likely to be enhanced through the design of measures that reduce or eliminate variations in response style. These can include the use of scales with reversed items (e.g., Vignoles et al., 2016), scales with greatly reduced response choice (e.g., Minkov et al., 2017), scales which use lexical universals (Goddard & Wierzbicka, 1994), and scales whose scores can be adapted by using anchoring vignettes that have been included in the survey. Anchoring vignettes ask respondents to rate hypothetical persons or objects located at moderate and extreme scale points, thus providing estimates of the way in which each individual uses the response scales. He et al. (2017) have empirically compared the extent to which several different procedures modify scale scores. The question is currently unresolved as to their relative merits and disadvantages, but it appears that the use of anchoring vignettes may be the preferred option for at least some types of response scales (He et al., 2020).
Tests of measurement equivalence
Whatever translation procedures and response style controls are employed, substantive evidence is also required as to the degree of measurement equivalence that is achieved between samples. In their survey of publications in this journal between 2008 and 2015, Boer et al. (2018) found that measurement equivalence tests were reported in only 17% of the papers. Of course, some of the papers published used methods that do not lend themselves to these types of tests. Where such tests were employed, the principal procedure was multigroup confirmatory factor analysis, which was reported in just 11% of studies. In none of the studies that tested for equivalence was full metric equivalence established. However, partial metric equivalence was accepted in most of these studies as a basis for interpreting the results obtained.
The hazards involved in comparing scores from different samples where measurement equivalence has not been established are well known (Boer et al., 2018; Fischer & Smith, 2021; Van de Vijver & Leung, 2011). It is nonetheless apparent that most studies published in this journal in recent years did not test for equivalence. The issue to be considered is how serious a threat this presents to the findings that are most typically presented. The first point of relevance is that this difficulty is not distinctive to the cross-cultural field. Well-known measures such as the Big Five personality dimensions do not satisfy criteria for full metric equivalence even when sampled only within the US (Marsh et al., 2004, 2010). The criteria for full equivalence may simply be too stringent for the variability that is inevitable between substantial numbers of samples. Simulation studies using randomly generated data show that as the number of items per scale and the number of included samples rises, invariance criteria are increasingly breached (Svetina & Rutkowski, 2017).
A more radical critique of equivalence testing has been advanced by Welzel, Brunkert et al. (2021). These authors note that if a given cultural group is distinctive from others, respondents in that group are inherently likely to agree with one another in their responses to a given measurement instrument. Their scores on items comprising such an instrument will tend toward extremity, reflecting a relative cultural consensus on the acceptance or rejection of the attribute in question. When means on such measures are compared between groups, the individual-level distribution of scores will differ between samples, precisely because of the culture-level difference being tested. Multigroup confirmatory factor analysis will consequently detect variance between the distribution of scores in the different samples, thereby indicating that invariance is not fully established. While this tendency to detect invariance may not be strong in comparisons between two or three cultural groups, it will become ever stronger as sampling is extended.
Given the current emphasis on achievement of measurement equivalence as a criterion for publication acceptance, authors will feel pressure to drop specific scale items that are identified as contributing more strongly to invariance (e.g., Alemán & Woods, 2016; Sokolov, 2018). By doing so, authors can provide evidence of the partial equivalence that has been reported in some of the papers published in this journal (Boer et al., 2018). Authors may also drop one or more samples for the same purpose. Such procedures enhance the cross-cultural reliability of the measures in use, but given the critique advanced by Welzel, Brunkert et al. (2021), these procedures also reduce measurement validity.
Welzel, Brunkert et al. (2021) propose that an alternative basis on which to establish the validity of a given measure of cultural difference is to explore its nomological net, by comparing scores on a broad range of indices for which there are theory-based reasons to predict associations among the measured constructs. Of course, some such indices may also be vulnerable to criticism of measurement inequivalence, whereas others may derive from the objective criteria that we will discuss in a later section. The critique advanced by Welzel et al. is not found persuasive by those who doubt the utility of these alternative indices of validity (Fischer et al., 2021). For instance, correlations between scores on variables defining the nomological net of a given variable may frequently be themselves affected by factors such as survey response style, rather than for reasons related to theory.
The relative merits of testing for measurement equivalence and of exploring nomological nets remain unknown and are likely to vary depending on the specific hypothesis that is being investigated. What is particularly important is that neither procedure provides total confidence in measurement validity (Fischer & Smith, 2021). Partial measurement equivalence is better than not testing for equivalence at all; controlling for variations in response style is better than not testing for it; exploring whatever nomological nets are available is better than reliance on face validity.
We Must Distinguish the Culture Level From the Individual Level in Our Analyses
In the early days of cross-cultural psychology, researchers measured a psychological variable, like locus of control (Bond & Tornatsky, 1973), or a set of variables, like the Big Five (Bond, 1979), in two, three, or four cultural groups presumed to differ. If differences were found by comparing average scores in the results, they were explained in terms of the hypotheses about cultural variation initially hypothesized or by ad hoc speculations based upon literature on culture marshaled for the task.
Extensions of this approach soon emerged with psychologists conducting larger multi-cultural, usually multi-national, studies of psychological outcomes of interest, like values (Ng et al., 1981) or mate preferences (Buss, 1989). These studies yielded culture scores from the averages of individual scores that differed within each of the constituent cultural groups. At this point in the field’s development, sufficient cultural differences in psychological measures were available to begin associating the size of these cultural differences with other established differences between cultural groups, whether these be objective variables, like GNP, or “psychological” variables, like Hofstede’s (1980) four dimensions of work values. The focus of interest was differences among cultures, not among individuals within and across those cultures.
As these larger, multi-cultural data sets emerged and disciplinary statistical sophistication developed, some authors have shown that the preponderance of variance lies between individuals rather than between groups, as noted earlier, while others have shown that the degree of individual-level variability varies between cultural groups (Williams & Best, 1982, 1990). These two discoveries have different implications. Some cross-cultural researchers have concluded that, if one’s focus is on understanding individual variation, then cultural differences are not a matter of compelling interest. A more balanced conclusion, however, is to cast a more direct focus upon the relationship between the cultural and the individual levels of analysis. From this perspective, if we are to explain cultural differences adequately, we must first estimate individual-level variance within and between samples rather than ignore it, and then explore the interrelations between individual experiences and the culturally normative contexts in which individuals’ varying responses take place (Smith & Bond, 2019; Van de Vijver et al., 2008).
Since multi-level analysis provides procedures whereby this cross-level integration may be accomplished (Nezlek, 2011), there has been a commendable and growing tendency over the past decade to employ such methods. These progressive stages in the development of cross-cultural psychology have been described in greater detail by Bond and Van de Vijver (2011).
Effective conduct of cross-cultural, multi-level analyses requires the use of defensible measures of culture-level variability. Ideally, these differences should derive from the variability found between the cultural groups being studied. However, we note that there is also value in the use of known dimensions of cultural variation as moderator variables where meta-analyses are conducted of individual-level studies that have been reported from a variety of individual nations. For instance, Bond and Smith (1996) have explored variation between nations in rates of conformity in this way.
We Need Measures That Link Culture to the Individual
The suggestion by Messick (1988a) that psychology should concentrate on universals and cross-culturalists should explore the norms that moderate such universals unfortunately begs the question of how we are to determine what it is about human behavior that is universal. As Jahoda (1988) noted in contributing to the Nag’s Head meeting, what is or is not a psychological universal has more often been a matter of faith rather than investigation. More recently, Norenzayan and Heine (2005) have usefully distinguished a hierarchy of types of psychological universal, dependent on the varying degrees of differentiation that are found between a variety of samples. The search for evidence for each of these types of universal is inextricably linked with sampling across cultures. Thus, we must work within a formulation of the interrelation over time of individuals and cultures.
In the early days of cross-cultural psychology, Berry (1976) presented a schematic model linking characteristics of cultural groups to the psychological characteristics of individuals constituting their cultural group. The component terms of the model and their linkages one to another were consistent with the concepts characterizing the fields of psychology and culture studies available at the time, so the model was accorded general assent. This model has provided a useful template for conceptualizing and developing researchable hypotheses in cross-cultural psychology up to the present. As shown in Figure 1, Berry (2017) continues to elaborate his earlier model, focusing particularly on intercultural relations and cross-cultural adaptation.

Linkage between cultural context and the individual.
The Berry model and its components are pitched at a general level requiring practitioners to position concepts of interest within the components of the model and then to theorize about the relationships among their chosen concepts. Some elements specified in the model have been highlighted by evolutionary psychologists (e.g., Sng et al., 2018) and others by cross-cultural psychologists. These choices need to be performed by knowledgeable and informed researchers who clearly understand that they are dealing with different levels of analysis (Leung & Bond, 1989), that “psycho-logic” is different from “eco-logic” (Leung & Bond, 2007), and that plausible stories of cross-level influence need to be developed. Until the late 20th century, the statistical tools required to perform such cross-level analyses were lacking, so theories could not be tested empirically. Increasingly, however, the required amalgam of conceptual innovation and statistical-empirical skills has been forged, and studies embodying such sophistication have become widely heralded (e.g., Gelfand et al., 2011).
In thinking through such a cross-level analysis, one needs to specify features of what Berry termed, “Background Variables,” that is, measures of “ecological” and “socio-political” context and “Process Variables,” in particular “cultural transmission” and “acculturation.” Fortunately, elaborations of the ecological context have been provided by researchers, such as Van de Vliert on temperature (Van de Vliert, 2013), Welzel on “cool water” (e.g., Santos Silva et al., 2017), and Fincher and Thornhill (2012) on frequency of pathogen stress. Elaborations of socio-economic political context have been provided, for example, by multi-national research on the GINI coefficient (Cheung & Lucas, 2016) and Welzel’s (2021) work on types of democracy. Elaborations of the process of cultural transmission variable have been provided by Bond and Lun’s (2014) work on priorities for the cultural socialization of qualities in children, in Berry’s own work on inter-cultural adaptation (e.g., Cameron et al., 2020; Tatarko et al., 2020), and recent studies of genetic contributions to cross-cultural differences in psychological responses (e.g., Minkov & Bond, 2015). So, at this stage in the development of our field, different features within the various components of Berry’s cross-level model are becoming increasingly available for crafting into more complex cross-cultural research designs.
The Treatment of Cultural Differences in the Contemporary Mainstream
“It can scarcely be denied that the supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience.”
Attributed to Albert Einstein
To gain an indication of the ways in which culture is currently being treated in contemporary mainstream social and organizational psychology journals, we surveyed the content of the 2020 issues of three key mainstream journals. We relaxed our proposed sampling criterion and defined a paper as having cross-cultural content if it included analyses of data from three or more different nations. Of course, many papers sampling fewer than three nations can address issues of interest to cross-cultural psychologists, but we are particularly concerned to identify where papers are to be found that make use of the multi-level perspective that we have proposed.
The Journal of Personality and Social Psychology contained two such papers from among the 131 that were published. Buss et al. (2020) surveyed the criteria perceived to be indicative of high status by male and female respondents from 14 nations. They found no significant differences by sample or by sex and interpreted these results in terms of evolutionary theory. DeFranza et al. (2020) examined text databases in 45 languages, comparing gendered and non-gendered languages. The presence of gender prejudice in a sample was inferred from the frequency of association between terms denoting gender and positive and negative attributes. They found prejudice to be greater where languages are gendered.
In the Journal of Personality, five papers from the 84 that were published satisfied our criterion to be considered as cross-cultural studies. Lee and Ashton (2020) reported sex differences in HEXACO personality dimensions among adults across 48 nations. The size of nation-level differences in Emotionality was correlated with the gender equality index. Kaiser et al. (2020) also reported on sex differences in personality, sampling six English-speaking nations using the 16PF measure. Measurement equivalence was tested only in relation to sex differences, however, not in terms of the 16 factors of the measure itself. Jonason et al. (2020) surveyed endorsement of Dark Triad personality traits across 49 countries. Following tests of measurement equivalence, means by sex and by nation were regressed on various sample-level indicators, including indices of Schwartz’s nation-level value dimensions. Narcissism was found to be higher in nations low on the Human Development Index and on Schwartz’s measure of embeddedness. Lee, Gardiner et al. (2020) reported on the frequency of different types of situational experiences across a sample of students from 62 nations. Nation-level means were correlated with Hofstede dimensions and nation-level Big Five personality dimensions derived from prior studies. Thomas et al. (2020) obtained measures of mate preferences from male and female students who were predominantly from five nations. They concluded that there were no differences between “Eastern” and “Western” responses and interpreted their results in terms of evolutionary theory.
This brief sampling of cross-cultural papers published in two contemporary mainstream social psychology journals reveals a predominant preoccupation with the identification of universal or pan-cultural aspects of social behavior. In so far as cultural differences are addressed, they are mostly “explained” through correlational analyses using secondary, nation-level indices. Tests of measurement equivalence were made in only two of these seven studies (Jonason et al., 2020; Kaiser et al., 2020). None of these studies included multi-level hypothesis tests.
Three of 83 papers in the Journal of International Business Studies satisfied our cross-cultural sampling criterion. Graafland and Noorderhaven (2020) tested the hypothesis that national economic freedom and Minkov and Hofstede’s (2012) dimension of Long-Term Orientation would together predict company-level corporate social responsibility. Using secondary data on 3,045 companies headquartered in 43 nations, the hypothesis was supported. Rockstuhl et al. (2020) made a meta-analysis of the relation between employees’ perceived level of organizational support and their attitudinal and behavioral responses. Outcomes were coded as more relevant to social exchange processes or more relevant to identification with the organization. The 827 effects from 54 nations were then assigned to one of two groups using a combination of Hofstede’s (2001) scores for collectivism and power distance. As their theory predicted, perceived support was a stronger predictor of outcomes relevant to social exchange in individualistic than in collectivistic nations, whereas it was a stronger predictor of outcomes relevant to organizational identification in collectivistic nations than in individualistic nations. Watts et al. (2020) also reported a meta-analysis. Drawing on data from 17 nations, they found that the positive relation between supervisor transformational leadership and innovation by individuals and teams was significantly stronger in nations scoring higher on uncertainty avoidance (Hofstede et al., 2010).
These papers published in the leading international business journal thus have a more explicitly cross-cultural focus than those in the social psychology mainstream. However, each of these papers involves secondary data analyses and all three drew on versions of nation-level indices derived from the work of Hofstede (1980) and its subsequent elaborations. Many of the effects summarized in the two meta-analyses were single-nation studies, so issues of translation and equivalence of measures will rarely have been addressed.
Where Are the Cutting-Edge Studies to Be Found?
“Come, my friends, Tis not too late to see a newer world.”
Tennyson, Ulysses
Each of the studies identified in our brief survey of the contemporary mainstream no doubt has substantive value, but if we wish to identify studies that move us closer to encompassing all of the criteria outlined earlier, we must search for them elsewhere. In the sections that follow, we identify 15 studies that do more closely approach the criteria that we have proposed. We note that five of these were published in this Journal and no more than two in any other journal. However, to render our comparison with papers published in mainstream journals a fair one, we should first note that the number of papers in the 2020 issues of this journal that satisfy the criteria that we have proposed is just 5 from the total of the 43 that were published. Evidently, there are varying perspectives as to how best to conduct research within the field of cross-cultural psychology, just as there are within the mainstream.
Which Recent Studies Approach Our Idealized Criteria?
We distinguish here between studies with a cross-sectional design and those with a longitudinal design. The cross-sectional studies that we select for discussion typically draw on dimensions of cross-cultural variation identified in earlier studies and report their impact on individual and culture-level variance in their selected outcomes of interest. Longitudinal studies are more directly able to draw on eco-cultural predictors of effects.
Cross-Sectional Studies
Culture as a moderator of effects
Two studies examined variation in the extent to which Schwartz’s (2006) measures of values predict behavior in different cultural contexts. Elster and Gelfand (2021) explored the consequences of cultural tightness versus looseness (Gelfand et al., 2011). They predicted that where norms are relatively loose values will be more strongly predictive, whereas where norms are tight values will no longer be predictive. This hypothesis was supported across 24 nations, drawing on three measures of reported behaviors drawn from World Values Survey data. Multi-level analysis showed that values predicted individual reported behaviors, but that Gelfand et al.’s (2011) culture-level tightness-looseness index moderated this effect.
Rudnev and Vauclair (2018) studied the relationship between values and reported alcohol consumption across 21 nations that had contributed to the European Social Survey. Multi-level analysis showed that while individual values predicted drinking behavior, the effects were significantly moderated by measures of both of Schwartz’s major dimensions of culture-level variation in values. In one instance, a significant reversal of the individual-level effect was found.
Lee, Hu et al. (2020) analyzed data relating to endorsement of traditional household gender roles that were drawn from representative samples in 41 nations. Multi-level analysis revealed that traditional roles were stronger in nations scoring higher on latent dimensions derived from the earlier work of Hofstede (2001), Schwartz (2007), and Welzel (2013).
These three recent studies illustrate the way in which broader sampling permits multi-level analyses. Such studies can refine our understandings of cultural variation, but they depend on secondary data analyses, which restrict the extent to which they can address issues of measurement equivalence directly.
Stamkou et al. (2019), studied the reactions of young adults from 19 nations to scenarios depicting norm violations or norm adherence. They predicted and found that within cultural groups that are more collectivistic and have tighter norms, violators evoked greater moral outrage and were perceived as less powerful than were those in cultural groups that are individualistic and have looser norms. This study included measures of respondents’ actual endorsement of collectivism and tightness-looseness and employed Tucker’s phi (Van de Vijver & Leung, 1997) to establish the configural equivalence of these measures across samples. The hypotheses were tested through multi-level analysis.
Bond et al. (2020) used data from 18 nations contributing to wave 6 of the World Values Survey to examine the relations between brief measures of Big Five personality dimensions and life satisfaction. The equivalence of personality dimensions across samples was tested through computation of congruence coefficients (a procedure conceptually equivalent to the use of Tucker’s phi). Life satisfaction was greater among those scoring higher on agreeableness, conscientiousness, and emotional stability. However, multi-level analysis showed that these effects were moderated by national wealth and global competitiveness. It was reasoned that traits such agreeableness, conscientiousness, and emotional stability elicit greater satisfaction in cultural contexts that more strongly favor economic activity; extroversion, only in nations favoring economic competitiveness. The distinctive strength of this analysis rests on its use of objective nation-level measures as moderators of psychological-level processes.
Smith, Easterbrook, Al-Selim, et al. (2020) tested alternative explanations for variation in the magnitude of sex differences across cultures. Students from 24 nations described themselves on three measures of independent versus interdependent self-construal (Vignoles et al., 2016). Configural equivalence of the self-construal scales was established with the use of congruence coefficients. Multi-level analysis revealed that sex differences on these measures were greater in samples scoring higher on religiosity and pathogen stress (Fincher & Thornhill, 2012), but lower in wealth and the Human Development Index. Three of the four nation-level moderators employed in this study were also objective indicators.
The preceding three studies thus not only use multi-level analysis but also highlight the need to employ measures of cultural variability whose structure can be established, either through demonstrating measurement equivalence or through using objective measures that are in some defensible way relevant to cultural variability that impacts upon psychological processes and outcomes (Smith & Bond, 2019).
New and revised dimensions of culture
An alternative recent trend has highlighted the need for measures of cultural variability that go beyond the predominant earlier focus of broad measures of individualism versus collectivism. Several studies have focused on the identification of new dimensions, employing both tests of measurement equivalence, and the exploration of the nomological nets of the new measures proposed. Some of these provide enhanced precision in differentiating separate aspects of individualism-collectivism. For instance, Vignoles et al. (2016) formulated measures of seven different aspect of independent versus interdependent self-construal, using 63 samples from 35 nations. The validity of these measures was tested through tests indicating partial measurement equivalence, followed by multi-level analysis exploring their nomological net.
Bond and Lun (2014) constructed a new set of measures using items concerning parents’ goals for child socialization that were surveyed in representative samples from 55 nations contributing to Wave 5 of the World Values Survey. Sample-level factor analysis yielded factors that were named as Self-directedness versus Other-directedness, and Civility versus Practicality. Exploration of the nomological net of these dimensions showed for instance that socialization goals for Self-directedness and for Civility were more favored in more affluent, less corrupt, and more gender-equal societies. These dimensions provide a potential linkage between the concerns of developmental cross-cultural psychologists and the regrettable prior propensity of cross-cultural social psychologists to treat cultural differences as a given.
Smith, Easterbrook, Koc, et al. (2020) have reported an initial attempt to create measures of cultural difference that reflect differing cultural emphasis upon dignity, honor, and face (Leung & Cohen, 2011). Single-item measures showed an initially promising nomological net, but measures permitting adequate tests of measurement equivalence will be required.
A more radical approach has been advanced by Muthukrishna et al. (2020), who constructed an index of cultural distance from the USA. Individual-level datapoints were accessed from the 80 nations contributing to waves 5 and 6 of the World Values Survey. This procedure ignores the typical distinction between individual-level and sample-level variance, simply computing distance between a given data point and all other datapoints. A second index of cultural distance from China was also constructed. Nation-level means for cultural distance were then correlated with the cultural dimension scores provided by Hofstede (2001), Schwartz (2007), Gelfand et al. (2011), Big Five personality dimensions (McCrae & Terracciano, 2005), and objective indices. Substantial correlations were found between scores on many of these dimensions and the index of US cultural distance, but there were many fewer significant effects when using the Chinese index. The authors conclude that this contrast underlines the distinctiveness or cultural WEIRDness (Henrich et al., 2010) of the USA.
Gardiner et al. (2020) compared the reliability of two measures of happiness among students from 63 nations. One measure focused on independent aspects of happiness, while the other emphasized interdependent happiness. The nomological net of sample means for these two measures was then compared using objective criteria, Schwartz’s (2006) dimensions of values, and Muthukrishna et al.’s (2020) measure for cultural WEIRDness. Supporting the conclusions of Muthukrishna et al., the measure of interdependent happiness was more consistently reliable across cultures, while the measure of independent happiness achieved greater reliability in samples from WEIRD nations. The use of measurement reliability as a dependent measure across cultures is innovative and can enhance the development of dimensional measures with broader validity.
These studies reflect a continuing hunger for the development of improved measures of cultural difference. However, so long as such measures are used only in cross-sectional designs, no progress will be possible in explicating the causal sequences specified for instance in Berry’s (2017) model linking context and culture. To make progress in this direction, longitudinal studies are required.
Longitudinal Studies
The need to conceptualize the origins and continuing creation of cultures has been increasingly signaled in recent years (Sng et al., 2018; Varnum & Grossmann, 2017). We here examine two types of studies. The first identifies variables predicted to influence the development of specific cultural adaptations. Predictions are based on indices of ecological context that are reasonably presumed to have been in existence long before cultural effects developed. These studies might better be considered as quasi-longitudinal. True longitudinal studies involve taking repeated measurements over time.
Recent quasi-longitudinal studies have frequently been influenced by the model of eco-cultural stressors developed by Gelfand et al. (2011). Jackson et al. (2019) drew on World Values Survey data from 26 nations to test hypotheses about the level of prejudice. They predicted that greater frequency of ecological threat would favor the development of cultural tightness, which would in turn elicit enhanced levels of prejudice. Ecological threat is defined in terms of historic population density, food scarcity, disease frequency, natural disasters, and past territorial conflicts. While controlling for alternative explanations, multi-level analysis confirmed that tightness moderated the relation between threat and prejudice. A similar result was obtained concerning predictors of rated levels of prejudice across 47 historical societies. The contemporary relevance of cultural tightness as an adaptation to eco-cultural threat is underlined by the findings of Gelfand et al. (2021). While controlling for alternative explanations, across 57 nations the frequency of deaths from COVID is significantly and negatively predicted by cultural tightness.
Van de Vliert et al. (2018) tested a climatic theory of the evolution of contemporary values over past centuries. These authors reasoned that dairying was only able to evolve in environments that lacked very cold winters and very hot summers. Furthermore, populations within which dairying was possible would be best placed to derive maximal benefit from products where they had evolved lactose tolerance. The specified climatic factors were found to predict data on lactose tolerance from 1,500 CE. Furthermore, over time the benefits of enhanced nutrition and health in these regions were predicted to facilitate the emergence of empowered resources and emancipative values (Welzel, 2013). A measure of empowered resources derived from 1,800 was found to interact with the presence of lactose tolerance in predicting the nation-level endorsement of contemporary emancipated values. This effect was sustained while controlling for a variety of possible alternative explanations.
Truly longitudinal studies have relied upon comparisons of results from successive waves of the World Values Survey. Welzel (2013) has mapped the progressive emergence over time of what he terms emancipatory values, summarized as preference for autonomy, choice, equality, and voice. These changes are attributed to the interaction between climatic factors and the growth of affluence in many societies (Welzel, Kruse et al., 2021). However, such changes are by no means uniform. Kaasa and Minkov (2020) examined change in three types of values across a 15-year period in 18 nations contributing to the World Values Survey. Change was found to be relatively uniform in relation to values about child socialization but differed between nations in relation to moral values. In a similar way, Akaliyski and Welzel (2020) showed that after the breakup of the Soviet bloc, emancipatory values increased within those nations that joined the European Union. However, no such effect was found in nations that did not join the Union and in which affiliation with Orthodox religions or Islam was prevalent.
Longitudinal studies can thus test hypotheses relevant to causal effects, but most of the studies in which this type of design has been employed have tested predictions of sample-level differences; they have not undertaken multi-level analyses or addressed issues of measurement equivalence. The study by Jackson et al. (2019) is notable in this respect, because their sample-level findings were augmented by an individual-level study in which perceived threat levels were experimentally manipulated. While results of this particular study were inconclusive, their approach underlines the need for the use of multiple methods in testing theories about causation.
Conclusions and Recommendations
“The scientist must. . .be concerned to understand the world and to extend the precision and scope with which it has been ordered. That commitment must, in turn, lead him to scrutinize, either through himself or through colleagues, some aspects of nature in great empirical detail. And, if that scrutiny displays pockets of apparent disorder, then these must challenge him to a new refinement of his observational techniques or to a further articulation of his theories.”
The Structure of Scientific Revolutions
We have reviewed our understandings of how the field of cross-cultural psychology, particularly its personality-social-organizational variant, has developed since the watershed conference held at Nags Head in 1987 (Kuhn 1962, p. 42). We identified themes of concern that emerged during that meeting and discussed how they have been addressed and debated by cross-cultural practitioners, as they continue to refine their craft. We have been especially focused on how these emerging developments have been incorporated by mainstream psychologists who have more recently adopted a cross-cultural approach toward understanding the possible influence of cultural context on their psychological processes and outcomes of interest. We note that it is less easy in 2022 than it was in 1987 to define where the “mainstream” is located. Our selected studies were published in nine journals, and the interdisciplinary focus of many of these challenges the notion that there is in fact a single mainstream toward which cross-cultural psychologists might address themselves.
Acknowledging the inherent difficulties involved, we have noted shortfalls and halting progress toward a gold standard of acceptable fineness in contemporary cross-cultural psychology. Our considered gold standard for conducting any type of cross-cultural study includes:
Drawing a background distinction about whether one is studying “culturology,” that is, examining average psychological responses within each cultural group across all the cultural groups involved, or studying cross-cultural psychology, that is, examining how individual responses within each cultural group are influenced by dimensions of variation distinguishing the cultural groups involved; culturology is not cross-cultural psychology, though culturology’s findings are of crucial importance for improving the conceptual heft of studies in cross-cultural psychology.
If one is studying culturology, then including sufficient cultural groups whose position along this conceptual definition of culture enables the researcher to empirically test alternative claims about the cultural dimension involved in the process or outcome of interest.
If one is studying cross-cultural psychology, then conceptually linking theoretically meaningful dimensions of cultural group variation to the individual-level process and/or outcomes of interest, using a sufficient number of cultural groups in one’s sample.
Measuring the individual-level constructs used for sufficient levels of metric equivalence to justify the claim that the same constructs are being assessed in each cultural group involved.
Performing statistical tests that allow the scientific community to assess the balance between culture-general and culturally nuanced effects revealed by the research as actually conducted.
It is our earnest hope that future research in cross-cultural psychology will continue to approach these exacting standards.
“Ah, but a man’s reach must exceed his grasp, or what’s a heaven for?”
Browning, Andreo del Sarto
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
