Abstract

Far more than any other species, humans rely on learning from others, with both genetic and cultural inheritance systems interacting to shape human psychology (Boyd & Richerson, 1985; Henrich, 2016). Large bodies of theory and empirical evidence suggest variability in many aspects of human thinking, perception, and behavior is driven by different ecological niches and social norms (Heine & Norenzayan, 2006; Laland, 2018; Smaldino et al., 2019). Mainstream psychology has historically sampled mostly from a thin slice of humanity (Arnett, 2008; Thalmayer et al., 2020), constraining knowledge of human diversity and contributing to psychology’s limited generalizability and possibly replicability (Adetula et al., 2022; Nosek et al., 2022; Open Science Collaboration, 2015). However, the role of population diversity in replicability and generalizability remains understudied, and robust theories that predict when to expect variation in psychological phenomena remain underused and underdeveloped.
The multisite project Many Labs 2 (ML2; Klein et al., 2018) had the important goal of assessing how population, site, and setting variability moderate the replicability of psychological findings. Klein et al. (2018) attempted to replicate 28 studies with 15,305 participants distributed across 125 sample sites in 36 countries. They found that 15 out of 28 studies showed a statistically significant effect in the same direction as the original study. In addition, there was little detected effect-size heterogeneity regarding replicability across different contexts and settings in 11 of the 28. In this commentary, we focus on ML2’s exploratory investigation of the moderating role of culture (operationalized as “WEIRDness”) and reflect on three important choices related to the selection of samples, measures, and effects. In doing so, we emphasize the central role of theory and connect our arguments with previous recommendations for cross-cultural research (Berry, 2011; Kitayama & Cohen, 2010). Next, we offer suggestions for future research to assess the role of population diversity in generalizability efforts in psychology. We conclude by discussing how researchers can thread the needle between searching for cultural variation and confirming replicability and generalizability of psychological phenomena.
Sample Selection: Sampling WEIRD People From Around the World
Despite the large sample size and number of sites, ML2’s samples were culturally relatively homogeneous. Multisite projects have much improved samples compared with most psychological studies but are still naturally limited by coordination, reachability, and availability constraints. Furthermore, because researchers and the associated participant samples (e.g., university subject pools) self-selected into the project, some level of selection bias is likely. Sampling representatively across the globe is challenging, and random sampling is even more difficult. For instance, the majority of ML2’s subject pool were U.S.-based participants (39%) or participants from other WEIRD (Western, educated, industrialized, rich, and democratic) countries. Indeed, a large fraction of the participants were obtained from a small set of countries (see Fig. 1).

Frequency of participants per source country in which the participants were sampled on Many Labs 2 (Figshare link: https://doi.org/10.6084/m9.figshare.24038496.v1).
Furthermore, the participants in ML2 were mostly recruited from university subject pools and Amazon’s Mechanical Turk. These samples are likely skewed toward participants with higher education, digital literacy, English proficiency, and socioeconomic status (SES)—especially in less WEIRD countries. Having higher levels of formal education correlates with a greater cultural alignment with the United States (White & Muthukrishna, 2023). As a result, despite cross-country variation in sampling from 36 countries, the cross-cultural variation in many of the samples may have been minimal. The lack of cultural variation is possibly explained by the fact that ML2 had not planned to investigate the moderating role of culture but did so in an ex-post explorative analysis. But future studies that plan to explore how cultural variation affects replicability should opt for a more limited yet more representative and culturally diverse set of samples. New methodological approaches can help to select samples that show significant cultural variation (e.g., Muthukrishna et al., 2020; Obradovich et al., 2022). Adopting such strategies can offer a more robust test of cultural variability than enlarging the sample size—particularly if the larger sample remains self-selected and culturally uniform. This encompasses comparing populations both between and within countries. Older approaches have already demonstrated that obtaining truly random samples (with participation rates over 95%) from some of the most remote communities on the planet is feasible (Ensminger & Henrich, 2014).
Moving forward, we urge researchers to consider the statistical power of tests that are sensitive to culture’s moderating role rather than solely emphasizing the main effect’s power. Simulations can be instrumental in crafting studies with adequate sample sizes that can identify effects, considering anticipated population-level variance. For example, researchers can simulate a multisite setting to explore how population-level variance influences behavior and can assess whether studies have low power to detect the moderating role despite having large samples. For further exploration of how simulations can inform design choices when investigating the moderating influence of population-level variation on generalizability in psychological studies, see Schimmelpfennig et al. (2023).
Measure Selection: Operationalizing and Measuring Cultural Variation
The selection of adequate measures of culture needs to be underpinned by a theoretical understanding of how to measure cultural variation and at which level cultural differences are likely to matter. That is, measuring cultural variation necessitates not only an effective tool to gauge such variation but also careful attention to the unit of analysis, such as differences within or between countries. In ML2, cultural moderation was assessed by decomposing the letters of the WEIRD backronym (Henrich et al., 2010), an approach that has since been adopted by other published articles (e.g., Van Assche et al., 2023). The backronym, however, was originally intended as only a rhetorical device. Although the approach to deconstructing the WEIRD backronym may seem pragmatically sound, it lacks grounding in a theoretical understanding of cultural variation. A more effective approach would entail conceptualizing cultural distinctions through empirical and theory-driven methodologies (Muthukrishna & Henrich, 2019). Inductive techniques that use more fine-grained measures for defining cultural differences—such as tightness–looseness (Gelfand et al., 2011), individualistic and collectivistic cultures (Hofstede, 2001), or cultural distance (e.g., cultural fixation index [CFST]; Muthukrishna et al., 2020) are preferable.
In addition, the sound measurement of cultural variation does not only depend on choosing a measure that accurately measures cultural variation; it also depends on anticipating the levels at which culture varies and should thus be measured. That is, theoretical predictions should precede both the selection of samples’ representative of the relevant populations and the selection of adequate instruments for measuring cultural variation at that level. Multisite studies often draw on instruments that measure cultural background at the country level based on the country of origin of that site (Gelfand et al., 2011; Hofstede, 2001; Klein et al., 2018; Muthukrishna et al., 2020). Although an understandable design choice, clustering by the country of the sample site can conceal potential psychological variation within the country (Greenfield, 2014), such as the migration background of participants or their parents. Culture is embedded in overlapping distributions of cultural traits within societies, as evidenced in studies focusing on factors such as people’s long-term orientation (Harati & Talhelm, 2023), political preferences (Talhelm et al., 2015), religious beliefs (White et al., 2021), SES (Kraus et al., 2017), language (Faessler et al., 2023), and ethnicities (Desmet et al., 2017). Significant cultural divides also sometimes arise when contrasting state with nonstate societies (Henrich et al., 2005), such as comparing Tanzanians broadly with the specific Hadza community of foragers within Tanzania. What dimensions of cultural variation should be measured in a study depends on the research question being asked. When selecting instruments for measuring cultural variation, researchers thus need to make deliberate choices about what level of cultural variation they are interested in. Country-level measures are generally a poor choice, so psychologists should seek out more finely grained levels of analysis.
What is more, not all population-level variation is due to cultural transmission. Whereas cultural variation may play a role in explaining psychological and behavioral differences, such differences might also be explained by environmental cues, ecological patterns, or even genetic differences (Uchiyama et al., 2022). Especially in the absence of exogenous variation that could isolate the moderating role of culture (see e.g., Faessler et al., 2023; Lonati et al., in press), researchers should be cautious to ascribe all kinds of population-level variation to be associated with cultural differences.
Effect Selection: Theorizing About Cultural Moderation and Generalizability
People’s minds are shaped by millions of years of genetic evolution, thousands of years of cultural evolution, and a short lifetime of individual experiences (Muthukrishna et al., 2021; Schimmelpfennig & Muthukrishna, 2023). Cultural-evolutionary theory describes how people evolved as a cultural species, how culture itself evolves, and how this process influences people’s psychology and behavior (Boyd & Richerson, 1985; Cavalli-Sforza & Feldman, 1981). Cultural-evolutionary theory helps understand the interplay of culture and psychology and offers a way to make principled predictions about which aspects of human psychology and behavior will vary across populations and across which populations.
Without a sound theory to explain possible aspects that vary, researchers are running blind (Muthukrishna & Henrich, 2019). Theory matters and helps navigate the much-needed methodological changes in psychological research (Gervais, 2021). Cultural-evolutionary theory can serve as an inclusive framework to predict the importance of cultural differences, both cross-societal and cross-temporal (Atari & Henrich, 2023; Muthukrishna et al., 2021). The human psyche is cumulatively transmitted via social learning, and differences in these social-learning dynamics can help explain and predict differences in psychology. The selection of effects for testing the generalizability should therefore be based on theoretical, preregistered predictions about whether one would expect the effects to vary across populations (Stroebe, 2019). An effective test would align effects with populations known for significant variations in dimensions relevant to the theory (Norenzayan & Heine, 2005). If initial tests validate cultural differences, subsequent examinations can then expand to wider populations. This incremental approach is not only more refined but also minimizes risks before running expansive multisite projects. It provides a focused yet powerful empirical examination before initiating a broader global generalizability study, which may be challenging to replicate at the same magnitude.
Implications for Future Research
Global research collaborations are crucial for progressing in psychological science and can help address both the replication and generalizability crisis and the narrow sampling from WEIRD populations. Multilab projects like ML2 have the potential to facilitate understanding of cultural variation on cross-societal levels if theory guides methodological and effect selection choices. Naturally, cross-cultural studies should seek to include not just samples but also researchers from different cultures. Ultimately, cross-cultural research would also facilitate the empowerment of emancipated research originating from different cultures (Hansen & Heu, 2020).
Theory is also the key to understanding how culture influences the relationship between generalizability and replicability in psychology. Replicating findings in the same population as the original study is critical to a rigorous scientific protocol. First and foremost, they are a tool to ensure that a given scientific effect is not an idiosyncratic finding of a specific research approach. Note that many psychological phenomena may represent universal human states, whereas others may not, and tests of these replications across samples and settings (i.e., generalizability studies) are needed. A better theoretical understanding of how a cultural context matters thus defines the differences between searching for generalizability and seeking to identify cultural differences. Practically speaking, if theory predicts the universality of a given effect, one could show this by testing it in a few very different cultural contexts. If theory predicts a moderating role by culture, researchers should selectively target confirmation/and nonconfirmation in different contexts (House et al., 2013, 2020). This would help understand when and why to expect deviations from theoretical predictions (e.g., when they are driven by cultural factors) and when these deviations may not be so surprising after all.
Conclusion
Many Labs 2 was a milestone in testing variation in the generalizability of psychological findings across 36 countries. We discuss the degree to which their data can inform about how cultural variation moderates the generalizability of psychological phenomena. By emphasizing targeted sample selection, appropriate cultural measures, and theoretically informed effect choices, we hope to encourage greater attention to population diversity in replicability efforts and a nuanced approach to understanding the complex role of culture in shaping psychological phenomena.
Footnotes
Transparency
Action Editor: David A. Sbarra
Editor: David A. Sbarra
Author Contribution(s)
R. Schimmelpfennig and R. Spicer share first authorship.
