The Myth of the Need for Diversity Among Subjects in Theory-Testing Research: Comments on “Racial Inequality in Psychological Research” by Roberts et al. (2020)

Abstract

Roberts and colleagues focus on two aspects of racial inequality in psychological research, namely an alleged underrepresentation of racial minorities and the effects attributed to this state of affairs. My comment focuses only on one aspect, namely the assumed consequences of the lack of diversity in subject populations. Representativeness of samples is essential in survey research or applied research that examines whether a particular intervention will work for a particular population. Representativeness or diversity is not necessary in theory-testing research, where we attempt to establish laws of causality. Because theories typically apply to all of humanity, all members of humanity (even American undergraduates) are suitable for assessing the validity of theoretical hypotheses. Admittedly, the assumption that a theory applies to all of humanity is also a hypothesis that can be tested. However, to test it, we need theoretical hypotheses about specific moderating variables. Supporting a theory with a racially diverse sample does not make conclusions more valid than support from a nondiverse sample. In fact, cause-effect conclusions based on a diverse sample might not be valid for any member of that sample.

Keywords

diversity of subject populations philosophy of science methodology external validity

Publisher’s Note

This article is part of a collection of articles related to the December 2022 APS Vote of No Confidence in the Editor-in-Chief of Perspectives on Psychological Science. Please see the editorial [https://doi.org/10.1177/17456916241246556] in this issue for further details on the collection.

In their analysis of “racial inequality in psychological research,” Roberts and colleagues focus on two aspects, namely an alleged underrepresentation of racial minorities in all levels of psychological research and the effects they attribute to this underrepresentation. Hommel (2024) has persuasively argued that the perception of an underrepresentation might be the result of a base-rate fallacy. Therefore, I will focus on the effects such an underrepresentation might have on psychological research. Because space restrictions prevent me from commenting on all these alleged effects, I will focus on the section in which they deplore the lack of diversity in participants in psychological research, which supposedly makes this research unrepresentative.

The lack of representativeness in psychological subject populations is a complaint that has frequently been aired in psychological publications. Already in 1946, McNemar complained that “the existing science of human behavior is largely the science of the behavior of sophomores” (p. 333).¹ More recently, Arnett (2016) argued that “research on the whole of humanity is necessary for creating a science that truly represents the whole of humanity” (p. 603). Because research on all of humanity is a tall order, Henrich et al. (2010) tried to suggest more realistic options such that universities should create “non-student subject pools – for example, by setting up permanent psychological and behavioral testing facilities in bus terminals, Fijian villages, rail stations, airports and anywhere diverse where subjects might find themselves with extra time” (p. 82). Compared to these proposals, the demand to increase the number of minority subjects in psychological research that highlights race seems extremely reasonable.

Before social psychologists begin to worry how they could guarantee that an experimental manipulation developed to operationalize a theoretical construct for American undergraduates would reflect the same construct for inhabitants of Fijian villages, or for business executives rushing to catch their plane or train, I can reassure readers that representativeness of subject populations is unnecessary in theory-testing research. This argument has frequently been advanced before (e.g., Calder et al., 1982; Mook, 1983; Stroebe et al., 2018; Stroebe & Nijstad, 2009). It is probably due to the popularity of inductivist notion of “external validity” (Campbell & Stanley, 1966) that these arguments have never gained wider acceptance among psychologists even though it is quite easy to demonstrate their correctness.²

Most psychological theories apply to all of humanity. Although they do not explicitly state this, the fact that they do not specify a particular subpopulation to which they apply implies that they claim implicitly to apply to the totality of mankind. Thus, any subsample of members of mankind (e.g., American undergraduates) would be an appropriate subject for testing such theories. If a theoretical hypothesis is not supported in an experimental study with a group of undergraduates, then the theory has to be rejected. If the hypothesis is supported then the theory has been supported.

This does not mean, however, that such confirmation proves a theory to be true (Popper, 1959).³ A theory can never be proven to be true because it is impossible to rule out all alternative explanations for a given finding. In testing a theory, a researcher has to translate theoretical variables into manipulations and into measures of the effect of these manipulations. There is always the possibility that these “auxiliary hypotheses” (Gadenne, 1976; Trafimow, 2012), which link abstract and unobservable theoretical concepts to empirical manipulations, could be wrong. Alternatively, the experimenter might not have been successful in eliminating potential third variables that might have been responsible for the effect. But the more strong empirical tests a theory has successfully undergone, the more it can be considered as well supported.

The assumption that a theory applies to all of humanity is also only a hypothesis that can be proven wrong. However, conducting research in bus terminals is not a suitable procedure to address this problem. If a theory is supported by experiments in two bus terminals, we cannot be certain that it might have been rejected by a study conducted in an airport (or Fijian village). Similarly, if a theory is supported in a bus terminal but not an airport, we do not know the reason for this discrepancy. To be informative, any test of the assumption that a theory applies only to specific subsections of humanity has to be guided by theory.

Let me use a classic study by Hovland et al. (1949) to illustrate this point. They tested whether one-sided or two-sided communications were more persuasive. They used army soldiers as subjects and found no difference: Both types of communications appeared to be equally persuasive. However, when they divided their participants into subgroups according to their level of education, they found two-sided communications more effective with the more highly educated participants and one-sided communications to be more effective with individuals with lower levels of education. Thus, if they had conducted their research with samples of undergraduates, they would have concluded that two-sided communications were most effective, and if they had done the study with factory workers, they might have found one-sided communications to be most effective. But most importantly, if they had conducted their study with a random sample of humanity—should such a feat be possible—they would have found no difference. And this last conclusion would have been invalid for most members of their sample. Thus, “it is a misconception to assume that research on the whole of humanity would create a science that truly represents the whole of humanity. If no moderation is expected, any subgroup of the population will do equally well, even the often maligned undergraduate students” (Stroebe & Nijstad, 2009, p. 596). Similarly, if race moderates effects, the findings of a study conducted with a racially diverse sample might result in conclusions that do not apply to any of the racial subgroups in that sample.

Therefore, a psychological science that acknowledges potential racial differences in psychological processes has to start out with theories about such differences. To be psychologically meaningful, such differences have to be linked to measurable psychological constructs. Merely demonstrating that minority subjects respond differently from white participants is not psychologically meaningful or interpretable, unless that difference can be attributed to some psychological construct (e.g., attitude, personality trait). However, once such a construct has been identified, it is likely that the racial difference in responses is due to the fact that this particular construct is more or less frequent among different racial groups. Thus, I would not expect that any of the recommendations made by Roberts and colleagues (2020) (e.g., “establish a diversity task force”; “release public diversity reports annually”) is likely to result in a psychological science that reflects racial diversity.

Representativeness of samples, although not important for theory-testing research, is important for many forms of applied research. For example, when researchers aim to determine the percentage of a population that has a particular characteristic (e.g., votes Republican; holds a particular attitude), representativeness of samples is important. In this research, one does not attempt to establish laws of causality but wants to infer from a small sample how certain features are distributed in a large population. Similarly, if one wants to assess the effectiveness of a planned mass media campaign to persuade Fijian villagers to eat more vegetables, one must pretest that intervention with respondents who are representative for that population (i.e., Fijian villagers rather than American undergraduates). But this limitation does not apply to theory testing, which after all is the main focus of research conducted in psychology departments.

Footnotes

Transparency

Action Editor: Klaus Fiedler

Editor: Klaus Fiedler

ORCID iD

Wolfgang Stroebe

Notes

References

Arnett

J. J.

(2016). The neglected 95%: Why American psychology needs to become less American. American Psychologist, 63, 602–614.

Calder

B. J.

Phillips

L. W.

Tybout

A. M.

(1982). The concept of external validity. Journal of Consumer Research, 9(3), 240–244.

Campbell

D. T.

Stanley

J. G.

(1966). Experimental and quasi-experimental designs for research. Rand McNally.

Christie

(1965). Some implications of research trends in social psychology. In Klineberg

Christie

(Eds.), Perspectives in social psychology (pp. 141–152). Holt, Rinehart & Winston.

Gadenne

(1976). Die Gültigkeit psychologischer Untersuchungen [The validity of psychological research]. Kohlhammer.

Henrich

Heine

S. J.

Norenzayan

(2010). The weirdest people in the world? Behavioral and Brain Sciences, 33(2–3), 61–83.

Hommel

(2024). Dealing with diversity in psychological science or ideology? Perspectives on Psychological Science, 19(3), 558–563. doi:10.1177/17456916241236170

Hovland

C. I.

Lumsdaine

Sheffield

F. D.

(1949). Experiments on mass communication. Princeton University Press.

McNemar

(1946). Opinion-attitude methodology. Psychological Bulletin, 43(4), 289–374.

10.

Mook

D. G.

(1983). In defense of external invalidity. American Psychologist, 38(4), 379–387.

11.

Popper

(1959). The logic of scientific discovery. Routledge.

12.

Roberts

S. O.

Bareket-Shavit

Dollins

F. A.

Goldie

P. D.

Mortenson

(2020). Racial inequality in psychological research: Trends of the past and recommendations for the future. Perspectives on Psychological Science, 15(6), 1295–1309.

13.

Stroebe

Gadenne

Nijstad

B. A.

(2018). Do our psychological laws apply only to college students? External validity revisited. Basic and Applied Social Psychology, 40(6), 384–395.

14.

Stroebe

Nijstad

(2009). Do our psychological laws apply only to Americans? American Psychologist, 64, 569.

15.

Stroebe

Strack

(2014). The alleged crisis and the illusion of exact replication. Perspectives on Psychological Science, 9(1), 59–71.

16.

Trafimow

(2012). The role of auxiliary assumptions for the validity of manipulations and measures. Theory & Psychology, 22(4), 486–498.