Sage Journals: Discover world-class research

Abstract

Psychological science holds substantial promise for informing policy decisions but faces challenges in realizing its potential. One widely recognized challenge is bridging the gap between the nonrepresentative study samples commonly used to evaluate interventions and the broader populations that policymakers aim to serve. To address this challenge, we introduce causal effect generalizability, an approach from causal inference and epidemiology, in the form of an accessible, nontechnical tutorial for psychological and behavioral scientists. We use publicly available data from a real-world psychology intervention study to illustrate why causal effects in a nonrepresentative study sample may systematically differ from those in a broader population. We provide a step-by-step guide with user-friendly R functions, enabling researchers to generalize causal effects from a study sample back to the full target population. This approach allows researchers to assess intervention effects in broader populations, offering valuable insights to guide evidence-based policy development. We hope this nontechnical introductory material will assist scholars in enhancing the policy relevance and real-world impact of psychological science.

Keywords

causal inference heterogeneous treatment effects potential outcomes propensity scores generalizability or transportability open data open materials

Psychological science holds great potential for addressing real-world problems and informing policies. Leveraging psychological theories, researchers have developed various interventions toward reducing prejudice (Paluck et al., 2021), changing climate beliefs and behaviors (Vlasceanu et al., 2024), reducing partisan animosity and anti-democratic attitudes (Voelkel et al., 2024), and reducing self-harm (Witt et al., 2021), among many others. Policy implications are often argued and positioned as a priority of the field (Constantino et al., 2022; Dodge et al., 2024; Kross et al., 2023; Paluck et al., 2019; Van Bavel et al., 2020; Verduyn et al., 2017; Williams & Nida, 2014).

Strengthening psychology’s policy relevance involves challenges (IJzerman et al., 2020; Kohrt et al., 2020; Maton & Bishop-Josef, 2006; Siegel et al., 2021; Walker et al., 2018). A widely recognized challenge concerns the study samples commonly used to evaluate interventions (Coppock et al., 2018; Diener et al., 2022; Sears, 1986), for example, college students at universities, volunteers recruited from social media sites, or participants from online crowdsourcing platforms, such as Prolific. Although a randomized study conducted using a convenience, nonrepresentative study sample can provide causal evidence of the intervention’s effectiveness within the sample, the evidence can rarely be extended to broader populations. As a result, the findings of many interventions do not necessarily generalize to the target populations that policymakers aim to serve, such as the general population in a nation. Without a clear understanding of an intervention’s effectiveness in a target population, policymakers risk developing distorted perceptions that can misguide decision-making. This can result in wasted cost, time, and resources on rolling out ineffective interventions; missed opportunities to implement promising ones; undermined public trust in psychological science; and ultimately, the persistence of societal problems.

The pitfalls of using nonrepresentative samples have been extensively discussed in psychological science and are well recognized by researchers (Arnett, 2008; Bauer, 2022; Henrich et al., 2010; Rad et al., 2018; Thalmayer et al., 2021). However, despite researchers’ best intentions and efforts, securing a representative sample is not always feasible. Many factors, such as limited resources, time constraints, study sites’ inaccessibility, and participants’ self-selection, make it difficult to achieve representative samples. These barriers are particularly pronounced for scholars working in underprivileged positions or underfunded institutions. While a complete and attainable solution remains elusive, one step in the right direction is explicitly acknowledging the use of nonrepresentative samples as a limitation in the discussion sections of empirical articles (Clarke et al., 2023).

But beyond acknowledging the lack of generalizability as a limitation, can scholars strengthen the generalizability of intervention studies through statistical methods? More precisely, how can scholars formally quantify an intervention’s effects in a full target population given a nonrepresentative study sample (Lund & Matthews, 2024; Mumford & Schisterman, 2019)? In this article, we introduce a practical approach that empowers substantive researchers to use an intervention study’s findings to inform effects in a target population: causal effect generalizability. First developed in the causal inference and epidemiological literature (Bareinboim & Pearl, 2013, 2016; Cole & Stuart, 2010; Kern et al., 2016; Lesko et al., 2017; Pearl & Bareinboim, 2014) and grounded in the potential outcomes framework (Imbens & Rubin, 2015), this approach is increasingly used in the health and medical sciences. For example, scholars have used this approach to generalize causal evidence of biomedical treatments from clinical study samples to target real-world populations (for a recent review, see Levy et al., 2024).

We aim to extend this approach to psychological science to enhance policy insights in the field. Here, we present an accessible, nontechnical introductory tutorial for psychological scientists. A glossary of causal inference terms we use throughout is presented in Table 1 for readers unfamiliar with the potential outcomes framework. The remainder of this article is organized as follows: We first introduce a real-world study from the psychology literature as a running example. We then formalize why the causal effect of interest in a nonrepresentative study sample can systematically differ from that in the target population. Next, we illustrate how to generalize the causal effect from the intervention study back to the full target population. We have developed user-friendly R (R Core Team, 2024) scripts to aid scholars in implementing the estimation methods. These are included using boxes as part of the illustrating example.¹ Finally, we offer practical recommendations for scholars seeking to generalize causal effects to improve the policy relevance of their developed interventions.

Table 1.

Glossary of Key Causal Inference Terms

Term	Description
Estimand	What a researcher seeks to quantify empirically using an estimation procedure
Potential outcome	An individual’s outcome had they experienced, possibly counter to fact, a given level of the intervention
Causal (effect) estimand	A causal quantity that formalizes a researcher’s causal question in terms of potential outcome(s)
Average causal effect	A causal estimand contrasting the population average potential outcomes under different intervention levels; also termed “average treatment effect”
Study sample	The sample (randomly drawn from a study population) an intervention has been implemented on
Target population	The broader population of policy relevance; the goal is to draw inferences about the potential causal effect of an intervention in the full target population using the possibly nonrepresentative study sample
Study average causal effect	The average causal effect within a study sample
Target average causal effect	The average causal effect among a target population

An Illustrating Example

For illustrative purposes, imagine a research team searching for solutions to reduce the impact of misinformation in the United States. The team learns about a promising intervention from a psychology article: the fact-checking intervention (Hoes et al., 2024). The intervention emphasizes the source (e.g., a politician or news outlet) of inaccurate claims to increase skepticism and prevent misperceptions (i.e., perceived accuracy of false statements; Hoes et al., 2024). In a randomized study, the intervention reduced participants’ misperceptions by −0.35 (on a 4-point scale, where 1 = not at all accurate, 4 = very accurate; Hoes et al., 2024). The study sample comprised participants in the United States recruited by an opinion polling company (Hoes et al., 2024). Would this causal effect extend to the research team’s target population, the general population in the nation?

Why the Effect May Differ

The causal effect of the fact-checking intervention within the study sample would systematically differ from that among the full target population when two scenarios jointly arise. First, the distribution of baseline characteristics of the study sample and target population differs. Second, the causal effects differ depending on the levels of these characteristics (i.e., these characteristics moderate the effect). The resulting different patterns of effect heterogeneity between the study sample and the target population lead to different average causal effects. Continuing our example, suppose that (a) individuals without a college degree were underrepresented in the study sample and (b) the fact-checking intervention was effective only among college-educated participants, with no impact among participants without a degree. Together, these two factors would lead to the intervention having a much weaker effect in the general population than what was observed within the study sample.

We formally define these causal effects using established concepts from the potential outcomes framework, commonly called the “Neyman-Rubin causal model” (Holland, 1986; Rubin, 1990; Splawa-Neyman et al., 1990). Using our running misinformation example, let $A$ denote whether an individual received the fact-checking intervention ( $A = 1$ ) or the control condition ( $A = 0$ ), and let $Y$ denote the postintervention level of misperception. Let $S$ denote whether an individual was part of the study sample ( $S = 1$ ) or a nonparticipant from the target population ( $S = 0$ ). Let $Y^{0}$ and $Y^{1}$ denote an individual’s potential outcomes had the individual, possibly different from what was observed, not experienced or experienced the intervention, respectively.² The average causal effect within the study sample is defined as

E (Y^{1} - Y^{0} | S = 1) = E (Y^{1} | S = 1) - E (Y^{0} | S = 1),

(1)

where $E (\cdot | Z = z)$ denotes the conditional average of a random variable among those with values $Z = z$ . (We use uppercase letters to denote random variables and lowercase letters to denote fixed values.) The study average causal effect in Equation 1 answers the following causal query: What is the intervention’s impact if every individual in the study population hypothetically experienced the intervention versus did not experience the intervention?

In contrast, the corresponding effect among the full target population is

E (Y^{1} - Y^{0}) = E (Y^{1}) - E (Y^{0}) .

(2)

The target average causal effect in Equation 2 answers the following causal query: Among individuals in the full target population, what is the intervention’s potential impact if everyone hypothetically experienced the intervention versus did not experience the intervention?

The crucial difference between Equations 1 and 2 is not that the former is merely a subpopulation delineated by $S$ but that the intervention was implemented only within the study sample $(S = 1)$ . In other words, one observes the intervention and outcome only within the study sample. This implies one can obtain an unbiased and consistent estimator for the effect in Equation 1 within the study population (from which the study sample is randomly drawn). However, recall that the research team’s interest is to evaluate the potential causal effect in Equation 2 among the target population of the general U.S. population. When the study sample is nonrepresentative of the full target population (e.g., the general population in the nation), in the sense that the distribution of baseline covariates differs between the study sample and target population, then Equations 1 and 2 will differ when effects are heterogeneous because of the differentially distributed covariates. Therefore, an estimator of Equation 1 will be biased for Equation 2. Can the research team achieve their goal using findings based on a nonrepresentative sample? Causal effect generalizability provides a solution. In the next section, we illustrate how the team can generalize the causal effect from a possibly nonrepresentative study sample $(S = 1)$ back to the full target population.

Illustration of Generalizing a Causal Effect With Sample R Code

In this section, we use our running example to illustrate generalizing the causal effect of an intervention carried out in a study sample back to the full target population. The causal effect of interest is the effect of a fact-checking intervention ( $A$ ) on misperceptions ( $Y$ ). In an experiment documented in Hoes et al. (2024), a sample of participants in the United States were randomly assigned to the intervention group ( $A = 1$ ) or a control group ( $A = 0$ ). After the intervention, participants’ misperceptions ( $Y$ ) were measured by asking them to rate how accurate they thought two false statements were to the best of their knowledge on a 4-point scale (1 = not at all accurate, 4 = very accurate). Both responses were aggregated to create a single measure of misperceptions.

To ease exposition, for the target population, we used information from 35,000 participants of a national representative sample quota-matched on key demographics (Voelkel et al., 2024). These participants were recruited from nonprobability opt-in internet panels by three different panel suppliers; for further details, see the Sampling Plan subsection of S0.2 in the Supplemental Online Materials of Voelkel et al. (2024).

The first step is to harmonize baseline covariates by aligning those in the study sample reported in the experiment of Hoes et al. (2024) with those in the target population reported in Voelkel et al. (2024). Both studies recorded the following baseline covariates: age, gender, race, education, political-party affiliation, and political ideology. We recoded the covariates so they were on the same scale across the study sample and target population when necessary. For example, to be consistent with the 5-point scale of political ideology in the study sample, we transformed political ideology, which was originally measured on a 7-point scale in the target population, to be on a 5-point scale. Summaries of these covariates (jointly denoted by $C$ ) in the study sample and the target population are shown in Table 2. Table 2 shows that the study sample differed from the target population in several dimensions: The target population tended to be older, had a higher proportion of White individuals, was more likely to have completed college, was more likely to be a Republican, and had greater variability in political ideology.

Table 2.

Baseline Covariates in the Study Sample and Target Population for the Misinformation Example

Covariate		Study sample ( $S = 1$ )	Nonparticipants ( $S = 0$ )	Target population
Gender (%)	Woman or nonbinary	53.5	54.5	54.5
Gender (%)	Man	46.5	45.5	45.5
Age (%)	18–24	15.4	6.3	6.4
	25–34	18.6	14.0	14.1
	35–44	17.4	17.8	17.8
	45–54	17.4	16.9	16.9
	55–64	14.2	20.8	20.7
	65 or older	17.1	24.3	24.2
White (%)	Yes	71.6	80.1	80.0
White (%)	No	28.4	19.9	20.0
Education (%)	Noncollege	59.5	44.4	44.6
Education (%)	College	40.5	55.6	55.4
Political party (%)	Democrat	45.3	44.0	44.0
	Republican	32.8	42.6	42.5
	Independent/other	22.0	13.4	13.5
Political ideology	M	4.1	4.1	4.1
	SD	1.7	1.8	1.8
	IQR	2.8	4.0	4.0
Total		592	35,217	35,809

Note: All results were rounded to one decimal place. The full target population comprises the study sample (S = 1) and nonparticipants (S = 0). IQR = interquartile range.

Next, we composed the full target population by merging individual-level data from the study sample ( $S = 1$ ) with the rest from the target population ( $S = 0$ ). Variable names (column headings) for the merged data included all covariates ( $C$ ), the intervention ( $A$ ), and the outcome ( $Y$ ). For a snapshot of the merged data, see Box 1. Because the intervention was carried out only within the study sample, individuals from the target population who were not in the study ( $S = 0$ ) had missing values for $A$ and $Y$ .

Box 1.

Snapshot of Merged Individual-Level Data for the Study Sample ( $S = 1$ ) and the Rest From the Target Population ( $S = 0$ )

Finally, we used the merged data to generalize the causal effect from the study sample back to the target population. To help researchers implement this in practice, we developed a single R function (“GeneralizeATE”) that calculates three commonly used causal effect estimators: one using only a regression model for the outcome (“Outcome-only”), one using a regression model for being in the study sample relative to the target population (inverse probability of sampling weights; “ISW”), and one combining both models above (doubly robust; “DR”). For a detailed description of the estimators, see Appendix B. Users have to specify only three arguments: (a) the name of the merged data (“data”), (b) the name of the intervention (“treat.name”), and (c) the name of the outcome variable (“out.name”). For the estimates using the observed data, see Box 2.

Box 2.

Sample Analysis Code to Generalize a Causal Effect

Nonparametric percentile bootstrap confidence intervals (Davison & Hinkley, 1997) may be constructed by randomly resampling individuals with replacement from the merged data and then applying the above function to each bootstrap sample. Moreover, it is also feasible to estimate the difference between the effects within the study sample versus among the target population. For example, the former may be estimated by fitting a simple linear regression of $Y$ on $A$ and using the coefficient of $A$ as the estimator. A function to carry out these steps using a bootstrap procedure from the R package boot (Canty & Ripley, 2024) is defined in Box 3.

Box 3.

Example Model Syntax for Calculating Bootstrap Confidence Intervals

The results are shown in Table 3. All three estimates were slightly further from zero, suggesting a stronger causal effect among the full target population. (There was, however, no difference between the effects in the study sample and target population up to sampling variability.) The analysis demonstrated that the effect of the misinformation intervention among the target population is estimated to be similar or slightly stronger than that within the study sample, suggesting that the intervention holds promise in reducing misperceptions of false statements among the general population in the nation.

Table 3.

Average Causal Effect Estimates in the Misinformation Example

Causal Effect Estimand	Estimator	Estimate	SE	95% CI (lower)	95% CI (upper)
Study sample	Linear regression	−0.35	0.07	−0.48	−0.21
Target population	Outcome-only	−0.41	0.10	−0.60	−0.21
	ISW	−0.42	0.10	−0.61	−0.22
	DR	−0.40	0.11	−0.61	−0.19
Difference	Outcome-only	−0.06	0.09	−0.24	0.10
	ISW	−0.08	0.09	−0.25	0.09
	DR	−0.05	0.09	−0.24	0.12

Note: Nonparametric quantile-based bootstrap CIs were calculated using 2,000 samples with replacement. All results were rounded to two decimal places. CI = confidence interval; ISW = inverse probability of sampling weights; DR = doubly robust.

Practical Recommendation: Record Relevant Covariates, Even in Experiments

As demonstrated above, harmonizing covariates is a key step in generalizing causal effects. However, covariate information is routinely absent in many existing randomized studies. We encourage researchers to record a comprehensive set of baseline preintervention covariates in randomized experiments, especially when using a nonrepresentative study sample for various reasons (e.g., researchers’ resource constraints or participants’ self-selection). Carefully recorded covariate information improves the feasibility of generalizing the findings to target populations of interest, practically improving the study’s broader relevance and policy implications.

Although seemingly straightforward, researchers should be aware of how the covariates are measured. Generalizing a causal effect necessitates merging individual-level data in the study sample and nonparticipants from the target population. For example, we recoded covariates during the analytic stage in our illustration. Such covariate harmonization can be practically difficult or impossible (Ikesu et al., 2024; Power et al., 2022). Mismatches in the covariates collected in the study and those available in the target population can arise for various reasons, such as different timings, operational definitions, or measurement scales. Certain covariates may apply only to specific outcomes, such as preintervention outcome measurements, or be context-specific to a target population, such as socioeconomic status. To overcome this difficulty, we recommend that researchers identify data from their target population(s) early in the study-design phase, if possible, so that pertinent covariates—consistent with the measures in the target population(s)—can be measured in the experiment.

Discussion and Conclusion

In this nontechnical tutorial, we introduced causal effect generalizability to psychological researchers. We provided a step-by-step guide with user-friendly R functions to make causal effect generalizability more accessible to a diverse readership. By applying methods for effect generalizability, researchers can conduct rigorous quantitative evaluations of interventions’ effects in clearly defined target populations without implementing the intervention in these populations. This practical approach can offer valuable insights to guide evidence-based policy development. We hope this nontechnical introduction will assist scholars in enhancing the policy relevance and real-world impact of psychological science.

Because of accessibility and space considerations, we did not provide a comprehensive review of all relevant work within this tutorial. We encourage readers to explore the literature beyond the material we covered here. In addition to the references from the causal inference and epidemiological literature stated in the introduction, readers may be interested in other estimation methods (Ackerman et al., 2019; Kern et al., 2016). We implemented three effect estimators developed by Dahabreh et al. (2019) using linear or logistic regression models widely used by and thus familiar to a broad audience of psychology researchers. Alternative estimation methods leveraging machine-learning algorithms, such as Bayesian additive regression trees (Chipman et al., 2010) and targeted maximum likelihood estimation (Van der Laan & Rose, 2018), are available in the R package generalize (Ackerman, 2020). In addition, the method can be applied beyond psychology, such as in education research.³ For example, Tipton (2014) provided a generalizability index to assess how generalizable a study sample is to a target population. Tipton and Olsen (2018) provided a general guide and discussion on generalizing causal effects, specifically, in education research. Finally, like any other statistical method, causal effect generalizability relies on assumptions. These assumptions are formalized in Appendix A and were discussed in depth by Degtiar and Rose (2023).

Although not the focus of this article, scholars can use the same framework to apply methods for causal effect transportability (Bareinboim & Pearl, 2013; Levy et al., 2024; Pearl & Bareinboim, 2014; Westreich et al., 2017). Whereas generalizability is used to extend causal effects from a study sample back to the full target population (that the study sample is nested within), transportability is used to extend causal effects to a target population distinct from and nonoverlapping with the study sample (Lesko et al., 2017). For example, scholars may aim to transport findings from studies conducted in urban areas to rural settings, or across countries, or from historical periods to the present. Although effect transportability offers exciting possibilities, it also introduces complexities. For example, interventions in psychological science often target behaviors, attitudes, and beliefs that can be deeply rooted in political and cultural contexts. These factors may constrain the feasibility of transporting a behavioral intervention to disparate or disjoint populations potentially incompatible with the study sample. Nonetheless, effect transportability can be a valuable tool, using the same class of methods as effect generalizability. We hope this article serves as an accessible introduction and a starting point for researchers interested in exploring these approaches.

Footnotes

Appendix A

Appendix B

Acknowledgements

We are grateful to Cande V. Ananth, Özge Gürcanlı Fischer-Baum, and Michelle Helinski for discussions that seeded the ideas for this article.

Transparency

Action Editor: Pamela Davis-Kean

Editor: David A. Sbarra

Author Contributions

Both authors contributed equally to this article and shared joint first authorship, but they are listed alphabetically by last name.

Wen Wei Loh: Conceptualization; Formal analysis; Funding acquisition; Methodology; Software; Writing – original draft; Writing – review & editing.

Dongning Ren: Conceptualization; Funding acquisition; Investigation; Writing – original draft; Writing – review & editing.

ORCID iDs

Wen Wei Loh

Dongning Ren

Notes

References

Ackerman

(2020). generalize: R package for generalizing randomized trial findings to a well-defined target population. https://github.com/benjamin-ackerman/generalize

Ackerman

Schmid

Rudolph

K. E.

Seamans

M. J.

Susukida

Mojtabai

Stuart

E. A.

(2019). Implementing statistical methods for generalizing randomized trial findings to a target population. Addictive Behaviors, 94, 124–132. https://doi.org/10.1016/j.addbeh.2018.10.033

Arnett

J. J.

(2008). The neglected 95%: Why American psychology needs to become less American. American Psychologist, 63(7), 602–614. https://doi.org/10.1037/0003-066X.63.7.602

Austin

P. C.

(2011). An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behavioral Research, 46(3), 399–424. https://doi.org/10.1080/00273171.2011.568786

Bang

Robins

J. M.

(2005). Doubly robust estimation in missing data and causal inference models. Biometrics, 61(4), 962–973. https://doi.org/10.1111/j.1541-0420.2005.00377.x

Bareinboim

Pearl

(2013). A general algorithm for deciding transportability of experimental results. Journal of Causal Inference, 1(1), 107–134. https://doi.org/doi:10.1515/jci-2012-0004

Bareinboim

Pearl

(2016). Causal inference and the data-fusion problem. Proceedings of the National Academy of Sciences, 113(27), 7345–7352. https://doi.org/10.1073/pnas.1510507113

Bauer

P. J.

(2022). Psychological science stepping up a level. Psychological Science, 33(2), 179–183. https://doi.org/10.1177/09567976221078527

Canty

Ripley

B. D.

(2024). boot: Bootstrap R (S-Plus) functions [R package version 1.3-31]. https://cran.r-project.org/web/packages/boot/

10.

Chipman

H. A.

George

E. I.

McCulloch

R. E.

(2010). BART: Bayesian additive regression trees. The Annals of Applied Statistics, 4(1), 266–298. https://doi.org/10.1214/09-AOAS285

11.

Clarke

Schiavone

Vazire

(2023). What limitations are reported in short articles in social and personality psychology? Journal of Personality and Social Psychology, 125(4), 874–901. https://doi.org/10.1037/pspp0000458

12.

Cole

S. R.

Stuart

E. A.

(2010). Generalizing evidence from randomized clinical trials to target populations: The ACTG 320 trial. American Journal of Epidemiology, 172(1), 107–115. https://doi.org/10.1093/aje/kwq084

13.

Constantino

S. M.

Sparkman

Kraft-Todd

G. T.

Bicchieri

Centola

Shell-Duncan

Vogt

Weber

E. U.

(2022). Scaling up change: A critical review and practical guide to harnessing social norms for climate action. Psychological Science in the Public Interest, 23(2), 50–97. https://doi.org/10.1177/15291006221105279

14.

Cook

R. R.

Foot

Arah

O. A.

Humphreys

Rudolph

K. E.

Luo

S. X.

Tsui

J. I.

Levander

X. A.

Korthuis

P. T.

(2023). Estimating the impact of stimulant use on initiation of buprenorphine and extended-release naltrexone in two clinical trials and real-world populations. Addiction Science & Clinical Practice, 18, Article 11. https://doi.org/10.1186/s13722-023-00364-3

15.

Coppock

Leeper

T. J.

Mullinix

K. J.

(2018). Generalizability of heterogeneous treatment effect estimates across samples. Proceedings of the National Academy of Sciences, 115(49), 12441–12446. https://doi.org/10.1073/pnas.1808083115

16.

Dahabreh

I. J.

Robertson

S. E.

Tchetgen

E. J.

Stuart

E. A.

Hernán

M. A.

(2019). Generalizing causal inferences from individuals in randomized trials to all trial-eligible individuals. Biometrics, 75(2), 685–694. https://doi.org/10.1111/biom.13009

17.

Davison

A. C.

Hinkley

D. V.

(1997). Bootstrap methods and their applications. Cambridge University Press. https://doi.org/10.1017/CBO9780511802843

18.

Degtiar

Rose

(2023). A review of generalizability and transportability. Annual Review of Statistics and Its Application, 10, 501–524. https://doi.org/10.1146/annurev-statistics-042522-103837

19.

Diener

Northcott

Zyphur

M. J.

West

S. G.

(2022). Beyond experiments. Perspectives on Psychological Science, 17(4), 1101–1119. https://doi.org/10.1177/17456916211037670

20.

Dodge

K. A.

Prinstein

M. J.

Evans

A. C.

Ahuvia

I. L.

Alvarez

Beidas

R. S.

Brown

A. J.

Cuijpers

Denton

E.-g.

Hoagwood

K. E.

Johnson

Kazdin

A. E.

McDanal

Metzger

I. W.

Rowley

S. N.

Schleider

Shaw

D. S.

(2024). Population mental health science: Guiding principles and initial agenda. American Psychologist, 79(6), 805–823. https://doi.org/10.1037/amp0001334

21.

Harder

V. S.

Stuart

E. A.

Anthony

J. C.

(2010). Propensity score techniques and the assessment of measured covariate balance to test causal associations in psychological research. Psychological Methods, 15(3), 234–249. https://doi.org/10.1037/a0019623

22.

Henrich

Heine

S. J.

Norenzayan

(2010). The weirdest people in the world? Behavioral and Brain Sciences, 33(2–3), 61–83. https://doi.org/10.1017/S0140525X0999152X

23.

Hoes

Aitken

Zhang

Gackowski

Wojcieszak

(2024). Prominent misinformation interventions reduce misperceptions but increase scepticism. Nature Human Behaviour, 8(8), 1545–1553. https://doi.org/10.1038/s41562-024-01884-x

24.

Holland

P. W.

(1986). Statistics and causal inference. Journal of the American Statistical Association, 81(396), 945–960.

25.

IJzerman

Lewis

N. A.

Przybylski

A. K.

Weinstein

DeBruine

Ritchie

S. J.

Vazire

Forscher

P. S.

Morey

R. D.

Ivory

J. D.

Anvari

(2020). Use caution when applying behavioural science to policy. Nature Human Behaviour, 4(11), 1092–1094. https://doi.org/10.1038/s41562-020-00990-w

26.

Ikesu

Zimmerman

S. C.

Inoue

Buto

Power

M. C.

Schaefer

C. A.

Glymour

M. M.

Mayeda

E. R.

(2024). Representativeness of participants in the ACCORD trial compared to middle-aged and older adults living with diabetes in the United States. Epidemiology, 35(4), 432–436. https://doi.org/10.1097/EDE.0000000000001746

27.

Imbens

G. W.

Rubin

D. B.

(2015). Causal inference in statistics, social, and biomedical sciences. Cambridge University Press.

28.

Kern

H. L.

Stuart

E. A.

Hill

Green

D. P.

(2016). Assessing methods for generalizing experimental impact estimates to target populations. Journal of Research on Educational Effectiveness, 9(1), 103–127. https://doi.org/10.1080/19345747.2015.1060282

29.

Kohrt

B. A.

El Chammay

Dossen

S. B.

(2020). Policy makers’ tough choices for psychological interventions in global mental health: Learning from multisite studies. JAMA Psychiatry, 77(5), 452–454. https://doi.org/10.1001/jamapsychiatry.2019.4267

30.

Kross

Ong

Ayduk

(2023). Self-reflection at work: Why it matters and how to harness its potential and avoid its pitfalls. Annual Review of Organizational Psychology and Organizational Behavior, 10, 441–464. https://doi.org/10.1146/annurev-orgpsych-031921-024406

31.

Lesko

C. R.

Buchanan

A. L.

Westreich

Edwards

J. K.

Hudgens

M. G.

Cole

S. R.

(2017). Generalizing study results: A potential outcomes perspective. Epidemiology, 28(4), 553–561. https://doi.org/10.1097/EDE.0000000000000664

32.

Levy

N. S.

Arenas

P. J.

Jemielita

Mt-Isa

McElwee

Lenis

Campbell

U. B.

Jaksa

Hair

G. M.

(2024). Use of transportability methods for real-world evidence generation: A review of current applications. Journal of Comparative Effectiveness Research, 13(11), Article e240064. https://doi.org/10.57264/cer-2024-0064

33.

Loh

W. W.

Ren

(2022). Estimating social influence in a social network using potential outcomes. Psychological Methods, 27(5), 841–855. https://doi.org/10.1037/met0000356

34.

Loh

W. W.

Ren

(2023). The unfulfilled promise of longitudinal designs for causal inference. Collabra: Psychology, 9(1), Article 89142. https://doi.org/10.1525/collabra.89142

35.

Lund

J. L.

Matthews

A. A.

(2024). Identifying target populations to align with decision-makers’ needs. American Journal of Epidemiology, 193(11), 1503–1506. https://doi.org/10.1093/aje/kwae129

36.

Maton

K. I.

Bishop-Josef

S. J.

(2006). Psychological research, practice, and social policy: Potential pathways of influence. Professional Psychology: Research and Practice, 37(2), 140–145. https://doi.org/10.1037/0735-7028.37.2.140

37.

Mumford

S. L.

Schisterman

E. F.

(2019). New methods for generalizability and transportability: The new norm. European Journal of Epidemiology, 34(8), 723–724. https://doi.org/10.1007/s10654-019-00532-3

38.

Paluck

E. L.

Green

S. A.

Green

D. P.

(2019). The contact hypothesis re-evaluated. Behavioural Public Policy, 3(2), 129–158. https://doi.org/10.1017/bpp.2018.25

39.

Paluck

E. L.

Porat

Clark

C. S.

Green

D. P.

(2021). Prejudice reduction: Progress and challenges. Annual Review of Psychology, 72(1), 533–560. https://doi.org/10.1146/annurev-psych-071620-030619

40.

Pearl

(2010). On the consistency rule in causal inference: Axiom, definition, assumption, or theorem? Epidemiology, 21(6), 872–875. https://doi.org/10.1097/EDE.0b013e3181f5d3fd

41.

Pearl

Bareinboim

(2014). External validity: From do-calculus to transportability across populations. Statistical Science, 29(4), 579–595. https://doi.org/10.1214/14-STS486

42.

Poppe

Steen

Loh

W. W.

Crombez

De Block

Jacobs

Tennant

P. W. G.

Cauwenberg

J. V.

De Paepe

A. L.

(2025). How to develop causal directed acyclic graphs for observational health research: A scoping review. Health Psychology Review, 19(1), 45–65. https://doi.org/10.1080/17437199.2024.2402809

43.

Power

M. C.

Engelman

B. C.

Wei

Glymour

M. M.

(2022). Closing the gap between observational research and randomized controlled trials for prevention of Alzheimer disease and dementia. Epidemiologic Reviews, 44(1), 17–28. https://doi.org/10.1093/epirev/mxac002

44.

R Core Team. (2024). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/

45.

Rad

M. S.

Martingano

A. J.

Ginges

(2018). Toward a psychology of Homo sapiens: Making psychological science more representative of the human population. Proceedings of the National Academy of Sciences, 115(45), 11401–11405. https://doi.org/10.1073/pnas.1721165115

46.

Ren

Loh

W. W.

(2024). Advancing group-based disparities research and beyond: A cautionary note on selection bias. Advances in Methods and Practices in Psychological Science, 7(3). https://doi.org/10.1177/25152459241260256

47.

Rubin

D. B.

(1986). Comment: Which ifs have causal answers. Journal of the American Statistical Association, 81(396), 961–962. https://doi.org/10.1080/01621459.1986.10478355

48.

Rubin

D. B.

(1990). [On the application of probability theory to agricultural experiments. essay on principles. Section 9.] Comment: Neyman (1923) and causal inference in experiments and observational studies. Statistical Science, 5(4), 472–480. https://doi.org/10.1214/ss/1177012032

49.

Schafer

J. L.

Kang

(2008). Average causal effects from nonrandomized studies: A practical guide and simulated example. Psychological Methods, 13(4), 279–313. https://doi.org/10.1037/a0014268

50.

Sears

D. O.

(1986). College sophomores in the laboratory: Influences of a narrow data base on social psychology’s view of human nature. Journal of Personality and Social Psychology, 51(3), 515–530. https://doi.org/10.1037/0022-3514.51.3.515

51.

Siegel

Arenson

Mikytuck

Woolard

(2021). Engaging public policy with psychological science. Translational Issues in Psychological Science, 7(1), 1–8. https://doi.org/10.1037/tps0000284

52.

Splawa-Neyman

Dabrowska

D. M.

Speed

T. P.

(1990). On the application of probability theory to agricultural experiments. Essay on principles. Section 9. Statistical Science, 5(4), 465–472. https://doi.org/10.1214/ss/1177012031

53.

Thalmayer

A. G.

Toscanelli

Arnett

J. J.

(2021). The neglected 95% revisited: Is American psychology becoming less American? American Psychologist, 76(1), 116–129. https://doi.org/10.1037/amp0000622

54.

Tipton

(2014). How generalizable is your experiment? An index for comparing experimental samples and populations. Journal of Educational and Behavioral Statistics, 39(6), 478–501. https://doi.org/10.3102/1076998614558486

55.

Tipton

Olsen

R. B.

(2018). A review of statistical methods for generalizing from evaluations of educational interventions. Educational Researcher, 47(8), 516–524. https://doi.org/10.3102/0013189X18781522

56.

Van Bavel

J. J.

Baicker

Boggio

P. S.

Capraro

Cichocka

Cikara

Crockett

M. J.

Crum

A. J.

Douglas

K. M.

Druckman

J. N.

Drury

Dube

Ellemers

Finkel

E. J.

Fowler

J. H.

Gelfand

Han

Haslam

S. A.

Jetten

. . . Willer

. (2020). Using social and behavioural science to support COVID-19 pandemic response. Nature Human Behaviour, 4(5), 460–471. https://doi.org/10.1038/s41562-020-0884-z

57.

Van der Laan

M. J.

Rose

. (2018). Targeted learning in data science. Springer.

58.

VanderWeele

T. J.

Knol

M. J.

(2014). A tutorial on interaction. Epidemiologic Methods, 3(1), 33–72. https://doi.org/10.1515/em-2013-0005

59.

Verduyn

Ybarra

Résibois

Jonides

Kross

(2017). Do social network sites enhance or undermine subjective well-being? A critical review. Social Issues and Policy Review, 11(1), 274–302. https://doi.org/10.1111/sipr.12033

60.

Vlasceanu

Doell

K. C.

Bak-Coleman

J. B.

Todorova

Berkebile-Weinberg

M. M.

Grayson

S. J.

Patel

Goldwert

Pei

Chakroff

Pronizius

van den Broek

K. L.

Vlasceanu

Constantino

Morais

M. J.

Schumann

Rathje

Fang

Aglioti

S. M.

. . . Van Bavel

J. J.

(2024). Addressing climate change with behavioral science: A global intervention tournament in 63 countries. Science Advances, 10(6), Article eadj5778. https://doi.org/10.1126/sciadv.adj5778

61.

Voelkel

J. G.

Stagnaro

M. N.

Chu

J. Y.

Pink

S. L.

Mernyk

J. S.

Redekopp

Ghezae

Cashman

Adjodah

Allen

L. G.

Allis

L. V.

Baleria

Ballantyne

Van Bavel

J. J.

Blunden

Braley

Bryan

C. J.

Celniker

J. B.

Cikara

. . . Willer

(2024). Megastudy testing 25 treatments to reduce antidemocratic attitudes and partisan animosity. Science, 386(6719), Article eadh4764. https://doi.org/10.1126/science.adh4764

62.

Walker

Speed

Taggart

(2018). Turning psychology into policy: A case of square pegs and round holes? Palgrave Communications, 4, Article 108. https://doi.org/10.1057/s41599-018-0159-8

63.

West

S. G.

Cham

Thoemmes

Renneberg

Schulze

Weiler

(2014). Propensity scores as a basis for equating groups: Basic principles and application in clinical treatment outcome research. Journal of Consulting and Clinical Psychology, 82(5), 906–919. https://doi.org/10.1037/a0036387

64.

Westreich

Edwards

J. K.

Lesko

C. R.

Stuart

Cole

S. R.

(2017). Transportability of trial results using inverse odds of sampling weights. American Journal of Epidemiology, 186(8), 1010–1014. https://doi.org/10.1093/aje/kwx164

65.

Williams

K. D.

Nida

S. A.

(2014). Ostracism and public policy. Policy Insights From the Behavioral and Brain Sciences, 1(1), 38–45. https://doi.org/10.1177/2372732214549753

66.

Witt

Hetrick

Rajaram

Hazell

Taylor Salisbury

Townsend

Hawton

(2021). Psychosocial interventions for self-harm in adults. Cochrane Database of Systematic Reviews, 4, Article CD013668. https://doi.org/10.1002/14651858.CD013668.pub2

From Experiments to Policy Insights: Generalizing Causal Effects From Study Samples to Target Populations

Abstract

Keywords

An Illustrating Example

Why the Effect May Differ

Illustration of Generalizing a Causal Effect With Sample R Code

Practical Recommendation: Record Relevant Covariates, Even in Experiments

Discussion and Conclusion

Footnotes

Appendix A

Appendix B

Acknowledgements

Transparency

ORCID iDs

Notes

References