Sage Journals: Discover world-class research

Abstract

Addressing core questions in diversity science requires quantifying causal effects (e.g., what drives social inequities and how to reduce them). Conventional approaches target the average causal effect (ACE), but ACE-based analyses suffer from limitations that undermine their relevance for diversity science. In this article, we introduce a novel alternative from the causal inference literature: the so-called incremental propensity score (IPS). First, we explain why the IPS is well suited for investigating core queries in diversity science. Unlike the ACE, the IPS does not demand conceptualizing unrealistic counterfactual scenarios in which everyone in the population is uniformly exposed versus unexposed to a causal factor. Instead, the IPS focuses on the effect of hypothetically shifting individuals’ chances of being exposed along a continuum. This allows seeing how the effect may be graded, offering a more realistic and policy-relevant quantification of the causal effect than a single ACE estimate. Moreover, the IPS does not require the positivity assumption, a necessary condition for estimating the ACE but which rarely holds in practice. Next, to broaden accessibility, we provide a step-by-step guide on estimating the IPS using R, a free and popular software. Finally, we illustrate the IPS using two real-world examples. The current article contributes to the methodological advancement in diversity science and offers researchers a more realistic, relevant, and meaningful approach.

Keywords

causal inference inverse probability weighting potential outcomes social inequality social justice

Social equity is a fundamental tenet of diversity science. Extensive research seeks to identify the causes of inequalities, evaluate their consequences, and develop strategies for transformation; see, for example, Devine and Ash (2022); Hebl et al. (2020); Juvonen et al. (2019); National Academies of Sciences, Engineering, and Medicine (2023); Paluck et al. (2021); and Valantine and Collins (2015). Example questions include: To what extent does racism drive racial disparities in life expectancies? What is the impact of social exclusion on mental health among the LGBTQ+ community? Do voters’ misperceptions of equality-enhancing policies spur their opposition to the policies? Does supervisor support have the potential to reduce gender disparities at work? Addressing these queries necessitates meaningful and realistic quantifications of causal effects.

When assessing the impact of a focal causal factor, researchers routinely estimate the so-called average causal effect (ACE). In this article, we explain why the ACE is inherently ill-suited for answering core causal queries in diversity science. We introduce an alternative causal quantity: the so-called incremental propensity score (IPS), a novel approach recently developed in the statistical literature (Kennedy, 2019; for recent applications in criminology and epidemiology, respectively, see Jacobs et al., 2023; Naimi, Rudolph, et al., 2021). In the following sections, we first explain what the IPS is and put forth that the IPS, although originally developed for quantifying the effects of interventions, can be applied broadly beyond intervention studies. In particular, we explain why the IPS is uniquely well suited for addressing important causal questions in diversity science. Next, we describe how to estimate the IPS. To increase the accessibility of the material for our target audience of social science and psychological researchers, we illustrate the procedure using the widely used and freely accessible statistical software R (R Core Team, 2023). Finally, we apply the IPS to two publicly available real-world data sets. All R scripts are available online at https://github.com/wwloh/ips-diversity.

Challenges of Using the ACE for Diversity Science

We first define and explain the ACE causal quantity routinely used in diversity science and social sciences broadly. We draw on concepts from the established potential outcomes framework, commonly called the Neyman-Rubin causal model (Rubin, 1990; Splawa-Neyman et al., 1990). Let $X$ denote whether an individual was exposed to a focal causal factor $(X = 1)$ , such as being socially excluded, or unexposed $(X = 0)$ and let $Y$ denote the outcome of interest. Let $Y^{1}$ and $Y^{0}$ denote an individual’s potential outcomes had the individual, possibly counter to actual exposure, been exposed or unexposed, respectively.¹ Then, the ACE (Rubin, 1974) utilized in routine analyses is defined as

ACE \equiv E (Y^{1}) - E (Y^{0}) .

(1)

The ACE quantifies the average difference in potential outcomes for the entire study population. Continuing with our example, the ACE conceives the effect of social exclusion as the average difference between two extreme counterfactual scenarios: one in which all individuals are excluded and another in which all individuals are included.

Having defined the ACE causal quantity, we can now clarify two challenges of adopting the ACE for diversity science. First, the ACE may be uninterpretable. Although it is clear-cut to conceive competing counterfactual scenarios for a treatment whereby all individuals could be either exposed or unexposed in reality (e.g., a drug, the color of a visual stimulus, momentary sadness, spending time with a friend), this is rarely the case in diversity science. Causal factors of substantive interest in diversity science (e.g., the experiences of minoritized individuals, diversity-related attitudes and perceptions, and access to support or resources in times of need) are rarely experienced uniformly by all individuals in real-world settings. Instead, individuals’ conceivabilities of being exposed vary along a continuum—possibly as a function of different dimensions (e.g., intersecting social identities and socioeconomic status, among others; see e.g., Bhattacharyya & Berdahl, 2023; Hebl et al., 2020; Kurzban & Leary, 2001; Moss-Racusin, 2021). This renders contrasting two one-size-fits-all counterfactual scenarios (all individuals being exposed vs. unexposed) uninterpretable. This contrast undermines the ACE’s practical relevance for policy development: It is unrealistic for any intervention to result in either scenario.

Second, causal factors of interest in diversity science are often inextricably interrelated, such as unemployment and social exclusion (Boardman et al., 2022; Morrish & Medina-Lara, 2021), living conditions and access to health care (Caldwell et al., 2016; O’Shea et al., 2023), or minority status and diversity beliefs (Avery, 2011; Skinner-Dorkenoo et al., 2023). This can lead to violations of the so-called positivity assumption—a necessary condition when targeting the ACE. The positivity assumption (Petersen et al., 2012; Westreich & Cole, 2010) states that among individuals with the same values of the baseline or pretreatment covariates $C$ , the probability of being exposed to the focal causal factor $(X = 1)$ must be strictly between 0 and 1; formally, for all unique values of $C = c$ ,

0 < P r (X = 1 | C = c) < 1 .

(2)

Note that the conditional probability of being exposed $(X = 1)$ given the covariates $C$ is termed the “propensity score” (Rosenbaum & Rubin, 1983; West et al., 2014), which we hereafter denote by π $(C)$ to emphasize its dependence on $C$ .

Continuing our example, this assumption stipulates that there must be individuals who are exposed $(X = 1)$ and who are unexposed $(X = 0)$ to social exclusion within every stratum defined by a unique value of $C$ in the study population. Suppose there are three baseline covariates: race, immigration background, and unemployment. Then, excluded $(X = 1)$ and included $(X = 0)$ individuals must exist for each unique combination of these covariates.

In randomized experiments, positivity can be satisfied by design because researchers determine the chances of assignment (e.g., $0.5$ ). But in naturalistic observational studies, positivity is likely to be violated because of individuals with zero or unit probability of being exposed, such as being socially excluded; that is, their propensity score is exactly $π (C) = 0$ or $π (C) = 1$ , respectively.² In principle, positivity violations can be diagnosed empirically, such as by judging whether the distribution of the covariates fails to overlap between excluded and included individuals. But this can be difficult or impossible practically when they occur only in certain combinations of two or more covariates (Westreich & Cole, 2010). For instance, there may be individuals with a specific combination of covariates (e.g., racial minorities, immigrants, and unemployed) who are perfectly certain to experience social exclusion.³

IPS as a Better-Suited Alternative to ACE for Assessing Causal Queries in Diversity Science

We now introduce a fresh approach for investigating the core causal questions in diversity science: the IPS (Kennedy, 2019). The IPS effectively addresses both challenges of the ACE, as we outlined in the previous section. First, the IPS quantifies an alternative causal estimand from shifting individuals’ chances for being exposed; hence, the IPS avoids the ACE-based comparison of extreme counterfactual scenarios for the entire population. Second, the IPS does not require the positivity assumption; hence, the IPS is robust to violations of the positivity assumption that would otherwise preclude ACE-based analyses. In the following sections, we explain what the IPS is and how to estimate its implied causal estimand.

The IPS aims to answer a straightforward question: “What would outcomes look like, on average, if the chances of exposure were changed by a given amount?” The IPS approach starts by envisaging a desired interventional probability of exposure to a causal factor $(X = 1)$ and then evaluating the average potential outcome under this desired exposure level. Therefore, the IPS quantifies the average outcome if the naturally occurring propensity scores $(C)$ could be hypothetically shifted to a different distribution instead. This answers the query of what the average outcome would be if an individual’s chances of being exposed were shifted or changed to a different predetermined amount. The ACE is equivalent to specific instances of the IPS because the ACE quantifies the difference in average outcomes between two extreme counterfactual scenarios in which all individuals’ chances of exposure are set to exactly 0 or 1. We elaborate on this later.

To focus ideas, consider an individual whose probability of being socially excluded is $0.6$ ; that is, their naturally occurring propensity score in the observed data is $π (C) = 0.6$ . Using the IPS, researchers can answer the causal query: How would this individual’s health outcome improve if the individual’s probability of being excluded could be hypothetically decreased from the status quo (e.g., from $0.6$ to $0.3$ )? We further illustrate the IPS and its comparison with the ACE using a hypothetical example in Box 1.

Box 1:

Why Is the Incremental Propensity Score More Relevant Than the Average Causal Effect? A Hypothetical Example

Dr. A is interested in reducing the prevalence of depression among racial minorities. Social exclusion is an established risk factor for depression among this population. How should Dr. A quantify the causal effect of exclusion $(X)$ on depression $(Y)$ among racial minorities? The routine approach would be to use the average causal effect (ACE). The ACE contrasts two extreme, unrealistic scenarios: One in which all individuals who identify as racial minorities are deterministically excluded versus another in which none are excluded. The contrast between these two contrived scenarios makes the ACE a poor choice because it fails to provide a meaningful answer to the scientific query. Furthermore, the ACE has little to no real-world relevance or implications because it is unfeasible for a policy to either enforce or eliminate exclusion for everyone.

Realistically, it is more meaningful to consider what would happen if individuals’ chances of being excluded could be reduced (instead of uniformly eliminated). This is precisely what the incremental propensity score (IPS) offers. Using the IPS, Dr. A can answer the causal question: What would depression, possibly contrary to fact, be on average if individuals’ propensity of being excluded were reduced by a given amount over the status quo? The IPS causal effect, therefore, represents the change in average depression among racial minorities that would be brought about if the propensity of being excluded were reduced (without necessarily being forced to zero). With this causal evidence in hand, researchers, practitioners, and policymakers can design and develop interventions toward a realistic goal—reducing the propensity of experiencing exclusion among racial minorities. Such interventions hold the promise of improving mental health outcomes in this population. (Note that the IPS is not intended to inform the specific designs of putative interventions; we return to this point in the Discussion section.)

We formally define an interventional probability distribution to answer this new causal query. Let $Q_{δ} (X = 1 | C)$ denote the desired interventional probability of being socially excluded $(X = 1)$ given $C$ , where $δ$ is a shift parameter describing the desired shift in the probability. Suppose that the shift parameter $δ$ describes the odds ratio between the desired probability $Q_{δ} (X = 1 | C)$ and the naturally occurring propensity score π $(C)$ :

δ = \frac{Q_{δ} (X = 1 | C)}{1 - Q_{δ} (X = 1 | C)} / \frac{π (C)}{1 - π (C)} .

(3)

Solving for $Q_{δ} (X = 1 | C)$ yields the functional form:

Q_{δ} (X = 1 | C) = \frac{δ π (C)}{1 + (δ - 1) π (C)} .

(4)

We emphasize that π $(C)$ , which we term the “organic” propensity score, denotes the (typically unknown) naturally occurring probability of being socially excluded $(X = 1)$ given $C$ . In contrast, $Q_{δ} (X = 1 | C)$ characterizes the desired interventional probability of being socially excluded given $C$ ; it is thus termed an “incremental propensity score intervention” (Bonvini et al., 2023; Kennedy, 2019), referred to hereafter as IPS for simplicity.

The average potential outcome under a given IPS $Q_{δ} (X = 1 | C)$ is then defined as

E (Y^{Q_{δ}}) \equiv \sum_{c \in C} \sum_{x = 0}^{1} E (Y^{x} | C = c) Q_{δ} (X = x | C = c) Prob (C = c),

(5)

where $E (A | B = b)$ denotes the conditional expectation of a random variable $A$ among those with a specific value(s) of the random variable(s) $B = b$ , and $C$ denotes the covariate space. Uppercase letters represent random variables or potential outcomes, and lowercase letters indicate specific or realized values. In words, the IPS causal estimand (IPSCE) in Equation 5 comprises a weighted average of the potential outcomes $Y^{0}$ and $Y^{1}$ , with weights corresponding to the IPSs $Q_{δ} (X = 0 | C)$ and $Q_{δ} (X = 1 | C)$ , respectively.

The IPSCE can be interpreted as the average potential outcome when the given IPS $Q_{δ} (X = 1 | C)$ describes the desired (or hypothetical) propensity of being socially excluded. Crucially, this estimand merely conceptualizes changing—or shifting—each individual’s chances of being excluded; hypothetical exclusion status remains stochastic. In contrast, the components of the ACE, $E (Y^{0})$ and $E (Y^{1})$ , are interpreted as the average potential outcome when hypothetical exclusion or inclusion is deterministically imposed on the entire population. In Appendix A, we elaborate on the comparisons between the IPSCE in Equation 5 and the ACE, including the extreme scenarios for the IPS that are equivalent to the ACE.

We emphasize that the IPS $Q_{δ} (X = 1 | C)$ is a probability that must take values between 0 and 1. This ensures that $Q_{δ} (X = x | C)$ is a valid probability distribution for $x = 0, 1$ . For this reason, the IPS in Equation 4 has the desirable property of ensuring that $Q_{δ} (X = 1 | C)$ is between 0 and 1 regardless of the value of $π (C)$ . We visualize this in Figure 1 by plotting $Q_{δ} (X = 1 | C)$ and $π (C)$ under different values of the shift parameter $δ$ . All the lines representing transformations between $Q_{δ} (X = 1 | C)$ and $π (C)$ are within the interior of the unit square, which ensures that $Q_{δ} (X = 1 | C)$ remains a valid probability. Under this IPS, the shift parameter $δ$ can take any value in the interval $(0, \infty)$ , with a unit value $(δ = 1)$ corresponding to no desired shift in the propensity score.⁴ In the next section, we provide a step-by-step guide on estimating the IPSCE.

Fig. 1.

The incremental propensity score $Q_{δ} (X = 1 | C)$ as a function of the organic propensity score π $(C)$ and the shift parameter $δ$ , where $δ$ encodes the odds ratio. Each plotted line corresponds to a value of $δ$ that encodes the relationship between $Q_{δ} (X = 1 | C)$ and $π (C)$ as stated in the heading. For visual clarity, thicker lines correspond to larger differences between $Q_{δ} (X = 1 | C)$ and $π (C)$ . Values of $δ$ where $Q_{δ} (X = 1 | C) > π (C)$ are drawn as solid lines; values of $δ$ where $Q_{δ} (X = 1 | C) < π (C)$ are drawn as broken lines. The null where $Q_{δ} (X = 1 | C) = π (C)$ is indicated by the thickest solid diagonal line.

How to Estimate the IPSCE

In this section, we describe how to estimate the IPSCE $E (Y^{Q_{δ}})$ , which we term the IPSCE. The estimation procedure comprises four steps. For the sole purpose of exposition, we continue with our social exclusion example wherein interest is in the effect of social exclusion $(X; 0 = included, 1 = excluded)$ on depression $(Y)$ , with two covariates (age and gender). Sample R code (R Core Team, 2023) is included at each step as a practical guide. A more technical presentation with mathematical details is provided in Appendix C for interested readers.

Step 1

Specify an outcome model that regresses $Y$ on all the covariates as predictors. For example, a linear regression model can be used for continuous $Y$ . Fit the outcome model separately to each subgroup defined by the observed $X$ , that is, among included individuals (with $X = 0$ ) and among excluded individuals (with $X = 1$ ). For example, R code for fitting linear regression models with main effects only⁵ to the observed data may be as follows:

Then, for each individual, extract the predicted potential outcomes under both levels of $X$ (i.e., one under $X = 1$ and another under $X = 0$ , regardless of their observed value) and record them as two new variables (e.g., “FIT_Y”).

Step 2

Fit a propensity score model that regresses $X$ on all the covariates as predictors. For example, R code for fitting a logistic regression model with main effects only to the observed data may be as follows:

Then, for each individual, obtain the fitted value of $X$ (i.e., predicted organic propensity score) and record it as a new variable (e.g., “FIT_PS”).

Step 3

For a given value of $δ$ (i.e., the odds ratio encoding the change in the chances of being excluded over the status quo), calculate the estimate and 95% confidence interval (CI) of the IPSCE using the predicted outcomes and propensity score for each individual from Steps 1 and 2, respectively. To implement this in practice, we have written an R function called “PHIhatQdelta” that takes in the following inputs:

“data,” the observed data set;

“mu.hat,” the predicted outcomes from Step 1;

“pi.hat,” the predicted (organic) propensity score from Step 2;

“delta.fixed,” the given value of $δ$ ;

“treat.name,” the variable name for $X$ ;

“outcome.name,” the variable name for $Y$ .

Continuing our example, the R code for estimating the IPSCE for $δ = 1$ would be as follows:

Step 4

Repeat Step 3 for a user-selected sequence of values of $δ$ . The resulting estimates and 95% CIs of the IPSCE can then be plotted against $δ$ . In the next section, we present what such plots may look like and how to interpret them using data from real-world research.

Illustrations of IPS Using Real-World Data

In this section, we used two publicly available real-world data sets to illustrate the IPS.

Misperception of equality-enhancing policy hinders equality

Why do people oppose equality-enhancing policies? Brown et al. (2022) proposed misperceptions of equality-enhancing policies as a potential explanation. To examine the effect of misperceptions on policy opposition, the authors used a sample of White and Asian registered voters in the November 2020 California general election. Participants completed surveys regarding California Proposition 16 (which proposed removing the ban on affirmative action in public employment and public university admissions decisions) at Time 1 (between October 12 and 19, 2020) and Time 2 (between October 27 and November, 2 2020). Further details on the study are provided in Brown et al.⁶

For the sole purpose of illustration, we analyzed whether misperceiving Proposition 16 as harmful (i.e., harms advantaged in-group resource access) at Time 1 $(X)$ predicted opposition to the policy at Time 2 $(Y)$ . Misperception was measured using one item: “How do you think Proposition 16 will affect non-underrepresented people’s (e.g., non-Hispanic White, Asian/Asian American) chances of gaining placement in public employment, public education, and public contracting positions in CA?” Of the sample, 41.7% responded that the policy would harm their chances $(X = 1)$ , and the rest reported that chances would be maintained or improved $(X = 0)$ . The policy support outcome $(Y)$ was the response to a single item (“Overall, how much do you oppose or support Proposition 16?”) on a 7-point Likert scale (1 = strongly oppose, 7 = strongly support), which we reverse-coded so that larger values indicated stronger opposition.

To simplify the illustration, we assumed that the following self-reported and demographic variables sufficed for no unmeasured confounding to hold: age, gender, race, explicit prejudice, social dominance orientation, system-justifying beliefs, zero-sum beliefs, and overall political orientation. We considered the 645 participants with complete data recorded on these variables used for the analysis.

As we explained in the previous sections, the ACE demands conjuring up extreme, unrealistic scenarios in which everyone uniformly believes that the policy is harmful versus not harmful. Instead, the IPSCE allows us to answer a more meaningful causal question: What is the effect of hypothetically reducing people’s chances of adopting this misperception?

Another motivation for using the IPSCE is when positivity is violated. One way to check for this empirically is by plotting the predicted (organic) propensity scores for each group $X = 0$ and $X = 1$ , such as in Figure 2. Note how there were participants in the $X = 0$ and $X = 1$ groups with propensity scores very close to 0 and 1, respectively, and the interquartile ranges for the two groups had limited overlap. These suggested possible violations of positivity that further justified the use of IPS.

Fig. 2.

Estimates of the organic propensity score for each treatment group in the equality-enhancing policy example.

For illustration, we postulated a set of values for $δ$ as $Δ = {e^{- 4}, e^{- 3.8}, . . ., e^{0} = 1, \dots, e^{4}}$ . We considered values based on the (natural) logarithmic or “log” transformation $\log (δ)$ because it simplified positing an equally spaced grid symmetric around $δ = 1$ , where $\log (δ) = 0$ , and eased visually interpreting the results, as we show next. After calculating the estimates and 95% CIs for each posited value of $δ \in Δ$ , we plotted them against $\log (δ)$ , as shown in Figure 3.

Fig. 3.

Estimates of the incremental propensity score causal estimand (IPSCE) in the equality-enhancing policy example. Each circle corresponded to an IPSCE estimate, and each vertical line corresponded to the 95% confidence interval (CI) for a given value of $δ$ on the horizontal axis. The horizontal broken lines corresponded to the bounds of the 95% CI for the IPSCE under $δ = 1$ . Values of $δ$ where the CIs did not intersect the horizontal broken lines are indicated in red.

We now interpret the results. As expected, the estimate equaled 0 (the sample mean after mean-centering the outcomes) when $δ = 1$ . Increasing or decreasing $δ$ led to estimated average outcomes that differed from the observed value. This suggested that as participants’ misperceptions of Proposition 16 shifted, their opposition to the policy would correspondingly change. For instance, the 95% CIs for specific values of $\log (δ) \geq 2.4$ did not intersect the 95% CI for $δ = 1$ (whose bounds were drawn as two horizontal broken lines). Therefore, raising the odds of adopting the misperception by $δ = \exp (2.4) \approx 11$ times the organic odds would significantly increase opposition to the policy. Conversely, to significantly reduce opposition to the policy, one would have to make (more) substantial effort to lower the odds of adopting the misperception to $δ = \exp (- 4) \approx 1 / 55$ of the organic odds. These results demonstrate that voters’ misperceptions drove their opposition to this equality-enhancing policy. To increase public support of equality-enhancing policies, intensive and enduring interventions may be required to refute misperceptions about equality as zero-sum.

Impact of supervisor support among women and men in the workplace

Does supervisor support represent a protective factor for workers’ well-being? How does the effect differ between women and men? We used a data set collected by McIlroy et al. (2021) to explore these questions. McIlroy et al. conducted an online survey to evaluate the effect of supervisor support on employees’ well-being (along with other performance and relational outcomes). A sample of workers in the UK recalled whether they received support from their supervisor after requesting it in the past month. Participants then answered questions about their attitudes, feelings, and behaviors following the situation they described. For the sole purpose of illustration, we assessed whether the absence of supervisor support affected workers’ well-being.⁷

Supervisor support $(X)$ was measured using one item. Participants indicated whether they received their supervisor’s help after asking for it $(X = 1)$ or not $(X = 0)$ . The well-being outcome was the reversed mean score of five items measuring emotional exhaustion (e.g., “I felt emotionally drained”) on a 7-point scale (1 = strongly disagree, 7 = strongly agree) so that larger values indicated better well-being. To simplify our illustration, we assumed that these self-reported variables sufficed for no unmeasured confounding to hold: age, gender, hours working per week on average, years in current job, and being in a management or supervisor position. We considered the 265 participants with complete data recorded on these variables for the analysis. We followed the same steps, as outlined in the first example, using the entire sample to estimate the IPSCE. As shown in Figure 4, changing the chances of supervisor support $(δ)$ led to significant changes in workers’ well-being.

Fig. 4.

Estimates of the incremental propensity score causal estimand (IPSCE) in the supervisor support example. Each circle corresponded to an IPSCE estimate, and each vertical line corresponded to the 95% confidence interval (CI) for a given value of $δ$ on the horizontal axis. The horizontal broken lines corresponded to the bounds of the 95% CI for the IPSCE under $δ = 1$ . Values of $δ$ where the CIs did not intersect the horizontal broken lines are indicated in red.

Next, we investigated gender differences. We first plotted the organic propensity scores for each gender subgroup⁸ in Figure 5. We can see that women workers were less likely to receive supervisor support even after conditioning on the other covariates. We then estimated the conditional IPSCE for each gender subgroup (men or women). Details on calculating these estimators are provided in Appendix C.

Fig. 5.

Estimates of the organic propensity score for each gender in the supervisor-support example.

The resulting estimates and 95% CIs were then plotted against the values in Δ, as shown in Figure 6. As the plot showed, women had lower average well-being than men, as indicated by the lower outcomes for women given each posited value of $δ$ . More important, the effect of supervisor support on workers’ well-being differed between women and men. Whereas reducing the chances of receiving supervisor support would lead to poorer well-being for both women and men, increasing supervisor support by at least $δ = \exp (2) \approx 7$ times the organic odds would significantly improve well-being for women only. However, note that the resulting improved well-being among women would be only at a level similar to male workers’ observed well-being under the status quo (under $(δ = 1)$ ). These results showed that improving supervisor support for women holds the promise of addressing gender disparities in the workplace. Therefore, to reduce gender disparities at work, organizations and policymakers should consider developing training programs and instituting policies for supervisors to better support women employees.

Fig. 6.

Estimates of the incremental propensity score causal estimand (IPSCE) for each gender subgroup in the supervisor support example. Each circle corresponded to an IPSCE estimate, and each vertical line corresponded to the 95% confidence interval (CI) for a given value of $δ$ on the horizontal axis. The horizontal broken lines corresponded to the bounds of the 95% CI for the IPSCE under $δ = 1$ . Values of $δ$ where the CIs did not intersect the horizontal broken lines are indicated in red.

Discussion

In this article, we introduced the IPS as a novel approach to answer core causal questions in diversity science. A central strength of the IPS for diversity science is that it offers a more realistic and policy-relevant quantification of the causal effect than a single ACE. Using the IPS, researchers can readily investigate a wide range of causal questions in diversity science, such as quantifying the impact of adverse exposures that are often experienced by minority groups (e.g., racism, sexism, bullying, microaggression, harassment, dehumanization), examining the effects of diversity-related perceptions or misperceptions, understanding what hinders the public’s support for equality-enhancing policies, and identifying possible exposures (e.g., access to supervisor support among women employees) that hold the promise of reducing inequalities, among others. Note that these exposures or factors are all conceptually manipulable causes. The effects of nonmanipulable causes, such as gender and race, are pertinent and essential for diversity science. For example, what is the gender difference in income? How does a person’s race affect the person’s chances of accessing health care? These are important research questions in diversity science but require different conceptual (see e.g., VanderWeele & Hernán, 2012) and methodological (see e.g., Loh & Ren, 2023b) considerations beyond the context of this article.

The IPS is not intended to inform the specific designs of putative interventions or programs.⁹ Continuing our example of gender disparities at work, various strategies may increase women’s chances of receiving supervisor support. For example, organizations may consider offering training sessions for managers on best supporting women employees, creating tools and resources for women to be heard and their needs addressed, and formalizing mentoring programs in which women may seek advice and support. However, the design of such interventions or programs, such as the content of the materials or length of the sessions, is outside of what IPS offers. Researchers should rely on their domain expertise and subject-matter knowledge in interpreting and gauging the feasibility of putative interventions for changing the (organic) propensity score.

Although the focus of this article is on the IPS, there are three general issues that are not unique to the IPS but require careful attention by researchers when seeking to draw causal conclusions. First, as with all causal inferences of nonrandomized treatments, the assumption of no unmeasured confounding (formally stated as Equation C1 in Appendix C) is a prerequisite for the IPSCE to be consistently estimated. This assumption cannot be verified empirically alone—it must be grounded in and rationalized using theoretical knowledge and subject-matter expertise (Hernán & Robins, 2020; Steiner et al., 2010; VanderWeele, 2019). We reiterate two suggestions from the literature. Researchers should strive to record and adjust for all predictors of the outcomes of interest (VanderWeele, 2019). Adjusting for covariates that are strongly associated with the outcome can improve the finite sample precision of the estimators even if they are unhelpful for confounding when weakly (or un)associated with the treatment (Brookhart et al., 2006; Loh & Ren, 2023a). In contrast, covariates that are associated solely with the treatment—and not with the outcome—are redundant for confounding, yet adjusting for them results in inefficient estimators that are prone to finite-sample bias simply because of sampling variability (Brookhart et al., 2006; Kelcey, 2011).

Second, we assumed that all covariates in $C$ were accurately measured. However, adjusting for imprecisely measured covariates can produce biased estimates of causal effects (Sengewald et al., 2019). Developing methods that account for measurement error in error-prone predictors in the outcome and propensity score models for the IPS is an avenue for future work.

Finally, model misspecification may arise when modeling the propensity score or outcome. In the current presentation, we used parametric regression models, an approach familiar to psychologists and consistent with prevalent practices in the field. However, these models may have been incorrectly specified, which can induce biases (Naimi, Mishler, & Kennedy, 2021). To avoid the risks of such biases when modeling the propensity score or outcome, researchers may consider using (supervised) machine or statistical learning prediction algorithms, such as generalized additive models (Hastie et al., 2009), least absolute shrinkage and selection operator (LASSO; Tibshirani, 2011), or tree-based algorithms (e.g., random forests; Breiman, 2001), among others. Such flexible, nonparametric machine learning–based estimators of the IPS causal effects are readily implemented using the ipsi function in the excellent R package npcausal (Kennedy, 2021). We refer interested readers to Kennedy (2019) for the underlying statistical theory that ensures unbiased estimation and valid CIs while using such flexible methods.

Conclusion

In conclusion, IPS is an appealing alternative to the ACE for addressing core causal questions in diversity science. The IPS accounts for nonuniform exposures to causal factors, enabling more realistic and policy-relevant interpretations. Moreover, the IPS is robust to the stringent positivity assumption—necessary for estimating the ACE—that is likely to be violated in this context. We hope this nontechnical introduction to the IPS empowers researchers to engage in more meaningful assessments and methodical investigations of the core causal questions in diversity science.

Footnotes

Appendix A

Appendix B

Appendix C

Acknowledgements

W. W. Loh thanks Thomas S. Richardson of the University of Washington for the inspiration behind the plot in and B1.

Transparency

Action Editor: Yasemin Kisbu-Sakarya

Editor: David A. Sbarra

Author Contributions

Both authors contributed equally to this article and share joint first authorship but they are listed alphabetically by last name.

Wen Wei Loh: Conceptualization; Methodology; Software; Visualization; Writing – original draft; Writing – review & editing.

Dongning Ren: Conceptualization; Writing – original draft; Writing – review & editing.

ORCID iDs

Wen Wei Loh

Dongning Ren

Notes

References

Avery

D. R.

(2011). Support for diversity in organizations: A theoretical exploration of its origins and offshoots. Organizational Psychology Review, 1(3), 239–256. https://doi.org/10.1177/2041386611402115

Bhattacharyya

Berdahl

J. L.

(2023). Do you see me? An inductive examination of differences between women of color’s experiences of and responses to invisibility at work. Journal of Applied Psychology, 108(7), 1073–1095. https://doi.org/10.1037/apl0001072

Boardman

Killaspy

Mezey

(2022). Social inclusion and mental health: Understanding poverty, inequality and social exclusion (2nd ed.). Cambridge University Press. https://doi.org/10.1017/9781911623601

Bonvini

McClean

Branson

Kennedy

E. H.

(2023). Incremental causal effects: An introduction and review. In Zubizarreta

J. R.

Stuart

E. A.

Small

D. S.

Rosenbaum

P. R.

(Eds.), Handbook of matching and weighting adjustments for causal inference (pp. 349–372). Chapman; Hall/CRC. https://doi.org/10.1201/9781003102670

Breiman

(2001). Random forests. Machine Learning, 45(1), 5–32.

Brookhart

M. A.

Schneeweiss

Rothman

K. J.

Glynn

R. J.

Avorn

Stürmer

(2006). Variable selection for propensity score models. American Journal of Epidemiology, 163(12), 1149–1156. https://doi.org/10.1093/aje/kwj149

Brown

N. D.

Jacoby-Senghor

D. S.

Raymundo

(2022). If you rise, I fall: Equality is prevented by the misperception that it harms advantaged groups. Science Advances, 8(18), Article eabm2385. https://doi.org/10.1126/sciadv.abm2385

Caldwell

J. T.

Ford

C. L.

Wallace

S. P.

Wang

M. C.

Takahashi

L. M.

(2016). Intersection of living in a rural versus urban area and race/ethnicity in explaining access to health care in the United States. American Journal of Public Health, 106(8), 1463–1469. https://doi.org/10.2105/AJPH.2016.303212

Crump

R. K.

Hotz

V. J.

Imbens

G. W.

Mitnik

O. A.

(2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika, 96(1), 187–199. https://doi.org/10.1093/biomet/asn055

10.

Devine

P. G.

Ash

T. L.

(2022). Diversity training goals, limitations, and promise: A review of the multidisciplinary literature. Annual Review of Psychology, 73(1), 403–429. https://doi.org/10.1146/annurev-psych-060221-122215

11.

Hastie

Tibshirani

Friedman

(2009). The elements of statistical learning: Data mining, inference, and prediction. Springer Science & Business Media.

12.

Hebl

Cheng

S. K.

L. C.

(2020). Modern discrimination in organizations. Annual Review of Organizational Psychology and Organizational Behavior, 7(1), 257–282. https://doi.org/10.1146/annurev-orgpsych-012119-044948

13.

Hernán

M. A.

Robins

J. M.

(2020). Causal inference: What if. Chapman & Hall\CRC.

14.

Imbens

G. W.

Rubin

D. B.

(2015). Causal inference in statistics, social, and biomedical sciences. Cambridge University Press.

15.

Jacobs

L. A.

McClean

Branson

Kennedy

E. H.

Fixler

(2023). Incremental propensity score effects for criminology: An application assessing the relationship between homelessness, behavioral health problems, and recidivism. arXiv. https://doi.org/10.48550/arXiv.2305.14040

16.

Juvonen

Lessard

L. M.

Rastogi

Schacter

H. L.

Smith

D. S.

(2019). Promoting social inclusion in educational settings: Challenges and opportunities. Educational Psychologist, 54(4), 250–270. https://doi.org/10.1080/00461520.2019.1655645

17.

Kelcey

(2011). Covariate selection in propensity scores using outcome proxies. Multivariate Behavioral Research, 46(3), 453–476. https://doi.org/10.1080/00273171.2011.570164

18.

Kennedy

E. H.

(2019). Nonparametric causal effects based on incremental propensity score interventions. Journal of the American Statistical Association, 114(526), 645–656. https://doi.org/10.1080/01621459.2017.1422737

19.

Kennedy

E. H.

(2021). npcausal. https://github.com/ehkennedy/npcausal

20.

Kilpatrick

R. D.

Gilbertson

Brookhart

M. A.

Polley

Rothman

K. J.

Bradbury

B. D.

(2013). Exploring large weight deletion and the ability to balance confounders when using inverse probability of treatment weighting in the presence of rare treatment decisions. Pharmacoepidemiology and Drug Safety, 22(2), 111–121. https://doi.org/10.1002/pds.3297

21.

Kurzban

Leary

M. R.

(2001). Evolutionary origins of stigmatization: The functions of social exclusion. Psychological Bulletin, 127(2), 187–208. https://doi.org/10.1037/0033-2909.127.2.187

22.

Loh

W. W.

Ren

(2023a). Data-driven covariate selection for confounding adjustment by focusing on the stability of the effect estimator. Psychological Methods. Advance online publication. https://doi.org/10.1037/met0000564

23.

Loh

W. W.

Ren

(2023b). Understated gender disparities due to outcome-dependent selection: Comment on Mackelprang et al. (2023). American Psychologist, 78(6), 811–813. https://doi.org/10.1037/amp0001167

24.

McIlroy

T. D.

Parker

S. L.

McKimmie

B. M.

(2021). The effects of unanswered supervisor support on employees’ well-being, performance, and relational outcomes. Journal of Occupational Health Psychology, 26(1), 49–68. https://doi.org/10.1037/ocp0000270

25.

Morrish

Medina-Lara

(2021). Does unemployment lead to greater levels of loneliness? A systematic review. Social Science & Medicine, 287, Article 114339. https://doi.org/10.1016/j.socscimed.2021.114339

26.

Moss-Racusin

C. A.

(2021). Psychology of gender: Addressing misconceptions and setting goals for the field. American Psychologist, 76(9), 1429–1441. https://doi.org/10.1037/amp0000930

27.

Naimi

A. I.

Mishler

A. E.

Kennedy

E. H.

(2021). Challenges in obtaining valid causal effect estimates with machine learning algorithms. American Journal of Epidemiology, 192(9), 1536–1544. https://doi.org/10.1093/aje/kwab201

28.

Naimi

A. I.

Rudolph

J. E.

Kennedy

E. H.

Cartus

Kirkpatrick

S. I.

Haas

D. M.

Simhan

Bodnar

L. M.

(2021). Incremental propensity score effects for time-fixed exposures. Epidemiology, 32(2), 202–208. https://doi.org/10.1097/EDE.0000000000001315

29.

National Academies of Sciences, Engineering, and Medicine. (2023). Advancing antiracism, diversity, equity, and inclusion in STEMM organizations: Beyond broadening participation (G. A. Barabino, S. T. Fiske, L. A. Scherer, & E. A. Vargas, Eds.). The National Academies Press. https://doi.org/10.17226/26803

30.

O’Shea

T. M.

McGrath

Aschner

J. L.

Lester

Santos

H. P.

Marsit

Stroustrup

Emmanuel

Hudak

McGowan

Patel

Fry

R. C.

Smith

P. B.

Newby

K. L.

Jacobson

L. P.

Parker

C. B.

, & on behalf of program collaborators for Environmental influences on Child Health Outcomes. (2023). Environmental influences on child health outcomes: Cohorts of individuals born very preterm. Pediatric Research, 93(5), 1161–1176. https://doi.org/10.1038/s41390-022-02230-5

31.

Paluck

E. L.

Porat

Clark

C. S.

Green

D. P.

(2021). Prejudice reduction: Progress and challenges. Annual Review of Psychology, 72(1), 533–560. https://doi.org/10.1146/annurev-psych-071620-030619

32.

Pearl

(2010). On the consistency rule in causal inference: Axiom, definition, assumption, or theorem? Epidemiology, 21(6), 872–875.

33.

Petersen

M. L.

Porter

K. E.

Gruber

Wang

Van Der Laan

M. J.

(2012). Diagnosing and responding to violations in the positivity assumption. Statistical Methods in Medical Research, 21(1), 31–54.

34.

R Core Team. (2023). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/

35.

Rosenbaum

P. R.

Rubin

D. B.

(1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1), 41–55.

36.

Rubin

D. B.

(1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5), 688–701. https://doi.org/10.1037/h0037350

37.

Rubin

D. B.

(1986). Comment: Which ifs have causal answers. Journal of the American Statistical Association, 81(396), 961–962.

38.

Rubin

D. B.

(1990). [On the application of probability theory to agricultural experiments. Essay on principles. Section 9.] Comment: Neyman (1923) and causal inference in experiments and observational studies. Statistical Science, 5(4), 472–480. https://doi.org/10.1214/ss/1177012032

39.

Sengewald

M.-A.

Steiner

P. M.

Pohl

(2019). When does measurement error in covariates impact causal effect estimates? Analytic derivations of different scenarios and an empirical illustration. British Journal of Mathematical and Statistical Psychology, 72(2), 244–270. https://doi.org/10.1111/bmsp.12146

40.

Skinner-Dorkenoo

A. L.

George

Wages

J. E.

Sánchez

Perry

S. P.

(2023). A systemic approach to the psychology of racial bias within individuals and society. Nature Reviews Psychology, 2(7), 392–406. https://doi.org/10.1038/s44159-023-00190-z

41.

Splawa-Neyman

Dabrowska

D. M.

Speed

T. P.

(1990). On the application of probability theory to agricultural experiments. Essay on principles. Section 9. Statistical Science, 5(4), 465–472. https://doi.org/10.1214/ss/1177012031

42.

Steiner

P. M.

Cook

T. D.

Shadish

W. R.

Clark

M. H.

(2010). The importance of covariate selection in controlling for selection bias in observational studies. Psychological Methods, 15(3), 250–267. https://doi.org/10.1037/a0018719

43.

Tibshirani

(2011). Regression shrinkage and selection via the LASSO: A retrospective. Journal of the Royal Statistical Society Series B: Statistical Methodology, 73(3), 273–282. https://doi.org/10.1111/j.1467-9868.2011.00771.x

44.

Tsiatis

A. A.

(2007). Semiparametric theory and missing data. Springer. https://doi.org/10.1007/0-387-37345-4

45.

Valantine

H. A.

Collins

F. S.

(2015). National Institutes of Health addresses the science of diversity. Proceedings of the National Academy of Sciences, USA, 112(40), 12240–12242. https://doi.org/10.1073/pnas.1515612112

46.

VanderWeele

T. J.

(2009). Concerning the consistency assumption in causal inference. Epidemiology, 20(6), 880–883. https://doi.org/10.1097/EDE.0b013e3181bd5638

47.

VanderWeele

T. J.

(2019). Principles of confounder selection. European Journal of Epidemiology, 34(3), 211–219. https://doi.org/10.1007/s10654-019-00494-6

48.

VanderWeele

T. J.

Hernán

M. A.

(2012). Causal effects and natural laws: Towards a conceptualization of causal counterfactuals for nonmanipulable exposures, with application to the effects of race and sex. In Berzuini

Dawid

Bernardinelli

(Eds.), Causality: Statistical perspectives and applications (pp. 101–113). John Wiley & Sons. https://doi.org/10.1002/9781119945710.ch9

49.

West

S. G.

Cham

Thoemmes

Renneberg

Schulze

Weiler

(2014). Propensity scores as a basis for equating groups: Basic principles and application in clinical treatment outcome research. Journal of Consulting and Clinical Psychology, 82(5), 906–919. https://doi.org/10.1037/a0036387

50.

Westreich

Cole

S. R.

(2010). Invited commentary: Positivity in practice. American Journal of Epidemiology, 171(6), 674–677. https://doi.org/10.1093/aje/kwp436

The Incremental Propensity Score Approach for Diversity Science

Abstract

Keywords

Challenges of Using the ACE for Diversity Science

IPS as a Better-Suited Alternative to ACE for Assessing Causal Queries in Diversity Science

Why Is the Incremental Propensity Score More Relevant Than the Average Causal Effect? A Hypothetical Example

How to Estimate the IPSCE

Step 1

Step 2

Step 3

Step 4

Illustrations of IPS Using Real-World Data

Misperception of equality-enhancing policy hinders equality

Impact of supervisor support among women and men in the workplace

Discussion

Conclusion

Footnotes

Appendix A

Appendix B

Appendix C

Acknowledgements

Transparency

ORCID iDs

Notes

References