Sage Journals: Discover world-class research

Abstract

Obtaining an accurate understanding of group-based disparities is an important pursuit. However, unsound study designs can lead to erroneous conclusions that impede this crucial work. In this article, we highlight a critical methodological challenge to drawing valid causal inferences in disparities research: selection bias. We describe two commonly adopted study designs in the literature on group-based disparities. The first is outcome-dependent selection, when the outcome determines whether an observation is selected. The second is outcome-associated selection, when the outcome is associated with whether an observation is selected. We explain the methodological challenge each study design presents and why it can lead to selection biases when evaluating the actual disparity of interest. We urge researchers to recognize the complications that beset these study designs and to avoid the insidious impact of inappropriate selection. We offer practical suggestions on how researchers can improve the rigor and demonstrate the defensibility of their conclusions when investigating group-based disparities. Finally, we highlight the broad implications of selection mechanisms for psychological science.

Keywords

discrimination disparities or inequalities diversity science outcome-dependent sampling selection bias sensitivity analysis

An accurate understanding of what drives inequality is an important first step toward eliminating inequality. Research from diverse disciplines has frequently shown that dimensions such as gender, race, ethnicity, and socioeconomic status are fundamental causes of inequality (Amis et al., 2021; C. C. Miller & Katz, 2024; Phelan & Link, 2015; Pickett & Wilkinson, 2015; Sbarra & Whisman, 2022). For example, scholars have sought to understand the sources of inequalities in academia by examining gender gaps in receiving recognition (Card et al., 2023), racial disparities in funding rates (Erosheva et al., 2020), and the impact of socioeconomic background on academic job placement and performance (Morgan et al., 2022). Such group-based disparities are well documented in various contexts across social, economic, and health domains. The invaluable knowledge accumulated in these scholarly works informs diversity policies and practices and is paramount in pursuing a more equitable society.

Despite the broad interest in and importance of group-based disparities research, there is a lack of methodological discussions on accurately estimating the effect of interest. In this article, we elucidate an often neglected methodological challenge of group-based disparities research by focusing on how inappropriate selection can lead to selection bias (Hernán et al., 2004). Motivated by the preceding examples, we situate our discussion of selection bias in the context of causal inquiries seeking to shed light on disparate outcomes because of differing group memberships (e.g., What is the gender gap in publishing?). Descriptive research, such as trend studies (e.g., What is the share of women among published authors over time?) or studies concerning a single group’s representation (e.g., Are women underrepresented among published authors?), is valuable, but it is outside the scope of this article.

The remainder of this article is organized into two parts. In the first part, we present two prevalent study designs used in the existing literature to answer questions about group-based disparities: “outcome-dependent” selection and “outcome-associated” selection. Leveraging the causal inference framework, we demonstrate that notwithstanding both designs being intuitive, they involve inappropriate selection of a population subset that can induce systematic (i.e., nonrandom) differences between the true disparity of interest and the empirical quantity among the subset selected. This distortion is termed “selection bias” in the epidemiological literature (Arah, 2019; Hernán et al., 2004; Hernán & Monge, 2023). Although selection bias has been well recognized in other fields, such as medicine (Hernán & Monge, 2023), public health (Smith, 2020), and sociology (Elwert & Winship, 2014), it has yet to receive much attention in psychology.

Within each study design, we organize our presentation following a “causal roadmap” (Ahern, 2018; Hernán, 2018). First, we clarify the causal effect of interest when investigating group-based disparities research. Second, we explain how each study design implies certain causal conditions by drawing on concepts from the established causal diagram framework (M. M. Glymour, 2006; J. J. Lee, 2012; Pearl et al., 2016). Causal diagrams facilitate visualizing nonparametric structural relationships between each pair of variables (Grosz et al., 2020; Pearl, 2012). Throughout this article, we assume that the causal structures depicted in the posited causal diagrams, such as the correct set of variables, the presence and ordering of directed edges, and the absence of cyclic (e.g., bidirectional) relationships, can be defensibly justified using theoretical and empirical evidence and temporal-logical constraints (Grosz et al., 2020).¹ Third, we explain the shortcomings of each design and why they are fraught with selection bias that can lead to erroneous or even entirely misleading conclusions. Throughout the article, we use a hypothetical example of a gender gap in academic recognition for expository purposes. We emphasize that the causal inference reasonings about the challenges and implications of inappropriate selection broadly apply to group-based disparities research (e.g., disparities because of race or socioeconomic status). Crucially, as we discuss toward the end of the article, selection bias is prevalent and has wide-ranging implications in psychological science beyond disparities research.

In the second part, to help researchers best address these challenges, we offer concrete suggestions that can be readily implemented in practice to circumvent these common pitfalls and strengthen the defensibility of their causal inferences. The R (R Core Team, 2023) scripts to reproduce the sensitivity analysis and simulation studies in this article are available online on OSF: https://osf.io/n6tuc/. We hope this article will equip researchers to recognize and mitigate the impact of selection bias in group-based disparities research and psychological science at large.

Running Example: Part 1

We first introduce our running hypothetical example. Suppose there is a professional organization representing psychologists—individuals who have obtained a doctoral degree in psychology—in a nation. The organization gives a prestigious leadership award to individuals who have significantly contributed to the community. The newly elected president is interested in finding out what the gender gap is between women and men in receiving this award. The president invited interested scholars in the nation to investigate this question, and two teams responded and carried out their research separately.

The first team, led by Dr. A, compiled a list of psychologists who received the award since its inception. The team inspected the list of award recipients and found that 50% of the recipients identified as women and that 45% of the recipients identified as men (5% identified as other gender identities). Dr. A’s team concluded that women were more likely to receive the award than men: Women’s chances were 1.11 times those of men (where $50 % / 45 % \approx 1.11$ ).

The second team, led by Dr. B, also compiled the same list of award recipients as their study sample. However, they adopted a different analytic strategy. They first calculated the share of women among award recipients (50%). They then compared this share of women among award recipients to a “base rate,” the proportion of women in the population of interest (i.e., all psychologists eligible for the award), which was 60%. The comparison ( $50 % < 60 %$ ) showed that women were underrepresented among award recipients. The team further calculated the ratio of these two proportions ( $50 % / 60 % \approx 0.83$ ) and reported it as a measure of gender disparity.

Which team was correct? Neither. We explain why in the next section.

Outcome-Dependent Selection

The two teams adopted different analytic strategies, but both teams selected only award recipients as their study sample. Such a design offers limited (or possibly even no useful) information on the true gender disparity of interest. The core underlying problem is outcome-dependent selection (also termed “outcome-dependent sampling”): Observations are selected based solely on their outcome. Put differently, the outcome of interest (receiving the award) determines the selected subset of the population.

We use Figure 1 to provide a conceptual understanding of the problem. The left side depicts all eligible candidates for the award since its inception (i.e., the population of interest). The right side depicts all award recipients from this population. As Figure 1 illustrates, the chances of receiving the award among women were $100 / 900 \approx 11 %$ , and the chances among men were $90 / 360 = 25 %$ . Thus, the true gender disparity for women versus men (as a ratio) was $0.44$ (where $11 % / 25 % \approx 0.44$ ). In other words, women’s chances of receiving the award were 0.44 times those of men.

Fig. 1.

Dr. A’s and Dr. B’s study designs delimited by outcome-dependent selection. Reflecting the traditional focus of gender-gap research, the organization’s president, in our running hypothetical example, was interested in the gender gap between women and men. Hence, solely for illustrative purposes, individuals of other gender identities are not visualized in this figure.

It is straightforward to see that neither team accurately uncovered the true disparity. Despite women being less likely to receive the award than men, Dr. A’s team concluded the opposite. Although Dr. B’s team correctly reported that women were underrepresented among award recipients, they underestimated the true gender disparity (for a brief commentary, see Loh & Ren, 2023). This numerical example illustrates the perils of outcome-dependent selection: It can understate the true disparity at best or lead to opposite conclusions at worst. A formal mathematical derivation of the numerical results presented here is provided in Section A of the Supplemental Material available online.

We now formalize the methodological issue with the help of the causal diagram in Figure 2. Let the focal predictor of interest (referred to hereafter as “exposure”), for example, gender, be denoted by $X$ ; the outcome, for example, receiving the award, by $Y$ ; and the inclusion or selection criteria by $S$ .² Therefore, the gender disparity of interest can be defined as the causal effect of $X$ on $Y$ as encoded by the arrow from $X$ to $Y$ (i.e., causal estimand), that is, the average effect in the entire population of interest. The dependence of selection $S$ on the outcome $Y$ , and not on $X$ directly, is represented by the directed arrow from $Y$ to $S$ and the absence of an arrow from $X$ to $S$ , respectively. We draw a box around $S$ to emphasize that the analysis is restricted to selected population subsets with specific values of $S$ , such as $S = 1$ (i.e., award recipients). Under such a causal structure, as well documented in the epidemiological, medical, and statistical literature (e.g., Didelez et al., 2010; Lu et al., 2022; Sjölander, 2023), the effect of $X$ on $Y$ (e.g., the gender gap in receiving the award) cannot be unbiasedly estimated even with an infinitely large sample.³

Fig. 2.

Causal diagram depicting outcome-dependent selection in the hypothetical example. The dashed arrow between $Y$ and $S$ emphasizes a deterministic dependence of $S$ on $Y$ . Reflecting the traditional focus of gender-gap research, the organization’s president, in our running hypothetical example, was interested in the gender gap between women and men. Hence, solely for illustrative purposes, individuals of other gender identities are not visualized in this figure.

We further illustrate the problem using the language of probabilities. Continuing our running example, let $X = 1$ denote the psychologist’s gender being a woman, and let $X = 0$ denote being a man; let $Y = 1$ denote receiving the award, and let $Y = 0$ denote not receiving the award; let $S = 1$ denote being selected into the study sample, and let $S = 0$ denote not being selected. The true disparity of interest is the contrast of the chances of receiving the award $(Y = 1)$ between women $(X = 1)$ and men $(X = 0)$ . These two probabilities, written as $Prob (Y = 1 | X = 1)$ and $Prob (Y = 1 | X = 0)$ , respectively, can be contrasted using a ratio, that is, $Prob (Y = 1 | X = 1) / Prob (Y = 1 | X = 0)$ .

Now, we turn to the descriptive quantities calculated by each team. Dr. A’s team quantified the gender disparity using a ratio of two proportions: the proportion of women among award recipients and the proportion of men among award recipients. This ratio can be written as $Prob (X = 1 | Y = 1) / Prob (X = 0 | Y = 1)$ . Dr. B’s team quantified the gender disparity by comparing the proportion of women among award recipients, $Prob (X = 1 | Y = 1)$ , with a corresponding value in the population, $Prob (X = 1)$ . This comparison as a ratio is $Prob (X = 1 | Y = 1) / Prob (X = 1)$ . Neither descriptive quantity equals the true disparity of interest. Hence, unsurprisingly, neither team uncovered the true gender disparity.

Could either team have improved their analytic approach (e.g., by performing a secondary data analysis of the award recipients) to get closer to the true gender disparity of $0.44$ as illustrated in Figure 1? Unfortunately, no. We formally demonstrate why in Section A of the Supplemental Material. The crux of the problem lies not in their analytic approach but in their conceptually flawed study design. When an observation is selected $(S = 1)$ if and only if a person received an award $(Y = 1)$ , it implies that nonrecipients have zero chances of being selected, that is, $Prob (S = 1 | Y = 0) = 0$ . Hence, this form of outcome-dependent selection precludes calculating the true disparity of interest from the selected population subset. Any subsequent analyses based on this selected subset—under the simplest possible setting as shown in Figure 2 and without any sampling uncertainty—at best understates and at worst offers no meaningful insight into the true disparity of interest.

Running Example: Part 2

We continue with the running hypothetical example. After receiving two conflicting conclusions, the organization’s president sought a third opinion from another research team in the nation led by Dr. C. Knowing the perils of outcome-dependent selection, Dr. C requested access to the official records of all psychologists who would have been potentially eligible for the award—the population of interest (as shown in the left side of Fig. 1). The president sighed, “Alas, I’m afraid our organization did not maintain consistent records over the years.” Without these records, Dr. C’s team decided to construct a proxy for the population. They scraped the web pages of all psychology departments at research institutions nationwide and compiled a list of faculty members as their selected study sample; see the left side of Figure 3. They then checked the list of award recipients and recorded whether each faculty member in their selected sample received an award. They calculated the proportions of women versus men in their selected sample who received the award (Fig. 3, right side vs. left side). The proportions were $90 / 450 = 20 %$ and $54 / 270 = 20 %$ , respectively, which led Dr. C’s team to conclude that there was no gender disparity. This conclusion, once again, is nowhere close to the true gender disparity ( $0.44$ , as illustrated in Fig. 1). But what went wrong?

Fig. 3.

Dr. C’s study design showing susceptiblity to outcome-associated selection. Reflecting the traditional focus of gender-gap research, the organization’s president, in our running hypothetical example, was interested in the gender gap between women and men. Hence, solely for illustrative purposes, individuals of other gender identities are not visualized in this figure.

The approach by Dr. C’s team is predicated on selecting faculty members at research institutions as a proxy for the population. This selected subset was not representative of the true population of all psychologists in the nation. The key issue is that securing a faculty position at a research institution (i.e., selection) and receiving the award (i.e., outcome) share several common causes. Possible common causes include the prestige of the doctoral-degree-granting program, research area, academic record, and professional-network ties. These common causes of the outcome and selection lead to systematic differences in outcomes between individuals selected and individuals unselected that are not due to gender. As a result, the calculated gender gap among the selected subset is tainted with selection bias.

So, how did Dr. C’s team reach the conclusion that there was no gender disparity? We offer a numerical illustration in Table 1. As a simple probative example, we assumed that being a faculty membeer at a research institution and receiving the award shared a single common cause: whether the psychologist’s PhD was from an elite university $(U = 1)$ or a nonelite university $(U = 0)$ . The true population is presented in the top of Table 1; the selected subset is in the bottom of Table 1. As the numerical example shows, despite the gender disparity of $0.44$ in the true population, it was zero in the subset selected by Dr. C’s team.

Table 1.

Population (Top) and Subset Selected by Dr. C ( $S = 1$ ; Bottom) in the Hypothetical Example

Population	Did not receive award $(Y = 0)$		Received award $(Y = 1$ )
Gender	Nonelite ( $U = 0$ )	Elite ( $U = 1$ )	Nonelite ( $U = 0$ )	Elite ( $U = 1$ )
Woman ( $X = 1$ )	450	350	0	100
Man ( $X = 0$ )	180	90	0	90
Selected subset	Did not receive award $(Y = 0$ )		Received award $(Y = 1)$
Gender	Nonelite ( $U = 0$ )	Elite ( $U = 1$ )	Nonelite ( $U = 0$ )	Elite ( $U = 1$ )
Woman ( $X = 1$ )	45	315	0	90
Man ( $X = 0$ )	162	54	0	54

Note: Reflecting the traditional focus of gender-gap research, the organization’s president, in our running hypothetical example, was interested in the gender gap between women and men. Hence, solely for illustrative purposes, individuals of other gender identities are not visualized in this table.

Outcome-Associated Selection

We formalize the problem by visualizing the study design Dr. C’s team used with the causal diagram in Figure 4. Here, the causal estimand remains the causal effect of $X$ on $Y$ in the entire population, as encoded by the arrow from $X$ to $Y$ . We describe three features of Figure 4 that distinguish it from Figure 2. First, there is no arrow from $Y$ to $S$ . This arrow could be ruled out because Dr. C’s team selected their study sample (faculty members at research institutions) without prior knowledge of the outcome (receiving an award or not). Second, it is likely that selection and outcome are effects of shared common causes (Didelez et al., 2010; Hernán, 2017). This is readily represented by arrows from $U$ entering both $S$ and $Y$ . Third, to account for the possibility of gender-driven differences in faculty hiring at research institutions, we included an arrow from $X$ to $S$ to allow for $X$ to influence $S$ .

Fig. 4.

Causal diagram depicting outcome-associated selection in the hypothetical example. Reflecting the traditional focus of gender-gap research, the organization’s president, in our running hypothetical example, was interested in the gender gap between women and men. Hence, solely for illustrative purposes, individuals of other gender identities are not visualized in this figure.

This is an example of outcome-associated selection. Under such a scenario, selection $S$ is a collider on the pathway $X \to S \leftarrow U \to Y$ . The bias in an effect estimator resulting from adjusting for or conditioning on a collider on a pathway—thereby “opening” (or “unblocking”) said pathway—is termed “collider (stratification) bias” (Cole et al., 2009; Elwert & Winship, 2014; Greenland, 2003). Because $S$ is inevitably conditioned on as a consequence of the study design, this pathway is rendered open—when the noncolliders $U$ on the same path are unadjusted for—and induces a noncausal association between $X$ and $Y$ . This noncausal association generates a systematic bias in that the effect in the study sample, even as sampling variability becomes negligible with more data collected, is distorted from the true causal effect in the population. Even absent any effect of X on Y, there will be an association within the selected sample. In general, the selection bias can lead to an estimator that is either greater or smaller in magnitude than or even has the opposite sign from the true effect of interest in the population. As illustrated in our hypothetical example, Dr. C concluded there was no gender disparity even when women were less likely to receive the award in the population than men. We derive the selection bias formally using a single binary covariate $U$ (see Section B.1 in the Supplemental Material) and empirically using a Monte Carlo simulation study (see Section B.2 in the Supplemental Material).

Note that $S$ need not be a collider on the pathway $X \to S \leftarrow U \to Y$ for selection bias to arise (Hernán, 2017). We provide an example in Figure 5. The only deviation from Figure 4 is that here we assume there is no $X \to S$ arrow. Continuing our hypothetical example, this represents assuming no gender differences in faculty hiring at research institutions. In this causal diagram, $S$ is not a collider on any path linking $X$ and $Y$ . Nonetheless, selection bias can arise (Hernán, 2017). We formally demonstrate the outcome-associated selection bias under this scenario (see Section C in the Supplemental Material).

Fig. 5.

Causal diagram depicting outcome-associated selection in which selection $S$ is not a collider on any path between the exposure $X$ and outcome $Y$ in the hypothetical example. Reflecting the traditional focus of gender-gap research, the organization’s president, in our running hypothetical example, was interested in the gender gap between women and men. Hence, solely for illustrative purposes, individuals of other gender identities are not visualized in this figure.

Under the causal diagrams depicted in Figures 4 and 5, the structural bias induced by outcome-associated selection can be mitigated or eliminated by carefully recording and adjusting for $U$ . In practice, determining covariates that are putative common causes of $Y$ and $S$ can be daunting or unfeasible. Nonetheless, we urge researchers to ascertain variables that can be reasoned as putatively being $U$ based on the specific context of each research study. We return to this point in the recommendations section.

Recommendations to Mitigate Selection Bias

We offer three practical suggestions researchers can readily adopt to mitigate selection bias and strengthen their causal conclusions in group-based disparities research. These suggested strategies are complementary, and we encourage researchers to practice all three.

Avoid outcome-dependent selection

As a first step, researchers should avoid study designs that beget outcome-dependent selection designs. Researchers should ensure that their inclusion or selection criteria can be justifiably unaffected by—and, if feasible, blinded to—the outcome. Achieving this objective may not be straightforward. A unique challenge in disparities research is that outcomes of interest are often those that have occurred before the time of investigation (e.g., receiving the award in years past). Thus, researchers often work with historical data (e.g., from employment or administrative records or archived websites), leaving it possible that sample selection is influenced by the outcome of interest.

How to select a study sample to avoid outcome-dependent selection? Continuing our running hypothetical example of examining the gender gap in academic recognition, one could construct a “risk set” of psychologists nationwide to enumerate the population, such as procuring graduation records of all psychology PhDs from the department of education. Here, selection into the risk set $S$ (graduated with a psychology PhD) justifiably precedes the outcome $Y$ (receiving the award or not), precluding outcome-dependent selection based on temporal-logical constraints. Spoon et al. (2023) adopted such a study design to investigate whether there were gender differences in academic attrition rates (outcome $Y$ ). Their analytic sample comprised administrative records from an employment census of faculty that was not contingent on remaining in academia at the time of data collection ( $S = 1$ ).

We briefly describe three other examples from the field of gender-disparities research. Card et al. (2023) investigated whether there was a gender gap in researchers being inducted into the National Academy of Science or the American Academy of Arts and Science (outcome $Y$ ). Their analytic sample comprised active researchers who were eligible but had yet to be inducted $(S = 1)$ . Cech and Blair-Loy (2019) investigated whether there were gender imbalances in science, technology, engineering, and math (STEM) full-time employment trajectories of new parents (outcome $Y$ ). Their analytic sample comprised full-time employees in a STEM field who were initially childless $(S = 1)$ . M. G. Miller and Sutherland (2023) investigated whether there was a gender disparity in being interrupted during congressional committee hearings (outcome $Y$ ). Their analytic sample comprised original committee transcripts from congressional hearings published by the Government Printing Office that were unprocessed for interruptions $(S = 1)$ . In all these real-world examples, the possibility of outcome-dependent selection was ruled out by design.

Covariate adjustment to mitigate outcome-associated selection bias

Next, we recommend researchers incorporate covariate adjustment as a crucial part of their analytic strategy. Selection is inevitably conditioned upon as part of a study design and is unavoidable in the subsequent data analysis. Researchers should thus strive to record variables that suffice to block the paths between $X$ and $Y$ rendered open from conditioning on $S$ because of the study design. For example, under the scenario shown in Figure 4, in which $U$ are common causes of selection $(S)$ and outcome $(Y)$ , $U$ should be adjusted (or statistically controlled) for when analyzing the effect of $X$ on $Y$ (VanderWeele & Robinson, 2014, Appendix 2). As part of the analysis, $U$ can be readily included as additional independent or explanatory (i.e., statistical control) variables when regressing $Y$ on $X$ , or they can be stratified on by analyzing the association between $Y$ and $X$ within different levels of $U$ .⁴ Finally, these pre-exposure selection factors should be drawn up and rationalized at the study-design stage to be measured during data collection. As we have shown in this article, the causal relations linking the variables to selection can be readily visualized using a causal diagram.

Sensitivity analysis to assess the impact of selection bias

Covariate adjustment alone may be insufficient to mitigate selection bias. Researchers should conduct a sensitivity analysis to empirically evaluate how selection can affect the reported results.⁵ We describe an approach developed in the epidemiological literature (Thompson & Arah, 2014). This approach employs inverse probability of selection weights (IPSW) to remove the impact of selection. IPSW eliminates selection by weighting individuals selected to represent a pseudopopulation comprising a mixture of selected and unselected individuals. For example, if a selected individual has a weight of 4, that individual accounts for four individuals (one selected and three unselected) in the pseudopopulation. In this pseudopopulation, the distributions of the predictors of selection (e.g., $U$ and $X$ in Fig. 4) are approximately similar across the selected $(S = 1)$ and unselected $(S = 0)$ individuals. In other words, after weighting, the selected individuals can be considered as being “representative” of a target population (we elaborate on this in Section F in the Supplemental Material). Hence, conceptually, in the pseudopopulation, there would be no arrows entering $S$ (e.g., in Fig. 4, there would be no $U \to S$ or $X \to S$ arrows and thus no $X \to S \leftarrow U \to Y$ path).

We describe how to conduct a sensitivity analysis using IPSW in the following steps:

Determine the observed predictors of selection $S$ . For simplicity, we consider the causal diagram in Figure 4, in which selection depends on $X$ and $U$ , with the latter being a single common cause of $S$ and $Y$ .

Specify a statistical model for the selection mechanism (i.e., “selection model”) based on the determined predictors. For example, suppose that the probability of being selected $(S = 1)$ depends on $U$ and $X$ according to some specified parametric function, that is, $Prob (S = 1 | U, X; β) = σ (U, X; β)$ . The notation emphasizes the dependence of the selection probability on the posited value of the selection parameters β .

Posit a numerical (vector) value for the selection parameters β; this is held fixed for the rest of these steps. Calculate each individual’s predicted probability of being selected. For example, given a value of $β = (0.7, 1.4)$ , the predicted probability for an individual with observed values of $U = 0$ and $X = 1$ is $σ (U = 0, X = 1; β = (0.7, 1.4))$ .

With the predicted selection probabilities in hand, assign each individual in the selected subset ( $S = 1$ ) a weight that is the inverse of the individual’s probability, that is,

W (β) = \frac{1}{σ (U, X; β)} .

(1)

5. Estimate the effect of $X$ on $Y$ among individuals selected using the individual weights from the previous step. For example, regress the outcome $Y$ on $X$ using weighted least squares. In addition to the point estimates, confidence intervals and standard errors for statistical inference⁶ can be readily calculated conditional on the fixed value of the selection parameters posited in Step 3.

We make a few remarks about the sensitivity analysis procedure. First, we formally demonstrate the unbiasedness of using IPSW in Section D in the Supplemental Material. Monte Carlo simulation studies empirically demonstrating the unbiasedness are also provided in Sections B.2 and E (see rightmost column of Table S1) in the Supplemental Material.

Second, the selected study sample consists only of observations with $S = 1$ , with no data available for individuals with $S = 0$ (i.e., not selected for the study). Hence, fitting the specified selection model $σ (U, X; β)$ to the observed data is impossible. Instead, it necessitates positing values for the selection parameters β in Step 3. Researchers should use relevant empirical information or theoretical knowledge from different sources (Geneletti et al., 2011) about individuals who remained unselected (the $S = 0$ stratum) to inform the posited values of β.

Third, we suggest adopting a functional form for the selection model that ensures the probability of being selected is bounded between 0 and 1 to avoid negative weights. As a probative example, a logistic model with main effects only would be:

Prob (S = 1 | U, X; β) = expit (β_{1} U + β_{2} X),

(2)

where $expit (a) = \frac{\exp (a)}{1 + \exp (a)}$ . We offer a numerical example as follows. Suppose that based on a different study sample drawn from the same population, researchers may posit that the odds of being selected given a unit increase in $U$ increases by about twofold. Then, this would correspond to a $β_{1} = \log (2) \approx 0.7$ . Hence, a benefit of using a logistic model is the ease with which an odds ratio of selection can readily translate into a posited coefficient value. Because the ability to counteract selection bias depends on correctly specifying the selection model, we recommend including substantively meaningful higher-order and interaction terms when multiple predictors are present to improve the chances of correctly specifying the selection model. When the selected sample can be stratified based on U and X, stratum-specific selection probabilities can be posited instead. The functional form of the selection model should be based on tenable assumptions about the selection mechanisms guided by the problem-specific context of the research question.

Fourth, it is rarely straightforward or desirable to posit a single value for the selection parameters in practice. Therefore, we recommend systematically varying the values of β and repeating Steps 3 to 5. The resulting estimates given each posited value of β can then be plotted. We demonstrate this using a single simulated example described in Section B.2 in the Supplemental Material. We specified the selection model in Equation 2 and considered a discrete grid of values for $β_{1}$ between 0 and 1, holding $β_{2}$ fixed at (its true value) $1.4$ to simplify our illustration. We then calculated the effect of $X$ on $Y$ using the weights $W (β)$ for each given value of β in turn. The resulting estimates are shown in Figure 6, where they changed curvilinearly with $β_{1}$ . A solid black horizontal line indicates a naive effect estimate of $- 0.10$ (corresponding to positing $β_{1} = β_{2} = 0$ ; i.e., ignoring the possibility of selection depending on $U$ or $X$ ). A broken red horizontal line indicates the true effect of $X$ , which is always unknown in practice. We show that the value of $(β_{1}, β_{2}) = (0.7, 1.4)$ for the selection parameters recovered the true effect. Therefore, such a sensitivity analysis permits visually judging the extent to which the results are (in)sensitive to a range of values for the sensitivity parameters encoding varying selection mechanisms.

Fig. 6.

Sensitivity analysis showing the estimated effect of $X$ on $Y$ given each posited value of the selection parameters $β = (β_{1}, β_{2})$ for a single simulated example. The solid black horizontal line indicates the estimated effect among the selected subset only, and the broken red horizontal line indicates the true effect that is unknown in practice. The true value of $β_{1}$ is indicated by a broken black vertical line. The value of $β_{2}$ was fixed at its true value, as stated in the main text.

Fifth, when $X$ is nonrandomized and there are measured confounders, denoted by, for example, $C$ , that suffice to ensure conditional ignorability between $X$ and $Y$ , then the weights $W (β)$ may be multiplied by the inverse probability of treatment weights (Thoemmes & Kim, 2011) to remove the impact of measured confounding of the effect of $X$ on $Y$ that is due to $C$ . These latter weights are the reciprocal of the probability of the observed $X$ . For example, let $e (C) \equiv Prob (X = 1 | C)$ denote the propensity score fitted to the observed data. Then, the weights used in Step 4 would be

W (β) \times {\frac{X}{e (C)} + \frac{1 - X}{1 - e (C)}} .

(3)

Discussion

In this article, we sought to raise scholars’ awareness of selection bias in group-based disparities research and offered suggestions for mitigating selection bias. We emphasize that selection bias is only one methodological issue in disparities research. There are other methodological challenges we did not unpack here. For example, in the current material, we did not consider other threats to causal inference. A long-standing debate in the formal counterfactual literature that is especially relevant to disparities research is whether an exposure must be realistically (or hypothetically) manipulable to be regarded as a cause. We offer further context of this debate in Box 1 for interested readers.

Box 1.

Is Manipulability Strictly Necessary for Causation?

Following the seminal article of Rubin (1974), the Neyman-Rubin potential outcomes framework (Holland, 1986; Splawa-Neyman et al., 1990) offers a formal quantitative framework for conceptualizing causal effects using counterfactual contrasts. However, within this framework, causes of interest in disparities research (e.g., gender, race, and socioeconomic status, among others) may be viewed as “immutable characteristics” (Greiner & Rubin, 2011). Hence, some have stipulated experimental manipulability (either hypothetical or realized) as a strictly necessary condition (Greiner & Rubin, 2011; Holland, 1986) for establishing causality. But many others have argued that specific aspects or components of such ill-defined or “nonmanipulable” causes can have meaningful counterfactual interpretations and implications (C. Glymour & Glymour, 2014; Greenland, 2017; Sen & Wasow, 2016; VanderWeele & Hernán, 2012; VanderWeele & Robinson, 2014). As C. Glymour and Glymour (2014) stated in no uncertain terms in their title, “Race and sex are causes.” The mantra of “no causation without manipulation” (Holland, 1986) was refuted by Pearl (2009):
True, manipulation is one way (albeit a crude one) for scientists to test the workings of mechanisms, but it should not in any way inhibit causal thoughts, formal definitions, and mathematical analyses of the mechanisms that propel the phenomena under investigation. It is for that reason, perhaps, that scientists invented counterfactuals; it permits them to state and conceive the realization of antecedent conditions without specifying the physical means by which these conditions are established. (p. 361)
This follows from interpreting the hypothetical fixing of (possibly endogenous) causes, such as gender in the running example, at specific levels in terms of Pearl’s “do-operator” under the structural causal model (SCM) framework (Pearl, 2009); for a detailed illustration, see Gische et al. (2021). Pearl (2018) offered a detailed examination of the sufficient conditions for establishing a factor as a cause, explaining how counterfactuals can be well defined under the SCM framework (but generally unobserved) even with nonmanipulable causes.
There is a sizable literature debating this single issue of whether or how such “nonmanipulable” exposures can be framed as causes within the quantitative causal inference framework. We encourage readers seeking causal conclusions in group-based disparities research to engage with the different perspectives and be judicious about precisely defining aspects or components of their causes of interest that are amenable to formal causal inference. In addition to the references above and in the main text, other examples of extant work unpacking this issue include Krieger (2014), Pearce and Vandenbroucke (2020), Rubin (2010), and VanderWeele (2016), among others. Comparisons of different counterfactual frameworks, such as the Neyman-Rubin framework and Pearl’s SCM framework, have also been presented elsewhere in philosophy (Markus, 2011; Weinberger, 2023).
Finally, we also mention that a distinct so-called “causes of effects” approach has been developed for investigating questions similar to those raised in this article in empirical legal-studies research (Dawid et al., 2013); although for a counterpoint, see Pearl (2014). This approach builds on an alternative framework for conducting counterfactual reasoning using decision theory, free of assumptions about the simultaneous existence of incompatible counterfactuals (Dawid, 2000).

We acknowledge that causes of interest in disparities research, such as race (Howe et al., 2022; Sen & Wasow, 2016; VanderWeele & Robinson, 2014), are inherently complex and multifaceted. These are undeniably established and fundamental causes of inequality (Amis et al., 2021; Phelan & Link, 2015; Pickett & Wilkinson, 2015). But the extent to which they can be unambiguously and meaningfully defined depends on the unique context in which they are applied. For example, perceived gender, race, or social class are well-defined and experimentally manipulable causes (Goldin & Rouse, 2000; C. C. Miller & Katz, 2024; Quillian & Midtbøen, 2021). Therefore, we encourage researchers to explicate whether their research question is causal or associational (Hernán, 2018). If interest is in addressing a causal question, researchers may find it useful to follow the causal roadmap (Ahern, 2018; Hernán, 2018), starting with clarifying the population of interest (Lu et al., 2022) and precisely specifying the hypothetical counterfactual scenarios used to define the causal estimands (VanderWeele, 2016; VanderWeele & Robinson, 2014).

Throughout the article, we used a hypothetical example concerning the gender gap between women and men. We reiterate that this is an expository example from the mainstream literature, and the causal reasonings we present in the article apply broadly to group-based disparities research. For example, when investigating the effect of gender, scholars are increasingly adopting a more inclusive approach by examining different gender groups beyond the “gender binary” (Aghi et al., 2024; Hyde et al., 2019; Ledgerwood et al., 2023). Scholars also frequently examine other forms of inequality because of dimensions or a combination of dimensions, such as race, sexual orientation, disability, migration background, marital status, socioeconomic positions, and neighborhood (Gaskin et al., 2013; Slaughter-Acey et al., 2023; Turney & Wildeman, 2015). Regardless of the specific groups being compared, scholars engaging in disparities research face similar challenges of minimizing the impact of selection bias.

Finally, we emphasize that the issue of selection bias applies broadly across various research areas in psychological science. In Box 2, we briefly present putative examples from psychological science beyond the context of disparities research to illustrate the importance of carefully considering selection mechanisms. Of these, Examples $1 - to 3$ are prone to outcome-dependent selection, and Examples $4 - to 6$ are prone to outcome-associated selection. Crucially, the examples presented here are nonexhaustive; many others may be susceptible to selection bias. Selection bias can arise from other causal structures beyond those shown in Figures 2, 4, and 5 (for other possibilities, see Figs. S2 and S3 in the Supplemental Material). We used probative examples in this article to simplify demonstrating the insidious impact of selection bias and its wide-ranging implications. We encourage researchers to thoughtfully consider how selection factors into their study design not only in the context of disparities research but also in all other domains of psychology research.

Box 2.

Examples Illustrating Broader Implications of Selection Bias in Psychological Science

Examples prone to outcome-dependent selection include the following:
1. Interest was in whether being single versus being in a romantic relationship

(X)

affected health outcomes

(Y)

. Individuals who were relatively healthy were more likely to participate in the study

(S = 1)

than individuals who struggled with health problems, that is,

Y \to S

.
2. Interest was in whether resilience

(X)

led to higher economic outcomes

(Y)

. Researchers studied executives of Fortune 500 companies

(S = 1)

, that is,

Y \to S

.
3. Interest was in whether sleep disturbance

(X)

exacerbated chronic pain

(Y)

. Researchers recruited patients who visited a research clinic to receive in-person pain evaluations

(S = 1)

, that is,

Y \to S

.
Examples prone to outcome-associated selection include the following:
4. Interest was in the effect of workplace discrimination

(X)

on work productivity

(Y)

in a company. Employees received a link to an online survey. Filling out the survey

(S = 1)

and work productivity may share common causes, such as organizational commitment

(U)

, that is,

S \leftarrow U \to Y

.
5. Interest was in the effect of children’s digital play at 5 years

(X)

on cognitive development assessed at 8 years

(Y)

. Researchers invited families for in-person interviews during working hours on a university campus

(S = 1)

. Only parents who could afford to take time off work during working hours could bring their children in for the interview. These parents tend to have higher socioeconomic status

(U)

, a determinant of children’s cognitive development, that is,

S \leftarrow U \to Y

.
6. Interest was in whether prosocial behavior, such as volunteering at local community events

(X)

, reduced loneliness

(Y)

among U.S. adults. Researchers distributed flyers in English at public spaces in the local community to recruit participants

(S = 1)

. Here, residents without language barriers

(U)

were more likely to participate in the study. At the same time, language barriers could have led to loneliness, that is,

S \leftarrow U \to Y

Conclusion

Group-based disparities research is critical to achieving a more equitable society. But inappropriate selection of a population subset impedes or undermines this essential scientific pursuit (Arah, 2019; Hernán et al., 2004; Smith, 2020). With this article, we sought to raise researchers’ awareness of the threat selection bias poses and provide guidance on mitigating it. Note that selection bias is not unique to group-based disparities research but is prevalent in causal pursuits across the psychological sciences. We hope that this article aids researchers in minimizing the potential for inaccuracies because of selection biases toward a stronger psychological science.

Supplemental Material

sj-pdf-1-amp-10.1177_25152459241260256 – Supplemental material for Advancing Group-Based Disparities Research and Beyond: A Cautionary Note on Selection Bias

Supplemental material, sj-pdf-1-amp-10.1177_25152459241260256 for Advancing Group-Based Disparities Research and Beyond: A Cautionary Note on Selection Bias by Dongning Ren and Wen Wei Loh in Advances in Methods and Practices in Psychological Science

Footnotes

Transparency

Action Editor: David A. Sbarra

Editor: David A. Sbarra

Author Contributions

Both authors contributed equally to this article.

Dongning Ren: Conceptualization; Methodology; Visualization; Writing – original draft; Writing – review & editing.

Wen Wei Loh: Conceptualization; Methodology; Software; Visualization; Writing – original draft; Writing – review & editing.

ORCID iDs

Dongning Ren

Wen Wei Loh

Supplemental Material

Additional supporting information can be found at

Notes

References

Aghi

Anderson

B. M.

Castellano

B. M.

Cunningham

Delano

Dickinson

E. S.

von Diezmann

Forslund-Startceva

S. K.

Grijseels

D. M.

Groh

S. S.

Guthman

E. M.

Jayasinghe

Johnston

Long

McLaughlin

J. F.

McLaughlin

Miyagi

Rajaraman

Sancheznieto

. . . Weinberg

Z. Y.

(2024). Rigorous science demands support of transgender scientists. Cell, 187(6), 1327–1334. https://doi.org/10.1016/j.cell.2024.02.021

Ahern

(2018). Start with the “C-Word,” follow the roadmap for causal inference. American Journal of Public Health, 108(5), 621–621. https://doi.org/10.2105/AJPH.2018.304358

Amis

J. M.

Brickson

Haack

Hernandez

(2021). Taking inequality seriously. Academy of Management Review, 46(3), 431–439. https://doi.org/10.5465/amr.2021.0222

Arah

O. A.

(2019). Analyzing selection bias for credible causal inference: When in doubt, DAG it out. Epidemiology, 30(4), 517–520. https://doi.org/10.1097/EDE.0000000000001033

Card

DellaVigna

Funk

Iriberri

(2023). Gender gaps at the academies. Proceedings of the National Academy of Sciences, USA, 120(4), Article e2212421120. https://doi.org/10.1073/pnas.2212421120

Cech

E. A.

Blair-Loy

(2019). The changing career trajectories of new parents in STEM. Proceedings of the National Academy of Sciences, USA, 116(10), 4182–4187. https://doi.org/10.1073/pnas.1810862116

Cole

S. R.

Platt

R. W.

Schisterman

E. F.

Chu

Westreich

Richardson

Poole

(2009). Illustrating bias due to conditioning on a collider. International Journal of Epidemiology, 39(2), 417–420. https://doi.org/10.1093/ije/dyp334

Dawid

A. P.

(2000). Causality without counterfactuals (with discussion). Journal of the American Statistical Association, 95, 407–448. https://doi.org/10.2307/2669377

Dawid

A. P.

Faigman

D. L.

Fienberg

S. E.

(2013). Fitting science into legal contexts: Assessing effects of causes or causes of effects? Sociological Methods & Research, 43(3), 359–390. https://doi.org/10.1177/0049124113515188

10.

Didelez

Kreiner

Keiding

(2010). Graphical models for inference under outcome-dependent sampling. Statistical Science, 25(3), 368–387. https://doi.org/10.1214/10-STS340

11.

Elwert

Winship

(2014). Endogenous selection bias: The problem of conditioning on a collider variable. Annual Review of Sociology, 40(1), 31–53. https://doi.org/10.1146/annurev-soc-071913-043455

12.

Erosheva

E. A.

Grant

Chen

M.-C.

Lindner

M. D.

Nakamura

R. K.

Lee

C. J.

(2020). NIH peer review: Criterion scores completely account for racial disparities in overall impact scores. Science Advances, 6(23), Article eaaz4868. https://doi.org/10.1126/sciadv.aaz4868

13.

Ferguson

K. D.

McCann

Katikireddi

S. V.

Thomson

Green

M. J.

Smith

D. J.

Lewsey

J. D.

(2020). Evidence synthesis for constructing directed acyclic graphs (ESC-DAGs): A novel and systematic method for building directed acyclic graphs. International Journal of Epidemiology, 49(1), 322–329. https://doi.org/10.1093/ije/dyz150

14.

Gaskin

D. J.

Thorpe

R. J.

McGinty

E. E.

Bower

Rohde

Young

J. H.

LaVeist

T. A.

Dubay

(2013). Disparities in diabetes: The nexus of race, poverty, and place. American Journal of Public Health, 104(11), 2147–2155. https://doi.org/10.2105/AJPH.2013.301420

15.

Geneletti

Mason

Best

(2011). Commentary: Adjusting for selection effects in epidemiologic studies: Why sensitivity analysis is the only “solution.” Epidemiology, 22(1), 36–39. http://www.jstor.org/stable/29764677

16.

Gische

West

S. G.

Voelkle

M. C.

(2021). Forecasting causal effects of interventions versus predicting future outcomes. Structural Equation Modeling: A Multidisciplinary Journal, 28(3), 475–492. https://doi.org/10.1080/10705511.2020.1780598

17.

Glymour

M. R.

(2014). Commentary: Race and sex are causes. Epidemiology, 25(4), 488–490. https://doi.org/10.1097/EDE.0000000000000122

18.

Glymour

M. M.

(2006). Using causal diagrams to understand common problems in social epidemiology. In Oakes

J. M.

Kaufman

J. S.

(Eds.), Methods in social epidemiology (pp. 393–428). Jossey-Bass/Wiley.

19.

Goldin

Rouse

(2000). Orchestrating impartiality: The impact of “blind” auditions on female musicians. American Economic Review, 90(4), 715–741. https://doi.org/10.1257/aer.90.4.715

20.

Greenland

(2003). Quantifying biases in causal models: Classical confounding vs collider-stratification bias. Epidemiology, 14(3), 300–306. http://www.jstor.org/stable/3703850

21.

Greenland

(2017). For and against methodologies: Some perspectives on recent causal and statistical inference debates. European Journal of Epidemiology, 32(1), 3–20. https://doi.org/10.1007/s10654-017-0230-6

22.

Greiner

D. J.

Rubin

D. B.

(2011). Causal effects of perceived immutable characteristics. The Review of Economics and Statistics, 93(3), 775–785. http://www.jstor.org/stable/23016076

23.

Grosz

M. P.

Rohrer

J. M.

Thoemmes

(2020). The taboo against explicit causal inference in nonexperimental psychology. Perspectives on Psychological Science, 15(5), 1243–1255. https://doi.org/10.1177/1745691620921521

24.

Hernán

M. A.

(2017). Invited commentary: Selection bias without colliders. American Journal of Epidemiology, 185(11), 1048–1050. https://doi.org/10.1093/aje/kwx077

25.

Hernán

M. A.

(2018). The C-Word: Scientific euphemisms do not improve causal inference from observational data. American Journal of Public Health, 108(5), 616–619. https://doi.org/10.2105/AJPH.2018.304337

26.

Hernán

M. A.

Hernández-Díaz

Robins

J. M.

(2004). A structural approach to selection bias. Epidemiology, 15(5), 615–625. https://doi.org/10.1097/01.ede.0000135174.63482.43

27.

Hernán

M. A.

Monge

(2023). Selection bias due to conditioning on a collider. The BMJ, 381, Article 1135. https://doi.org/10.1136/bmj.p1135

28.

Holland

P. W.

(1986). Statistics and causal inference. Journal of the American Statistical Association, 81(396), 945–960. https://doi.org/10.1080/01621459.1986.10478354

29.

Holland

P. W.

Rubin

D. B.

(1988). Causal inference in retrospective studies. Evaluation Review, 12(3), 203–231. https://doi.org/10.1177/0193841X8801200301

30.

Howe

C. J.

Bailey

Z. D.

Raifman

J. R.

Jackson

J. W.

(2022). Recommendations for using causal diagrams to study racial health disparities. American Journal of Epidemiology, 191(12), 1981–1989. https://doi.org/10.1093/aje/kwac140

31.

Hyde

J. S.

Bigler

R. S.

Joel

Tate

C. C.

van Anders

S. M.

(2019). The future of sex and gender in psychology: Five challenges to the gender binary. American Psychologist, 74(2), 171–193. https://doi.org/10.1037/amp0000307

32.

Kaplan

Lee

(2018). Optimizing prediction using Bayesian model averaging: Examples using large-scale educational assessments. Evaluation Review, 42(4), 423–457. https://doi.org/10.1177/0193841X18761421

33.

Krieger

(2014). On the causal interpretation of race. Epidemiology, 25(6), 937. https://doi.org/10.1097/EDE.0000000000000185

34.

Ledgerwood

Lawson

K. M.

Kraus

Vollhardt

Adetula

Leach

C. W.

Martinez

J. E.

Remedios

Tate

Todd

A. R.

Weltzien

Buchanan

N. T.

Gonzalez

Montilla Doble

L. J.

Romero-Canyas

Westgate

Zou

(2023). Disrupting racism and global exclusion in academic publishing: Recommendations and resources for authors, reviewers, and editors. PsyArXiv. https://doi.org/10.31234/osf.io/whc76

35.

Lee

B. K.

Lessler

Stuart

E. A.

(2010). Improving propensity score weighting using machine learning. Statistics in Medicine, 29(3), 337–346. https://doi.org/10.1002/sim.3782

36.

Lee

J. J.

(2012). Correlation and causation in the study of personality. European Journal of Personality, 26(4), 372–390. https://doi.org/10.1002/per.1863

37.

Loh

W. W.

Ren

(2023). Understated gender disparities due to outcome-dependent selection: Comment on Mackelprang et al. (2023). American Psychologist, 78(6), 811–813. https://doi.org/10.1037/amp0001167

38.

Cole

S. R.

Howe

C. J.

Westreich

(2022). Toward a clearer definition of selection bias when estimating causal effects. Epidemiology, 33(5), 699–706. https://doi.org/10.1097/EDE.0000000000001516

39.

Markus

K. A.

(2011). Real causes and ideal manipulations: Pearl’s theory of causal inference from the point of view of psychological research methods. In McKay Illari

Russo

Williamson

(Eds.), Causality in the sciences (pp. 240–270). Oxford University Press. https://doi.org/10.1093/acprof:oso/9780199574131.003.0012

40.

McCaffrey

D. F.

Ridgeway

Morral

A. R.

(2004). Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychological Methods, 9(4), 403–425. https://doi.org/10.1037/1082-989X.9.4.403

41.

Miller

C. C.

Katz

(2024). What researchers discovered when they sent 80,000 fake résumés to U.S. jobs. The New York Times, April 8. https://www.nytimes.com/2024/04/08/upshot/employment-discrimination-fake-resumes.html

42.

Miller

M. G.

Sutherland

J. L.

(2023). The effect of gender on interruptions at congressional hearings. American Political Science Review, 117(1), 103–121. https://doi.org/10.1017/S0003055422000260

43.

Morgan

A. C.

LaBerge

Larremore

D. B.

Galesic

Brand

J. E.

Clauset

(2022). Socioeconomic roots of academic faculty. Nature Human Behaviour, 6(12), 1625–1633. https://doi.org/10.1038/s41562-022-01425-4

44.

Pearce

(2018). Bias in matched case–control studies: DAGs are not enough. European Journal of Epidemiology, 33(1), 1–4. https://doi.org/10.1007/s10654-018-0362-3

45.

Pearce

Vandenbroucke

J. P.

(2020). Educational note: Types of causes. International Journal of Epidemiology, 49(2), 676–685. https://doi.org/10.1093/ije/dyz229

46.

Pearl

(2009). Causality: Models, reasoning and inference (2nd ed.). Cambridge University Press. https://doi.org/10.1017/CBO9780511803161

47.

Pearl

(2012). The causal foundations of structural equation modeling. In Hoyle

R. H.

(Ed.), Handbook of structural equation modeling (pp. 68–91). The Guilford Press. https://doi.org/10.21236/ADA557445

48.

Pearl

(2014). Causes of effects and effects of causes. Sociological Methods & Research, 44(1), 149–164. https://doi.org/10.1177/0049124114562614

49.

Pearl

(2018). Does obesity shorten life? Or is it the soda? On non-manipulable causes. Journal of Causal Inference, 6(2). https://doi.org/doi:10.1515/jci-2018-2001

50.

Pearl

Glymour

Jewell

N. P.

(2016). Causal inference in statistics: A primer. John Wiley & Sons.

51.

Phelan

J. C.

Link

B. G.

(2015). Is racism a fundamental cause of inequalities in health? Annual Review of Sociology, 41(1), 311–330. https://doi.org/10.1146/annurev-soc-073014-112305

52.

Pickett

K. E.

Wilkinson

R. G.

(2015). Income inequality and health: A causal review. Social Science & Medicine, 128, 316–326. https://doi.org/10.1016/j.socscimed.2014.12.031

53.

Quillian

Midtbøen

A. H.

(2021). Comparative perspectives on racial discrimination in hiring: The rise of field experiments. Annual Review of Sociology, 47, 391–415. https://doi.org/10.1146/annurev-soc-090420-035144

54.

R Core Team. (2023). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/

55.

Rosenbaum

P. R.

Rubin

D. B.

(1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1), 41–55. https://doi.org/10.1093/biomet/70.1.41

56.

Rubin

D. B.

(1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5), 688–701. https://doi.org/10.1037/h0037350

57.

Rubin

D. B.

(2010). Reflections stimulated by the comments of Shadish (2010) and West and Thoemmes (2010). Psychological Methods, 15(1), 38–46. https://doi.org/10.1037/a0018537

58.

Sbarra

D. A.

Whisman

M. A.

(2022). Divorce, health, and socioeconomic status: An agenda for psychological science. Current Opinion in Psychology, 43, 75–78. https://doi.org/10.1016/j.copsyc.2021.06.007

59.

Scharfstein

D. O.

Rotnitzky

Robins

J. M.

(1999). Adjusting for nonignorable drop-out using semiparametric nonresponse models. Journal of the American Statistical Association, 94(448), 1096–1120. https://doi.org/10.1080/01621459.1999.10473862

60.

Sen

Wasow

(2016). Race as a bundle of sticks: Designs that estimate effects of seemingly immutable characteristics. Annual Review of Political Science, 19(1), 499–522. https://doi.org/10.1146/annurev-polisci-032015-010015

61.

Sjölander

(2023). Selection bias with outcome-dependent sampling. Epidemiology, 34(2), 186–191. https://doi.org/10.1097/EDE.0000000000001567

62.

Slaughter-Acey

Simone

Hazzard

V. M.

Arlinghaus

K. R.

Neumark-Sztainer

(2023). More than identity: An intersectional approach to understanding mental-emotional well-being of emerging adults by centering lived experiences of marginalization. American Journal of Epidemiology, 192(10), 1624–1636. https://doi.org/10.1093/aje/kwad152

63.

Smith

L. H.

(2020). Selection mechanisms and their consequences: Understanding and addressing selection bias. Current Epidemiology Reports, 7(4), 179–189. https://doi.org/10.1007/s40471-020-00241-6

64.

Splawa-Neyman

Dabrowska

D. M.

Speed

T. P.

(1990). On the application of probability theory to agricultural experiments. Essay on principles. Section 9. Statistical Science, 5(4), 465–472. https://doi.org/10.1214/ss/1177012031

65.

Spoon

LaBerge

Wapman

K. H.

Zhang

Morgan

A. C.

Galesic

Fosdick

B. K.

Larremore

D. B.

Clauset

(2023). Gender and retention patterns among U.S. faculty. Science Advances, 9(42), Article eadi2205. https://doi.org/10.1126/sciadv.adi2205

66.

Thoemmes

F. J.

Kim

E. S.

(2011). A systematic review of propensity score methods in the social sciences. Multivariate Behavioral Research, 46(1), 90–118. https://doi.org/10.1080/00273171.2011.540475

67.

Thompson

C. A.

Arah

O. A.

(2014). Selection bias modeling using observed data augmented with imputed record-level probabilities. Annals of Epidemiology, 24(10), 747–753. https://doi.org/10.1016/j.annepidem.2014.07.014

68.

Turney

Wildeman

(2015). Detrimental for some? Heterogeneous effects of maternal incarceration on child wellbeing. Criminology & Public Policy, 14(1), 125–156. https://doi.org/10.1111/1745-9133.12109

69.

VanderWeele

T. J.

(2016). Commentary: On causes, causal inference, and potential outcomes. International Journal of Epidemiology, 45(6), 1809–1816. https://doi.org/10.1093/ije/dyw230

70.

VanderWeele

T. J.

Hernán

M. A.

(2012). Causal effects and natural laws: Towards a conceptualization of causal counterfactuals for nonmanipulable exposures, with application to the effects of race and sex. In Berzuini

Dawid

A. P.

Bernardinelli

(Eds.), Causality (pp. 101–113). https://doi.org/10.1002/9781119945710.ch9

71.

VanderWeele

T. J.

Robinson

W. R.

(2014). On the causal interpretation of race in regressions adjusting for confounding and mediating variables. Epidemiology, 25(4), 473–484. https://doi.org/10.1097/EDE.0000000000000105

72.

Vansteelandt

Goetghebeur

Kenward

M. G.

Molenberghs

(2006). Ignorance and uncertainty regions as inferential tools in a sensitivity analysis. Statistica Sinica, 16(3), 953–979.

73.

Weinberger

(2023). Comparing Rubin and Pearl’s causal modelling frameworks: A commentary on Markus (2021). Economics and Philosophy, 39(3), 485–493. https://doi.org/10.1017/S0266267121000353

74.

West

S. G.

Cham

Thoemmes

Renneberg

Schulze

Weiler

(2014). Propensity scores as a basis for equating groups: Basic principles and application in clinical treatment outcome research. Journal of Consulting and Clinical Psychology, 82(5), 906–919. https://doi.org/10.1037/a0036387

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.23 MB