The Target Study: A Conceptual Model and Framework for Measuring Disparity

Abstract

We present a conceptual model to measure disparity—the target study—where social groups may be similarly situated (i.e., balanced) on allowable covariates. Our model, based on a sampling design, does not intervene to assign social group membership or alter allowable covariates. To address nonrandom sample selection, we extend our model to generalize or transport disparity or to assess disparity after an intervention on eligibility-related variables that eliminates forms of collider-stratification. To avoid bias from differential timing of enrollment, we aggregate time-specific study results by balancing calendar time of enrollment across social groups. To provide a framework for emulating our model, we discuss study designs, data structures, and G-computation and weighting estimators. We compare our sampling-based model to prominent decomposition-based models used in healthcare and algorithmic fairness. We provide R code for all estimators and apply our methods to measure health system disparities in hypertension control using electronic medical records.

Keywords

equity disparity ethics fairness target study emulation conceptual model framework

Introduction

Measuring disparity is a key step in making progress toward health equity. Disparity measures underlie descriptive reports and trends and serve as benchmarks for evaluating the effects of interventions and policies (Cooper, Hill, and Powe 2002). Although the measurement of disparity is critical and there has been much discussion and debate about what constitutes a disparity (Institute of Medicine Committee on Understanding and Eliminating Racial Ethnic Disparities in Healthcare 2003; Braveman 2006; Duran and Pérez-Stable 2019), there has been limited discussion about best practices and principles for measurement of disparity, especially when using secondary data not collected for research purposes.

Conceptual models serve as important guides for the analysis and interpretation of secondary data. For example, consider the target trial framework (Hernán and Robins 2016), which lays out the hypothetical randomized controlled trial one would conduct if the goal were to estimate the effect of a treatment strategy to inform clinical decision-making. The elements of the trial (eligibility, treatment strategies, outcome follow-up) guide the design and analysis of a study based on secondary data help ensure that the measure of association has a causal interpretation that applies to (a) the population of interest, (b) treatment strategies of interest, and (c) outcomes of interest, all of which are critical for informing treatment policy decisions.

The target trial framework cannot guide a descriptive measurement of disparity where there is no intervention. Still, without a conceptual guide, the population of interest and the follow-up period that pertain to unjust processes or outcomes may be unclear which can impede appropriate policymaking. Without a conceptual model it is difficult to justify and interpret covariate adjustment in health disparities research (Kaufman 2017). Causal models have been used to define disparities (Duan et al. 2008), but they have stringent assumptions and abstract away important realities. Meanwhile, there are intense discussions about nonrandom sample selection and its impact on related concepts such as discrimination (Knox, Lowe, and Mummolo 2020; Gaebler et al. 2022). Outlining the hypothetical study one could do in the real world to measure disparity will provide clarity on these issues.

We present a novel conceptual model—the target study—to address these issues and provide a framework for emulating it. The paper is organized as follows. We begin by introducing our motivating example. The section “Conceptual Issues in Measuring Disparity” reviews key issues in disparity measurement. The section “A Target Study Conceptual Model for Measuring Disparity” presents our model under the case where investigators wish to capture all effects of nonrandom sample selection. The section “Extension of the Target Study to Address Non-Random Sample Selection” expands the model to address nonrandom sample selection by generalizing to a broader population, transporting to a different population, or estimating disparity in a counterfactual population where certain consequences of non-random sample selection are absent. The section “Emulation of the Target Study with Secondary Data” proposes data structures and estimators to emulate the target study. The section “Contributions and Comparison to Existing Literature” outlines our contributions and compares our model to others widely used to study disparity in healthcare and algorithmic fairness. The section “Discussion” discusses strengths and limitations. To aid readability, we use modular sections with ample cross-referencing so readers may skip directly to sections of interest.

Motivating Example

Consider the measurement of racial disparities in hypertension outcomes of primary care patients diagnosed with hypertension who receive care at a large regional health system in the USA. The outcomes of interest are a health-related quantity Y (e.g., hypertension control) or a healthcare decision D made by a clinician (e.g., to intensify hypertension treatment). We are concerned with average outcomes across a categorical social grouping, such as race R, where a socially disadvantaged (henceforth referred to as marginalized) group (e.g., Black persons) is denoted as $R = 1$ and a socially advantaged (henceforth referred to as privileged) group (e.g., White persons) is denoted as $R = 0$ . The available data are electronic medical records stamped at time $h$ (e.g., in minutes, hours, seconds) from office-based primary care visits over multiple years that include measures of prior hypertension, demographics $X_{h}$ (e.g., age and sex assigned at birth), comorbidity and socioeconomic status (SES) $L_{h}$ , hypertension control as of that visit, denoted by $Y_{h}$ (1: yes, 0: no), antihypertensive treatment intensification (i.e., initiation, change in dose, or change in class) within the 14 days after the visit, denoted by $D_{h}$ (1: yes, 0: no), and time-stamped enrollment in an electronic patient portal program (EPPP) for care management. The notation $h$ refers to the timing of a variable's measurement (e.g., the date/time of the visit) rather than the time at which its value is realized (e.g., some date/time before the visit). Later on, we will coarsen measurement time $h$ into some chosen unit of calendar time k (e.g., months). Finally, note that persons may have multiple visits per day or week.

Conceptual Issues in Measuring Disparity

Defining Disparity

In medicine and public health, the definition of disparity depends on whether the outcome is a health status (e.g., hypertension control $Y$ ) or a healthcare commodity (e.g., treatment intensification $D$ ). For health status outcomes, the Healthy People 2020 report committee defined disparities as “systematic, plausibly avoidable health differences adversely affecting socially disadvantaged groups” (Braveman et al. 2011). This builds upon a World Health Organization definition (Whitehead 1992) and relates to the National Institute of Minority Health and Health Disparities (NIMHD) definition (Duran and Pérez-Stable 2019): “a health difference that adversely affects disadvantaged populations, based on one or more health outcomes” where outcomes range from health behaviors, to diagnosis- or stage-specific-clinical endpoints or self-reported measures, to overall mortality. For healthcare outcomes (e.g., treatment), the Institute of Medicine (IOM) report “Unequal Treatment” (Institute of Medicine Committee on Understanding and Eliminating Racial Ethnic Disparities in Healthcare 2003) defines disparities as

differences in the quality of care that are not due to access-related factors or clinical needs, preferences, and appropriateness of intervention…[where] analysis is focused at two levels: 1) the operation of the health systems and the legal and regulatory climate…; 2) discrimination at the individual, patient-provider level. (emphasis added)

Disparity reflects society's failure to achieve equity in health, defined as “everyone having a fair and just opportunity to be as healthy as possible” (Whitehead 1992; Braveman et al. 2011).¹

Temporal Framing

To aid decision-makers, community members, and other stakeholders, disparity refers not to a universal, general phenomenon, but to outcomes among people nested in a particular context at a point in (or span of) calendar time. For example, we could describe the disparity in prevalent uncontrolled hypertension for primary care visits during each month during the peak of the COVID-19 pandemic in 2020–2022. If we include one care episode per person per month, we can meaningfully summarize disparity over the entire period by averaging over the month-specific estimates of disparity. Such a summary measure would be interpreted as an average disparity over populations indexed by calendar months. By accounting for calendar time when producing such summary estimates, we avoid confounding by time-specific trends in enrollment and the outcome. To obtain a summary estimate of disparity between social groups with the same person-time experience of the health system, the summary must properly account for calendar time.

Allowability

In the IOM definition, disparity compares groups who are similarly situated (i.e., balanced) on “allowable” covariates. Allowable covariates are those whose differential distribution does not lead to inequitable outcomes. For a distributed good outcome D (e.g., healthcare) they are factors that, on moral arguments, are appropriate for determining allocation (Jackson 2021). For example, disparities in healthcare treat clinical need as allowable based on clinical guidelines (McGuire et al. 2006; Cook et al. 2009). For a state outcome Y (e.g., health), the differential distribution of allowable covariates does not contribute to worse outcomes among the marginalized group (Jackson 2021). For example, if the marginalized group is younger and increased age predicts worse hypertension control, the younger age of the Black population does not contribute to the disparate distribution of hypertension control at the population-level. Not treating age as allowable could mask disparity from barriers to hypertension control that the Black population disproportionately faces (e.g., neighborhood disadvantage and limited options for healthy diet, physical activity, and pharmacies; Mueller et al. 2015).

Is a Causal Framing of Disparity Necessary?

A fundamental question in conceiving of disparity is how the groups come to be similarly situated (i.e., balanced) on the allowables. Many authors conceive of disparity as comparing populations that are similarly situated through an intervention where an external actor makes existing groups similar by changing the value(s) of each person's allowable covariate(s)² (McGuire et al. 2006; Duan et al. 2008; Cook et al. 2009; Kaufman 2017). For example, disparate pulse oximeter performance is assessed in desaturation studies where hypoxia is induced among healthy volunteers (Food and Drug Administration 2024). Disparate healthcare utilization is assessed in statistical analyses that hypothetically modify individuals’ health and project utilization after this modification. This causal reading of the IOM definition is justified by its phrase “not due to,” interpreted as “not caused by,” where disparity compares social groups who are made similar (on the allowables) by intervention, to isolate the mediating role of inappropriate factors (e.g., SES) in producing differences in healthcare utilization (McGuire et al. 2006). But the phrase “not due to” also permits a noncausal framing where, by design, disparity compares social groups who are already alike on the allowables. Early work that applied the IOM definition of disparity was motivated by non-causal studies where patients of different social groups with the same underlying need for medical treatment are compared in their distribution of appropriate medical treatment received, and actually framed such studies as IOM concordant (Cook et al. 2009: 2–3). We argue that approaches that balance allowables by design (e.g., our model) align with the IOM definition. More broadly, arguments about the exact causes of disparity are not needed to view disparity with concern (Braveman 2006). Moral concern may arise by the impact that disparity has on the human rights of marginalized groups (Hutler 2022).

Non-Random Sample Selection

Some frameworks for health equity acknowledge that non-random sample selection may impact a disparity measure (Kilbourne et al. 2006). Consider the causal graph of Figure 1a, where variables $W_{k}$ establishing eligibility $Q_{k}$ (eligibility-related variables) are prior hypertension, established care in the health system, enrollment in the electronic portal program (EPPP), and current visit, with k indexing the calendar time at which a variable is measured, and J indexing the outcome's follow-up time.

Figure 1.

Causal directed acyclic graphs depicting causal relationships between historical processes H, race R, demographics age and sex $X_{k}$ , and comorbidities and adult socioeconomic status $L_{k}$ , hypertension control $Y_{k + J}$ , and eligibility-related variables: $W_{k}$ (prior hypertension, established care, electronic patient portal program [EPPP] enrollment, current visit [in (a)]), $W_{k}^{‡}$ (prior hypertension and established care [in (c)], and additionally current visit [in (b)]), $W_{k}^{†}$ (EPPP enrollment [in (b) and (c)], and $W_{k}^{≀}$ (current visit [in (c)]), all measured at calendar time k. Full eligibility $Q_{k}$ is based on all eligibility-related variables $W_{k}$ . Similarly, indicators of partial eligibility $Q_{k}^{‡}$ , $Q_{k}^{†}$ , $Q_{k}^{≀}$ are based on their corresponding subsets of eligibility-related variables $W_{k}^{‡}$ , $W_{k}^{†}$ , $W_{k}^{≀}$ . The subscript k marks calendar time and J the interval of time until $Y_{k + J}$ is measured. Dashed lines emphasize selective paths that become unblocked when conditioning on the indicator of eligibility $Q_{k}$ or all indicators of partial eligibility ( $Q_{k}^{‡}$ , $Q_{k}^{†}$ , and $Q_{k}^{≀}$ ). Note that the subscript k refers to the calendar time at which a variable (e.g., a persons level of SES in adulthood) is measured (e.g., the calendar time of the visit) rather than the time at which its value was realized (e.g., some time before the visit).

Lack of generalizability occurs when a disparity measure is unbiased for the study sample (e.g., those enrolled in the EPPP) but biased for the broader population of interest (e.g., the entire health system). Lack of transportability occurs when the disparity measure is unbiased for the study sample but biased for a different population of interest (e.g., not enrolled in EPPP) (Smith 2020). Either scenario arises when (i) a risk factor $L_{k}$ (e.g., SES) has different associations with the outcome $Y_{k + J}$ (e.g., hypertension control) across social groups R and (ii) the risk factor $L_{k}$ 's distribution depends on eligibility-related variables $W_{k}$ (e.g., it differs by EPPP enrollment), even with no difference in eligibility $Q_{k}$ across social groups R.³ They also arise when the effect of eligibility-related variables on the outcome differs across social groups.

Collider stratification creates an association between social group R and the outcome $Y_{k + J}$ among those who are eligible $Q_{k} = 1$ . (VanderWeele and Robinson 2014) It can occur when eligibility-related variables $W_{k}$ (e.g., EPPP enrollment) are affected by (i) social group R and (ii) a risk factor $L_{k}$ (e.g., SES) for the outcome $Y_{k + J}$ (Hernán, Hernández-Díaz, and Robins 2004; Elwert and Winship 2014). Recent work (Shahar and Shahar 2017; Nguyen, Dafoe, and Ogburn 2019) implies that collider stratification occurs if the probability ratio of being eligible $Q_{k} = 1$ (comparing levels of social group $R$ ) varies across levels of a risk factor $L_{k}$ for the outcome $Y_{k + J}$ , even if $L_{k}$ has homogeneous associations with $Y_{k + J}$ across social groups R.

Because collider stratification due to non-random sample selection can induce an association between social group and the outcome among the sample (e.g., those enrolled in the EPPP) that is not present among the broader population (e.g., irrespective of EPPP enrollment), it is often viewed as a bias (VanderWeele and Robinson 2014; Knox, Lowe, and Mummolo 2020; Rojas-Saunero, Glymour, and Mayeda 2023). There are reasons to include contributions of collider stratification to disparity. First, if disparity is measured in a meaningfully defined population of interest,⁴ the contributions are substantively grounded as they reflect that population of interest (VanderWeele and Robinson 2014). Consider when eligibility is defined by a condition (e.g., hypertension) that gives meaning to the outcome (e.g., hypertension control). For example, persons without history of hypertension can have elevated blood pressure due to hypertension onset or due to exercise, but these reasons do not represent uncontrolled hypertension. The disparity is only defined among eligible persons. Second, for a meaningful population of interest, when collider stratification disadvantages the marginalized group on baseline covariates leading to a worse outcome distribution compared to the privileged group, this aligns with definitions of disparity (see the subsection “Defining Disparity”). Third, the contribution of collider stratification is amenable to intervention by changing how covariates affect eligibility or the outcome.⁵ However, when collider stratification advantages the marginalized group, it may mask disparity from other sources and investigators may choose to exclude it from disparity.

A Target Study Conceptual Model for Measuring Disparity

Overview

We now describe the elements of our conceptual model for measuring disparity, the target study. In this heuristic, an eligible population [denoted as $Q_{k}$ (1: eligible, 0: otherwise)] from two or more social groups (e.g., Black $R = 1$ and White persons $R = 0$ ) are selected from an eligible source population (e.g., established care in system, prior hypertension, enrolled in EPPP, current visit) within a given moment or span of coarsened time k representing the enrollment period (e.g., the month of January 2023). This selection occurs at the end of the enrollment window through a two-stage sampling strategy. The first stage of sampling addresses non-random selection into the study. The second stage of sampling similarly situates (i.e., balances) the social groups on the allowable covariates (if any are chosen) so that for both social groups, the allowables follow a distribution from a within-sample standard population (chosen by the investigator). Thus, the disparity estimate in the final sample is not due to differences in the allowable distributions. After both stages of sampling, those enrolled are followed for a specified period of time. A specified statistical comparison of outcomes across social groups provides the measure of disparity $ψ_{k}$ which is indexed by the enrollment period k.⁶ The additive and ratio disparity are contrasts of mean outcomes (prevalence or risk with binary outcomes):

ψ_{k}^{a d d} = μ_{k} (1) - μ_{k} (0) and ψ_{k}^{r e l} = μ_{k} (1) / μ_{k} (0)

(1)

where

μ_{k} (r)

denotes

E_{Ω} [Y_{k + J} | Q_{k} = 1, R = r, k]

, the average outcome

Y_{k + J}

in social group

R = r

who are eligible

Q_{k} = 1

(based on criteria for eligibility-related variables

W_{k}

) and enrolled in the target study sample

Ω

at calendar time k with follow-up time

(0, \dots, J)

Under this conceptual model, the target population (in which inference is made) operationally consists of the source population that, within the enrollment period, is eligible, sampled, and enrolled. That is, in real life, if one wanted to make inferences about disparity in a population that exists within a certain span of time, one would carry out the protocol of the target study. At any calendar time unit k, a person only enrolls once into a target study. (Each unit of calendar time is of equal length). Results of studies $ψ_{k}$ carried out at distinct calendar times k can be aggregated into a summary measure of disparity $Ψ$ . A weighted average of disparity measures $ψ_{k}$ indexed at each calendar time k can be taken as:

Ψ = \frac{\sum_{k} γ_{k} ψ_{k}}{\sum_{k} γ_{k}}

(2)

where

γ_{k}

is a weight specific to calendar time k.

We discuss the choice of the weights $γ_{k}$ in the subsection “Statistical Analysis” where we discuss the analysis of target studies.

We begin with our default model (Design 1) where we choose to enroll all eligible persons (or a simple random sample of eligible persons) during the first stage of sampling. Recall that eligibility criteria cause persons to be non-randomly selected from the source population, so our default model includes all contributions of this non-random selection of persons to disparity. Adaptations to deal with such non-random sample selection (i.e., Designs 2, 3, or 4) are discussed in the section “Extension of the Target Study to Address Non-Random Sample Selection”. Design 1 has minimal structural constrains on the underlying causal relations between all relevant variables.⁷

Enrollment Window(s)

To conduct a target study, we first choose a specific moment or narrow span in calendar time, denoted by k, to enroll persons. This requires choosing a level of granularity for calendar time (e.g., hours, days, months, years) and a specific moment k as the enrollment period (e.g., the month of January 2023). For each person, all eligibility-related variables $W_{k}$ (e.g., prior diagnosis of hypertension, established care in the health system, with a current visit in the window) and all allowable covariates $A_{k}$ (e.g., age, sex) are defined and measured at or before the end of this period. Thus, relative to the end of the enrollment window, eligibility $Q_{k}$ is based on having acceptable current or prior values $w_{k}$ of the eligibility-related variables $W_{k}$ . At any calendar time k, a person may only enroll in one study.

Enrollment Groups

The definitions of disparity in the subsection “Defining Disparity” compare groups with persistently different levels of social advantage, privilege, power, wealth, or prestige because of their position in society (Braveman 2006). Within the USA, the NIMHD's concept of a disparity specifies social groups such as racial and ethnic minoritized groups (versus majoritized groups), underserved rural residents (versus urban residents), lower socioeconomic status (versus higher socioeconomic status), and sexual and gender minorities (versus sexual and gender majorities) (Duran and Pérez-Stable 2019). The National Institute of Mental Health (NIMH) further specifies groups with serious mental illness (versus those without) who have experienced long-standing stigmatization, discrimination, social exclusion, and loss of agency in society. Reflecting an intersectional perspective that mechanisms of social injustice combine to uniquely shape experience (Collins and Bilge 2020), social groups may be defined by joint membership along multiple axes (e.g., Black women versus White men) (Jackson, Williams, and VanderWeele 2016). This list is not exhaustive, and our model accommodates categorical⁸ and time-varying definitions⁹ of social groups.

Eligibility Criteria

The eligibility criteria can define the population of interest, reflecting issues of scope, societal level, and timing. In terms of scope, the criteria can restrict to places (e.g., the Mid-Atlantic region), institutions (e.g., a particular health system), or shared experiences or conditions (e.g., diagnosis of hypertension) that define a meaningful population. In terms of societal level, the criteria can focus on persons under the purview of a specific decision-maker (e.g., a clinical provider), facility (e.g., a clinic), or institution (e.g., a health system). In terms of timing, the criteria can focus on critical life stages, such as birth or a milestone event (e.g., myocardial infarction) where outcomes (e.g., appropriate medical treatment) are given meaning by that event. From here, we will use the following criteria: prior hypertension, established care in the health system, EPPP enrollment before calendar time k, and a recent primary care visit within calendar time k.

Allowable Covariates

We choose the covariates that the social groups are to be similarly situated (i.e., balanced) on by the end of the enrollment process. These allowable covariates $A_{k}$ are ones not implicated in generating disparate outcomes among the marginalized group, as described in the subsection “Allowability”. Because the allowables $A_{k}$ are used to guide the enrollment process by defining the sampling fractions, they must be defined and measured by the end of the enrollment window when enrollment occurs. No allowable covariates may be chosen at all (i.e., $A_{k} = ⊘$ ), as discussed the subsection “Enrollment Process (for Design 1)”). Although the eligibility variables $W_{k}$ (used to form the sampling frame) are separate from the allowables $A_{k}$ (used to define the sampling fractions), $W_{k}$ are conceptually deemed allowable as the enrollment process similarly situates social groups on them.

Standard Distribution

During enrollment we sample individuals so that the distribution of allowables $A_{k}$ is the same for social groups, following a within-sample standard distribution chosen by the investigator. If the disparity varies across the allowables $A_{k}$ (e.g., age is considered to be allowable and disparity is higher in mid-life) the choice of the standard population, denoted by $T = 1$ , that defines the standard distribution may impact the magnitude and direction of the disparity measure $ψ_{k}$ . The choice may be motivated by normative, theoretical, or practical concerns. If the marginalized group is the standard, its experience is emphasized (Thurber et al. 2022). Then, the disparity measure $ψ_{k}$ compares the experience of the marginalized group (enrolled through simple random sampling) to the experience of a privileged group (enrolled through stratified sampling) that shares the marginalized group's distribution of allowable covariates $A_{k}$ (e.g., its age structure).

To balance the allowables $A_{k}$ across social groups R, the values of $A_{k}$ found among the standard population must be within those of each social group $R = r$ at each time k, which is an overlap assumption:

P (A_{k} = a_{k} | Q_{k} = 1, R = r, k) > 0

(3)

for all values

a_{k}

with

P (A_{k} = a_{k} | Q_{k} = 1, T = 1, k) > 0

and all k.

The overlap assumption (3) requires that at each time k we look among the standard population (denoted by $T = 1$ ) and note the pattern of allowables $A_{k}$ covariate values. Then for each of those strata, we need to find (among eligible persons) members of each social group $R = r$ . Otherwise, the sampling strategies we now describe will not be able to balance the allowables according to the standard distribution.

Enrollment Process (for Design 1)

At any time k, each person enrolls (once) into one study through multiple stages of sampling.¹⁰ Limiting participation to a single enrollment in a single study per unit of calendar time maps inference to well-defined populations at each unit of calendar time. There is a pre-stage where eligible individuals are selected, a first stage that addresses the contribution of selective mechanisms to disparity $ψ_{k}$ , and a second stage that balances allowable covariates $A_{k}$ in the final sample. For any target study design $D$ , at each stage $ℓ$ , a person is selected via a known sampling fraction $α_{k, ℓ}^{D}$ defined as the ratio of sample $S_{k, ℓ}$ 's size $N_{k, ℓ}$ to the sampling frame $S_{k, ℓ - 1}$ 's size $N_{k, ℓ - 1}$ (Lohr 2022). The sampling fractions $α_{k, ℓ}^{D} (v_{k})$ may be stratified by and vary across covariate levels $V_{k} = v_{k}$ .

We assume that the sampling is process is innocuous with respect to the outcome:

f_{Y}^{s a m p l e d} (Y_{k + J} | Q_{k} = 1, V_{k} = v_{k}, k) = f_{Y}^{u n s a m p l e d} (Y_{k + J} | Q_{k} = 1, V_{k} = v_{k}, k)

(4)

where

f (\cdot)

is the conditional probability mass function for discrete outcomes (conditional density for continuous outcomes), for all k.

In words, the conditional distribution of the outcome given that persons are eligible and have covariate values $V_{k} = v_{k}$ is the same for those enrolled and those not enrolled. Sampling does not affect the outcome. A person is randomly selected (without replacement) using a probability equal to their rescaled sampling fraction $α_{k, ℓ}^{*, D} (v)$ that is bounded between zero and one.¹¹ We draw a uniformly distributed random number bounded between zero and one (i.e., $U [0, 1]$ ) and, if it is equal to or less than a person's rescaled sampling fraction, they are included (Sunter 1977). The sampling process is separate for each social group $R = r$ .

The pre-stage $ℓ = 0$ is at the end of the enrollment window for time k. All eligibility criteria $W_{k}$ and all covariates $V_{k}$ used in the enrollment process are measured by this point. From the source population $P_{k} (r)$ , we select the eligible population $S_{k, 0} (r)$ of chosen size $N_{k, 0} (r)$ . Full eligibility $Q_{k}$ is defined as:

Q_{k} = I (W_{k} \in w)

(5)

where

w

represents the eligible values

w_{k}

of the eligibility-related variables

W_{k}

In the first stage $ℓ = 1$ , a sample $S_{k, 1}^{D 1} (r)$ of chosen size $N_{k, 1} (r)$ is selected from the first-stage sampling frame $S_{k, 0} (r)$ of size $N_{k, 0} (r)$ (eligible persons) using the sampling fraction $α_{k, 1}^{D 1} (r)$ :

α_{k, 1}^{D 1} (r) = \frac{N_{k, 1} (r)}{N_{k, 0} (r)}

(6)

Recall that eligible persons have been non-randomly chosen from the entire source population. They may all be selected at this stage [i.e., when

N_{k, 0} (r) = N_{k, 1} (r)

], otherwise

α_{k, 1}^{D 1} (r)

leads to simple random sampling of eligible persons in each social group

R = r

. Under either choice, all selective mechanisms, including that of non-generalizability and collider-stratification, contribute to the disparity estimate

ψ_{k}

In the second stage $ℓ = 2$ , a sample $S_{k, 2}^{D 1} (r)$ of chosen size $N_{k, 2} (r)$ is selected from the second-stage sampling frame $S_{k, 1}^{D 1} (r)$ (selected in stage one) of size $N_{k, 1} (r)$ using the sampling fraction $α_{k, 2}^{D 1} (r, a_{k})$ :

α_{k, 2}^{D 1} (r, a_{k}) = \frac{N_{k, 1} (r)}{N_{k, 2} (r)} \times \frac{P (a_{k} | Q_{k} = 1, T = 1, k)}{P (a_{k} | Q_{k} = 1, R = r, k)}

(7)

The final sample

S_{k, 2}^{D 1}

[collapsed over R ] is where disparity

ψ_{k}

is measured. If no allowable covariates are specified, stage 2 enrolls all persons [i.e., when

N_{k, 1} (r) = N_{k, 2} (r)

] or selects by simple random sampling for each social group

R = r

. Otherwise, the sampling fractions

α_{k, 2}^{D 1} (r, a_{k})

similarly situates the social groups on the allowables

A_{k}

according to their distribution in the standard population, denoted

T = 1

Time Zero

Time zero indicates the temporal anchor during calendar time for the start of follow-up for the outcomes $Y_{k + j}$ , where the time-scale of follow-up (i.e., time on study) is denoted as $j = 0, 1, 2, \dots, J$ (with J the end of follow-up). Time-zero determines when outcomes are counted towards disparity. We propose to anchor time zero at calendar time k, for two reasons: (1) to avoid differential alignment of outcomes from the point of eligibility; (2) to avoid underestimating disparity from a relevant point in time. For example, disparities (e.g., in appropriate treatment) may occur early after enrollment (e.g., after hospital discharge for myocardial infarction). If early treatment is critical for preventing adverse outcomes (e.g., a second myocardial infarction), we want to characterize that disparity by setting time zero right after enrollment.

Follow-up and Outcome Ascertainment

We specify how outcomes are defined (e.g., incident or prevalent), what constructs are considered, how they are measured, and for how long they will be assessed. These details add precision that can aid future interventional work or policy actions to reduce disparity. For example, if our enrollment window is indexed around an incident diagnosis of hypertension, resolving disparities early on may require a focus on addressing patient knowledge, awareness, and structures that prevent adherence to a healthy diet and regular physical activity. Resolving disparities five years post-onset also involves supports to improve medication adherence, enable home-based blood-pressure monitoring, and resources and protocols to facilitate timely and appropriate treatment intensification by clinicians for patients with uncontrolled hypertension.

Statistical Analysis

Last, we need to specify how the data will be analyzed. We choose the scale (e.g., additive or ratio) and coding of the outcome (shortfall [e.g., uncontrolled hypertension] or gain [controlled hypertension]) for reporting disparity. For repeatedly measured outcomes or time-to-event outcomes, we also choose whether to present measures indexed at the end of follow-up (i.e., $k + J$ ) or to present a graphical summary of the disparity or of group-specific measures indexed at each time $k + j$ during follow-up (i.e., at time on study $j = 0, \dots, J$ ) such as group-specific growth curves, cumulative incidence curves, or survival curves.

If there are multiple target studies across calendar times k, we can always present trends in disparity or group-specific outcomes across calendar time k. We may also provide a summary measure $Ψ$ , which is a weighted average of calendar-specific disparity estimates $ψ_{k}$ for outcomes indexed at any point during follow-up $k + j$ (or the end of follow-up $k + J$ ), as in (2). To summarize the additive disparity $ψ_{k}^{a d d}$ using (2), we suggest the weight $γ_{k} = P (k | Q_{k} = 1, T = 1)$ which is the probability that an instance of enrollment among the standard population has calendar time k. To summarize the relative disparity $ψ_{k}^{r e l}$ , the weight $γ_{k}$ is multiplied by $μ_{k} (0)$ , the mean outcome among the enrolled privileged group at time k (Miettinen 1972). This approach standardizes the additive disparity $ψ_{k}^{a d d}$ (or the relative disparity $ψ_{k}^{r e l}$ ) to the distribution of enrollment timing among the standard population (see the subsection “Standard Distribution”), removing impacts of differential enrollment timing (see the subsection “Temporal Framing”).¹² The summary measure $Ψ$ is interpretable as a difference in standardized mean outcomes (for $ψ_{k}^{a d d}$ ) or a ratio of standardized mean outcomes (for $ψ_{k}^{r e l}$ ). Thus, one may alternatively pool instances of enrollment and weight each instance in the statistical analysis by $λ_{k} (r)$ :

λ_{k} (r) = \frac{P (k | Q_{k} = 1, T = 1)}{P (k | Q_{k} = 1, R = r)}

(8)

Such a pooled analysis permits aggregation of trends in the outcome over the target study timescale j.

A person may be eligible many times (e.g., they may have prior hypertension at multiple visits). Of course, under certain eligibility criteria (e.g., recent onset of hypertension) a person may only be eligible at one point in calendar time. When persons enroll in multiple studies over calendar time, this leads to correlated outcomes which can be addressed by using a stratified cluster bootstrap (Davison and Hinkley 1997; Field and Welsh 2007; Ren et al. 2010; Huang 2018) to obtain confidence intervals.

Extension of the Target Study to Address Nonrandom Sample Selection

Overview

When investigators wish to include all contributions of non-random sample selection to disparity, the target study described in the section “A Target Study Conceptual Model for Measuring Disparity” is sufficient. To address non-random sample selection, we introduce sampling strategies that allow data from the eligible population described in the previous section, denoted by $Q_{k} = 1$ , to infer about disparity that may exist in a broader population (Design 2), a different population (Design 3), or a counterfactual population (Design 4) in which collider stratification from selecting on full or partial eligibility does not occur. Designs 2 and 3 allow inference to studies where eligibility criteria are changed whereas Design 4 allows inference to target studies that, through intervention, change who is eligible. Design 4, if chosen, adds a causal element to the model in that the eligibility-related variables are intervened on, but Design 4 remains descriptive with respect to social group membership and allowables. We will see that, unlike Designs 2 and 3, Design 4 may be used when eligibility-related variables affect the outcome.

In addition to the innocuous sampling assumption (4) and variants of the overlap assumption (3), each modified sampling design relies on independence (or exchangeability) assumptions and positivity assumptions. In each design, these additional assumptions may partly depend on a set of non-allowable covariates $N_{k}$ (e.g., socioeconomic status) that are measured by the time of enrollment k and, when specified, are factored into the sampling design. Specifically, the first-stage and second-stage sampling fractions $α_{k, 1}^{D} (\cdot)$ and $α_{k, 2}^{D} (\cdot)$ may depend on the allowables $A_{k}$ and non-allowables $N_{k}$ . When the target study design uses non-allowalbes $N_{k}$ , ultimately, it does not balance them across social groups $R = r$ .¹³

Designs 2 and 3 operate under the same minimal structural constraints as Design 1.¹⁴ The sampling strategy for Design 4, invoking counterfactuals, has more constraints which we discuss later. Aside from the sampling plan and aggregation over calendar time, the other elements are unchanged from Design 1.

Design 2: Sampling as if from a Broader Population (Generalizability)

As in Figure 1b, suppose that the indicator of full eligibility $Q_{k}$ (1: yes, 0: no) is based on partial eligibility $Q_{k}^{‡}$ (1: yes if has prior hypertension, established care in the health system, with a visit within k, 0: no otherwise) and partial eligibility $Q_{k}^{†} = 1$ (1: yes if enrolled in the EPPP, 0: otherwise), i.e., $Q_{k} = Q_{k}^{†} \times Q_{k}^{‡}$ . Suppose we must study persons with prior hypertension, established care, a current visit who are enrolled in the EPPP (i.e., $Q_{k} = 1$ as in Figure 2a) but want to assess disparity regardless of EPPP enrollment (i.e., a broader version of eligibility $Q_{k} *$ =1 based on $Q_{k}^{‡} = 1$ alone, as in Figure 2b). For this we use first-stage sampling fractions $α_{k, 1}^{D 2} (r, a_{k}, n_{k})$ that act as if we sample the broader population defined by $Q_{k}^{‡} = 1$ alone:

α_{k, 1}^{D 2} (r, a_{k}, n_{k}) = \frac{N_{k, 1} (r)}{N_{k, 0} (r)} \times \frac{P (n_{k} | Q_{k}^{‡} = 1, R = r, a_{k}, k)}{P (n_{k} | Q_{k} = 1, R = r, a_{k}, k)} \times \frac{P (a_{k} | Q_{k}^{‡} = 1, R = r, k)}{P (a_{k} | Q_{k} = 1, R = r, k)}

(9)

Figure 2.

Venn diagrams depicting populations eligible and inferred to under Designs 2, 3, and 4 using partial eligibility indicators (labeled $Q_{k}^{‡}$ , $Q_{k}^{†}$ , and $Q_{k}^{≀}$ [1:yes, 0: no]) based on eligibility-related variables $W_{k}^{‡}$ , $W_{k}^{†}$ , and $W_{k}^{≀}$ (where $W_{k}^{†}$ affects $W_{k}^{≀}$ ). The first row pertains to a target study (Design 2) that (a) enrolls a fully eligible population defined by $Q_{k} = 1$ , i.e., $Q_{k}^{‡} = 1$ and $Q_{k}^{†} = 1$ but (b) infers to a broader population defined by $Q_{k}^{‡} = 1$ . The second row pertains to a target study (Design 3) that (c) enrolls a fully eligible population defined by $Q_{k} = 1$ , i.e., $Q_{k}^{‡} = 1$ and $Q_{k}^{†} = 1$ but (d) infers to a different population defined by $Q_{k}^{‡} = 1$ and $Q_{k}^{†} = 0$ . The third row pertains a target study (Design 4) that (e) enrolls a fully eligible population defined by $Q_{k} = 1$ , i.e., $Q_{k}^{‡} = 1$ , $Q_{k}^{†} = 1$ , and $Q_{k}^{≀} = 1$ but (d) infers to a fully eligible counterfactual population defined by $Q_{k}^{G_{k}} = 1$ , i.e., $Q_{k}^{‡} = 1$ , ${Q_{k}^{†}}^{G_{k}} = 1$ , and ${Q_{k}^{≀}}^{G_{k}} = 1$ after an intervention $G_{k}$ that eliminates collider stratification through $W_{k}^{†}$ . A target study (Design 1) may enroll and infer to the same fully eligible population defined by $Q_{k} = 1$ .

$N_{k, 0} (r)$ is the size of the first-stage sampling frame $S_{k, 0} (r)$ , the source population $P_{k} (r)$ with $Q_{k} = 1$ . We use second-stage sampling fractions $α_{k, 2}^{D 2} (r, a_{k})$ to create a final sample $S_{k, 2}^{D 2} (r)$ where the allowables $A_{k}$ are balanced across groups to follow their distribution in the standard population, defined among $S_{k, 2}^{D 2}$ [collapsed over R]:

α_{k, 2}^{D 2} (r, a_{k}) = \frac{N_{k, 2} (r)}{N_{k, 1} (r)} \times \frac{P (a_{k} | Q^{‡} = 1, T = 1, k)}{P (a_{k} | Q^{‡} = 1, R = r, k)}

(10)

When the marginalized group in the broader population is the standard population, the marginalized group undergoes simple random sampling in stage 2, so that its expected outcome in the target study and in the broader population are the same (i.e., the inference for the marginalized group is purely descriptive of the marginalized group in the broader population).

The design permits inference to the broader population under an independence assumption:

Y_{k + J} ∐ Q_{k}^{†} | Q_{k}^{‡} = 1, R = r, N = n, A = a, k for all k

(11)

In words, the outcome

Y_{k + J}

(e.g., hypertension control) must be independent of partial eligibility

Q_{k}^{†}

(e.g., based on EPPP enrollment) among the broader population (denoted by

Q_{k}^{‡} = 1

) given social group

R = r

, the allowables

A_{k}

(e.g., age and sex), and non-allowables

N_{k}

(e.g., SES). This assumption would hold in Figure 1b if

W_{k}^{†}

(e.g., EPPP enrollment) did not affect the outcome

Y_{k + J}

(i.e., the arrow

W_{k}^{†} \to Y_{k + J}

is absent). A positivity assumption is also required:

P (Q_{k}^{†} = 1 | Q_{k}^{‡} = 1, R = r, N_{k} = n_{k}, A_{k} = a_{k}, k) > 0

(12)

for all

(a_{k}, n_{k})

with

P (N_{k} = n_{k}, A_{k} = a_{k} | Q_{k}^{‡} = 1, R = r, k) > 0

and all k.

For each social group $R = r$ , we note at each time k the pattern of allowable $A_{k}$ and non-allowable $N_{k}$ covariate values among the broader population (e.g., denoted only by $Q_{k}^{‡} = 1$ ). For each pattern, we must observe persons who belong to the population that our target study enrolls (i.e., persons who meet our narrower version of full eligibility $Q_{k} = 1$ based on partial eligibility indicators $Q_{k}^{†} = 1$ and $Q_{k}^{‡} = 1$ ). The overlap assumption (3) needs to hold among the broader population defined by $Q_{k}^{‡} = 1$ .¹⁵

Design 3: Sampling as if from a Different Population (Transportability)

Suppose again that full eligibility $Q_{k} = 1$ is based on partial eligibility $Q_{k}^{‡} = 1$ (prior hypertension, established care, current visit) and partial eligibility $Q_{k}^{†} = 1$ (enrolled in the EPPP) as in Figure 1b. The target study again enrolls those who are fully eligible (i.e., $Q_{k} = 1$ as in Figure 2c) but we want to assess disparity for those not enrolled in the EPPP (i.e., a different version of full eligibility $Q_{k}^{* *} = 1$ based on $Q_{k}^{‡} = 1$ and $Q_{k}^{†} = 0$ , as in Figure 2d). For this we use modified first-stage sampling fractions $α_{k, 1}^{D 3} (r, a_{k}, n_{k})$ that act as if we sample the different population defined by $Q_{k}^{‡} = 1$ and $Q_{k}^{†} = 0$ :

α_{k, 1}^{D 3} (r, a_{k}, n_{k}) = \frac{N_{k, 1} (r)}{N_{k, 0} (r)} \times \frac{P (n_{k} | Q_{k}^{†} = 0, Q_{k}^{‡} = 1, R = r, a_{k}, k)}{P (n_{k} | Q_{k} = 1, R = r, a_{k}, k)} \times \frac{P (a_{k} | Q_{k}^{†} = 0, Q_{k}^{‡} = 1, R = r, k)}{P (a_{k} | Q_{k} = 1, R = r, k)}

(13)

N_{k, 0} (r)

is the size of the first-stage sampling frame

S_{k, 0} (r)

, i.e., the source population

P_{k} (r)

with

Q_{k}^{‡} = 1

Q_{k} = 1

. We use second-stage sampling fractions

α_{k, 2}^{D 3} (r, a_{k})

to create a final sample

S_{k, 2}^{D 3} (r)

where allowables are balanced across groups to their distribution in the standard population, defined among

S_{k, 2}^{D 3}

[collapsed over R ]:

α_{k, 2}^{D 3} (r, a_{k}) = \frac{N_{k, 2} (r)}{N_{k, 1} (r)} \times \frac{P (a_{k} | Q_{k}^{†} = 0, Q_{k}^{‡} = 1, T = 1, k)}{P (a_{k} | Q_{k}^{†} = 0, Q_{k}^{‡} = 1, R = r, k)}

(14)

When the marginalized group in the different population is the standard population, the marginalized group undergoes simple random sampling in stage 2, so that its expected outcome in the target study and in the different population is the same (i.e., the inference for the marginalized group is purely descriptive of the marginalized group in the different population).

The design permits inference to the different population under the independence assumption (11) which, again, would hold in Figure 1b if $W_{k}^{†}$ (e.g., EPPP enrollment) did not affect the outcome (e.g., hypertension control $Y_{k + J}$ (i.e., the arrow $W_{k}^{†} \to Y_{k + J}$ ) is absent. A positivity assumption is also required:

P (Q_{k}^{†} = 1 | Q_{k}^{‡} = 1, R = r, N_{k} = n_{k}, A_{k} = a_{k}, k) > 0

(15)

for all

(a_{k}, n_{k})

with

P (N_{k} = n_{k}, A_{k} = a_{k} | Q_{k}^{†} = 0, Q_{k}^{‡} = 1, R = r, k) > 0

and all k.

For each social group $R = r$ , we note at each time k the pattern of allowable $A_{k}$ and nonallowable $N_{k}$ covariate values among the different population (e.g., denoted by $Q_{k}^{‡} = 1$ and $Q_{k}^{†} = 0$ ). For each pattern, we must observe persons who belong to the population our target study enrolls (i.e., who meet our narrower version of full eligibility $Q_{k} = 1$ based on partial eligibility indicators $Q_{k}^{†} = 1$ and $Q_{k}^{‡} = 1$ ). The overlap assumption (3) needs to hold among the different population defined by $(Q_{k}^{†} = 0, Q_{k}^{‡} = 1)$ .¹⁶

Design 4: Sampling as if from a Counterfactual Population (Inference in a Selected Population)

As in Figure 1c, now we express full eligibility $Q_{k}$ (1: yes, 0: no) with finer partial eligibility indicators $Q_{k}^{‡}$ (prior hypertension, established care), $Q_{k}^{†}$ (e.g., EPPP enrollment), and $Q_{k}^{≀}$ (e.g., current visit), i.e., $Q_{k} = Q^{≀} \times Q_{k}^{†} \times Q_{k}^{‡}$ , where $W_{k}^{‡}$ may affect $W_{k}^{†}$ which may affect $W_{k}^{≀}$ .¹⁷ Suppose we want to infer to those with full eligibility (Figure 2e) but worry that selecting persons enrolled in the EPPP (i.e., $Q_{k}^{†} = 1$ ) induces collider-stratification that masks disparity. Suppose that we accept collider stratification through conditioning on other partial eligibility indicators $Q_{k}^{‡}$ (e.g., prior hypertension, established care) and conditioning on $Q_{k}^{≀}$ (e.g., current visit) as part of disparity. To avoid collider stratification through $Q_{k}^{†}$ , we infer to a counterfactual population where such collider-stratification is absent (Figure 2f). Denote $G_{k}$ as an intervention to allocate¹⁸ the partial eligibility variables $W_{k}^{†}$ (e.g., EPPP enrollment) according to a distribution $g_{k} (\cdot)$ that does not simultaneously depend on (i) social group R and (ii) non-allowables $N_{k}$ (e.g., risk factors $L_{k}$ ) (Table 1). Let $V_{k}^{G_{k}}$ be the potential outcome of a variable $V_{k}$ under intervention $G_{k}$ . Our use of a superscript to denote the intervention $G_{k}$ differs from our use of a superscript to denote a sampling design $D$ . We use sampling fractions $α_{k, 1}^{D 4} (r, a_{k}, n_{k})$ that act as if we sample the counterfactual population defined by $Q_{k}^{G_{k}} = 1$ :

α_{k, 1}^{D 4} (r, a_{k}, n_{k}) = \frac{N_{k, 1} (r)}{N_{k, 0} (r)} \times \frac{P (n_{k} | Q_{k}^{G_{k}} = 1, R = r, a_{k}, k)}{P (n_{k} | Q_{k} = 1, R = r, a_{k}, k)} \times \frac{P (a_{k} | Q_{k}^{G_{k}} = 1, R = r, k)}{P (a_{k} | Q_{k} = 1, R = r, k)}

(16)

Table 1.

Example Interventions to Eliminate Forms of Collider-Stratification Under Design 4.

Sub-design	Definition of the intervention $G_{k}$ to allocate $W_{k}^{†}$ according to the distribution $g_{k} (\cdot)$	The allocation strategy $g_{k} (\cdot)$ for $W_{k}^{†}$ under $G_{k}$	Distribution $q_{k} (\cdot)$ of partial eligibility $Q_{k}^{†} = 1$ under $G_{k}$
4a	$W_{k}^{†} \sim g_{k} (\cdot) = P (w_{k}^{†} \| Q_{k}^{‡} = 1, k)$	Randomly assign $W_{k}^{†}$ (e.g., EPPP enrollment) by the observed probability of $W_{k}^{†} = w^{†}$ among those with $Q_{k}^{‡} = 1$ (e.g., prior hypertension) at calendar time k. This removes associations between $W_{k}^{†}$ and each of the allowables $A_{k}$ (e.g., age $X_{k}$ ), non-allowables $N_{k}$ (e.g., SES $L_{k}$ ), and social group R.	$P (Q_{k}^{†} = 1 \| Q_{k}^{‡} = 1, k)$
4b	$W_{k}^{†} \sim g_{k} (\cdot) = P (w_{k}^{†} \| Q_{k}^{‡} = 1, R = r, a_{k}, k)$	Randomly assign $W_{k}^{†}$ (e.g., EPPP enrollment) by the observed probability of $W_{k}^{†} = w^{†}$ among those with $Q_{k}^{‡} = 1$ (e.g., prior hypertension) given social group $R = r$ , the allowables $A_{k}$ (e.g., age $X_{k}$ ) at calendar time k. This removes direct associations (with respect to social group R and allowables $A_{k})$ between $W_{k}^{†}$ and non-allowables $N_{k}$ (e.g., SES $L_{k}$ ).	$P (Q_{k}^{†} = 1 \| Q_{k}^{‡} = 1, R = r, a_{k}, k)$
4c	$W_{k}^{†} \sim g_{k} (\cdot) = P (w_{k}^{†} \| Q_{k}^{‡} = 1, T = 1, n_{k}, a_{k}, k)$	Randomly assign $W_{k}^{†}$ (e.g., EPPP enrollment) by the observed probability of $W_{k}^{†} = w^{†}$ among those with $Q_{k}^{‡} = 1$ (e.g., prior hypertension) in the standard population $T = 1$ given the allowables $A_{k}$ (e.g., age $X_{k}$ ) and a set of non-allowables $N_{k}$ (e.g., SES $L_{k}$ ) that satisfy exchangeability (18) or (19) at calendar time k. This removes direct associations (with respect to allowables $A_{k}$ and non-allowables $N_{k}$ ) between $W_{k}^{†}$ and social group R.^a	$P (Q_{k}^{†} = 1 \| Q_{k}^{‡} = 1, T = 1, n_{k}, a_{k}, k)$

As explained at the end of the subsection “Design 4: Sampling as if from a Counterfactual Population (Inference in a Selected Population)” and in Footnote 22, when the non-allowables $N_{k}$ are multivariate and follow certain causal structures, Design 4c may leave residual contributions from collider stratification that would be eliminated under Designs 4a and 4b.

$N_{k, 0} (r)$ is the size of the first-stage sampling frame $S_{k, 0}^{G_{k}} (r)$ , i.e., the counterfactual source population $P_{k}^{G_{k}} (r)$ with $Q_{k}^{G_{k}} = 1$ . We use second-stage sampling fractions $α_{k, 2}^{D 4} (r, a_{k})$ to create a final sample $S_{k, 2}^{D 4} (r)$ where the allowables $A_{k}$ are balanced according to the standard distribution defined among $S_{k, 2}^{D 4}$ [collapsed over R]:

α_{k, 2}^{D 4} (r, a_{k}) = \frac{N_{k, 2} (r)}{N_{k, 1} (r)} \times \frac{P (a_{k} | Q_{k}^{G_{k}} = 1, T = 1, k)}{P (a_{k} | Q_{k}^{G_{k}} = 1, R = r, k)}

(17)

This design permits inference to the counterfactual population via an exchangeability assumption:¹⁹

(Y_{k + J}^{G_{k}}, Q {_{k}^{≀}}^{G_{k}}) ∐ Q_{k}^{†} | Q_{k}^{‡} = 1, R = r, N = n, A = a, k for all k

(18)

In words, the potential outcome

Y_{k + J}^{G_{k}}

(e.g., hypertension control) and the potential value of partial eligibility

Q_{k}^{≀ G_{k}}

(e.g., current visit) must be jointly independent of observed partial eligibility

Q_{k}^{†}

(e.g., EPPP enrollment) given partial eligibility

Q_{k}^{‡} = 1

, social group

R = r

, the allowables

A_{k}

, and non-allowables

N_{k}

, i.e., no unmeasured selection-bias. Positivity (12) is required as well as consistency: the intervention

G_{k}

returns observed values for

Y_{k + J}

and

Q_{k}^{≀}

when it assigns a person's observed values for

W_{k}^{†}

. The overlap assumption (3) needs to hold among the counterfactual population defined by

Q_{k}^{G_{k}} = 1

.²⁰ Now, if no partial eligibility variables occur after

W_{k}^{†}

(i.e., if

W_{k}^{≀} = ⊘

the empty set) exchangeability (18) simplifies to:

Y_{k + J}^{G_{k}} ∐ Q_{k}^{†} | Q_{k}^{‡} = 1, R = r, N = n, A = a, k for all k

(19)

The exchangeability assumptions (18) or (19) of Design 4 holds where the independence assumption (11) of Designs 2 and 3 fails: when

W_{k}^{†}

affects the outcome

Y_{k + J}

, i.e., the arrow

W_{k}^{†} \to Y_{k + J}

in Figure 1c (see Figure 4). Design 4 operates under additional structural constraints compared to Designs 1, 2, and 3.²¹

Table 1 specifies interventions for $G_{k}$ (Designs 4a, 4b, and 4c) that eliminate collider stratification through partial eligibility $Q_{k}^{†}$ . Design 4a makes partial eligibility $Q_{k}^{†}$ random given partial eligibility $Q_{k}^{‡} = 1$ and calendar time k. Design 4b makes partial eligibility $Q_{k}^{†}$ random with respect to non-allowables $N_{k}$ given partially eligibility $Q_{k}^{‡} = 1$ , social group $R = r$ , the allowables $A_{k}$ and calendar time k. Design 4c makes partial eligibility $Q_{k}^{†}$ random with respect to social group R given partial eligibility $Q_{k}^{‡} = 1$ , the allowables $A_{k}$ , a set of non-allowables $N_{k}$ that satisfy exchangeability (18) or (19), and calendar time k. These designs target different counterfactual populations and may return different estimates of disparity. To choose, one may consider the design's feasibility (in how $W_{k}^{†}$ is allocated) or the design's inferential utility. When there are no downstream partial eligibility-related variables (i.e., $W_{k}^{≀} = ⊘$ ), Design 4a generalizes to the broader population defined by $Q_{k}^{‡} = 1$ even when eligibility variables $W_{k}^{†}$ affect the outcome $Y_{k + J}$ (unlike Design 2). Under the same conditions, when the marginalized group is the standard population, Design 4c reduces to simple random sampling among the marginalized group across both stages of sampling. Then, the model fully describes the fully eligible marginalized group defined by $Q_{k} = 1$ . One may also consider meaningfulness. Designs 4b and 4c do not remove all social group differences in partial eligibility $Q_{k}^{†}$ , which may be a more ‘realistic’ setting for characterizing disparity in outcomes. Finally, one may also consider effectiveness. When $N_{k}$ is multivariate, under certain causal structures Design 4c, unlike Designs 4a and 4b, may leave some residual collider-stratification through conditioning on partial eligibility $Q_{k}^{†}$ .²²

Modified Statistical Analysis

In the subsections “Overview” and “Statistical Analysis”, we discussed procedures to aggregate results over calendar time use the distribution in the standard population implied by the design. This is the broader population under Design 2, the different population under Design 3, and the counterfactual population under Design 4.²³

Emulation of the Target Study with Secondary Data

Overview

In theory, the target study protocol could be implemented in real life to measure disparity. Often, a target study will have to be emulated through the design and analysis of secondary data. We outline data structures and estimators to emulate the target study under our motivating example of assessing racial disparity in hypertension control in a healthcare system among those with prior hypertension, established care and a current visit who are (a) enrolled in the EPPP (Design 1); (b) may or may not be enrolled in EPPP (Design 2); not enrolled in the EPPP (Design 3); enrolled in the EPPP under a hypothetical allocation of EPPP (Design 4). These applications are plausible when we only have outcomes $Y_{k + J}$ measured in the EPPP. The example target study protocols and emulation steps for each design are shown in Table 2. In this example, we assume multiple target studies across calendar time whose results are to be aggregated. Recall that with multiple target studies, a person may possibly be eligible for and enroll in multiple studies. We show how the emulation simplifies with one target study. For each design $D$ , we present estimators for each social group's mean outcome aggregated over calendar time k, $τ^{D} (r)$ (see the subsections “Overview”, “Statistical Analysis”, and “Modified Statistical Analysis”). The aggregated additive disparity is $Ψ^{a d d} = τ^{D} (1) - τ^{D} (0)$ and the relative disparity is $Ψ^{r e l} = τ^{D} (1) / τ^{D} (0)$ .

Table 2.

Example Target Study Protocol Specification and its Emulation With Secondary Data.

	Target Study	Emulation with EMR data
Enrollment windows	Weekly over 2015	Same
Enrollment groups	Self-reported Non-Hispanic Black persons and Non-Hispanic White Persons	Same, implemented as self-reported race/ethnicity as recorded in EMR
Eligibility criteria	Established care in health system, prior diagnosis of hypertension, not pregnant, not diagnosed with ESKD, enrolled in EPPP, and current visit for primary care	Same, implemented as 2+ primary care visit in past 2 years, diagnosis of hypertension, in past 2 years, not pregnant, not diagnosed with ESKD, enrolled in EPPP before current visit
Allowable covariates	Age and sex assigned at birth	Same, implemented using age and sex^a in EMR at current visit
Standard population	Black population	Same
Enrollment Process	Stratified sampling…	With pooled data (Figure 4), apply…
…Design 1 for inference in the fully eligible population	…to balance age and sex (allowable), sampling from the eligible population	…G-Computation (20) or weighting (21) using age and sex (allowable)
…Design 2: for inference in the broader population (e.g., regardless of EPPP enrollment)	… to balance age and age (allowable), and account for comorbidity and SES (non-allowable), sampling as if from the population regardless of EPPP enrollment	…G-Computation (22) or weighting (23) using age, sex^a (allowable), and comorbidity, and SES^b (non-allowable)
…Design 3: for inference in a different population (e.g., not enrolled in EPPP)	…to balance age and sex (allowable), and account for comorbidity and SES (non-allowable), sampling as if from the population not enrolled in EPPP	…G-Computation (24) or weighting (25) using age, sex^a (allowable), and comorbidity, and SES^b (non-allowable)
…Design 4: for inference in a counterfactual population after intervention to allocate EPPP enrollment (e.g., to remove the impact of collider-stratification)	…to balance age and sex (allowable), and account for comorbidity and SES (non-allowable), sampling as if from a counterfactual eligible population after intervening to allocate EPPP enrollment	…G-Computation (28) or weighting (29) using age, sex^a (allowable), and comorbidity, and SES^b (non-allowable); or simplified versions of G-computation and weighting, i.e., (22) and (23) under Design 4a, (30) and (31) under Design 4b, or (32) and (33) under Design 4c
Time zero	Time of current visit	Same
Outcome assessment	Uncontrolled hypertension at current visit (i.e., systolic blood pressure ≥140 mm Hg or diastolic blood pressure ≥90 mm Hg)	Same
Statistical analysis	Mean difference in uncontrolled hypertension	Same
…Aggregation of results	Weighted average of results according to the distribution of calendar time enrollment in the Black population	Same

Abbreviations: EMR = Electronic Medical Records; EPPP = Electronic Patient Portal Program; ESKD = End Stage Kidney Disease; SES = Socioeconomic Status; Hg = Mercury.

Sex as recorded in the EMR.

In the EMR, SES is approximated by health insurance type and categorized CDC Social Vulnerability Index.

We present two types of estimators that, given the appropriate data structure, are used to emulate the sampling-based enrollment and aggregation. G-computation (Snowden, Rose, and Mortimer 2011), akin to model-based standardization, sequentially regresses the outcome and predicted values. Weighting (Hernán and Robins 2006), which takes a weighted average of the outcome, models membership in the social group $R = r$ , the standard population $T = 1$ and, for Designs 2 through 4, indicators of partial eligibility. G-computation estimators are usually more efficient (Ren et al. 2023) but weighting is more objective as the weights are constructed without outcome data. To construct confidence intervals that account for clustering by individual, we use a cluster bootstrap that samples each individual with replacement (Davison and Hinkley 1997; Field and Welsh 2007; Ren et al. 2010; Huang 2018). We abbreviate a weighted mean, $\sum_{i} Y_{i} ω_{i} / \sum_{i} ω_{i}$ where i represents the unit of observation, as $E [Y \times ω]$ .

Data Structure

To emulate a target study, we specify a unit of calendar time for enrollment windows (e.g., months), enrollment groups R (e.g., Black persons [marginalized $R = 1$ ] and White persons [privileged $R = 0$ ]), and full eligibility $Q_{k} = 1$ (e.g., prior hypertension, established care, and enrolled in EPPP before k, and current visit within $k$ ). For Design 1, we do not disaggregate full eligibility $Q_{k}$ into partial eligibility. For Designs 2 and 3, we distinguish partial eligibility $Q_{k}^{†}$ that we generalize or transport over (e.g., EPPP enrollment) from partial eligibility $Q_{k}^{‡}$ that we do not (e.g., prior hypertension, established care, current visit). For Design 4, we distinguish partial eligibility $Q_{k}^{†}$ allocated by intervention (e.g., EPPP enrollment), partial eligibility $Q_{k}^{≀}$ affected by the allocation (e.g., current visit), and partial eligibility $Q_{k}^{‡}$ not affected by the allocation (e.g., prior hypertension, established care). We choose allowable covariates $A_{k}$ to similarly situate social groups (e.g., age $X_{k}$ ) and the within-sample standard population, coded $T = 1$ , determining their within sample distribution (e.g., the Black group). We choose non-allowable covariates $N_{k}$ to meet assumptions of independence (11) for Designs 2 and 3 or exchangeability (18) or (19) for Design 4 (e.g., SES $L_{k}$ ).

We form a ‘long’ dataset where every row is the vector $O_{i, k} = (i, k, Q_{k}^{‡}, Q_{k}^{†}, Q_{k}^{≀}, Q_{k}, R, T, X_{k}, L_{k})$ for an individual i at month k (Figure 4). Each person i contributes one record per calendar time k. The data $O$ only include calendar times k where all social groups of interest are represented. The indicator T of membership in the standard population is constructed (e.g., if the marginalized group $R = 1$ is the standard population, we set T as equal to $R$ ). For Design 1, we subset the data $O$ to those fully eligible at time k, i.e., by $Q_{k} = 1$ . For Designs 2, 3, and 4, we subset the data $O$ to those partially eligible by $Q_{k}^{‡} = 1$ . We attach outcomes $Y_{k + J}$ at follow-up time $k + J$ to records indexed at time k.

Identification and Estimation for Design 1 (Default Model)

Under overlap (3) and innocuous sampling (4), we can identify the aggregated mean $τ^{D 1} (r)$ as:

E_{a, k} (E [Y_{k + J} | Q_{k} = 1, R = r, a_{k}, k] | Q_{k} = 1, T = 1)

(20)

where

E_{a, k} (\cdot)

is over

P (a_{k}, k | Q_{k} = 1, T = 1)

To estimate (20) by G-computation, in step 1 we fit a model $η_{1}^{D 1} (a_{k}, k)$ for the outcome $Y_{k + J}$ (e.g., hypertension control) given the allowables $A_{k}$ (e.g., age) and calendar time k (e.g., months) among those fully eligible $Q_{k} = 1$ (e.g., prior hypertension, established care, current visit, enrolled in EPPP) in the social group $R = r$ . In step 2, we obtain predicted values $p_{1}^{D 1}$ from the model $η_{1}^{D 1} (a_{k}, k)$ on those fully eligible $Q_{k} = 1$ . In step 3, we average $p_{1}^{D 1}$ in the pooled, fully eligible $Q_{k} = 1$ standard population $T = 1$ to estimate the aggregated mean $τ^{D 1} (r)$ (conditionally on calendar time k for time-specific means, $μ_{k}^{D 1} (r)$ ).

We may also estimate the aggregated mean outcome $τ^{D 1} (r)$ by the weighting estimator:

E [Y_{k + J} \times ω_{r, Q_{k} = 1, k}^{D 1 - p o o l} | Q_{k} = 1, R = r]

(21)

where

ω_{r, Q_{k} = 1, k}^{D 1 - p o o l} = \frac{P (T = 1 | Q_{k} = 1, a_{k}, k)}{P (R = r | Q_{k} = 1, a_{k}, k)} \times \frac{P (R = r | Q_{k} = 1)}{P (T = 1 | Q_{k} = 1)}

The first term of the weight is the ratio of (i) the probability of belonging to the standard population $T = 1$ among those fully eligible $Q_{k} = 1$ , conditional on the allowables $A_{k}$ and calendar time k to (ii) the corresponding probability of belonging to the person's observed social group $R = r$ . The second term of the weight is inverse of this ratio but with unconditional probabilities (with respect to $A_{k}, k$ ). The probabilities in the first term may be estimated by predictions from models for $T = 1$ and $R = r$ , and the probabilities in the second term are estimated directly. A weighted average of the outcome $Y_{k + J}$ (e.g., hypertension control) in the fully eligible $Q_{k} = 1$ social group $R = r$ estimates the aggregate mean $τ^{D 1} (r)$ . For time-specific means, $μ_{k}^{D 1} (r)$ , all terms condition on calendar time k.

Identification and Estimation for Design 2 (Generalizability)

Under a version of overlap (3) (see footnote 16), innocuous sampling (4), independence (11), and positivity (12), we can identify the aggregated mean $τ^{D 2} (r)$ of the social group in the broader population $Q_{k}^{‡} = 1$ (e.g., prior hypertension, established care in the health system, regardless of EPPP enrollment) as:

E_{a, k} [[E_{n} (E [Y_{k + J} | Q_{k} = 1, R = r, n_{k}, a_{k}, k] | Q_{k}^{‡} = 1, R = r, a_{k}, k) | Q_{k}^{‡} = 1, T = 1]]

(22)

where $E_{n} (\cdot)$ is over $P (n_{k} | Q_{k}^{‡} = 1, R = r, a_{k}, k)$

and $E_{a, k} [[\cdot]]$ is over $P (a_{k}, k | Q_{k}^{‡} = 1, T = 1)$

To estimate (22) by G-computation, in step 1 we fit a model $η_{1}^{D 2} (n_{k}, a_{k}, k)$ for the outcome $Y_{k + J}$ (e.g., hypertension control) given the allowables $A_{k}$ (e.g., age), non-allowables $N_{k}$ (e.g., SES), and calendar time k (e.g., months) among the fully eligible $Q_{k} = 1$ (e.g., prior hypertension, established care, EPPP enrollment, current visit) social group $R = r$ . In step 2, we obtain predicted values $p_{1}^{D 2}$ from the model $η_{1}^{D 2} (n_{k}, a_{k}, k)$ in broader population $Q_{k}^{‡} = 1$ (e.g., regardless of EPPP enrollment). In step 3, we fit a model $η_{2}^{D 2} (a_{k}, k)$ for the predicted values $p_{1}^{D 2}$ given the allowables $A_{k}$ and calendar time k on the broader $Q_{k}^{‡} = 1$ social group $R = r$ . In step 4, we obtain predicted values $p_{2}^{D 2}$ from the model $η_{2}^{D 2} (a_{k}, k)$ on the broader population $Q_{k}^{‡} = 1$ . In step 5, we average $p_{2}^{D 2}$ in the pooled broader $Q_{k}^{‡} = 1$ standard population $T = 1$ to estimate the aggregated mean $τ^{D 2} (r)$ (conditionally on calendar time k for time-specific means, $μ_{k}^{D 2} (r)$ ).

We may also estimate the aggregated mean outcome $τ^{D 2} (r)$ by the weighting estimator:

E [Y_{k + J} \times ω_{r, Q_{k} = 1, k}^{D 2 - p o o l} | Q_{k} = 1, R = r]

(23)

where

ω_{r, Q_{k} = 1, k}^{D 2 - p o o l} = \frac{P (Q_{k}^{†} = 1 | Q_{k}^{‡} = 1, R = r)}{P (Q_{k}^{†} = 1 | Q_{k}^{‡} = 1, R = r, n_{k}, a_{k}, k)} \times \frac{P (T = 1 | Q_{k}^{‡} = 1, a_{k}, k)}{P (R = r | Q_{k}^{‡} = 1, a_{k}, k)} \times \frac{P (R = r | Q_{k}^{‡} = 1)}{P (T = 1 | Q_{k}^{‡} = 1)}

The second and third terms are similar to (21) but are defined by the broader $Q_{k}^{‡} = 1$ population (e.g., prior hypertension, established care, current visit) rather than by those fully eligible $Q_{k} = 1$ (e.g., also enrolled in the EPPP). The first term is the ratio of (i) the unconditional (with respect to $A_{k}, N_{k}, k)$ probability of partial eligibility $Q_{k}^{†} = 1$ (e.g., enrolled in the EPPP) in the broader $Q_{k}^{‡} = 1$ population (e.g., prior hypertension, established care, current visit) in the social group $R = r$ , to (ii) the corresponding conditional probability given the non-allowables $N_{k}$ , allowables $A_{k}$ , and calendar time k. The denominator is estimated by predictions from a model for $Q_{k}^{†} = 1$ and the numerator is estimated directly. A weighted average of the outcome $Y_{k + J}$ (e.g., hypertension control) in the fully eligible $Q_{k} = 1$ social group $R = r$ estimates the aggregate mean $τ^{D 2} (r)$ . For time-specific means, $μ_{k}^{D 2} (r)$ , all terms condition on calendar time k.

Identification and Estimation for Design 3 (Transportability)

Under a version of overlap (3) (see footnote 17), innocuous sampling (4), independence (11), and positivity (15), we identify the aggregated mean $τ^{D 2} (r)$ of the social group in the different population ( $Q_{k}^{†} = 0,$ $Q_{k}^{‡} = 1$ ) (e.g., prior hypertension, established care, current visit, but not enrolled in the EPPP) as:

\begin{aligned} E_{a, k} [[E_{n} (E [Y_{k + J} | Q_{k} = 1, R = r, n_{k}, a_{k}, k] | Q_{k}^{†} = 0, Q_{k}^{‡} = 1, \\ R = r, a_{k}, k) | Q_{k}^{†} = 0, Q_{k}^{‡} = 1, T = 1]] \end{aligned}

(24)

where $E_{n} (\cdot)$ is over $P (n_{k} | Q_{k}^{†} = 0, Q_{k}^{‡} = 1, R = r, a_{k}, k)$

and $E_{a, k} [[\cdot]]$ is over $P (a_{k}, k | Q_{k}^{†} = 0, Q_{k}^{‡} = 1, T = 1)$

To estimate (24) by G-computation, in step 1 we fit a model $η_{1}^{D 3} (n_{k}, a_{k}, k)$ for the outcome $Y_{k + J}$ (e.g., hypertension control) given the allowables $A_{k}$ (e.g., age), non-allowables $N_{k}$ (e.g., SES), and calendar time k (e.g., months) among the fully eligible $Q_{k} = 1$ (e.g., prior hypertension, established care, current visit, enrolled in EPPP) social group $R = r$ . In step 2, we obtain predicted values $p_{1}^{D 3}$ from the model $η_{1}^{D 3} (n_{k}, a_{k}, k)$ in the different population ( $Q_{k}^{†} = 0, Q_{k}^{‡} = 1)$ (e.g., not enrolled in EPPP). In step 3, we fit a model $η_{2}^{D 3} (a_{k}, k)$ for the predicted values $p_{1}^{D 3}$ given the allowables $A_{k}$ and calendar time k among the different ( $Q_{k}^{†} = 0, Q_{k}^{‡} = 1)$ social group $R = r$ . In step 4, we obtain predicted values $p_{2}^{D 3}$ from the model $η_{2}^{D 3} (a_{k}, k)$ on the different population ( $Q_{k}^{†} = 0, Q_{k}^{‡} = 1)$ . In step 5, we average $p_{2}^{D 3}$ in the pooled different ( $Q_{k}^{†} = 0, Q_{k}^{‡} = 1)$ standard population $T = 1$ to estimate the aggregated mean $τ^{D 3} (r)$ (conditionally on calendar time k for time-specific means, $μ_{k}^{D 3} (r)$ ).

We may also estimate the aggregated mean outcome $τ^{D 3} (r)$ by the weighting estimator:

E [Y_{k + J} \times ω_{r, Q_{k} = 1, k}^{D 3 - p o o l} | Q_{k} = 1, R = r]

(25)

where

ω_{r, Q_{k} = 1, k}^{D 3 - p o o l} = \frac{P (Q_{k}^{†} = 0 | Q_{k}^{‡} = 1, R = r, n_{k}, a_{k}, k)}{P (Q_{k}^{†} = 1 | Q_{k}^{‡} = 1, R = r, n_{k}, a_{k}, k)} \times \frac{P (Q_{k}^{†} = 1 | Q_{k}^{‡} = 1, R = r)}{P (Q_{k}^{†} = 0 | Q_{k}^{‡} = 1, R = r)}

\times \frac{P (T = 1 | Q_{k}^{†} = 0, Q_{k}^{‡} = 1, a_{k}, k)}{P (R = r | Q_{k}^{†} = 0, Q_{k}^{‡} = 1, a_{k}, k)} \times \frac{P (R = r | Q_{k}^{†} = 0, Q_{k}^{‡} = 1)}{P (T = 1 | Q_{k}^{†} = 0, Q_{k}^{‡} = 1)}

The fourth and fifth terms of the weight are similar to those used in (21), except that they are among the different population ( $Q_{k}^{†} = 0, Q_{k}^{‡} = 1)$ (e.g., prior hypertension, established care, current visit, but not enrolled in EPPP) rather than those fully eligible $Q_{k} = 1$ (e.g., also enrolled in the EPPP). The first term of the weight is equivalent to the inverse odds of being fully eligible ( $Q_{k}^{†} = 1, Q_{k}^{‡} = 1)$ versus in the different population ( $Q_{k}^{†} = 0, Q_{k}^{‡} = 1)$ , conditional on social group $R = r$ , the non-allowables $N_{k}$ , allowables $A_{k}$ , and calendar time k. It is estimated by fitting a model for partial eligibility $Q_{k}^{†} = 1$ , making predictions, obtaining its complement, and taking the ratio of the compliment to the prediction. The second term is the odds but is unconditional (with respect to $A_{k}, N_{k}, k)$ and is estimated directly. A weighted average of the outcome $Y_{k + J}$ (e.g., hypertension control) among the fully eligible $Q_{k} = 1$ social group $R = r$ who are estimates the aggregate mean $τ^{D 3} (r)$ . For calendar-time specific means, $μ_{k}^{D 3} (r)$ , the probabilities in the second term and fourth terms of the weight condition on calendar time k.

Identification and Estimation for Design 4 (Inference in a Counterfactual Selected Population)

Identification and estimation under Design 4 uses special weights $ϕ_{k}$ and $θ_{k}^{p o o l}$ that generally are:

ϕ_{k} = \frac{q_{k} (\cdot) P (Q_{k}^{≀} = 1 | Q_{k}^{†} = 1, Q_{k}^{‡} = 1, R = r, n_{k}, a_{k}, k)}{E_{n} [q_{k} (\cdot) P (Q_{k}^{≀} = 1 | Q_{k}^{†} = 1, Q_{k}^{‡} = 1, R = r, n_{k}, a_{k}, k) | Q_{k}^{‡} = 1, R = r, a_{k}, k]}

(26)

where

E_{n} [\cdot]

is over

P (n_{k} | Q_{k}^{‡} = 1, R = r, a_{k}, k)

\begin{aligned} θ_{k}^{p o o l} \\ = \frac{E_{n} [E_{r} {q_{k} (\cdot) | Q^{‡} = 1, T = 1, n_{k}, a_{k}, k} P (Q^{≀} = 1 | Q^{†} = 1, Q^{‡} = 1, R = r, n_{k}, a_{k}, k) | Q_{k}^{‡} = 1, T = 1, a_{k}, k]}{E_{(n, a, k)} [E_{r} {q_{k} (\cdot) | Q^{‡} = 1, T = 1, n_{k}, a_{k}, k} P (Q^{≀} = 1 | Q^{†} = 1, Q^{‡} = 1, T = 1, n_{k}, a_{k}, k) | Q_{k}^{‡} = 1, T = 1]} \end{aligned}

(27)

where $E_{r} {\cdot}$ is over $P (r | Q_{k}^{‡} = 1, T = 1, n_{k}, a_{k}, k)$ ,

$E_{n} [\cdot]$ is over $P (n_{k} | Q_{k}^{‡} = 1, T = 1, a_{k}, k)$ ,

and $E_{(n, a, k)} [\cdot]$ is over $P (n_{k}, a_{k}, k | Q_{k}^{‡} = 1, T = 1)$

In (26) and (27) $q_{k} (\cdot)$ is the probability of being partially eligible $Q_{k}^{†} = 1$ (e.g., enrolled in EPPP) in the counterfactual source population given partial eligibility $Q_{k}^{‡} = 1$ (e.g., prior hypertension, established care), social group $R = r$ , allowables $A_{k}$ , non-allowables $N_{k}$ and calendar time k which, under the interventional distribution $q_{k} (\cdot)$ for the Designs 4a, 4b, and 4c, is shown in the third column of Table 1. The term $q_{k} (\cdot)$ can thus be estimated as the predicted value from an appropriate model for partial eligibility $Q_{k}^{†} = 1$ .²⁴

The weights $ϕ_{k}$ and $θ_{k}^{p o o l}$ account for how non-random selection on $Q {_{k}^{≀}}^{G_{k}}$ affects the distribution of non-allowables $N_{k}$ and allowables $A_{k}$ , as $W_{k}^{≀}$ (e.g., current visit) may be affected by both $W_{k}^{†}$ (e.g., EPPP enrollment) the allowables $A_{k}$ and non-allowables $N_{k}$ (e.g., as in Figure 1c and Figure 3). They incorporate a product $ρ_{k}$ of two parts (i) and (ii). The first part (i) is the conditional probability of partial eligibility $Q_{k}^{≀} = 1$ (e.g., current visit) given other indicators of partial eligibility $Q_{k}^{†}$ =1 (e.g., EPPP enrollment) and $Q_{k}^{‡} = 1$ (e.g., prior hypertension, established care), non-allowables $N_{k}$ , allowables $A_{k}$ , calendar time k, and membership in the social group $R = r$ (in the case of $ϕ_{k}$ ) or the standard population $T = 1$ (in the case of $θ_{k}^{p o o l}$ ). The second part (ii) is either (ii-a) $q_{k} (\cdot)$ as described above (in the case of $ϕ_{k}$ ) or (ii-b) $q_{k} (\cdot)$ standardized over the conditional distribution of social group $R = r$ within the standard population $T = 1$ (in the case of $θ_{k}^{p o o l}$ ).²⁵ For $ϕ_{k}$ the numerator is this product $ρ_{k}$ and the denominator standardizes $ρ_{k}$ over the conditional distribution of the non-allowables $N_{k}$ . For $θ_{k}^{p o o l}$ the numerator standardizes $ρ_{k}$ over the conditional distribution of the non-allowables $N_{k}$ while the denominator further standardizes $ρ_{k}$ over the conditional joint distribution of the allowables $A_{k}$ and calendar time k. Both $ϕ_{k}$ and $θ_{k}^{p o o l}$ may be estimated by G-computation (see sample code in the Supplementary Material).

Figure 3.

Single World Intervention Graph (Richardson and Robins 2013) depicting Design 4a (a) without intervention (b) with intervention $G_{k}$ to set the partial eligibility variable $W_{k}^{†}$ [e.g., enrollment in an electronic patient portal program (EPPP)] according to a random draw. $Y_{k + J}$ is the outcome (e.g., hypertension control) at time $k + J$ , R represents social group membership (e.g., race), $X_{k}$ (e.g., age) and $L_{k}$ (e.g., SES) are covariates that may be deemed allowable $A_{k}$ or non-allowable $N_{k}$ . $Q_{k}^{†}$ is the partial eligibility indicator (1: yes, 0: no) for the partial eligibility variable $W_{k}^{†}$ [e.g., EPPP enrollment)]. $W_{k}^{≀}$ represents another partial eligibility variable (e.g., current visit) affected by $W_{k}^{†}$ , with its partial eligibility indicator $Q_{k}^{≀}$ (1: yes, 0: no). For simplicity, the historical process variable H and the partial eligibility variables $W_{k}^{‡}$ (e.g., prior hypertension, established care in health system) and their indicator $Q_{k}^{‡}$ that appear on Figure 1c are omitted but if included would inherit their causal relationships from Figure 1c. Note that on (a) independence (11) $Y_{k + J} ∐ Q_{k}^{†} | X_{k}, L_{k}, R$ does not hold because $W_{k}^{†}$ affects $Y_{k + J}$ . However, on (b) exchangeability (18) $(Y_{k + J}^{G_{k}}, Q {_{k}^{≀}}^{G_{k}}) ∐ Q_{k}^{†} | X_{k}, L_{k}, R$ does hold. Note also that (i) though $W {_{k}^{†}}^{G_{k}}$ is randomly assigned, $(X_{k}, L_{k})$ are not independent of $Q {_{k}^{†}}^{G_{k}}$ given $Q {_{k}^{≀}}^{G_{k}}$ (ii) there is no collider stratification between R and $Y_{k + J}^{G_{k}}$ from conditioning on $Q {_{k}^{†}}^{G_{k}}$ .

Figure 4.

Example data structure to emulate the target study $(i, k, Q_{k}^{‡}, Q_{k}^{†}, Q_{k}^{≀}, R, T,$ $X_{k}, L_{k}, Y_{k + J})$ where i is a person identifier, k representes the coarsened moment of calendar time, R represents social group membership, T is an indicator of membership in the standard population, and the covariates $X_{k}, L_{k}$ are chosen from to select allowable covariates $A_{k}$ (for all designs) and, if needed, non-allowable covariates $N_{k}$ (for Designs 2, 3, and 4 that address non-random sample selection), $Q_{k}$ represents full eligibility and $Q_{k}^{‡}$ and $Q_{k}^{†}$ are partial eligibility indicators for Designs 2 and 3, and $Q_{k}^{≀}$ is an additional partial eligibility indicator for Design 4, and $Y_{k + J}$ is the outcome. Assuming Design 4, Person 1 is evaluated for seven studies and eligible (i.e., $Q_{k} = 1$ ) for two (at $k = 4, 10$ ). Person 2 is evaluated for three studies and eligible for one (at $k = 7$ ). For Design 1, we subset the data to those fully eligible $Q_{k} = 1$ . For Designs 2, 3, and 4, we subset the data to those partially eligible by $Q_{k}^{‡} = 1$ .

Under a version of overlap (3) (see footnote 21), innocuous sampling (4), exchangeability (18), positivity (12), and consistency, we identify the aggregated mean $τ^{D 4} (r)$ of the social group in the fully eligible counterfactual population $Q_{k}^{G_{k}} = 1$ (e.g., prior hypertension, established care, enrolled in EPPP, current visit) after an intervention $G_{k}$ on partial eligibility-related variables $W_{k}^{†}$ (e.g., EPPP enrollment) as:

\begin{aligned} E_{a, k} [[θ_{k}^{p o o l} E_{n} (ϕ_{k} E [Y_{k + J} | Q_{k} = 1, R = r, n_{k}, a_{k}, k] | Q_{k}^{‡} = 1, \\ R = r, a_{k}, k | Q_{k}^{‡} = 1, T = 1]] \end{aligned}

(28)

where

E_{n} (\cdot)

is over

P (n_{k} | Q_{k}^{G_{k}} = 1, R = r, a_{k}, k)

, identified by

ϕ_{k} P (n_{k} | Q_{k}^{‡} = 1, R = r, a_{k}, k)

, and

E_{a, k} [[\cdot]]

is over

P (a_{k}, k | Q_{k}^{G_{k}} = 1, T = 1)

, identified by

θ_{k}^{p o o l} P (a_{k}, k | Q_{k}^{‡} = 1, T = 1)

To estimate (28) by G-computation, it suffices to follow the same procedure as for Design 2 (see the subsection “Identification and Estimation for Design 1 (Default Model)”) with a slight change, to weight the model in step 3 by $ϕ_{k}$ and use $θ_{k}^{p o o l}$ as a weight for a weighted average for step 5. For calendar-time specific means, we condition the last expectation and all terms in $θ_{k}^{p o o l}$ on k.

We may also estimate the aggregated mean outcome $τ^{D 4} (r)$ by the weighting estimator:

E [Y_{k + J} \times ω_{r, Q_{k} = 1, k}^{D 4 - p o o l} | Q_{k} = 1, R = r]

(29)

where

ω_{r, Q_{k} = 1, k}^{D 4} = ϕ_{k} \times θ_{k}^{p o o l} \times \frac{P (Q_{k}^{≀} = 1 | Q_{k}^{†} = 1, Q_{k}^{‡} = 1, R = r)}{P (Q_{k}^{≀} = 1 | Q_{k}^{†} = 1, Q_{k}^{‡} = 1, R = r, n_{k}, a_{k}, k)} \times \frac{P (Q_{k}^{†} = 1 | Q_{k}^{‡} = 1, R = r)}{P (Q_{k}^{†} = 1 | Q_{k}^{‡} = 1, R = r, n_{k}, a_{k}, k)} \times \frac{P (T = 1 | Q_{k}^{‡} = 1, a_{k}, k)}{P (R = r | Q_{k}^{‡} = 1, a_{k}, k)} \times \frac{P (R = r | Q_{k}^{‡} = 1)}{P (T = 1 | Q_{k}^{‡} = 1)}

The first two terms are

ϕ_{k}

and

θ_{k}^{p o o l}

. The third term is the ratio of the unconditional probability (with respect to

A_{k}, N_{k}, k)

of being partially eligible

Q_{k}^{≀} = 1

(e.g., current visit) given other indicators of partial eligibility

Q_{k}^{†}

=1 (e.g., EPPP enrollment) and

Q_{k}^{‡} = 1

(e.g., prior hypertension, established care), and social group

R = r

to the corresponding conditional probability given the non-allowables

N_{k}

, allowables

A_{k}

, and calendar time k. The remaining terms are identical to the expressions for the weight (23) used in Design 2. For calendar-time specific means, we condition all terms in the weight and all terms in

θ_{k}^{p o o l}

on k.

Identification and Estimation of Design 4 Under Simplifying Conditions

Emulation of Design 4 simplifies greatly with no indicators of partial eligibility $Q_{k}^{≀}$ affected by an intervention on the partial eligibility-related variable $W_{k}^{†}$ . All terms for $Q_{k}^{≀}$ in $ϕ_{k}$ (26), $θ_{k}^{p o o l}$ (27), in the estimators (28) and (29) disappear. Under Design 4a (i.e., randomly assign $W_{k}^{†}$ ; Table 1), the term $q_{k} (\cdot)$ cancels and the estimators (28) and (29) reduce to those of Design 2, i.e., (22) and (23). Then, Design 4a generalizes the results $Ψ$ to those partially eligible by $Q_{k}^{‡} = 1$ even when $W_{k}^{†}$ affects the outcome.

Under Design 4b, (i.e., randomly assign $W_{k}^{†}$ given $A_{k}, R = r$ , and $k$ ; Table 1), the identifying expression for $τ^{D 4 b} (r)$ behind the G-computation estimator reduces to:

E_{a, k} [[E_{n} (E [Y_{k + J} | Q_{k} = 1, R = r, n_{k}, a_{k}, k] | Q_{k}^{‡} = 1, R = r, a_{k}, k) | Q_{k} = 1, T = 1]]

(30)

where

E_{n} (\cdot)

is over

P (n_{k} | Q_{k}^{‡} = 1, R = r, a_{k}, k)

and

E_{a, k} [[\cdot]]

is over

P (a_{k}, k | Q_{k} = 1, T = 1)

The weighting estimator reduces to:

E [Y_{k + J} \times ω_{r, Q_{k} = 1, k}^{D 4 b - p o o l} | Q_{k} = 1, R = r]

(31)

where

ω_{r, Q_{k} = 1, k}^{D 4 b - p o o l} = \frac{P (Q_{k}^{†} = 1 | Q_{k}^{‡} = 1, R = r, a_{k}, k)}{P (Q_{k}^{†} = 1 | Q_{k}^{‡} = 1, R = r, n_{k}, a_{k}, k)} \times \frac{P (T = 1 | Q_{k} = 1, a_{k}, k)}{P (R = r | Q_{k} = 1, a_{k}, k)} \times \frac{P (R = r | Q_{k} = 1)}{P (T = 1 | Q_{k} = 1)}

Then, Design 4b addresses non-random selection while retaining social group differences in eligibility. Note that if the chosen allowables $A_{k}$ are sufficient to satisfy independence (11) or exchangeability (18), so that no non-allowables $N_{k}$ are needed, the estimators (30) and (31) reduce to (20) and (21) of Design 1.

Under Design 4c (i.e., randomly assign $W_{k}^{†}$ given $A_{k}$ , $N_{k}$ , and $k$ as in $T = 1$ ; Table 1), $θ_{k}^{p o o l}$ reduces to one, and the identifying expression for $τ^{D 4 c} (r)$ behind the G-computation estimator reduces to:

E_{a, k} [[E_{n} (ϕ_{k}^{c} E [Y_{k + J} | Q_{k} = 1, R = r, n_{k}, a_{k}, k] | Q_{k}^{‡} = 1, R = r, a_{k}, k) | Q_{k} = 1, T = 1]]

(32)

where

E_{n} (\cdot)

is over

P (n_{k} | Q_{k}^{G_{k}} = 1, R = r, a_{k}, k)

, identified by

ϕ_{k}^{c} P (n_{k} | Q_{k}^{‡} = 1, R = r, a_{k}, k)

E_{a, k} [[\cdot]]

is over

P (a_{k}, k | Q_{k} = 1, T = 1)

, and

ϕ_{k}^{c} = \frac{P (Q_{k}^{†} = 1 | Q_{k}^{‡} = 1, T = 1, n_{k}, a_{k}, k)}{E_{n} [P (Q_{k}^{†} = 1 | Q_{k}^{‡} = 1, T = 1, n_{k}, a_{k}, k) | Q_{k}^{‡} = 1, R = r, a_{k}, k]}

with

E_{n} [\cdot]

over

P (n_{k} | Q_{k}^{‡} = 1, R = r, a_{k}, k)

The weighting estimator reduces to:

E [Y_{k + J} \times ω_{r, Q_{k} = 1, k}^{D 4 c - p o o l} | Q_{k} = 1, R = r]

(33)

where

ω_{r, Q_{k} = 1, k}^{D 4 c - p o o l} = ϕ_{k}^{c} \times \frac{P (Q_{k}^{†} = 1 | Q_{k}^{‡} = 1, R = r, a_{k}, k)}{P (Q_{k}^{†} = 1 | Q_{k}^{‡} = 1, R = r, n_{k}, a_{k}, k)} \times \frac{P (T = 1 | Q_{k} = 1, a_{k}, k)}{P (R = r | Q_{k} = 1, a_{k}, k)} \times \frac{P (R = r | Q_{k} = 1)}{P (T = 1 | Q_{k} = 1)}

Estimation under Design 4c is thus similar to that of Design 4b except with $ϕ_{k}^{c}$ factored in. If the marginalized group is the standard population, its aggregated mean outcome (32) reduces to its pooled mean $E [Y_{k + J} | Q_{k}$ $= 1, R = 1]$ and its weight (33) reduces to one. Then, the target study model is purely descriptive of the fully eligible $Q_{k} = 1$ marginalized group (i.e., its expected outcome in the target study and in the fully eligible population are the same), even when addressing non-random sample selection. It does so by making the selection process of the privileged group match that of the marginalized group.

Contributions and Comparison to Existing Literature

Target Trial Emulation

Our target study model is inspired by the trial emulation framework (Hernán and Robins 2016) used to evaluate causal effects of treatment strategies. Our work differs in that (i) there are no treatment groups, only observed social groups; (ii) we balance allowable covariates by design (sampling) rather than balance confounders by intervention (randomization); (iii) under multiple studies across calendar time, we enforce one person per unit of time, all social groups compared must be represented at each unit of time, and our aggregated estimators adjust for calendar time enrollment by balancing it across groups, without any assumption that disparity is homogeneous over time. Our summary estimate is interpretable as a weighted average of disparity measures for populations indexed at different points in calendar time, and it avoids potential bias due to differential timing of enrollment by social groups. Our model can incorporate treatment strategies and within-group randomization post sampling to evaluate causal effects of treatment strategies on disparity (Jackson et al. 2024).

Sampling as a Conceptual Model

Tipton (2013), Westreich et al. (2017), and Dahabreh et al. (2021), use sampling designs of target populations to generalize or transport results from randomized trials. VanderWeele (2020) proposes simple random sampling to define causal effects of observing social groups to measure inequality. Lundberg (2022) uses simple random sampling to critique inference about causal effects on inequality in a superpopulation. Moreno-Betancur (2021), in a commentary on Jackson (2021), mentions enrollment in a target trial with balanced allowables to motivate causal inference with somewhat vague interventions. Our model uses stratified sampling of an eligible population to construct a sample with desirable properties (i.e., distributions of covariates that reflect populations of interest to address non-random sample selection, balance of allowable covariates across social groups) to measure disparity. Our formal presentation discusses the sampling designs and emulation procedures in considerable detail.

Alternative Conceptual Models of Disparity

Other conceptual models that employ allowability are widely used to define disparity in healthcare or used to audit or improve algorithmic fairness. We outline these models and then compare them to our model. We ignore issues of non-random sample selection to focus on core differences between the models. Each of these models, including our own, presumes that the constructs measured by allowables have the same meaning for each social group (i.e., that there is no differential interpretation or measurement error).

Cook et al. (2009) frame disparity as the difference in healthcare utilization outcomes unexplained by differences in the allowable covariates, a generalization of the Oaxaca-Blinder Decomposition (Blinder 1973; Oaxaca 1973) originally used to measure labor discrimination.²⁶ Under this interpretation, they define an IOM concordant disparity that compares outcomes between an observed marginalized group $R = 1$ and a counterfactual privileged group $R = 0$ with certain distributional properties. Its joint distribution of the allowables $A$ and non-allowables $N$ , $f_{A, N} * (A, N | R = 0)$ , must: (a) return $f_{A} (A | R = 1)$ the factual marginal distribution of $A$ among the marginalized group $R = 1$ when integrated over $N$ ; (b) return $f_{N} (N | R = 0)$ the factual marginal distribution of $N$ among the privileged group $R = 0$ when integrated over $A$ . Criterion (a) ensures balance of the allowables $A$ . Criterion (b) picks up the mediating role of $N$ leading to differences in healthcare utilization in the overall population. They avoid strict causal assumptions by not placing further constraints on $f_{A, N} * (A, N | R = 0)$ . They adapt a ‘Rank and Replace’ procedure (McGuire et al. 2006) to implement this model by producing a counterfactual privileged population satisfying criteria (a) and (b).

Duan et al. (2008) agree with the decomposition perspective but argue that Cook et al.’s (2009) criteria and ‘Rank and Replace’ procedure lead to implausible populations not relevant for policymaking. With a factual marginalized group, a decomposition involves an intervention to assign the privileged group the allowable $A$ distribution of the marginalized group, and such intervention will impact the non-allowable $N$ distribution when $A$ causes $N$ . This yields a causal decomposition, one that intervenes on the allowables among the privileged group, and compares the observed marginalized group $R = 1$ to a counterfactual privileged group $R = 0$ where the joint distribution of $A$ and $N$ depend on the causal relationship between them. When $N$ causes $A$ (Figure 5a), the joint distribution is $f_{N} (N | R = 0) \times f_{A | N} (A | R = 1, N)$ , their ‘marginal framework’, where they assign $A$ within levels of $N$ . Whereas when $A$ causes $N$ (Figure 5b), the joint distribution is $f_{N | A} (N | R = 0, A) \times f_{A} (A | R = 1)$ , their ‘conditional framework’, where they assign $A$ irrespective of $N$ . To implement, they propose a density ratio weighting procedure. The model relies on ‘nature preserving assumptions’ that allow the counterfactuals to be mapped to data, wherein the intervention on $A$ does not change how the outcome is conditionally distributed. The model does not extend to settings where $N$ and $A$ cause each other over time (Figure 5c).

Figure 5.

Directed acyclic graphs depicting causal relations between historical processes H, social group R, a set of allowable covariates $A$ , a set of nonallowable covariates $N$ , and a decision-based outcome D. In (a), the nonallowables affect the allowables. In (b) the allowables affect the nonallowables. In (c) there is causal feedback between allowables $A$ and non-allowables $N$ over time.

Many authors (Pearl 2001; Zhang, Wu, and Wu 2016; Kilbertus et al. 2017; Nabi and Shpitser 2018; Zhang and Bareinboim 2018; Chiappa 2019; Weinberger 2022) define discrimination as a direct effect of assigning social group membership R (or its perception) on an outcome D made by a decider.²⁷ The direct effect does not occur through allowable covariates $A$ appropriate for decision-making, capturing inappropriate causal paths.²⁸ In Figure 5a and b, discrimination is reflected by the direct path set: $R \to D$ and $R \to N \to D$ (again, we say ‘direct’ as the path set avoids $A$ ). In Figure 5c, discrimination is reflected by the path set: $R \to D$ , $R \to N_{1} \to D$ , $R \to N_{2} \to D$ , and $R \to N_{1} \to N_{2} \to D$ . A direct effect can be defined by potential outcomes $D^{r, A^{r}}$ under interventions that jointly assign social group R and allowables $A$ . For exposition, consider the direct effect defined among the marginalized group, where persons' social group R is set to marginalized $r = 1$ versus privileged $r = 0$ but the allowables $A$ are held to what they would be under marginalized status, i.e., $A^{r = 1}$ (to measure discrimination):

E [D^{r = 1, A^{r = 1}} | R = 1] - E [D^{r = 0, A^{r = 1}} | R = 1]

Under consistency and composition assumptions, (VanderWeele and Vansteelandt 2009) the expression

E [D^{r = 1, A^{r = 1}} | R = 1]

is identified as the marginalized group's factual mean,

E [Y | R = 1]

. Under Figure 5b, if the graph includes all confounders of R 's effects on

A

and D (e.g.,

H

), all confounders of

A

's effect on D, and no confounder is affected by R (i.e., the recanting witness criterion holds; Chen, Shpitser, and Pearl 2005; Shpitser 2013), the ‘cross-world’ expression

E [D^{r = 0, A^{r = 1}} | R = 1]

is identified as:

\sum_{h, a, n} E [D | R = 0, n, a, h] P (n | R = 0, a, h) P (a, h | R = 1)

The expression shows a joint distribution

f_{N | A, H} (N | R = 0, A, H) \times f_{A, H} (A, H | R = 1)

where the confounder H is treated as allowable.²⁹ Under Figure 5a and c, the direct effect is not identified because the non-allowables

N

confound the effect of the allowables

A

on the outcome D but are affected by social group R.

Our model defines disparity by comparing groups who are (distributionally) similarly situated (i.e., balanced) on the allowables $A$ by design. We argued in the subsection “Is a Causal Framing of Disparity Necessary?” that this design is IOM concordant.³⁰ Our model's estimand is also interpretable as a standardized measure, where a disparity metric is calculated for each level of the allowables $A = a$ and these metrics are standardized to a common distribution. In contrast, the Cook et al. (2009), Duan et al. (2008), and direct effect models decompose either a crude difference or a total effect into a portion that captures unjust differences (e.g., disparity, discrimination) and another that does not.³¹ Direct effects are defined by nuanced interventions that either act in a ‘cross-world’ sense (Andrews and Didelez 2021), act on distinct mechanisms that stem from assigning social group R (Robins, Richardson, and Shpitser 2021), or change how information flows from assigning R (Díaz 2023).

Our default model, Design 1, assumes overlap (3) of allowables $A$ ³² and does not specify H (i.e., determinants of R as in Figure 5) or non-allowables $N$ . That is, unlike the Cook et al. (2009), Duan et al. (2008), and direct effect models, our model does not require investigators to understand all the non-allowable factors that lead to the outcome. If we do express H and $N$ and choose the marginalized group as the standard population, our model would compare the observed outcomes of the marginalized group to a privileged group with a joint distribution $f_{N, H | A} (N, H | R = 0, A) \times f_{A} (A | R = 1)$ for $A$ , $N$ , and H (i.e., H is treated as if it were non-allowable). Our default model is agnostic about causal structure. Our model is identified under Figure 5a to c, whereas the Duan et al. (2008) model does not cover Figure 5c, and the direct effect model is not identified in Figure 5a or c. Our model also assumes that the sampling process does not impact the data-generating process (4). The Duan et al. (2008) and direct effect models assume that their interventions leave aspects (e.g., the conditional outcome distribution) of the data generation process intact. This assumption is very demanding given how society and social groups are currently structured (Kohler-Hausmann 2019; Jackson and Arah 2020).

Selection Bias (Including Generalizability and Transportability)

Nonrandom sample selection can bias the causal effect of assigning perceived group membership (e.g., a measure of discrimination) (Greiner and Rubin 2011; Malinsky, Shpitser, and Richardson 2019; Knox, Lowe, and Mummolo 2020; Gaebler et al. 2022; Stensrud et al. 2022). It also impacts a descriptive measure of inequality (VanderWeele and Robinson 2014). We underscored that nonrandomly selected populations can be inherently meaningful and that collider stratification may affect baseline covariates to further disadvantage a marginalized group on outcomes, aligning with the Healthy People 2020 and NIMHD definitions of disparity (see the subsections “Defining Disparity” and “Non-Random Sample Selection”). We provided Design 1 for use in these settings.

We argued that non-random sample selection may be addressed whenever (a) it limits the ability to infer to a population of interest or (b) masks disparity through collider-stratification. We provided Designs 2 and 3 for use in setting (a), and Design 4 for use in setting (b). Design 4 may be contrasted with Design 1 to understand how much non-random sample selection impacts disparity. Designs 2 and 3 use similar assumptions used to generalize or transport descriptive measures and the results of randomized trials (Degtiar and Rose 2023). When there are no allowables specified and the combined sample [collapsed over social group R ] is the standard population, and there is only one study (i.e., indexed at a single moment of calendar time), then the identifying expressions of Designs 2 and 3 reduce to those of Bareinboim, Tian, and Pearl (2014), the weighting estimators for Design 2 reduce to inverse selection weights of Cole and Stuart (2010), the weighting estimators for Design 3 reduce to the inverse odds sampling weights of Westreich et al. (2017), and the G-computation estimators for Design 2 reduce to those of Lesko et al. (2017) and also Dahabreh et al. (2020).

Design 4 is a novel approach to address non-random sample selection. It envisions an intervention to allocate certain (i.e., not necessarily all) eligibility-related variables, even when other eligibility-related variables (not intervened upon) are affected by the intervention. Dahabreh et al. (2019) envision an intervention to scale up trial participation, the last step in study enrollment. The exchangeability assumption of Design 4, like Dahabreh et al. (2019), holds when eligibility-related variables or study participation affect the outcome. The independence assumption of Designs 2 and 3 used to generalize or transport does not. Estimation under Design 4 simplifies greatly when the intervention point is the last step in enrollment (e.g., an intervention to allocate study participation).

We focused on non-random sample selection. Missing data, loss to follow-up, and competing risks during follow-up may bias a disparity estimate (Howe and Robinson 2018). These may be addressed in the statistical analysis of the target study and its emulation. Events (e.g., death, hospital discharge for study in inpatient disparity in prognosis) before enrollment may affect who (non-randomly) selects into the study sample (Rojas-Saunero, Glymour, and Mayeda 2023). If survivors at enrollment (i.e., those without the event) are not considered to be a meaningful population, an extension of Design 4 with a sustained intervention may help if the event is manipulable, but such an extension is not developed here and is saved for future work.

Discussion

We have proposed a conceptual model for measuring disparity and a framework to emulate it with secondary data. Through a sampling plan, the model similarly situates social groups on allowable covariates at baseline to map to meaningful definitions of disparity (Design 1). The model extends to address non-random sample selection in various ways to permit generalizability (Design 2), transportability (Design 3), or inference in a selected population without inducing undesirable forms of collider-stratification (Design 4). We motivated Design 4 for when collider stratification due to non-random sample selection attenuates disparity, but it is difficult to know when this will occur (Nguyen, Dafoe, and Ogburn 2019). Investigators may emulate both Designs 1 and 4 to report how selective mechanisms may contribute to or attenuate disparity. Unlike existing models used to measure disparity, our model involves no intervention to assign social group membership and no intervention to manipulate the allowable covariates. Only under Design 4 does the model intervene on variables used to establish eligibility. Under Designs 1‒3, the model can recover the crude expected outcomes of a social group (e.g., the marginalized group) by using it to determine the standard distribution of the sampling plan. This is also true of Designs 4a and 4c under simplifying conditions (see the subsection “Identification and Estimation of Design 4 Under Simplifying Conditions”). We have described data structures and provided weighting and G-computation estimators to emulate the model in complex data. Our model and emulation procedures avoid bias due to differential enrollment over calendar time without invoking any assumption that disparity is homogeneous over time. The summary estimates are weighted averages of estimates for populations at points in calendar time. In the Supplementary Material, we provide sample code, a data application to electronic medical records, and proof of all results.

Our model has translational value for advancing public health and clinical medicine. First, it relies on minimal assumptions and is therefore a practical measure for advancing social justice. Second, its features accommodate specific populations during critical life stages: eligibility, time zero, follow-up, and outcome definition, aspects which actual interventions must consider in practice. Third, it maps to definitions of disparity that have strong moral foundations and have long guided public health action. Fourth, it can be extended to evaluate causal effects of (i) hypothetical interventions to inform future interventions (Jackson 2021) and (ii) actual interventions in (non)-randomized trials (Jackson et al. 2024).

Our model also has conceptual value. Our Designs 1‒3 are grounded in the observed world. They pick up the realized effects of unjust mechanisms as they operate in this world. Mechanisms of injustice are exquisitely complex, inter-dependent, mutually constituted, and dynamically reinforcing (Reskin 2012). Causal approaches that leverage observational data assume that the way outcomes are conditionally distributed in the factual world will be unchanged in the counterfactual world created by hypothetical interventions, ignoring this complexity (Jackson and Arah 2020). Our Designs 1‒3 capture the impact of this complexity as observed without specifying how this complexity works or assuming it away. Our Design 4 invokes a consistency assumption, though, and is subject to this limitation.

Supplemental Material

sj-pdf-1-smr-10.1177_00491241251314037 - Supplemental material for The Target Study: A Conceptual Model and Framework for Measuring Disparity

Supplemental material, sj-pdf-1-smr-10.1177_00491241251314037 for The Target Study: A Conceptual Model and Framework for Measuring Disparity by John W. Jackson, Yea-Jen Hsu, Raquel C. Greer, Romsai T. Boonyasai and Chanelle J. Howe in Sociological Methods & Research

Footnotes

Author Contributions

Dr. Jackson conceived of the work, developed the formal results, carried out the data application, and drafted the initial and revised manuscripts. Dr. Hsu constructed the analytic cohort for the data application. Drs. Jackson, Hsu, Greer, and Boonyasai oversaw the construction of the analytic cohort and data application. Drs. Hsu, Greer, Boonyasai, and Howe critically edited the initial and revised manuscripts for scientific content.

Authors' Note

The data application code and sample code to implement all estimators are available at: https://osf.io/ta7vw/ (Open Science Framework) and (GitHub).

Declaration of Conflicting Interests

The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: This work does not necessarily represent the views or opinions of the Agency for Healthcare Research and Quality. Dr. Howe has received funding via a grant from Sanofi Pasteur administered directly to Brown University (unrelated to the current work).

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Dr. Jackson was supported by a grant from the National Heart, Lung, and Blood Institute (K01HL145320).

ORCID iDs

John W. Jackson

Chanelle J. Howe

Data Availability Statement

This manuscript used data from electronic patient medical records of patients seen within a large healthcare system. To protect patient privacy and to comply with HIPAA, we are unable to share or post the data with third parties for re-analysis.

Supplemental Material

Supplemental material (i.e., data application and proofs) for this article is available online.

Notes

Author Biographies

Dr. John W. Jackson, Sc.D. is an Associate Professor in the Departments of Epidemiology, Biostatistics, and Mental Health at the Johns Hopkins Bloomberg School of Public Health, and core faculty in the Johns Hopkins Center for Health Equity, Center for Health Disparities Solutions, and Center for Drug Safety and Effectiveness. His research focuses on developing methods for translational health equity research, including methods to define and measure health disparities, to identify high-leverage targets and strategies for interventions that address health disparities, and to evaluate effects of interventions. His work is funded by the National Heart, Lung, and Blood Institute, the Robert E. Meyerhoff Foundation, as well as by pilot funding from Johns Hopkins University.

Dr. Yea-Jen Hsu, Ph.D., is an Associate Research Scientist in the Department of Health Policy and Management at the Johns Hopkins Bloomberg School of Public Health. She specializes in health services research, program evaluation, and the application of implementation science theories to healthcare improvement, with a current focus on patient safety and quality improvement with innovative care and evaluation models.

Dr. Raquel C. Greer, M.D., M.H.S. is an Adjunct Associate Professor in the Department of Medicine at the Johns Hopkins School of Medicine and her research is focused on identifying and testing strategies to reduce and eliminate health disparities and advance health equity for people with kidney disease.

Dr. Romsai Tony Boonyasai, M.D., M.P.H. is a Physician Advisor in the Division of Quality Measurement and Improvement at the Agency for Health Care Quality, a branch of the U.S. Department of Health and Human Services, and an Associate Professor (part-time) in the Department of Medicine at the Johns Hopkins School of Medicine. His research interests include the development of quality-of-care measures and the implementation of systems-based interventions to improve population health and advance health equity.

Dr. Chanelle J. Howe, Ph.D. is an Associate Professor in the Department of Epidemiology within the Brown University School of Public Health. She has an appointment with Brown's Center for Epidemiologic Research, is a member of the Providence/Boston Center for AIDS Research, and is a faculty associate with Brown's Population Studies and Training Center. Her research interests include methods, infectious diseases, and health disparities. She also serves as an Editor for the American Journal of Epidemiology.

References

Andrews

R. M.

Didelez

. 2021. “Insights into the Cross-World Independence Assumption of Causal Mediation Analysis.” Epidemiology 32(2):209-19. doi:https://doi.org/10.1097/ede.0000000000001313

Bareinboim

Tian

Jin

Pearl

. 2014. “Recovering from Selection Bias in Causal and Statistical Inference.” Pp. 2410-16 in Twenty-Eighth AAAI Conference on Artificial Intelligence, Vol. 28.

Blinder

Alan S.

1973. “Wage Discrimination: Reduced Form and Structural Estimates.” The Journal of Human Resources 8(4):436-55. doi:https://doi.org/10.2307/144855

Braveman

2006. “Health Disparities and Health Equity: Concepts and Measurement.” Annual Review of Public Health 27:167-94. doi:https://doi.org/10.1146/annurev.publhealth.27.021405.102103

Braveman

P. A.

Kumanyika

Fielding

Laveist

Borrell

L. N.

Manderscheid

Troutman

. 2011. “Health Disparities and Health Equity: The Issue Is Justice.” American Journal of Public Health 101(Suppl. 1):S149-55. doi:https://doi.org/10.2105/ajph.2010.300062

Chen

Avin

Shpitser,

Pearl

. 2005. “Identifiability of Path-Specific Effects.” Paper presented at the Proceedings of the 19th International Joint Conference on Artificial Intelligence, Edinburgh, Scotland.

Chiappa

Silvia

. 2019. “Path-Specific Counterfactual Fairness.” Proceedings of the AAAI Conference on Artificial Intelligence 33(1):7801-08. doi:https://doi.org/10.1609/aaai.v33i01.33017801

Civil Rights Act of 1964, Public Law 88-352, 78, (1964).

Civil Rights Act of 1991, Public Law 102-166, 105, (1991).

10.

Cole

S. R.

Stuart

E. A.

. 2010. “Generalizing Evidence from Randomized Clinical Trials to Target Populations: The Actg 320 Trial.” American Journal of Epidemiology 172(1):107-15. doi:https://doi.org/10.1093/aje/kwq084

11.

Collins

Patricia Hill,

Bilge

Sirma

. 2020. Intersectionality. Medford, MA: Polity Press.

12.

Cook

B. L.

McGuire

T. G.

Meara

Zaslavsky

A. M.

. 2009. “Adjusting for Health Status in Non-Linear Models of Health Care Disparities.” Health Services and Outcomes Research Methodology 9(1):1-21. doi:https://doi.org/10.1007/s10742-008-0039-6

13.

Cooper

L. A.

Hill

M. N.

Powe

N. R.

. 2002. “Designing and Evaluating Interventions to Eliminate Racial and Ethnic Disparities in Health Care.” Journal of General Internal Medicine 17(6):477-86. doi:https://doi.org/10.1046/j.1525-1497.2002.10633.x

14.

Dahabreh

I. J.

Haneuse

S. J. A.

Robins

J. M.

Robertson

S. E.

Buchanan

A. L.

Stuart

E. A.

Hernán

M. A.

. 2021. “Study Designs for Extending Causal Inferences from a Randomized Trial to a Target Population.” American Journal of Epidemiology 190(8):1632-42. doi:https://doi.org/10.1093/aje/kwaa270

15.

Dahabreh

I. J.

Robertson

S. E.

Steingrimsson

J. A.

Stuart

E. A.

Hernán

M. A.

. 2020. “Extending Inferences from a Randomized Trial to a New Target Population.” Statistics in Medicine 39(14):1999-2014. doi:https://doi.org/10.1002/sim.8426

16.

Dahabreh

I. J.

Robins

J. M.

Haneuse

S. J. P.

Hernán

M. A.

. 2019. “Generalizing Causal Inferences from Randomized Trials: Counterfactual and Graphical Identification.” Cornell University, arXiv.

17.

Davison

Hinkley

. 1997. Bootstrap Methods and Their Application. Cambridge, UK: Cambridge University Press.

18.

Degtiar

Irina

Rose

Sherri

. 2023. “A Review of Generalizability and Transportability.” Annual Review of Statistics and Its Application 10:501-24.

19.

Díaz

Iván

. 2023. “Non-Agency Interventions for Causal Mediation in the Presence of Intermediate Confounding.” Journal of the Royal Statistical Society Series B: Statistical Methodology 86:435-60. doi:https://doi.org/10.1093/jrsssb/qkad130

20.

Diderichsen

Evans

Whitehead

. 2001 “The Social Basis of Disparities in Health.” Pp. 12-23 in Challenging Inequities in Health: From Ethics to Action, edited by Evans

Whitehead

Diderichsen

Bhuiya

Wirth

. New York, NY: Oxford University Press.

21.

Duan

Meng

X. L.

Lin

J. Y.

Chen

C. N.

Alegria

. 2008. “Disparities in Defining Disparities: Statistical Conceptual Frameworks.” Statistics in Medicine 27(20):3941-56. doi:https://doi.org/10.1002/sim.3283

22.

Duran

D. G.

Pérez-Stable

E. J.

. 2019. “Novel Approaches to Advance Minority Health and Health Disparities Research.” American Journal of Public Health 109(S1):S8-S10. doi:https://doi.org/10.2105/ajph.2018.304931

23.

Elwert

Winship

. 2014. “Endogenous Selection Bias: The Problem of Conditioning on a Collider Variable.” Annual Review of Sociology 40:31-53. doi:https://doi.org/10.1146/annurev-soc-071913-043455

24.

Field

C. A.

Welsh

A. H.

. 2007. “Bootstrapping Clustered Data.” Journal of the Royal Statistical Society Series B: Statistical Methodology 69(3):369-90. doi:https://doi.org/10.1111/j.1467-9868.2007.00593.x

25.

Food and Drug Administration. 2024. “FDA Executive Summary: Performance Evaluation of Pulse Oximeters Taking into Consideration Skin Pigmentation, Race and Ethnicity. Prepared for the Anesthesiology and Respiratory Therapy Devices Panel of the Medical Devices Advisory Committee Center for Devices and Radiological Health (CDRH) United States Food and Drug Administration.” Retrieved from March 7, 2024. https://www.fda.gov/media/175828/download.

26.

Gaebler

Cai

Basse

Shroff

Goel,

Hill

. 2022. “A Causal Framework for Observational Studies of Discrimination.” Statistics and Public Policy 9(1):26-48.

27.

Greenland

1977. “Response and Follow-up Bias in Cohort Studies.” American Journal of Epidemiology 106(3):184-7. doi:https://doi.org/10.1093/oxfordjournals.aje.a112451

28.

Greiner

Rubin

D. B.

. 2011. “Causal Effects of Perceived Immutable Characteristics.” Review of Economics and Statistics 93(3):775-85.

29.

Hernán

M. A.

2017. “Invited Commentary: Selection Bias without Colliders.” American Journal of Epidemiology 185(11):1048-50. doi:https://doi.org/10.1093/aje/kwx077

30.

Hernán

M. A.

Hernández-Díaz,

Robins

J. M.

. 2004. “A Structural Approach to Selection Bias.” Epidemiology 15(5):615-25. doi:https://doi.org/10.1097/01.ede.0000135174.63482.43

31.

Hernán

Miguel A.,

Robins

James M.

. 2006. “Estimating Causal Effects from Epidemiological Data.” Journal of Epidemiology & Community Health 60(7):578-86. doi:https://doi.org/10.1136/jech.2004.029496

32.

Hernán

M. A.,

Robins

J. M.

. 2016. “Using Big Data to Emulate a Target Trial When a Randomized Trial Is Not Available.” American Journal of Epidemiology 183(8):758-64. doi:https://doi.org/10.1093/aje/kwv254

33.

Howe

C. J.

Robinson

W. R.

. 2018. “Survival-Related Selection Bias in Studies of Racial Health Disparities: The Importance of the Target Population and Study Design.” Epidemiology 29(4):521-24. doi:https://doi.org/10.1097/ede.0000000000000849

34.

Huang

Francis L

. 2018. “Using Cluster Bootstrapping to Analyze Nested Data with a Few Clusters.” Educational and Psychological Measurement 78(2):297-318. doi:https://doi.org/10.1177/0013164416678980

35.

Hutler

2022. “Causation and Injustice: Locating the Injustice of Racial and Ethnic Health Disparities.” Bioethics 36(3):260-66. doi:https://doi.org/10.1111/bioe.12994

36.

Institute of Medicine Committee on Understanding and Eliminating Racial Ethnic Disparities in Healthcare. 2003 “Introduction and Literature Review.” in Unequal Treatment: Confronting Racial and Ethnic Disparities in Health Care, edited by Smedley

B. D.

Stith

A. Y.

Nelson

A. R.

. Washington, DC: National Academies Press., 3–4.

37.

Jackson

J. W.

2021. “Meaningful Causal Decompositions in Health Equity Research: Definition, Identification, and Estimation through a Weighting Framework.” Epidemiology 32(2):282-90.

38.

Jackson

J. W.

Arah

O. A.

. 2020. “Invited Commentary: Making Causal Inference More Social and (Social) Epidemiology More Causal.” American Journal of Epidemiology 189(3):179-82.

39.

Jackson

J. W.

Hsu

Zalla

L. C.

Carson

K. A.

Marsteller

J. A.

Cooper

L. A.

. 2024. “Evaluating Effects of Multilevel Interventions on Disparities in Health and Healthcare.” Prevention Science 25(Suppl 3):407-20.

40.

Jackson

J. W.

Williams

D. R.

VanderWeele

T. J.

. 2016. “Disparities at the Intersection of Marginalized Groups.” Social Psychiatry and Psychiatric Epidemiology 51(10):1349-59.

41.

Kaufman

J. S.

2017. “Statistics, Adjusted Statistics, and Maladjusted Statistics.” American Journal of Law & Medicine 43(2-3):193-208.

42.

Kilbertus

Rojas-Carulla

Parascandolo

Hardt

Janzing

Scholkopf

. 2017. “Avoiding Discrimination through Causal Reasoning.” Pp. 2037-45 in Thirty-first Conference on Neural Information Processing Systems Long Beach, CA.

43.

Kilbourne

A. M.

Switzer

Hyman

Crowley-Matoka,

Fine

M. J.

. 2006. “Advancing Health Disparities Research within the Health Care System: A Conceptual Framework.” American Journal of Public Health 96(12):2113-21.

44.

Kindig

D.,

Stoddart

. 2003. “What Is Population Health?” American Journal of Public Health 93(3):380-3.

45.

Kitagawa

Evelyn M

. 1955. “Components of a Difference Between Two Rates.” Journal of the American Statistical Association 50(272):1168-94. doi:https://doi.org/10.2307/2281213

46.

Knox

Lowe

Mummolo

. 2020. “Administrative Records Mask Racially Biased Policing.” American Political Science Review 114(3):619-37.

47.

Kohler-Hausmann

2019. “Eddie Murphy and the Dangers of Counterfactual Causal Thinking About Detecting Racial Discrimination.” Northwestern University Law Review 113(5):1163-228.

48.

Lesko

C. R.

Buchanan

A. L.

Westreich

Edwards

J. K.

Hudgens,

M. G.

Cole

S. R.

. 2017. “Generalizing Study Results: A Potential Outcomes Perspective.” Epidemiology 28(4):553-61. doi:https://doi.org/10.1097/ede.0000000000000664

49.

Fan

. 2023. “Using Propensity Scores for Racial Disparities Analysis.” Observational Studies 9(1):59-68.

50.

Lohr

Sharon L

. 2022. Sampling: Design and Analysis. Boca Raton: CRC Press.

51.

Cole

S. R.

Howe

C. J.

Westreich

. 2022. “Toward a Clearer Definition of Selection Bias When Estimating Causal Effects.” Epidemiology 33(5):699-706. doi:https://doi.org/10.1097/ede.0000000000001516

52.

Lundberg

2022. “The Gap-Closing Estimand: A Causal Approach to Study Interventions That Close Disparities across Social Categories.” Sociological Methods & Research 53(2):507-70.

53.

Malinsky

Shpitser,

Richardson

. 2019. “A Potential Outcomes Calculus for Identifying Conditional Path-Specific Effects.” in Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS).

54.

McGuire

T. G.

Alegria

Cook

B. L.

Wells

K. B.

Zaslavsky

A. M.

. 2006. “Implementing the Institute of Medicine Definition of Disparities: An Application to Mental Health Care.” Health Services Research 41(5):1979–2005. doi:https://doi.org/10.1111/j.1475-6773.2006.00583.x

55.

Miettinen

O. S.

1972. “Standardization of Risk Ratios.” American Journal of Epidemiology 96(6):383-8. doi:https://doi.org/10.1093/oxfordjournals.aje.a121470

56.

Moreno-Betancur

2021. “The Target Trial: A Powerful Device Beyond Well-Defined Interventions.” Epidemiology 32(2):291-94. doi:https://doi.org/10.1097/ede.0000000000001318

57.

Mueller

Purnell

T. S.

Mensah

G. A.

Cooper

L. A.

. 2015. “Reducing Racial and Ethnic Disparities in Hypertension Prevention and Control: What Will It Take to Translate Research into Practice and Policy?” American Journal of Hypertension 28(6):699-716. doi:https://doi.org/10.1093/ajh/hpu233

58.

Muñoz

I. D.

van der Laan

. 2012. “Population Intervention Causal Effects Based on Stochastic Interventions.” Biometrics 68(2):541-9. doi:https://doi.org/10.1111/j.1541-0420.2011.01685.x

59.

Nabi

R.,

Shpitser

. 2018. “Fair Inference on Outcomes.” Pp. 1931-40 in Thirty-Second AAAI Conference on Artificial Intelligence. New Orleans, LA: AAAI Press.

60.

Nguyen

T. Q.

Dafoe

Ogburn

E. L.

. 2019. “The Magnitude and Direction of Collider Bias for Binary Variables.” Epidemiologic Methods 8(1):1-29. doi:https://doi.org/10.1515/em-2017-0013

61.

Oaxaca

Ronald

. 1973. “Male-Female Wage Differentials in Urban Labor Markets.” International Economic Review 14(3):693-709. doi:https://doi.org/10.2307/2525981

62.

Pearl

2001. “Direct and Indirect Effects.” Paper presented at the Proceedings of the 17th Conference on Uncertainty and Artificial Intelligence.

63.

Powers

Madison

Faden

Ruth

. 2019. Structural Injustice: Power, Advantage, and Human Rights. New York: Oxford University Press.

64.

Ren

Jinma

Cislo

Paul

Cappelleri

Joseph C.

Hlavacek

Patrick

DiBonaventura

Marco

. 2023. “Comparing G-Computation, Propensity Score-Based Weighting, and Targeted Maximum Likelihood Estimation for Analyzing Externally Controlled Trials with Both Measured and Unmeasured Confounders: A Simulation Study.” BMC Medical Research Methodology 23(1):18. doi:https://doi.org/10.1186/s12874-023-01835-6

65.

Ren

Shiquan

Lai

Hong

Tong

Wenjing

Aminzadeh

Mostafa

Hou,

Xuezhang

Lai

Shenghan

. 2010. “Nonparametric Bootstrapping for Hierarchical Data.” Journal of Applied Statistics 37(9):1487-98. doi:https://doi.org/10.1080/02664760903046102

66.

Reskin

2012. “The Race Discrimination System.” Annual Review of Sociology 38:17-35.

67.

Richardson

T. S.,

Robins

J. M.

. 2013. “Single World Intervention Graphs (Swigs): A Unification of the Counterfactual and Graphical Approaches to Causality.” University of Washington Center for Statistics and the Social Sciences Working Paper #128.

68.

Robins

J. M.

Richardson

T. S.

Shpitser

. 2021. “An Interventionist Approach to Mediation Analysis.” arXiv:2008.06019[v2].

69.

Rojas-Saunero

L. P.

Glymour

M. M.

Mayeda

E. R.

. 2023. “Selection Bias in Health Research: Quantifying, Eliminating, or Exacerbating Health Disparities?” Current Epidemiology Reports 11(1):63-72. doi:https://doi.org/10.1007/s40471-023-00325

70.

Shahar

D. J.

Shahar

. 2017. “A Theorem at the Core of Colliding Bias.” International Journal of Biostatistics 13(1):1-11. doi:https://doi.org/10.1515/ijb-2016-0055

71.

Shpitser

2013. “Counterfactual Graphical Models for Longitudinal Mediation Analysis with Unobserved Confounding.” Cognitive Science 37(6):1011-35. doi:https://doi.org/10.1111/cogs.12058

72.

Smith

Louisa

. 2020. “Selection Mechanisms and Their Consequences: Understanding and Addressing Selection Bias.” Current Epidemiology Reports 7:179-89.

73.

Snowden

Jonathan M.

Rose

Sherri

Mortimer

Kathleen M.

. 2011. “Implementation of G-Computation on a Simulated Data Set: Demonstration of a Causal Inference Technique.” American Journal of Epidemiology 173(7):731–38. doi:https://doi.org/10.1093/aje/kwq472

74.

Stensrud

M. J.

Robins

J. M.

Sarvet

Tchetgen

E. J.

Young

J. G.

. 2023. “Conditional Seperable Effects.” Journal of the American Statistical Association 118(554):2671-83.

75.

Sunter

A. B.

1977. “List Sequential Sampling with Equal or Unequal Probabilities Without Replacement.” Journal of the Royal Statistical Society. Series C (Applied Statistics) 26(3):261-68. doi:https://doi.org/10.2307/2346966

76.

Thurber

K. A.

Thandrayen

Maddox

Barrett

E. M.

Walker

Priest

Korda

R. J.

Banks

Williams

D. R.

Lovett

. 2022. “Reflection on Modern Methods: Statistical, Policy and Ethical Implications of Using Age-Standardized Health Indicators to Quantify Inequities.” International Journal of Epidemiology 51:324-33. doi:https://doi.org/10.1093/ije/dyab132

77.

Tipton

2013. “Stratified Sampling Using Cluster Analysis: A Sample Selection Strategy for Improved Generalizations from Experiments.” Evaluation Review 37(2):109-39. doi:https://doi.org/10.1177/0193841×13516324

78.

VanderWeele

T. J.

2020. “Invited Commentary: Counterfactuals in Social Epidemiology-Thinking Outside of ‘the Box’.” American Journal of Epidemiology 189(3):175-78. doi:https://doi.org/10.1093/aje/kwz198

79.

VanderWeele

T. J.

Robinson

W. R.

. 2014. “On the Causal Interpretation of Race in Regressions Adjusting for Confounding and Mediating Variables.” Epidemiology 25(4):473-84. doi:https://doi.org/10.1097/ede.0000000000000105

80.

VanderWeele

T. J.

Vansteelandt

. 2009. “Conceptual Issues Concerning Mediation, Interventions and Composition.” Statistics and Its Interface 2(4):457-68.

81.

Weinberger

Naftali

. 2022. “Path-Specific Discrimination.” University of Pittsburgh, phil-sci archive.

82.

Westreich

Edwards

J. K.

Lesko

C. R.

Stuart

Cole

S. R.

. 2017. “Transportability of Trial Results Using Inverse Odds of Sampling Weights.” American Journal of Epidemiology 186(8):1010-14. doi:https://doi.org/10.1093/aje/kwx164

83.

Whitehead

1992. “The Concepts and Principles of Equity and Health.” International Journal of Health Services 22(3):429-45. doi:https://doi.org/10.2190/986l-lhq6-2vte-yrrn

84.

Zhang

Bareinboim

. 2018. “Fairness in Decision-Making--the Causal Explanation Formula.” Pp. 2037-45 in The Thirty-Second AAAI Conference on Artificial Intelligence. New Orleans, LA: AAAI Press.

85.

Zhang

. 2016. “A Causal Framework for Discovering and Removing Direct and Indirect Discrimination.” Paper presented at the Proceedings of the 26th International Joint Conference on Artificial Intelligence, July 9-16, New York, New York, USA.