Abstract
We present a conceptual model to measure disparity—the target study—where social groups may be similarly situated (i.e., balanced) on allowable covariates. Our model, based on a sampling design, does not intervene to assign social group membership or alter allowable covariates. To address nonrandom sample selection, we extend our model to generalize or transport disparity or to assess disparity after an intervention on eligibility-related variables that eliminates forms of collider-stratification. To avoid bias from differential timing of enrollment, we aggregate time-specific study results by balancing calendar time of enrollment across social groups. To provide a framework for emulating our model, we discuss study designs, data structures, and G-computation and weighting estimators. We compare our sampling-based model to prominent decomposition-based models used in healthcare and algorithmic fairness. We provide R code for all estimators and apply our methods to measure health system disparities in hypertension control using electronic medical records.
Introduction
Measuring disparity is a key step in making progress toward health equity. Disparity measures underlie descriptive reports and trends and serve as benchmarks for evaluating the effects of interventions and policies (Cooper, Hill, and Powe 2002). Although the measurement of disparity is critical and there has been much discussion and debate about what constitutes a disparity (Institute of Medicine Committee on Understanding and Eliminating Racial Ethnic Disparities in Healthcare 2003; Braveman 2006; Duran and Pérez-Stable 2019), there has been limited discussion about best practices and principles for measurement of disparity, especially when using secondary data not collected for research purposes.
Conceptual models serve as important guides for the analysis and interpretation of secondary data. For example, consider the target trial framework (Hernán and Robins 2016), which lays out the hypothetical randomized controlled trial one would conduct if the goal were to estimate the effect of a treatment strategy to inform clinical decision-making. The elements of the trial (eligibility, treatment strategies, outcome follow-up) guide the design and analysis of a study based on secondary data help ensure that the measure of association has a causal interpretation that applies to (a) the population of interest, (b) treatment strategies of interest, and (c) outcomes of interest, all of which are critical for informing treatment policy decisions.
The target trial framework cannot guide a descriptive measurement of disparity where there is no intervention. Still, without a conceptual guide, the population of interest and the follow-up period that pertain to unjust processes or outcomes may be unclear which can impede appropriate policymaking. Without a conceptual model it is difficult to justify and interpret covariate adjustment in health disparities research (Kaufman 2017). Causal models have been used to define disparities (Duan et al. 2008), but they have stringent assumptions and abstract away important realities. Meanwhile, there are intense discussions about nonrandom sample selection and its impact on related concepts such as discrimination (Knox, Lowe, and Mummolo 2020; Gaebler et al. 2022). Outlining the hypothetical study one could do in the real world to measure disparity will provide clarity on these issues.
We present a novel conceptual model—the target study—to address these issues and provide a framework for emulating it. The paper is organized as follows. We begin by introducing our motivating example. The section “Conceptual Issues in Measuring Disparity” reviews key issues in disparity measurement. The section “A Target Study Conceptual Model for Measuring Disparity” presents our model under the case where investigators wish to capture all effects of nonrandom sample selection. The section “Extension of the Target Study to Address Non-Random Sample Selection” expands the model to address nonrandom sample selection by generalizing to a broader population, transporting to a different population, or estimating disparity in a counterfactual population where certain consequences of non-random sample selection are absent. The section “Emulation of the Target Study with Secondary Data” proposes data structures and estimators to emulate the target study. The section “Contributions and Comparison to Existing Literature” outlines our contributions and compares our model to others widely used to study disparity in healthcare and algorithmic fairness. The section “Discussion” discusses strengths and limitations. To aid readability, we use modular sections with ample cross-referencing so readers may skip directly to sections of interest.
Motivating Example
Consider the measurement of racial disparities in hypertension outcomes of primary care patients diagnosed with hypertension who receive care at a large regional health system in the USA. The outcomes of interest are a health-related quantity Y (e.g., hypertension control) or a healthcare decision D made by a clinician (e.g., to intensify hypertension treatment). We are concerned with average outcomes across a categorical social grouping, such as race R, where a socially disadvantaged (henceforth referred to as marginalized) group (e.g., Black persons) is denoted as
Conceptual Issues in Measuring Disparity
Defining Disparity
In medicine and public health, the definition of disparity depends on whether the outcome is a health status (e.g., hypertension control differences in the quality of care that are not due to access-related factors or clinical needs, preferences, and appropriateness of intervention…[where] analysis is focused at two levels: 1) the operation of the health systems and the legal and regulatory climate…; 2) discrimination at the individual, patient-provider level. (emphasis added)
Disparity reflects society's failure to achieve equity in health, defined as “everyone having a fair and just opportunity to be as healthy as possible” (Whitehead 1992; Braveman et al. 2011). 1
Temporal Framing
To aid decision-makers, community members, and other stakeholders, disparity refers not to a universal, general phenomenon, but to outcomes among people nested in a particular context at a point in (or span of) calendar time. For example, we could describe the disparity in prevalent uncontrolled hypertension for primary care visits during each month during the peak of the COVID-19 pandemic in 2020–2022. If we include one care episode per person per month, we can meaningfully summarize disparity over the entire period by averaging over the month-specific estimates of disparity. Such a summary measure would be interpreted as an average disparity over populations indexed by calendar months. By accounting for calendar time when producing such summary estimates, we avoid confounding by time-specific trends in enrollment and the outcome. To obtain a summary estimate of disparity between social groups with the same person-time experience of the health system, the summary must properly account for calendar time.
Allowability
In the IOM definition, disparity compares groups who are similarly situated (i.e., balanced) on “allowable” covariates. Allowable covariates are those whose differential distribution does not lead to inequitable outcomes. For a distributed good outcome D (e.g., healthcare) they are factors that, on moral arguments, are appropriate for determining allocation (Jackson 2021). For example, disparities in healthcare treat clinical need as allowable based on clinical guidelines (McGuire et al. 2006; Cook et al. 2009). For a state outcome Y (e.g., health), the differential distribution of allowable covariates does not contribute to worse outcomes among the marginalized group (Jackson 2021). For example, if the marginalized group is younger and increased age predicts worse hypertension control, the younger age of the Black population does not contribute to the disparate distribution of hypertension control at the population-level. Not treating age as allowable could mask disparity from barriers to hypertension control that the Black population disproportionately faces (e.g., neighborhood disadvantage and limited options for healthy diet, physical activity, and pharmacies; Mueller et al. 2015).
Is a Causal Framing of Disparity Necessary?
A fundamental question in conceiving of disparity is how the groups come to be similarly situated (i.e., balanced) on the allowables. Many authors conceive of disparity as comparing populations that are similarly situated through an intervention where an external actor makes existing groups similar by changing the value(s) of each person's allowable covariate(s) 2 (McGuire et al. 2006; Duan et al. 2008; Cook et al. 2009; Kaufman 2017). For example, disparate pulse oximeter performance is assessed in desaturation studies where hypoxia is induced among healthy volunteers (Food and Drug Administration 2024). Disparate healthcare utilization is assessed in statistical analyses that hypothetically modify individuals’ health and project utilization after this modification. This causal reading of the IOM definition is justified by its phrase “not due to,” interpreted as “not caused by,” where disparity compares social groups who are made similar (on the allowables) by intervention, to isolate the mediating role of inappropriate factors (e.g., SES) in producing differences in healthcare utilization (McGuire et al. 2006). But the phrase “not due to” also permits a noncausal framing where, by design, disparity compares social groups who are already alike on the allowables. Early work that applied the IOM definition of disparity was motivated by non-causal studies where patients of different social groups with the same underlying need for medical treatment are compared in their distribution of appropriate medical treatment received, and actually framed such studies as IOM concordant (Cook et al. 2009: 2–3). We argue that approaches that balance allowables by design (e.g., our model) align with the IOM definition. More broadly, arguments about the exact causes of disparity are not needed to view disparity with concern (Braveman 2006). Moral concern may arise by the impact that disparity has on the human rights of marginalized groups (Hutler 2022).
Non-Random Sample Selection
Some frameworks for health equity acknowledge that non-random sample selection may impact a disparity measure (Kilbourne et al. 2006). Consider the causal graph of Figure 1a, where variables

Causal directed acyclic graphs depicting causal relationships between historical processes H, race R, demographics age and sex
Lack of generalizability occurs when a disparity measure is unbiased for the study sample (e.g., those enrolled in the EPPP) but biased for the broader population of interest (e.g., the entire health system). Lack of transportability occurs when the disparity measure is unbiased for the study sample but biased for a different population of interest (e.g., not enrolled in EPPP) (Smith 2020). Either scenario arises when (i) a risk factor
Collider stratification creates an association between social group R and the outcome
Because collider stratification due to non-random sample selection can induce an association between social group and the outcome among the sample (e.g., those enrolled in the EPPP) that is not present among the broader population (e.g., irrespective of EPPP enrollment), it is often viewed as a bias (VanderWeele and Robinson 2014; Knox, Lowe, and Mummolo 2020; Rojas-Saunero, Glymour, and Mayeda 2023). There are reasons to include contributions of collider stratification to disparity. First, if disparity is measured in a meaningfully defined population of interest, 4 the contributions are substantively grounded as they reflect that population of interest (VanderWeele and Robinson 2014). Consider when eligibility is defined by a condition (e.g., hypertension) that gives meaning to the outcome (e.g., hypertension control). For example, persons without history of hypertension can have elevated blood pressure due to hypertension onset or due to exercise, but these reasons do not represent uncontrolled hypertension. The disparity is only defined among eligible persons. Second, for a meaningful population of interest, when collider stratification disadvantages the marginalized group on baseline covariates leading to a worse outcome distribution compared to the privileged group, this aligns with definitions of disparity (see the subsection “Defining Disparity”). Third, the contribution of collider stratification is amenable to intervention by changing how covariates affect eligibility or the outcome. 5 However, when collider stratification advantages the marginalized group, it may mask disparity from other sources and investigators may choose to exclude it from disparity.
A Target Study Conceptual Model for Measuring Disparity
Overview
We now describe the elements of our conceptual model for measuring disparity, the target study. In this heuristic, an eligible population [denoted as
Under this conceptual model, the target population (in which inference is made) operationally consists of the source population that, within the enrollment period, is eligible, sampled, and enrolled. That is, in real life, if one wanted to make inferences about disparity in a population that exists within a certain span of time, one would carry out the protocol of the target study. At any calendar time unit k, a person only enrolls once into a target study. (Each unit of calendar time is of equal length). Results of studies
We discuss the choice of the weights
We begin with our default model (Design 1) where we choose to enroll all eligible persons (or a simple random sample of eligible persons) during the first stage of sampling. Recall that eligibility criteria cause persons to be non-randomly selected from the source population, so our default model includes all contributions of this non-random selection of persons to disparity. Adaptations to deal with such non-random sample selection (i.e., Designs 2, 3, or 4) are discussed in the section “Extension of the Target Study to Address Non-Random Sample Selection”. Design 1 has minimal structural constrains on the underlying causal relations between all relevant variables. 7
Enrollment Window(s)
To conduct a target study, we first choose a specific moment or narrow span in calendar time, denoted by k, to enroll persons. This requires choosing a level of granularity for calendar time (e.g., hours, days, months, years) and a specific moment k as the enrollment period (e.g., the month of January 2023). For each person, all eligibility-related variables
Enrollment Groups
The definitions of disparity in the subsection “Defining Disparity” compare groups with persistently different levels of social advantage, privilege, power, wealth, or prestige because of their position in society (Braveman 2006). Within the USA, the NIMHD's concept of a disparity specifies social groups such as racial and ethnic minoritized groups (versus majoritized groups), underserved rural residents (versus urban residents), lower socioeconomic status (versus higher socioeconomic status), and sexual and gender minorities (versus sexual and gender majorities) (Duran and Pérez-Stable 2019). The National Institute of Mental Health (NIMH) further specifies groups with serious mental illness (versus those without) who have experienced long-standing stigmatization, discrimination, social exclusion, and loss of agency in society. Reflecting an intersectional perspective that mechanisms of social injustice combine to uniquely shape experience (Collins and Bilge 2020), social groups may be defined by joint membership along multiple axes (e.g., Black women versus White men) (Jackson, Williams, and VanderWeele 2016). This list is not exhaustive, and our model accommodates categorical 8 and time-varying definitions 9 of social groups.
Eligibility Criteria
The eligibility criteria can define the population of interest, reflecting issues of scope, societal level, and timing. In terms of scope, the criteria can restrict to places (e.g., the Mid-Atlantic region), institutions (e.g., a particular health system), or shared experiences or conditions (e.g., diagnosis of hypertension) that define a meaningful population. In terms of societal level, the criteria can focus on persons under the purview of a specific decision-maker (e.g., a clinical provider), facility (e.g., a clinic), or institution (e.g., a health system). In terms of timing, the criteria can focus on critical life stages, such as birth or a milestone event (e.g., myocardial infarction) where outcomes (e.g., appropriate medical treatment) are given meaning by that event. From here, we will use the following criteria: prior hypertension, established care in the health system, EPPP enrollment before calendar time k, and a recent primary care visit within calendar time k.
Allowable Covariates
We choose the covariates that the social groups are to be similarly situated (i.e., balanced) on by the end of the enrollment process. These allowable covariates
Standard Distribution
During enrollment we sample individuals so that the distribution of allowables
To balance the allowables
The overlap assumption (3) requires that at each time k we look among the standard population (denoted by
Enrollment Process (for Design 1)
At any time k, each person enrolls (once) into one study through multiple stages of sampling.
10
Limiting participation to a single enrollment in a single study per unit of calendar time maps inference to well-defined populations at each unit of calendar time. There is a pre-stage where eligible individuals are selected, a first stage that addresses the contribution of selective mechanisms to disparity
We assume that the sampling is process is innocuous with respect to the outcome:
In words, the conditional distribution of the outcome given that persons are eligible and have covariate values
The pre-stage
In the first stage
In the second stage
Time Zero
Time zero indicates the temporal anchor during calendar time for the start of follow-up for the outcomes
Follow-up and Outcome Ascertainment
We specify how outcomes are defined (e.g., incident or prevalent), what constructs are considered, how they are measured, and for how long they will be assessed. These details add precision that can aid future interventional work or policy actions to reduce disparity. For example, if our enrollment window is indexed around an incident diagnosis of hypertension, resolving disparities early on may require a focus on addressing patient knowledge, awareness, and structures that prevent adherence to a healthy diet and regular physical activity. Resolving disparities five years post-onset also involves supports to improve medication adherence, enable home-based blood-pressure monitoring, and resources and protocols to facilitate timely and appropriate treatment intensification by clinicians for patients with uncontrolled hypertension.
Statistical Analysis
Last, we need to specify how the data will be analyzed. We choose the scale (e.g., additive or ratio) and coding of the outcome (shortfall [e.g., uncontrolled hypertension] or gain [controlled hypertension]) for reporting disparity. For repeatedly measured outcomes or time-to-event outcomes, we also choose whether to present measures indexed at the end of follow-up (i.e.,
If there are multiple target studies across calendar times k, we can always present trends in disparity or group-specific outcomes across calendar time k. We may also provide a summary measure
A person may be eligible many times (e.g., they may have prior hypertension at multiple visits). Of course, under certain eligibility criteria (e.g., recent onset of hypertension) a person may only be eligible at one point in calendar time. When persons enroll in multiple studies over calendar time, this leads to correlated outcomes which can be addressed by using a stratified cluster bootstrap (Davison and Hinkley 1997; Field and Welsh 2007; Ren et al. 2010; Huang 2018) to obtain confidence intervals.
Extension of the Target Study to Address Nonrandom Sample Selection
Overview
When investigators wish to include all contributions of non-random sample selection to disparity, the target study described in the section “A Target Study Conceptual Model for Measuring Disparity” is sufficient. To address non-random sample selection, we introduce sampling strategies that allow data from the eligible population described in the previous section, denoted by
In addition to the innocuous sampling assumption (4) and variants of the overlap assumption (3), each modified sampling design relies on independence (or exchangeability) assumptions and positivity assumptions. In each design, these additional assumptions may partly depend on a set of non-allowable covariates
Designs 2 and 3 operate under the same minimal structural constraints as Design 1. 14 The sampling strategy for Design 4, invoking counterfactuals, has more constraints which we discuss later. Aside from the sampling plan and aggregation over calendar time, the other elements are unchanged from Design 1.
Design 2: Sampling as if from a Broader Population (Generalizability)
As in Figure 1b, suppose that the indicator of full eligibility

Venn diagrams depicting populations eligible and inferred to under Designs 2, 3, and 4 using partial eligibility indicators (labeled
The design permits inference to the broader population under an independence assumption:
For each social group
Design 3: Sampling as if from a Different Population (Transportability)
Suppose again that full eligibility
The design permits inference to the different population under the independence assumption (11) which, again, would hold in Figure 1b if
For each social group
Design 4: Sampling as if from a Counterfactual Population (Inference in a Selected Population)
As in Figure 1c, now we express full eligibility
Example Interventions to Eliminate Forms of Collider-Stratification Under Design 4.
As explained at the end of the subsection “Design 4: Sampling as if from a Counterfactual Population (Inference in a Selected Population)” and in Footnote 22, when the non-allowables
Table 1 specifies interventions for

Example data structure to emulate the target study
Modified Statistical Analysis
In the subsections “Overview” and “Statistical Analysis”, we discussed procedures to aggregate results over calendar time use the distribution in the standard population implied by the design. This is the broader population under Design 2, the different population under Design 3, and the counterfactual population under Design 4. 23
Emulation of the Target Study with Secondary Data
Overview
In theory, the target study protocol could be implemented in real life to measure disparity. Often, a target study will have to be emulated through the design and analysis of secondary data. We outline data structures and estimators to emulate the target study under our motivating example of assessing racial disparity in hypertension control in a healthcare system among those with prior hypertension, established care and a current visit who are (a) enrolled in the EPPP (Design 1); (b) may or may not be enrolled in EPPP (Design 2); not enrolled in the EPPP (Design 3); enrolled in the EPPP under a hypothetical allocation of EPPP (Design 4). These applications are plausible when we only have outcomes
Example Target Study Protocol Specification and its Emulation With Secondary Data.
Abbreviations: EMR = Electronic Medical Records; EPPP = Electronic Patient Portal Program; ESKD = End Stage Kidney Disease; SES = Socioeconomic Status; Hg = Mercury.
Sex as recorded in the EMR.
In the EMR, SES is approximated by health insurance type and categorized CDC Social Vulnerability Index.
We present two types of estimators that, given the appropriate data structure, are used to emulate the sampling-based enrollment and aggregation. G-computation (Snowden, Rose, and Mortimer 2011), akin to model-based standardization, sequentially regresses the outcome and predicted values. Weighting (Hernán and Robins 2006), which takes a weighted average of the outcome, models membership in the social group
Data Structure
To emulate a target study, we specify a unit of calendar time for enrollment windows (e.g., months), enrollment groups R (e.g., Black persons [marginalized
We form a ‘long’ dataset where every row is the vector
Identification and Estimation for Design 1 (Default Model)
Under overlap (3) and innocuous sampling (4), we can identify the aggregated mean
To estimate (20) by G-computation, in step 1 we fit a model
We may also estimate the aggregated mean outcome
The first term of the weight is the ratio of (i) the probability of belonging to the standard population
Identification and Estimation for Design 2 (Generalizability)
Under a version of overlap (3) (see footnote 16), innocuous sampling (4), independence (11), and positivity (12), we can identify the aggregated mean where and
To estimate (22) by G-computation, in step 1 we fit a model
We may also estimate the aggregated mean outcome
The second and third terms are similar to (21) but are defined by the broader
Identification and Estimation for Design 3 (Transportability)
Under a version of overlap (3) (see footnote 17), innocuous sampling (4), independence (11), and positivity (15), we identify the aggregated mean where and
To estimate (24) by G-computation, in step 1 we fit a model
We may also estimate the aggregated mean outcome
The fourth and fifth terms of the weight are similar to those used in (21), except that they are among the different population (
Identification and Estimation for Design 4 (Inference in a Counterfactual Selected Population)
Identification and estimation under Design 4 uses special weights where and
In (26) and (27)
The weights

Single World Intervention Graph (Richardson and Robins 2013) depicting Design 4a (a) without intervention (b) with intervention
Under a version of overlap (3) (see footnote 21), innocuous sampling (4), exchangeability (18), positivity (12), and consistency, we identify the aggregated mean
To estimate (28) by G-computation, it suffices to follow the same procedure as for Design 2 (see the subsection “Identification and Estimation for Design 1 (Default Model)”) with a slight change, to weight the model in step 3 by
We may also estimate the aggregated mean outcome
Identification and Estimation of Design 4 Under Simplifying Conditions
Emulation of Design 4 simplifies greatly with no indicators of partial eligibility
Under Design 4b, (i.e., randomly assign
The weighting estimator reduces to:
Then, Design 4b addresses non-random selection while retaining social group differences in eligibility. Note that if the chosen allowables
Under Design 4c (i.e., randomly assign
The weighting estimator reduces to:
Estimation under Design 4c is thus similar to that of Design 4b except with
Contributions and Comparison to Existing Literature
Target Trial Emulation
Our target study model is inspired by the trial emulation framework (Hernán and Robins 2016) used to evaluate causal effects of treatment strategies. Our work differs in that (i) there are no treatment groups, only observed social groups; (ii) we balance allowable covariates by design (sampling) rather than balance confounders by intervention (randomization); (iii) under multiple studies across calendar time, we enforce one person per unit of time, all social groups compared must be represented at each unit of time, and our aggregated estimators adjust for calendar time enrollment by balancing it across groups, without any assumption that disparity is homogeneous over time. Our summary estimate is interpretable as a weighted average of disparity measures for populations indexed at different points in calendar time, and it avoids potential bias due to differential timing of enrollment by social groups. Our model can incorporate treatment strategies and within-group randomization post sampling to evaluate causal effects of treatment strategies on disparity (Jackson et al. 2024).
Sampling as a Conceptual Model
Tipton (2013), Westreich et al. (2017), and Dahabreh et al. (2021), use sampling designs of target populations to generalize or transport results from randomized trials. VanderWeele (2020) proposes simple random sampling to define causal effects of observing social groups to measure inequality. Lundberg (2022) uses simple random sampling to critique inference about causal effects on inequality in a superpopulation. Moreno-Betancur (2021), in a commentary on Jackson (2021), mentions enrollment in a target trial with balanced allowables to motivate causal inference with somewhat vague interventions. Our model uses stratified sampling of an eligible population to construct a sample with desirable properties (i.e., distributions of covariates that reflect populations of interest to address non-random sample selection, balance of allowable covariates across social groups) to measure disparity. Our formal presentation discusses the sampling designs and emulation procedures in considerable detail.
Alternative Conceptual Models of Disparity
Other conceptual models that employ allowability are widely used to define disparity in healthcare or used to audit or improve algorithmic fairness. We outline these models and then compare them to our model. We ignore issues of non-random sample selection to focus on core differences between the models. Each of these models, including our own, presumes that the constructs measured by allowables have the same meaning for each social group (i.e., that there is no differential interpretation or measurement error).
Cook et al. (2009) frame disparity as the difference in healthcare utilization outcomes unexplained by differences in the allowable covariates, a generalization of the Oaxaca-Blinder Decomposition (Blinder 1973; Oaxaca 1973) originally used to measure labor discrimination.
26
Under this interpretation, they define an IOM concordant disparity that compares outcomes between an observed marginalized group
Duan et al. (2008) agree with the decomposition perspective but argue that Cook et al.’s (2009) criteria and ‘Rank and Replace’ procedure lead to implausible populations not relevant for policymaking. With a factual marginalized group, a decomposition involves an intervention to assign the privileged group the allowable

Directed acyclic graphs depicting causal relations between historical processes H, social group R, a set of allowable covariates
Many authors (Pearl 2001; Zhang, Wu, and Wu 2016; Kilbertus et al. 2017; Nabi and Shpitser 2018; Zhang and Bareinboim 2018; Chiappa 2019; Weinberger 2022) define discrimination as a direct effect of assigning social group membership R (or its perception) on an outcome D made by a decider.
27
The direct effect does not occur through allowable covariates
Our model defines disparity by comparing groups who are (distributionally) similarly situated (i.e., balanced) on the allowables
Our default model, Design 1, assumes overlap (3) of allowables
Selection Bias (Including Generalizability and Transportability)
Nonrandom sample selection can bias the causal effect of assigning perceived group membership (e.g., a measure of discrimination) (Greiner and Rubin 2011; Malinsky, Shpitser, and Richardson 2019; Knox, Lowe, and Mummolo 2020; Gaebler et al. 2022; Stensrud et al. 2022). It also impacts a descriptive measure of inequality (VanderWeele and Robinson 2014). We underscored that nonrandomly selected populations can be inherently meaningful and that collider stratification may affect baseline covariates to further disadvantage a marginalized group on outcomes, aligning with the Healthy People 2020 and NIMHD definitions of disparity (see the subsections “Defining Disparity” and “Non-Random Sample Selection”). We provided Design 1 for use in these settings.
We argued that non-random sample selection may be addressed whenever (a) it limits the ability to infer to a population of interest or (b) masks disparity through collider-stratification. We provided Designs 2 and 3 for use in setting (a), and Design 4 for use in setting (b). Design 4 may be contrasted with Design 1 to understand how much non-random sample selection impacts disparity. Designs 2 and 3 use similar assumptions used to generalize or transport descriptive measures and the results of randomized trials (Degtiar and Rose 2023). When there are no allowables specified and the combined sample [collapsed over social group R ] is the standard population, and there is only one study (i.e., indexed at a single moment of calendar time), then the identifying expressions of Designs 2 and 3 reduce to those of Bareinboim, Tian, and Pearl (2014), the weighting estimators for Design 2 reduce to inverse selection weights of Cole and Stuart (2010), the weighting estimators for Design 3 reduce to the inverse odds sampling weights of Westreich et al. (2017), and the G-computation estimators for Design 2 reduce to those of Lesko et al. (2017) and also Dahabreh et al. (2020).
Design 4 is a novel approach to address non-random sample selection. It envisions an intervention to allocate certain (i.e., not necessarily all) eligibility-related variables, even when other eligibility-related variables (not intervened upon) are affected by the intervention. Dahabreh et al. (2019) envision an intervention to scale up trial participation, the last step in study enrollment. The exchangeability assumption of Design 4, like Dahabreh et al. (2019), holds when eligibility-related variables or study participation affect the outcome. The independence assumption of Designs 2 and 3 used to generalize or transport does not. Estimation under Design 4 simplifies greatly when the intervention point is the last step in enrollment (e.g., an intervention to allocate study participation).
We focused on non-random sample selection. Missing data, loss to follow-up, and competing risks during follow-up may bias a disparity estimate (Howe and Robinson 2018). These may be addressed in the statistical analysis of the target study and its emulation. Events (e.g., death, hospital discharge for study in inpatient disparity in prognosis) before enrollment may affect who (non-randomly) selects into the study sample (Rojas-Saunero, Glymour, and Mayeda 2023). If survivors at enrollment (i.e., those without the event) are not considered to be a meaningful population, an extension of Design 4 with a sustained intervention may help if the event is manipulable, but such an extension is not developed here and is saved for future work.
Discussion
We have proposed a conceptual model for measuring disparity and a framework to emulate it with secondary data. Through a sampling plan, the model similarly situates social groups on allowable covariates at baseline to map to meaningful definitions of disparity (Design 1). The model extends to address non-random sample selection in various ways to permit generalizability (Design 2), transportability (Design 3), or inference in a selected population without inducing undesirable forms of collider-stratification (Design 4). We motivated Design 4 for when collider stratification due to non-random sample selection attenuates disparity, but it is difficult to know when this will occur (Nguyen, Dafoe, and Ogburn 2019). Investigators may emulate both Designs 1 and 4 to report how selective mechanisms may contribute to or attenuate disparity. Unlike existing models used to measure disparity, our model involves no intervention to assign social group membership and no intervention to manipulate the allowable covariates. Only under Design 4 does the model intervene on variables used to establish eligibility. Under Designs 1‒3, the model can recover the crude expected outcomes of a social group (e.g., the marginalized group) by using it to determine the standard distribution of the sampling plan. This is also true of Designs 4a and 4c under simplifying conditions (see the subsection “Identification and Estimation of Design 4 Under Simplifying Conditions”). We have described data structures and provided weighting and G-computation estimators to emulate the model in complex data. Our model and emulation procedures avoid bias due to differential enrollment over calendar time without invoking any assumption that disparity is homogeneous over time. The summary estimates are weighted averages of estimates for populations at points in calendar time. In the Supplementary Material, we provide sample code, a data application to electronic medical records, and proof of all results.
Our model has translational value for advancing public health and clinical medicine. First, it relies on minimal assumptions and is therefore a practical measure for advancing social justice. Second, its features accommodate specific populations during critical life stages: eligibility, time zero, follow-up, and outcome definition, aspects which actual interventions must consider in practice. Third, it maps to definitions of disparity that have strong moral foundations and have long guided public health action. Fourth, it can be extended to evaluate causal effects of (i) hypothetical interventions to inform future interventions (Jackson 2021) and (ii) actual interventions in (non)-randomized trials (Jackson et al. 2024).
Our model also has conceptual value. Our Designs 1‒3 are grounded in the observed world. They pick up the realized effects of unjust mechanisms as they operate in this world. Mechanisms of injustice are exquisitely complex, inter-dependent, mutually constituted, and dynamically reinforcing (Reskin 2012). Causal approaches that leverage observational data assume that the way outcomes are conditionally distributed in the factual world will be unchanged in the counterfactual world created by hypothetical interventions, ignoring this complexity (Jackson and Arah 2020). Our Designs 1‒3 capture the impact of this complexity as observed without specifying how this complexity works or assuming it away. Our Design 4 invokes a consistency assumption, though, and is subject to this limitation.
Supplemental Material
sj-pdf-1-smr-10.1177_00491241251314037 - Supplemental material for The Target Study: A Conceptual Model and Framework for Measuring Disparity
Supplemental material, sj-pdf-1-smr-10.1177_00491241251314037 for The Target Study: A Conceptual Model and Framework for Measuring Disparity by John W. Jackson, Yea-Jen Hsu, Raquel C. Greer, Romsai T. Boonyasai and Chanelle J. Howe in Sociological Methods & Research
Footnotes
Author Contributions
Dr. Jackson conceived of the work, developed the formal results, carried out the data application, and drafted the initial and revised manuscripts. Dr. Hsu constructed the analytic cohort for the data application. Drs. Jackson, Hsu, Greer, and Boonyasai oversaw the construction of the analytic cohort and data application. Drs. Hsu, Greer, Boonyasai, and Howe critically edited the initial and revised manuscripts for scientific content.
Authors' Note
The data application code and sample code to implement all estimators are available at: https://osf.io/ta7vw/ (Open Science Framework) and
(GitHub).
Declaration of Conflicting Interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: This work does not necessarily represent the views or opinions of the Agency for Healthcare Research and Quality. Dr. Howe has received funding via a grant from Sanofi Pasteur administered directly to Brown University (unrelated to the current work).
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Dr. Jackson was supported by a grant from the National Heart, Lung, and Blood Institute (K01HL145320).
Data Availability Statement
This manuscript used data from electronic patient medical records of patients seen within a large healthcare system. To protect patient privacy and to comply with HIPAA, we are unable to share or post the data with third parties for re-analysis.
Supplemental Material
Supplemental material (i.e., data application and proofs) for this article is available online.
Notes
Author Biographies
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
