Abstract
Purpose
The data-generating mechanisms underlying health care data are infrequently considered, leading to inequitable equilibria being reinforced throughout the care continuum. As race-based criteria are reassessed, including in chronic kidney disease, the effect of those criteria on patterns of disease progression should also be reevaluated. We proposed a microsimulation model for attenuating societal bias in primary care chronic kidney disease data to study this.
Methods
We developed a continuous-time, discrete-event individual-level simulation model of kidney function decline, measured by estimated glomerular filtration rate (eGFR). The model simulates individual eGFR trajectories over time and enables generating counterfactual outcome distributions that would have been observed in the absence of race-based diagnosis and treatment criteria. eGFR decline is accelerated by hypertension, diabetes, and reaching chronic kidney disease stage 3a and can be delayed by interventions, which are applied based on eGFR level, measured with or without an adjustment for Black race. A Bayesian calibration procedure was applied to identify rates of eGFR decline corresponding to stage distributions in the cohort.
Results
Under the counterfactual scenario without a race adjustment, Black individuals qualify for diagnosis earlier, and non-Black individuals later, than under the reference scenario with race adjustment. The difference was largest for earlier stages and smaller at each consecutive stage. We do not observe differences in life expectancy between the 2 scenarios.
Limitations
Large variability in the prevalence of treatment and heterogeneity in treatment effectiveness may affect our results.
Conclusions
Beyond estimating the clinical consequences of the eGFR equation change, our work offers an alternative to previously proposed data-debiasing approaches. The simulated data can be used to inform future interventions and policy decisions.
Highlights
We developed a microsimulation model of chronic kidney disease progression with primary care data that reflect the effect of removing race-based diagnostic and treatment criteria.
The removal of race-based diagnostic criteria in our simulations changed the timing of qualification for chronic kidney disease diagnosis, ranging from 0.6 y to 9.6 y, with opposite effects for Black and non-Black patients.
The simulated differences in expected survival after removing the race adjustment did not exceed 2 mo among individuals who developed chronic kidney disease.
The explicit representation of the data-generation process can help anticipate the effect that policy changes can have on clinical data distributions.
Primary care plays a central role in the management of chronic disease and has the potential to address factors early in disease progression that have downstream health effects. However, there are substantial disparities in the health system that contribute to persistently worse health outcomes for minoritized groups.1,2 New, large primary care datasets may better inform interventions that ameliorate health inequities as well as statistical tools that encourage earlier diagnosis and treatment for patients who have experienced delayed care. However, these data reflect the existing inequitable equilibrium of the health care system, as they are encoded with societal biases, including racism, race-based treatment criteria, access disparities, and unmeasured differential exposure to social factors.3–5 This can lead to lower-quality evidence for groups underrepresented in health records due to access barriers and incorrect statistical inferences when confounders, such as social exposures and race-associated undertreatment patterns, are not appropriately accounted for. Hence, there is a risk that using these data as they are to build statistical tools will further perpetuate biases. 6 In addition, health care data rarely incorporate information on social drivers of health, social mechanisms and structures beyond the health care system that affect health, and are among the most important contributors to health inequities. 7
Approaches for transforming data to mitigate societal bias, referred to elsewhere as data debiasing, have previously been proposed in the algorithmic fairness literature.8–10 These approaches assume that the data encode a form of societal bias, which arises from a socially biased data-generation process, measurement error, or unmeasured confounders. They include relabeling, resampling or reweighing data, or generating intermediate data representations where some of the information and correlation structure is removed. These methods typically try to change data as if the process generating the data was different but do not usually formally define or explicitly model the change. They also do not typically incorporate social drivers of health, which contribute to the data-generation process. 11
In this work, we develop a microsimulation model for chronic disease trajectories in primary care data. Our goal is to estimate downstream clinical consequences of a diagnostic algorithmic change as well as provide a debiased dataset that can be used for statistical analysis. Utilizing a microsimulation modeling approach allows us to mechanistically define the data-generating process and generate new values under clearly defined changes to the data-generating process.
We study chronic kidney disease (CKD), a heterogeneous, progressive condition affecting 1 in 7 Americans. 12 Early diagnosis and treatment are crucial for maintaining health and preventing irreversible damage. 13 The condition is classified into 6 stages corresponding to an increasing degree of kidney damage. Stages 1 and 2 are often asymptomatic, and diagnosis requires urine tests for albuminuria, while the remaining stages (3a, 3b, 4, and 5) can be identified using only the estimated glomerular filtration rate (eGFR), which corresponds to the percentage of remaining kidney function. 14 Appropriate management of CKD differs depending on disease etiology, comorbidities, and progression speed. 15 In stage 5, also referred to as end-stage renal disease (ESRD), the only treatment options are dialysis or kidney transplant. Timely primary and specialist care has been associated with reductions in the yearly rate of eGFR decline and reduced mortality.16–23 In the United States, CKD is more prevalent among racial and ethnic minorities than in White patients. 24 Black patients have higher rates of ESRD and faster progression through CKD stages as compared with White patients, despite similar rates of CKD diagnosis between the 2 groups. 25 A range of social and structural factors contributes to those inequities,26–28 including adverse environmental exposures and neighborhood conditions as well as suboptimal care patterns.29,30
Race-adjusted formulas for estimating eGFR have been used for diagnosis and treatment decisions for decades.31–37 This race adjustment for Black patients has faced significant criticism for lack of clear biological justification and perpetuating racial bias.7,39 The implementation of race adjustment in the eGFR formula likely contributed to delayed CKD diagnosis and treatment for Black patients as well as faster disease progression and higher mortality since it overestimated their eGFR, assigning them to less severe CKD categories.25,35,39–42 In 2021, a new eGFR equation without race adjustment was proposed, with uptake by the majority of US labs by 2023.40,43,44 It has been hypothesized that using the 2021 formula may reduce delays in the treatment of Black patients by encouraging earlier initiation of stage-specific treatment and care. 40 However, given the slowly progressing nature of CKD, the downstream clinical consequences of the equation change are not yet clear. 45
We use our microsimulation model to simulate CKD trajectories. These trajectories correspond to primary care data that may have been observed under a more equitable data-generation process if the current eGFR criteria (without race adjustment) had been in effect since 2017. We will explicitly account for changes in the timing of diagnosis and stage assignment as well as changes in CKD progression and mortality resulting from updating a 2009 formula to the 2021 formula, which leads to changes in stage-specific interventions. In addition, we explicitly model social drivers of health in the data-generating process.
Methods
Data
American Family Cohort
Our primary data source is the American Family Cohort (AFC), which is a research version of a Center for Medicare and Medicaid Services–certified clinical registry and the largest primary care registry in the United States. 46 The AFC includes clinical, social, and demographic information for more than 7 million individuals and 1,300 practices, representing all 50 states. The dataset has a high representation of underserved (e.g., racial and ethnic minority, rural, and low-income) populations and includes individuals insured through Medicare and Medicaid as well as privately. We used these primary care registry data to characterize stages of CKD progression, including among undiagnosed patients.
Cohort definition
We defined a cohort of adult patients (i.e., age 18 y or older) for whom CKD progression can be observed in the AFC dataset between January 1, 2017, and December 31, 2017. Using standard codes, 47 we extracted variables corresponding to age, binary recorded sex, serum creatinine measurements, diagnoses of CKD, diabetes, hypertension, and acute events known to affect creatinine levels (e.g., acute kidney injury, volume depletion, critical illness). Exclusion criteria were applied to remove those observed for less than 1 y after the inclusion date and those missing binary sex information. Extreme creatinine measurements greater than 73.8 and less than 0 likely corresponded to other tests and were removed. 48 Creatinine measurements captured within 30 d of acute events were also excluded, as they may not have been indicative of overall kidney health. 47 We used the first available creatinine measurement to calculate eGFR values and subsequently classified individuals into CKD stages (eGFR ≥ 90: stage 1, 60–89: stage 2, 45–59: stage 3a, 30–44: stage 3b, 15–29: stage 4, ≤15: stage 5). 14 Given the underutilization of urine tests necessary for establishing albuminuria status, our analysis depends solely on eGFR-defined staging. eGFR values less than 5 were removed as they were unlikely to have been captured in a clinic. In addition, we extracted census tracts corresponding to patient home addresses as well as recorded race and ethnicity.
Social drivers of health
In addition to the AFC dataset, we considered 2 census tract–level indices of social deprivation and vulnerability: the Index of Concentration at the Extremes (ICE) and the Social Deprivation Index (SDI).49,50 These indices were generated using the 2020 American Community Survey data 51 and assigned to individual patients based on census tracts. Individuals missing census tract information were excluded from the calculation of ICE and SDI calibration targets. The indices were selected to capture relationships between social factors known to affect the progression of CKD, based on prior literature.11,26–28,52 The ICE is a metric expressing concentrated extremes of both privilege and deprivation. Three types of ICE are available: income inequality, racial composition, or combined income and race. We used the latter, which jointly measures economic and racial segregation. For a given geographic area and population, it compares the fraction of non-Hispanic Whites who are above the 80th percentile of income nationally with the fraction of non-White minorities whose income is below the 20th percentile. The SDI was developed to identify areas with unmet health care access needs for additional resource allocation. It is based on data regarding education, employment, family composition, housing quality, income, and transportation. We mapped the values of both indices into 3 quantiles based on their distribution in the AFC dataset.
Model
There are 2 primary simulation modeling approaches for CKD.53,54 The first creates discrete-time transitions through CKD disease states, defined by transition probabilities or risk equations.55,56 The second considers continuous, linear eGFR decline, with decline rates typically sampled from predefined distributions.57,58 Both approaches allow for modeling changes in progression associated with time-dependent changes to diagnosed comorbidities, CKD diagnosis status, and interventions, such as particular treatments. However, because CKD disease states are defined by eGFR values in clinical practice, directly modeling eGFR decline is, in principle, clinically better motivated than discrete stage modeling.
We developed a continuous-time microsimulation model of eGFR decline to simulate a hypothetical cohort representing the AFC cohort, based on past studies, and data on social drivers. Model parameters were calibrated to reflect the CKD stage distributions in the AFC cohort conditional on sex, diabetes, hypertension, ICE quantiles, or SDI quantiles. The model allows for changes in progression associated with time-dependent changes to diagnosed comorbidities, CKD diagnosis status, and interventions, such as particular treatments. This process is represented by the conceptual flowchart in Figure 1. The model simulates individual eGFR trajectories over time, from initiation age of 30 y until death. eGFR decline is accelerated by hypertension, diabetes, and reaching CKD stage 3a. It can be delayed by interventions, which are applied according to a patient’s eGFR level, as measured by a particular eGFR formula.

Conceptual flowchart representing the microsimulation model construction and the process of data simulation.
The model was then used to simulate the cohort under 2 scenarios: 1) reference, which corresponds to the setting under which the AFC data were collected when the race-adjusted eGFR formula would have been used and 2) counterfactual, which reflects changes in time of treatment initiation following the switch to the 2021 CKD-EPI Creatinine-based eGFR equation (eGFR21) without race adjustment. While under the reference scenario, clinicians may have used 1 of several race-adjusted eGFR formulas. We assumed the uniform use of the 2009 CKD-EPI creatinine equation (eGFR09) for simplicity. 35 All simulated individuals faced mortality risk specific to their age, sex, diabetes status, and eGFR level. The eGFR in the model corresponds to eGFR21, following current recommendations,40,59 and allows for ease of interpretation of model outputs by practitioners. The eGFR equations are included below. Additional details about parameter sources and modeling assumptions are included in the Parameters Supplement.
The eGFR21 40 and eGFR09 35 with serum creatinine (Scr) are:
Trajectory simulation occurs in 6 steps, as shown in Figure 2. Rates of eGFR decline are conditional on individual-level covariates: progression to moderate or advanced CKD (stage 3a or above), incidence of diabetes and hypertension (see Table S3), and treatment status. Prior mean values of the decline rates were derived from previous analyses using NHANES data57,58,60 and assumed the absence of albuminuria. Ages of diabetes of hypertension incidence were modeled using piecewise exponential frailty models, based on national incidence statistics grouped by age.61,62 The onset of hypertension additionally depended on sex. It was assumed that the timing of onset for both conditions was independent of one another.

Estimated glomerular filtration (eGFR) rate trajectory construction flowchart. CKD, chronic kidney disease.
We considered 2 interventions following a CKD diagnosis: enhanced comorbidity management and nephrology management. The model assumed that interventions can be assigned only starting at CKD stage 3a, with assignment probabilities increasing in more advanced stages and that each individual assigned an intervention experienced the same reduction in the eGFR progression rate (Supplementary Table S4). Interventions were applied the moment a patient’s eGFR crossed into a new stage and immediately resulted in reducing the speed of eGFR decline. The expected age of death was calculated from a piecewise exponential hazard function obtained from age- and sex-specific life tables in 2019. 63 These values were additionally adjusted with eGFR- and diabetes-specific hazard ratios. 64 Further details of the model are included in the Model Supplement. We additionally considered the sensitivity of model outputs to changes in intervention frequency and effectiveness through sensitivity analysis, described in detail in the Sensitivity Analysis Supplement.
Calibration
Rates of eGFR decline conditional on diabetes, hypertension, and CKD stage could not be directly estimated from the data. To obtain them, we instead used a Bayesian calibration procedure using calibration targets derived from the AFC dataset, as illustrated in Figure 3. The targets reflect age-specific distributions of CKD stages by sex, diabetes, hypertension status, ICE quantiles, and SDI quantiles (Parameters Supplement).

Calibration procedure. (A) A parameter set–level log likelihood calculated by simulating disease trajectories for MċN individuals across M sampled cohorts comparing their summaries to calibration targets using multinomial loss. (B) Posterior of decline parameters calculated by sampling R parameter sets from the prior defined in Supplementary Table S3, calculating parameter set–level log likelihoods following simulation and using the sample importance resampling (SIR) procedure to weigh the R parameters based on their log likelihoods to obtain a posterior.
For all calibrated parameters, we defined truncated univariate normal prior distributions to exclude eGFR slopes indicating improvement over time, based on existing evidence, theory, and plausibility (Supplementary Table S3). We applied a standard deviation corresponding to the coefficient of variation of 0.308 for sampling parameters. This coefficient corresponds to a standard deviation of 0.20 in the rate of progression among healthy individuals and captures the range of yearly rates of progression among healthy individuals reported in past literature. 57 For combinations of covariates not previously reported (e.g., co-occurrence of diabetes and hypertension), we used the higher mean prior values corresponding to either one of the conditions occurring and applied a higher coefficient of variation (0.461) to indicate a lower level of confidence in the priors. We further adjusted truncated normal priors based on regression analysis to achieve coverage of calibration targets.
We sampled
Model input parameter uncertainty for all outcome measures was accounted for by randomly sampling from the joint posterior distribution obtained from Bayesian calibration using the sample importance resampling algorithm.
66
The posterior distribution was represented by a subset of sampled parameter sets with importance weights. We used 1,000 parameter sets sampled from the posterior distribution to generate all primary outcomes for all scenarios and policies with 95% posterior model prediction intervals for each outcome from the 2.5th and 97.5th percentiles of the projected values. Once the posterior distribution was identified, we recalculated eGFR trajectories for all
Results
Data Summaries
We extracted a cohort of 733,337 individuals from the AFC dataset, described in Table 1. A cohort extraction flowchart also appears in the Figures and Tables Supplement. The cohort had a mean age of 60 y and was 44% male. At inclusion, 8% of individuals had a CKD diagnosis code. This is lower than the national age-adjusted prevalence of 21% but consistent with a high degree of underdiagnosis of CKD.12,67,68 In addition, 25% of individuals had a diabetes diagnosis and 60% had a hypertension diagnosis, similar to national prevalence values. Our cohort had 88% of individuals with an eGFR value at or above 60, corresponding to no CKD or stages 1 and 2.61,62 Only 6% of our cohort was Black or African American, with 79% White individuals. Of note, 12% of the cohort were missing race, 27% were missing ethnicity information, and 15% had missing census tract information. For the social indices, ICE and SDI, we observed a health gradient, in which indices indicating higher levels of deprivation were associated with a higher prevalence of diabetes, hypertension, and CKD. For instance, the prevalence of diabetes ranged from 19% to 31% in the least and most deprived ICE quantile, respectively.
American Family Cohort Data Summary
Model Calibration
Our calibration procedure generated a single best-fitting parameter set, which we refer to as the mean posterior. The inclusion of ICE and SDI calibration targets did not affect the value of the mean posterior. Figure 4 shows the value of the mean posterior compared with the mean prior slope parameters as well as the distribution of sampled parameters. The mean baseline rate of decline among healthy individuals was 0.68 mL/min/1.73 m2, 5% higher than that in the prior, and increased by 13% after reaching CKD stage 3a (compared with no change in the prior). Decline prior to CKD stage 3a was elevated 1% by comorbid diabetes, 15% by hypertension, and 159% by a combination of both (compared with 69%, 11%, and 69% increase in the prior). Decline after reaching CKD stage 3a was elevated 152% by comorbid diabetes, 24% by hypertension, and 163% by a combination of both (compared with 331%, 115%, and 331% increase in the prior).

Distribution of sampled parameters (blue), with mean prior56 (in black) and posterior (in red) values marked. Outlier values are not shown. Healthy corresponds to individuals in chronic kidney disease (CKD) stages 1 and 2 or without CKD, who also do not have diabetes or hypertension.
We examined the distribution of individuals across CKD stages and ages stratified by sex (Supplementary Figure S6), diabetes (Supplementary Figure S7), and hypertension (Supplementary Figure S8) for both simulation scenarios, comparing the prevalence observed in our AFC cohort corresponding to calibration targets. Both simulation scenarios generated highly similar, overlapping results. Prevalence was closely matched to that in the AFC cohort in CKD stages 1 and 2 for the sex strata as well as for individuals with diabetes or hypertension and less closely matched for those without diabetes or hypertension. Results were more imprecise at later ages and later stages, where group sizes were small, in particular, lower prevalence at later ages in stages 3a and 3b and higher in stages 4 and 5 in our simulations.
Simulation Results
We compared the mean life expectancy at age 30 y under our 2 simulated scenarios, separately considering groups stratified by sex, race, and CKD status, and included the results in Table 2. Under the model assumptions, Black individuals would be expected to survive longer and non-Black individuals shorter under the counterfactual scenario compared with the reference. The magnitude of differences was more pronounced for non-Black individuals, although differences did not surpass 2 mo for any group. The sensitivity analysis revealed that even among those with CKD, under immediate and uniform diagnosis starting at stage 3a and increased treatment effectiveness, differences in life expectancy would not exceed 4.2 mo.
Mean Additional Life Expectancy (in Months) under the Counterfactual Scenario Compared with the Reference Scenario, for Individuals in the Simulated Population. CKD, chronic kidney disease
In our main results, we compared the earliest times at which simulated individuals would qualify for a diagnosis at each CKD stage for the 2 scenarios (Figure 5). Under the counterfactual scenario with eGFR21, Black individuals would be eligible for diagnosis earlier, and non-Black individuals later, compared with the reference eGFR09 scenario. The difference was largest for earlier stages and smaller at each consecutive CKD stage. For example, under the counterfactual, the earliest diagnosis into stage 2 would on average be 9.6 and 9.1 y earlier for Black women and Black men but 4.4 and 4.8 y later for non-Black women and non-Black men. However, the earliest diagnosis into stage 5 would, on average, be 0.7 and 0.6 y earlier for Black women and Black men but 1.1 y later for non-Black women and non-Black men. We also compared the difference in eGFR values that would qualify individuals into particular stages under the 2 scenarios. Under the counterfactual scenario, Black individuals would be eligible for diagnosis at higher values of eGFR with non-Black individuals at lower values than under the reference. Similar to the difference in diagnosis times, the differences in eGFR values between the 2 scenarios decreased at each consecutive stage.

Difference in time (years) and estimated glomerular filtration rate (eGFR; mL/min/1.73 m2) value at the earliest possible diagnosis to a given chronic kidney disease (CKD) stage under eGFR21–eGFR09. Negative values indicate earlier diagnosis (left) or lower value of eGFR during diagnosis (right) under eGFR21 compared with eGFR09. Outliers not shown.
Discussion
We studied the effect of the 2021 removal of race adjustment from the eGFR equation in order to understand its impact on clinical outcomes and generate debiased data simulations that reflect the effect of equation change on disease trajectories change. This involved creating a microsimulation model of CKD progression based on eGFR decline over time, calibrated to a cohort of patients in a large primary care dataset. Our model was able to reproduce stage distributions observed in the cohort, which reflected patterns of CKD progression and care informed by the 2009 CKD-EPI equation.
The model allows for generating counterfactual eGFR trajectories that reflect the use of the 2021 CKD-EPI equation through adjusting the timing of interventions based on the counterfactual eGFR levels. The trajectories simulated under the counterfactual scenario reflected earlier diagnoses for Black patients and later diagnoses for non-Black patients than those observed in the data. However, these changes led to differences in life expectancy not exceeding 2 mo among those with CKD. While these results were sensitive to assumptions on the rate of diagnosis and intervention effectiveness, the simulated effect among those with CKD did not exceed 4.2 mo even under assumptions of universal diagnosis and treatment initiated at stage 3a.
The simulated data could be used directly as inputs into predictive algorithms for a number of outcomes, including timing of CKD incidence, CKD progression speed, time of diagnosis initiation and nephrology referral, time of reaching ESRD, 69 and mortality. This goes beyond reclassification and can include effects of treatment in the short and long term. These outcomes can be easily defined due to the continuous disease trajectories. The model also allows for a flexible adaptation to other counterfactual scenarios, such as changes in diagnosis frequency or regional differences in the frequency of nephrology referrals. It can additionally be used to up-sample underrepresented populations, such as those residing in areas with lower access to nephrology care.70,71 Future work could explore realistic subsampling of measurement times to reflect a real-world practice of collecting discrete measurements as well as adding patterns of missingness.
While our results suggest that the removal of the race adjustment from the eGFR equation is likely to lead to notable changes in diagnosis eligibility in earlier stages, those changes correspond to small changes to life expectancy. As such, the change of the eGFR equation by itself is unlikely to reduce the burden of CKD among Black Americans and reduce disparities in CKD outcomes in the United States. Our sensitivity analysis suggests that effects would remain modest (not exceeding 4.2 mo of additional life expectancy) even under perfect guideline concordance regarding early diagnosis and treatment of CKD.
The differences in earliest possible diagnosis time in the 2 simulation scenarios followed the hypothesized direction from the prior literature, 40 with Black patients qualifying for diagnosis earlier and non-Black patients later than in the reference scenario. However, the actual time of diagnosis is likely to lag behind the earliest possible diagnosis time because it depends on the clinician’s decision to initiate diagnosis and requires 2 blood tests separated by at least 90 d to establish chronicity. 15 The difference is much higher in earlier CKD stages, where diagnoses are less frequent. In stage 2, where differences were largest, additional urine testing is needed to establish a diagnosis. The observed effect on the timing of diagnoses may therefore be smaller than reported and modified by factors related to the health system and health access. This is also suggested by a recent study at Stanford Health Care that demonstrated that adopting the new eGFR equation without race adjustment did not affect the rates of nephrology referrals and visits after 2 y. 45 Our results additionally point to potential adverse consequences of the change in the eGFR equation among non-Black patients, who could experience delayed care and slightly elevated mortality compared with the 2009 criteria. This is consistent with the evidence that the formula change on average leads to eGFR overestimation for non-Black patients and eGFR underestimation for Black patients.25,72,73
Rates of progression identified through the Bayesian calibration procedure differed from those previously derived from NHANES data.57,58 In particular, rates of progression following CKD stage 3a, while higher than those in earlier stages, did not increase as notably in our model as in NHANES data. This could potentially reflect a higher quality of CKD and comorbidity management among the AFC population compared with the national sample. The rates did not differ across area-level social deprivation indices, which might be explained by similarity between index-specific calibration targets.
Our work has several limitations. The AFC dataset included short observation periods for individuals, high variability in the frequency of creatinine observations, and data-coding errors common in electronic health data. Given the limited data on albuminuria available in the AFC dataset, we also did not include albuminuria status in our model, as CKD models often do. 53 In addition, our choice of an eGFR-based CKD progression model, considered better clinically motivated than discrete stage modeling, 54 made it possible for us to identify the timing of eGFR-based interventions and their counterfactuals more directly. However, it prevented us from using the complete set of intermittently observed data in the AFC dataset to inform our model, in ways that a stage-based model may have allowed. We assumed a uniform stage-conditional probability of diagnosis and nephrology treatment, although those differ across states, race and ethnicity, age, socioeconomic status, and insurance status.70,71,74,75 Future analyses could consider differences in rates of diabetes and hypertension incidence as well as CKD diagnosis and nephrology referrals across social deprivation index quantiles. Further, we assumed that interventions would be triggered immediately after crossing an eGFR threshold value. In practice, interventions would typically be initiated with some delay, based on the timing of clinician visits, would likely not be effective immediately, and would be subject to discontinuation by some patients. The set of interventions available for CKD patients is vast, and their matching to patient profiles is complex. Our consideration of 2 interventions limited the range of effects observed. We also assumed treatment can incur only benefits, so potential harms resulting from overtreatment are not reflected in our results. Prior literature reports a wide range of treatment effectiveness values, and our assumption of uniform effectiveness may have affected our results. Finally, we note that formulas without race adjustment that include cystatin C have reported smaller discrepancies between estimated and measured GFR for both groups than equations considered here. 40 Such equations have not seen a broad uptake due to cost-effectiveness concerns.
To our knowledge, our analysis is the first to explicitly model the consequences of the eGFR equation change on CKD progression. Beyond estimating its clinical consequences, our work points to the importance of anticipating the effect that policy changes can have on clinical data distributions, offering an alternative to previously proposed data debiasing that does not explicitly follow the data generating process.
Supplemental Material
sj-docx-1-mdm-10.1177_0272989X261432162 – Supplemental material for A Microsimulation-Based Approach for Mitigating Societal Bias in Chronic Kidney Disease Data
Supplemental material, sj-docx-1-mdm-10.1177_0272989X261432162 for A Microsimulation-Based Approach for Mitigating Societal Bias in Chronic Kidney Disease Data by Agata Foryciarz, Fernando Alarid-Escudero, Gabriela Basel, Marika M. Cusick, Robert L. Phillips, Andrew Bazemore, Alyce S. Adams and Sherri Rose in Medical Decision Making
Footnotes
Acknowledgements
We thank Malcolm Barrett, Oana Enache, and Sara Khor for their valuable insights and contributions to code review. The following acknowledgment text is included as described by the Stanford Center for Population Health Sciences Data Core (phsdocs.stanford.edu/v1.0/need-help/citing-phs-data-core): “Data for this project were accessed using the Stanford Center for Population Health Sciences Data Core. The PHS Data Core is supported by a National Institutes of Health National Center for Advancing Translational Science Clinical and Translational Science Award (UL1TR003142) and from Internal Stanford funding. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.”
The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Fernando Alarid-Escudero is a member of the Editorial Board of Medical Decision Making. The author did not take part in the peer review or decision-making process for this submission and has no further conflicts to declare. The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Financial support for this study was provided by the National Institutes of Health grant R01LM013989. The funding agreement ensured the authors’ independence in designing the study, interpreting the data, writing, and publishing the report.
Ethical Considerations
This study obtained approval from the Institutional Review Board at Stanford University.
Consent to Participate
Not applicable.
Consent for Publication
Not applicable.
ORCID iDs
Data Availability
The Python code and summary data to reproduce our results are available at github.com/StanfordHPDS/data_transformation. All analyses described in the article can be reproduced, with the exception of the generation of data summaries and calibration targets, which require access to the AFC dataset. The AFC dataset contains protected health information and cannot be shared publicly.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
