Abstract
Whenever treatment is scarce, the question of how to allocate resources arises. One option is to allocate based on conditional survival benefit, defined as the contrast between an individual’s expected survival with and without treatment. Estimating conditional survival benefit from observational data may present the following three challenges: (i) time-dependent confounding, arising when treatments are assigned based on longitudinal health markers which are in turn affected by past assignments; (ii) multiple time scales, namely time since becoming eligible for treatment, relevant for comparability of patients, and calendar time, representing when treatment decisions are made; and (iii) multiple versions of treatment, leading to multiple counterfactual outcomes with treatment. Building on previous work where cross-sections and inverse probability of treatment weighting were combined to estimate survival benefit on the treated population, we propose a strategy to dynamically estimate conditional survival benefit for all patients awaiting scarce treatment at any time treatment becomes available, accounting for multiple versions of treatment. After delineating identifiability assumptions, we show in a simulation study that our proposed method improves the estimation of conditional survival benefit compared to simpler methods. The proposed method is then applied to liver transplant data from the Eurotransplant region.
Keywords
Introduction
Whenever treatment resources are scarce, difficult choices have to be made on which patients get to receive treatment. The allocation of scarce intensive care beds during the COVID-19 pandemic offers a good example. A second example, which motivates the current work, is the allocation of livers from deceased organ donors. The scarcity of donor livers results in a waiting list of patients who are considered eligible for transplantation and in a need of a prioritization rule on who is offered a donor liver when one becomes available. Since 2006, the Model for End-stage Liver Disease (MELD) score has been used in the Eurotransplant region to prioritize patients. The score is calculated based on values of serum creatinine, bilirubin and international normalized ratio (INR) of the prothrombin time, which are routinely collected together with other markers to monitor the clinical status of the patients. The MELD score is a strong indicator of disease severity and waiting list mortality and helps keeping track, throughout time, of those who are at highest risk of pre-transplant death. However, this score does not take post-transplant mortality into account. Lately there has been increasing interest in basing organ allocation on the individual expected survival benefit, that is, the predicted number of life-years gained due to transplant for an individual, 1 with the United Kingdom recently implementing his approach to prioritize patients awaiting transplantation. 2 More precisely, survival benefit can be quantified as the difference between two conditional counterfactual survival outcomes: the restricted mean survival time when receiving the transplant minus the restricted mean survival time when not receiving it. Estimating the conditional expected survival benefit can help clinicians make informed decisions on prioritization of patients.
There are three main issues that arise while estimating survival benefit. The first issue is confounding. In the motivating example, patients begin their follow-up untreated and eventually receive treatment (transplantation) or die untreated. In this context, the survival benefit of interest becomes the contrast of: (i) patient’s survival if they remain untreated, conditional on having survived, untreated, up until the moment of potential treatment and conditional on the patient’s health up to that moment (never-treated survival); (ii) patient’s survival if they do receive the available treatment (with its specific characteristics), conditional on having survived, untreated, up until the moment of potential treatment and conditional on patient-specific covariates evaluated at time of treatment (post-treatment survival). To avoid confounding, these two components should ideally be estimated on data collected via conditionally randomized trials, that is, trials where randomization is guaranteed within subgroups of the population. 3 However, randomization is uncommon in the field of transplantation and we therefore need to rely on the observational data at hand, where the relation between transplantation and survival is heavily confounded by MELD score. In particular, MELD changes through time and consequently influences both transplantation and survival chances; in return, MELD is influenced by the treatment choice, as the patient’s health usually keeps deteriorating while on the waiting list. In order to avoid bias due to this time-dependent confounding, adjustment techniques (such as inverse probability weighting) are needed. 4
The second issue arises from the fact that treatment is a scarce resource which only becomes available irregularly at certain calendar times. Decisions on the prioritization of patients have to be made in calendar time, whenever a new treatment resource becomes available. Hence, in order to prioritize based on survival benefit, we need a model that can correctly estimate survival benefit based on the most recent information we have on the patients’ health at any given calendar date. In previous work, 5 the use of a partly conditional method was proposed for this setting where clinical decisions need to be made in calendar time. This method is closely related to other approaches such as the sequential trial approach6,7 and the sequential stratification approach,8–10 which all involve the creation of artificial time origins to analyze the data. In Gong and Schaubel, 5 a version of inverse probability of censoring weighting tailored to the sequential stratification methodology is proposed to estimate never-treated survival from moment of potential treatment onward. In a later work by the same authors, 11 these methods are further explored to estimate average survival benefit on the treated population, by contrasting the average counterfactual never-treated survival to the average factual post-transplant survival on the treated population.
The third issue concerns the presence of multiple versions of treatment in the estimation of the counterfactual post-transplant survival. Estimating counterfactual survival risks in presence of confounding requires causal inference methods, which rely on several assumptions. 3 A key assumption is the no-multiple-versions-of-treatment assumption,3,12 which assumes that a potential outcome under a given treatment is a single well-defined value. In practice, however, this assumption may be violated. For instance, in liver transplantation treatments (livers) can differ in quality and potentially lead to different outcomes. Accounting for multiple versions of treatment represents a challenge which is not present in the work by Gong and Schaubel.5,11 Their approach estimated survival benefit among the treated and therefore did not involve modeling counterfactual post-transplant survival—only the factual survival observed in treated patients. The only counterfactual quantity required for their analysis was the never-treated survival, used to estimate what survival would have been had treated patients not received a transplant. Unlike post-transplant survival, never-treated survival has only one version, as there are no variations in “non-treatment,” and thus does not require addressing multiple treatment versions.
In this work, we build upon the partly conditional method with adjustment for time-dependent confounding proposed by Gong and Schaubel5,11 to estimate conditional survival benefit in a way that applies to all patients awaiting treatment, rather than only those who have been treated, thereby providing a quantity that can inform clinical decision-making. Following the framework of Gong and Schaubel,5,11 we combine marginal structural models (MSMs) with multiple baselines, ensuring that the model can be used to make predictions not only at one fixed moment but repeatedly in calendar time—aligning with the type of predictions needed to support clinical decisions among patients on a waiting list. We extend the approach to (i) ensure applicability to the entire waiting list population, and (ii) account for multiple versions of treatment. These extensions introduce additional complexities. In this article, we clearly specify the assumptions required for identification of conditional survival benefit and propose an estimation procedure. Specifically, we introduce a reweighting strategy that enables the construction of an MSM capable of estimating both never-treated and post-treatment survival for the entire waiting list population, while explicitly accounting—under clearly stated assumptions—for multiple versions of treatment.
In Section 2, we introduce the notation, establish identifiability results, and describe the proposed estimation method. In Section 3, the practical validity of the proposed method is evaluated through a simulation study. In Section 4, we apply the proposed method to liver transplant data. In Section 5, we discuss our findings.
Methods
Notation
We start by establishing the set-up and the notation, which is similar to the one proposed by Gong and Schaubel.
5
Let
Now let us consider
In this work, we aim to contrast two treatment strategies for patients who are still eligible (meaning alive, untreated and available to initiate treatment) at time
We define
For these two quantities, the mean survival times, restricted to some pre-specified fixed time horizon

Graphical depiction of the notation used in the article. We consider one patient awaiting treatment. Time
Ideally, the quantities in equations (1) to (3) would be estimated from a sequentially randomized trial. In such a trial, at each cross-section
The three identifiability conditions are: consistency: the counterfactual outcome
the counterfactual outcome
positivity: the probability of being treated at time
the probability of being treated at
conditional exchangeability: for every
Under these conditions, the survival distributions in equation (1) become identifiable. A proof of this main result can be found in Section A.1 of the Supplemental Material. The proof shows that these survival distributions can be expressed in terms of hazard functions and weights. The hazard functions are defined as follows:
We choose the set of covariates
To construct the weights for creating the pseudo-population, we extend the stabilized weights suggested by Gong and Schaubel
5
to also include patients that do receive treatment. This requires two models for the denominator. The first one estimates the probability of remaining untreated based on a patient’s history, used to reweight untreated patients. The second one estimates the probability of receiving treatment type
We model the first probability, that is, the probability of remaining untreated given patient’s history, using Cox regression. Weights are often estimated using a pooled logistic regression, which, in our context where treatment is only administered once and that those who are treated stay in the treated condition from then on, is asymptotically equivalent to a Cox regression with time-dependent covariates.
13
We model the second probability, that is, the probability of receiving treatment at
We note that the denominator of the weights presented in the first line of equation (6) represents the simplest function to obtain a pseudo-population on which
For the weights presented in the second line of equation (6), we note that in a setting of scarce resources, it is likely that only a small number of patients are treated at each cross-section. This can lead to challenges in estimating
Simulation setup
In this section, we will follow the planning and reporting approach for simulation studies proposed by Morris et al. 14
Aim
The aim of the simulation is to evaluate if the proposed method can estimate accurately the quantities specified in equation (2), for all patients on the waiting list, at all cross-sections. The proposed method is compared to simpler modeling approaches.
Data-generating mechanism
Consistently with the previous section, we use
We start by generating all potential outcomes. For each patient, we generate a time of first eligibility
We then move on to generating the observational data that we use for estimation. For each patient, we generate
The aim of the simulation is to evaluate if the proposed method can correctly estimate (2), for all patients on the waiting list, at all cross-sections. Hence, in each replication, the goal is to estimate
In our simulation study, we compare the performance of three methods which are here summarized briefly; for more detail see Section B in the Supplemental Material. The three methods are (i) the naive method, where
Performance measures
The performances of the three models are evaluated by assessing the calibration. More specifically, we compare the estimated
Calibration is presented both graphically, by means of calibration curves, and numerically, by computing the root mean squared bias, as an overall summary measure of calibration. Root mean squared bias, which we will abbreviate to bias, has been indicated as a good summary of overall calibration into one number.17,18 It is calculated by taking the squared differences between the smoothed calibration curve and the diagonal line of perfect calibration,
18
for each patient at each cross-section for each simulation run, and then taking the square root of the average over these squared differences. The smoothing of the calibration curve serves to implicitly marginalize
Software
All analyses were conducted using the statistical software R (version 4.1.3)
19
with the packages
Simulation results
We run the simulation 200 times, each time with a population of 2500 patients. Figure 2 presents the calibration plots and the resulting biases.

Calibration curves for the three estimation methods and root mean squared bias, here referred to as bias. In the calibration curve, the estimated and true value of the estimands (
Overall, the proposed method yields lower bias than the two other methods. For the estimation of the restricted mean survival time without treatment, the proposed method outperforms the two other methods, as shown by both the root mean squared bias and the calibration plot. In particular, the proposed method appears to be able to better capture the
In the estimation of the restricted mean survival time with treatment, our proposed method remains closer to the diagonal line of perfect calibration while the other methods deviate from it. This behavior is expected, as data were generated with a non-linear effect of the predictor on the log-hazards, while all three methods assume a linear effect. In the presence of some model misspecification, using weights yields estimates that better represent the full target population. The estimates of the naive and the cross-section unweighted methods, which do not reweight the population, are mainly based on patients with higher
The results for estimation of benefit follows its two components, with the proposed method outperforming the two naive approaches.
We apply the proposed method to estimate the benefit of transplantation for patients with End-Stage Liver Disease in the Eurotransplant region. We use individual data on patients with chronic liver cirrhosis from the Eurotransplant registry, with patients being followed between 1 January 2007 and 31 December 2019. For our analysis, we use a subset of 9868 patients. A full description of the selection criteria is presented in Section C.1 in the Supplemental Material. Of these patients, 4998 were eventually transplanted, 3013 died before receiving a transplant and 1361 died within 3 years of being transplanted.
In this study, the goal is to build a model that, at any calendar date that a donor liver becomes available, can inform about the contrast between (i) survival probability over the next 3 years if the patient does not receive a transplant within 3 years, based on their individual current health status and (ii) survival probability over the next three years if the patient does get transplanted at that specific date, based on their individual current health and the quality of the donor liver being available for transplantation. The contrast is expressed as the difference of the RMSTs given by (ii) and (i), with a time horizon of 3 years. This contrast, namely the 3 years survival benefit, provides information about the potential gain in life-years (out of 3 life-years) obtained by allocating the available liver to this specific patient.
In the analysis, we make use of both recipient candidate and donor data. Data on the recipients consist of longitudinal measurements of serum bilirubin and creatinine, the international normalized ratio (INR), dialysis status (yes/no), and eligibility status as well as baseline information on age, blood group, gender, registration country, weight, and height. The longitudinal measurements are available from first eligibility to delisting, which could be due to death, transplantation, removal due to worsening of health conditions, removal due to improvement, cut-off date of the data extraction or loss to follow-up. The donor data consists of information on the donor livers: age of the donor at time of death, country of the donor, rescue liver offer (yes/no), donation after cardiac death (yes/no), and cause of death. All continuous variables were centered and standardized for the analysis (column “Overall” of Table 1 in Section C in the Supplemental Material presents their mean and standard deviation).
With slight abuse of terminology, we consider removal due to worsening of health conditions as death, as patients who are removed due to worsening will typically no longer be able to have access to a transplant as a source of treatment and their health will quickly deteriorate to death. Removal due to improvement, cut-off date of the data extraction or loss to follow-up are considered as a censoring mechanism that is independent conditional on baseline covariates, which here means “at cross-section” (see Section C.2 in the Supplemental Material for details). All times were rounded to the nearest 0.01 years.
Survival without treatment is estimated from
In order to estimate the stabilized IPTW, as specified in (6), we fit two Cox models for the weights of the untreated patients and a complementary log-log regression for the weights of the treated patients, as described in equation (7). In the two Cox models, we include the time-fixed covariates age, sex, blood group, weight, and height and the time-varying covariates MELD score and dialysis twice within the last week (yes/no). The baseline hazards of the two models are stratified by recipient’s country of listing. In the complementary log-log regression, we include the same patient covariates as the Cox models, as well as treatment specific covariates: donor age, rescue liver offer (yes/no), cause of death, donation after cardiac death (yes/no), and proximity between transplantation center and donor liver (in three categories, i.e. same center, same country, or abroad). Cross-sections with multiple treatments were treated as separate cross-sections, that is, different copies of the patients were made, one for every treatment available at that time. We pool together the cross-sections and estimate a single common intercept, under the assumption that the composition of the cross-sections is similar through time. Weights are capped to the 99.99 percentile to exclude extreme values. The coefficients of the models can the found in Tables 2 to 4 in Section C.3 in the Supplemental Material. To check whether the positivity assumption is met, we looked into the distribution of the variables used in the weights estimation, stratified by treatment status over time, to assess common support between treated and untreated. Further details can be found in Section C.4 in the Supplemental Material.
We fitted our MSM as described in equation (5). For patients with transplant status equal to 0 (i.e. not yet transplanted), we include as patient-specific covariates age at time of cross-section, sex, weight, height, blood group, latest serum bilirubin measurement, latest serum creatinine measurement, latest INR measurement, dialysis twice within the last week (yes/no), and time spent on the waiting list. For patients with transplant status equal to 1 (i.e. transplanted), we include as patient-specific covariates age at time of cross-section, sex, latest MELD score, and dialysis twice within the last week (yes/no). Donor age, rescue liver offer (yes/no), cause of death, donation after cardiac death (yes/no), and proximity between transplantation center and donor liver (same three categories, i.e. same center, same country, or abroad) are considered as treatment-specific covariates. We assume no interactions between variables and we pool all
The estimated MSM is then applied to the same data, in order to estimate the three-years survival benefit on the waiting list patients. Estimations are made at the same cross-section dates, based on the patients’ covariates at those dates. For the donor characteristics, we sample with replacement from the donor data a set of
In Figure 3, the MELD score of each patient at each time they are included in one of the 120 cross-section used to estimate survival without treatment is plotted against the respective estimated three-years survival benefit of that patient at that cross-section date. This figure clearly highlights the differences between the two prioritization rules. The current prioritization rules in the Eurotransplant region rely on the MELD score which was created to estimate 3 months survival while on the waiting-list (hence before treatment), while our proposed method estimates the life-years gained with transplantation in the next 3 years. A MELD-based prioritization favors patients with the highest MELD scores (those on the far right of Figure 3), while a prioritization based on benefit, as estimated by our MSM, would favor patients with the highest estimated survival benefit (those at the top of Figure 3). While on average an increase in MELD corresponds to an increase in benefit, it is also true that if patients were to be allocated to transplant according to the MELD score (as is currently implemented in the Eurotransplant region), we would not be necessarily selecting the patient with most benefit from that liver: there are patients with MELD score between 20 and 30 who seem to have larger benefit than patients with MELD score between 30 and 40 and, vice versa, some patients with high MELD score do not seem to have a great survival benefit.

The Model for End-stage Liver Disease (MELD) score of each patient at each cross-section date is plotted against the respective estimated three-years survival benefit of that patient at that cross-section date. This means that one dot represents one cross-section-patient combination. To improve readability, random uniform noise (
By selecting at each cross-section one patient who would be the top pick according to a benefit based allocation and one patient who would be the top pick according to a MELD-based allocation, we computed that, on average, 0.46 life-years (out of the next 3 life-years) could have been saved extra by using a benefit-based allocation system. Given that over the last 10 years almost 5000 patients in our dataset received a transplant, this means that up to 2300 life-years could have potentially been saved for these individuals.
In this work, we propose a sequential MSM approach for the prediction of conditional survival benefit for patients on a waiting list at the moment a scarce intervention becomes available. To showcase the value of our method, we apply it to estimate the (potential) conditional survival benefit of liver transplantation for patients with End-Stage Liver Disease via individual patient data from the Eurotransplant registry. The simulation study shows that the proposed method outperforms simpler methods which either do not account for confounding or ignore the need to combine two different time scales.
The proposed method borrows the idea of making sequential cross-sections on the calendar time scale from previous work,5,11 where cross-sections were combined together with inverse probability of censoring weighting to estimate survival benefit for the treated population. In our work, however, cross-sections are used to obtain predictions of survival benefit that are applicable to all patients awaiting treatment, and not just to the treated population as by Gong and Schaubel, 5 Schaubel et al.,8,9 and Gong and Schaubel 11 These predictions can be used to rank patients awaiting for treatment based on benefit and may support allocation choices. It is important to underline that the use of cross-sections is motivated by the question at hand, namely: which patients have the largest gain in life years if they receive treatment at this calendar date? This question, where the calendar time at which treatment is given is key, is answered by estimating the average treatment effect for each patient conditional on both characteristics of the patient and characteristics of the treatment that came available at that specific date. While previous work has estimated the effect of transplantation for wait-listed patients,23,24 the treatment effect was estimated using only one time scale, hence targeting a different question, namely: what is the average treatment effect for a wait-listed patient, conditional on some clinically relevant characteristics? The two questions are not equivalent, as the population composition of the cohort at time of first eligibility list is different from the composition at given calendar dates. Our proposed method combines the two different time scales that are at place, thus answering our question of interest.
Compared to previous approaches, our method offers several advantages. First of all, survival with and without treatment can either be estimated separately (by means of separate baseline hazards, as demonstrated in the data application) or in one single model (by means of one single baseline hazard and including treatment as a time-dependent variable in the MSM), granting flexibility. Arguments pro and against both choices can be made. On the one hand, if treatment is estimated non-parametrically, fewer assumptions are needed in the model regarding its effect. On the other hand, one might argue that the proposed method estimates benefit as a difference of two estimands and not directly as a self-standing quantity. 25 Flexibility in our method is also partially ensured by the use of inverse probability of treatment weighting, which allows for the separate handling of confounding and analysis in two different steps. 3 Moreover, the cross-sections, which are used to target the correct population of interest, allow for the specification of interactions between treatment and time-varying covariates, which simpler MSMs cannot achieve. 26
The cross-sections also ensure efficient use of longitudinal data. 6 In particular, this use of the longitudinal data at hand, which considers measurements taken at multiple calendar dates as baselines, leads to a larger sample size compared to the one provided by one single time origin. This is also useful in light of estimation of weights, which are known to lead to greater variance. Previous work showed that the use of sequential trials leads to less extreme weights compared to simpler MSMs with only one time origin. 27
In the real data application, we applied the estimated MSM to the same data at the same calendar dates, while doctors would want to use the estimated MSM to make predictions for future patients on the waiting list at different calendar dates. To do so, decisions must be made regarding the baseline hazards of the two treatment strategies. If the baseline hazards are estimated non-parametrically for each cross-section separately, they cannot be directly extrapolated to predictions at later dates, and assumptions are required. For example, one could assume that the latest baseline hazard best reflects recent mortality rates (for both treated and untreated patients) and use that for prediction. Alternatively, baseline hazards could be pooled across several cross-sections, perhaps assuming that cross-sections of the same calendar year share one common baseline hazard, and the latest pooled hazard could be used for future predictions. Another approach might involve modeling the calendar dates parametrically, rather than using them as strata, to capture mortality trends over time, allowing for prediction under the assumption of continuation of the identified trend. In our application, we used one common baseline hazard pooled across all cross-sections.
Our method also has some limitations, which should be acknowledged. Firstly, it is computationally intensive. The use of
Despite being based on a relatively small cohort of 231 patients from four medical centers within the United States between 1991 and 1995, 28 the MELD score continues to be utilized in the Eurotransplant area to prioritize patients for allocation of liver organs. In recent years, efforts have been made to update and improve the MELD score to better capture the characteristics of the Eurotransplant patients29,30 and, therefore, improving the prioritization rule for the Eurotransplant area. These improvements involve re-fitting the models and subsequently adjusting the coefficients of each MELD component. Our work further contributes to the aim of exploring and improving prioritization rules for the Eurotransplant area, by proposing a methodology to estimate survival benefit and therefore providing the tools needed to shift from a traditional “sickest first” allocation approach to a prioritization strategy that focuses on individuals with the highest potential benefit. Such an allocation system has already been in place in the UK since 2018, after the National Health Service gave a presentation over the potential of a benefit-based allocation, 31 which was studied by means of a simulated allocation model. It is important to note that while our method has the potential to guide allocation choices, the current analysis was conducted on a specific subset of patients within the registry data. For actual implementation purposes, it is essential to include all wait-listed patients. Some variables with known high predictive value, such as sodium, albumin and oncological parameters, are currently not routinely collected in the Eurotransplant area. The addition of these variables to the prediction model would allow for more discriminative predictions. Further considerations will also be needed for particular subsets of patients, for example, patients seeking improved quality of life rather than life-saving treatment, who may need to be scored differently (i.e. not based of survival benefit). A simulated allocation model similar to LSAM (Liver Simulated Allocation Model), which was developed by the Scientific Registry of Transplant Recipients 32 for the United Network for Organ Sharing region, is needed to more accurately evaluate how the waiting list composition would change by a change to benefit-based allocation and to get a more precise estimate of how many life-years could be saved with benefit-based allocation.
In conclusion, we propose the use of a sequential MSM approach for predicting conditional survival benefit for patients on a waiting list. By applying it to estimate the potential survival benefit of liver transplantation for patients with End-Stage Liver Disease, we demonstrate its value. Our method provides flexibility in the modeling of treatment effect and incorporates longitudinal data effectively. It offers insights for prioritizing patients based on benefit thus supporting allocation strategies. Further research is needed to explore model misspecification scenarios and to gain more insight on how benefit-based allocation would change the composition of the liver transplant waiting list.
Supplemental Material
sj-pdf-1-smm-10.1177_09622802261420699 - Supplemental material for Estimating conditional survival benefit for the allocation of scarce resources
Supplemental material, sj-pdf-1-smm-10.1177_09622802261420699 for Estimating conditional survival benefit for the allocation of scarce resources by Ilaria Prosepe, Nan van Geloven, Hans de Ferrante, Andries E Braat and Hein Putter in Statistical Methods in Medical Research
Footnotes
Acknowledgements
Eurotransplant is gratefully acknowledged for providing the data.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the NWA.1228.191.361 grant.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
