Abstract
Background:
Restricted mean survival time is a measure of average survival time up to a specified time point. There has been an increased interest in using restricted mean survival time to compare treatment arms in randomized clinical trials because such comparisons do not rely on proportional hazards or other assumptions about the nature of the relationship between survival curves.
Methods:
This article addresses the question of whether covariate adjustment in randomized clinical trials that compare restricted mean survival times improves precision of the estimated treatment effect (difference in restricted mean survival times between treatment arms). Although precision generally increases in linear models when prognostic covariates are added, this is not necessarily the case in non-linear models. For example, in logistic and Cox regression, the standard error of the estimated treatment effect does not decrease when prognostic covariates are added, although the situation is complicated in those settings because the estimand changes as well. Because estimation of restricted mean survival time in the manner described in this article is also based on a model that is non-linear in the covariates, we investigate whether the comparison of restricted mean survival times with adjustment for covariates leads to a reduction in the standard error of the estimated treatment effect relative to the unadjusted estimator or whether covariate adjustment provides no improvement in precision. Chen and Tsiatis suggest that precision will increase if covariates are chosen judiciously. We present results of simulation studies that compare unadjusted versus adjusted comparisons of restricted mean survival time between treatment arms in randomized clinical trials.
Results:
We find that for comparison of restricted means in a randomized clinical trial, adjusting for covariates that are associated with survival increases precision and therefore statistical power, relative to the unadjusted estimator. Omitting important covariates results in less precision but estimates remain unbiased.
Conclusion:
When comparing restricted means in a randomized clinical trial, adjusting for prognostic covariates can improve precision and increase power.
Introduction
The log-rank test and the Cox 1 proportional hazards regression model are two of the most popular procedures for comparing survival times in different treatment arms of a randomized clinical trial (RCT). The log-rank test is known to be most powerful under proportional hazards alternatives, and proportional hazards are usually assumed when fitting the Cox regression model, although the model can be extended to accommodate non-proportional hazards as described in Cox’s original manuscript and by others. 2 A variety of parametric models are also available for analyzing survival data, 3 and these models can be applied under proportional hazards, accelerated failure time, and other frameworks. 4 All of these methods allow for censoring, that is, observations in which the event of interest has not yet occurred, a common feature of survival data from RCTs.
Were it not for censoring, mean survival times could be compared in different groups using standard methods, such as two-sample t-tests or ordinary least-squares multiple regression if covariate effects or adjustment are of interest. Methods for least squares regression with censored data have been proposed, 5 but are rarely used because of computational issues and, until relatively recently, the lack of available software. 6 In addition to evaluating covariate effects on outcome, covariates are often included in an analysis for two other purposes: (1) to adjust the treatment effect for imbalances in prognostic factors between the treatment arms, although in an RCT this would be adjustment for random imbalances only and (2) to improve the precision of the estimated treatment effect by accounting for other sources of variation.
As censoring generally precludes estimation of the mean survival time, Irwin
7
proposed, as an alternative, to estimate the expectation of life limited (restricted) to a suitably chosen time
where
Thus,
Kaplan and Meier discussed estimation of the restricted mean by substituting the product-limit estimator into equation (2), and Meier 9 established its asymptotic normality.
Why consider RMST? In addition to its simple interpretation as the mean “up to” time
However, one should not lose sight of the fact that the restricted mean is just that—a restricted mean: it ignores everything beyond the point of restriction and cuts off the distribution at
Karrison
15
and Royston and Parmar
10
provided sample size formulae for designing a clinical trial based on RMST and made recommendations for choosing the point of restriction
Covariate adjustment in linear and non-linear models
In linear models, adjusting for covariates that are associated with outcome increases the precision of the treatment effect estimator. In the classic analysis of covariance (ANCOVA) model, for example, this is achieved through a reduction in the residual variance. In non-randomized studies, adjustment for covariates is almost always necessary in order to reduce confounding, whereas, as mentioned above, in RCTs it serves the dual purpose of adjusting for random imbalances in prognostic factors between treatment arms, as well as of potentially improving precision.
A gain in precision, however, cannot be taken for granted in non-linear models. Robinson and Jewell 16 showed that in logistic regression, adjustment for covariates leads to a loss in precision (or at best no gain). Similarly, Ford et al. 17 demonstrated that in the Cox regression model, adjustment for prognostic covariates does not improve the precision of the estimated treatment effect. Further complicating the decision of whether to adjust for covariates is that omitting influential covariates in both logistic and Cox regression produces a treatment effect estimate that is “biased” toward the null. If the model is misspecified, tests of the null hypothesis are valid, but if the alternative hypothesis is true, the “bias” toward the null results in diminished power. Therefore, it is still beneficial to adjust for prognostic covariates.18,19 Schoenfeld and Borenstein 20 provide an algorithm for calculating the power for logistic and proportional hazards models that incorporate covariates. As in Hauck et al., 19 we have placed the term “bias” above in quotation marks because in the case of logistic and Cox regression, the estimand changes as covariates are added, and the unadjusted and adjusted models actually estimate different measures of treatment effect. Heuristically, Hauck et al. describe this as moving from a “population-averaged” interpretation for unadjusted estimates toward a more “subject-specific” effect in covariate-adjusted models, where covariates can be thought of as representing the subject effect.
As the model for estimating RMST as developed here is also non-linear in the covariates, we address the following questions in this article. How does comparison of restricted means between treatments in an RCT fare in regard to covariate adjustment: is it necessary to adjust for covariates to obtain an unbiased estimate of the treatment effect, and does adjustment for covariates improve precision and/or statistical power?
Methodology
Let
where
where the baseline hazard function for group g,
Zucker used the Breslow estimator for the cumulative underlying hazard function in group
Here,
The survival estimates in equation (4) can be integrated to provide estimates of RMST and the difference in RMST between treatment groups at a given value of
to obtain an overall adjusted treatment difference
Chen and Tsiatis
23
showed that
The large sample variance of
Simulation study
We conducted a simulation study to evaluate the performance of unadjusted and adjusted RMST comparisons in randomized, two-arm clinical trials. In all simulations, data were generated from a Weibull model with a pre-specified treatment effect parametrized by
True survival times
In Scenarios 1–4, we investigated the effect of covariate adjustment when the true model has only one or two prognostic covariates. In Scenario 5, we examined the effect of covariate adjustment when the true data generating mechanism involves multiple correlated prognostic covariates with varying degrees of correlation and magnitude of the effect on survival. R = 3000 replications were performed for each scenario. Figure 1 shows the true survival curves for each of the five scenarios, with the covariate(s) set to their expected or representative value(s).

True survival curves at expected values of the covariates (for Scenario 4, the binary covariate Z2 is set to 0).
Scenario 1
Survival times were generated under a proportional hazards treatment effect with a single prognostic covariate
Scenario 1: null hypothesis case RMST
ESE: empirical standard error; ASE: average model-based standard error; Unadj: unadjusted; Adj: adjusted; Rej rate: rejection rate; RMST: restricted mean survival time.
Average censoring rate = 30.5%.
Scenario 1: proportional hazards treatment effect, RMST
ESE: empirical standard error; ASE: average model-based standard error; Unadj: unadjusted; Adj: adjusted; Eff: efficiency; RMST: restricted mean survival time.
Average censoring rate = 21.6%.
We found that in all models, the estimates of the treatment effect are essentially unbiased. The average model-based standard errors (ASE = mean over R replications of the estimated standard error of
Scenario 2
Here, we generate survival times in each treatment group from Weibull distributions with different scale and shape parameters, thus allowing the treatment effect to be non-proportional hazards. Setting
Scenario 2: non-proportional hazards treatment effect (early difference) RMST
ESE: empirical standard error; ASE: average model-based standard error; Unadj: unadjusted; Adj: adjusted; Eff: efficiency; RMST: restricted mean survival time.
Average censoring rate = 23.9%.
Scenario 3
In this scenario, the treatment effect is again non-proportional hazards, such that the survival curves are similar over the first year and then separate (increasing hazard ratio), with a true
Scenario 3: non-proportional hazards treatment effect (late difference), RMST
ESE: empirical standard error; ASE: average model-based standard error; Unadj: unadjusted; Adj: adjusted; Eff: efficiency; RMST: restricted mean survival time.
Average censoring rate = 19.4%.
Scenario 4
We return to considering a proportional hazards treatment effect but with two prognostic covariates and one non-prognostic covariate. We generated
Scenario 4: proportional hazards treatment effect RMST
ESE: empirical standard error; ASE: average model-based standard error; Unadj: unadjusted; Adj: adjusted; Eff: efficiency; RMST: restricted mean survival time.
Average censoring rate = 37.0%.
Scenario 5
In the last scenario, we consider five correlated prognostic covariates

Scenario 5 RMST:
The covariate effects in Scenario 5 are somewhat large. For example,
Example
As an example, we analyze data from the DeCIDE trial,
24
an RCT of induction therapy plus chemoradiotherapy (I+CRT) versus chemoradiotherapy (CRT) alone in patients with locally advanced squamous cell carcinoma of the head and neck. Patients with non-metastatic N2 or N3 disease were randomized to receive either two cycles of induction therapy followed by five cycles of chemoradiotherapy or five cycles of chemoradiotherapy only. A total of 273 evaluable patients were enrolled and followed for up to 7 years. The Kaplan–Meier curves for recurrence-free survival, defined as the time from randomization until disease recurrence or death from any cause, are shown in Figure 3. Recurrence-free survival is higher in the induction therapy plus chemoradiotherapy arm after 1 year, but the difference is not statistically significant by the log-rank test (p = .16). Estimating RMST at
Unadjusted

Recurrence-free survival in DeCIDE trial of induction therapy plus chemoradiotherapy (I+CRT) versus CRT alone in patients with head-and-neck cancer. Solid: I+CRT (n1 = 138), Dashed: CRT (n2 = 135).
Thus, in the induction therapy plus chemoradiotherapy arm, RMST restricted to 5 years was estimated to be 3.64 years, and patients achieved 73% of potential recurrence-free life years (over a 5-year horizon) compared to 3.32 years and 66% in the chemoradiotherapy-only arm. The absolute difference in restricted means is .32 years (ratio 1.10), but is not statistically significant (p = .19). The p-value is very close to the p-value from the log-rank test.
Next, we obtain the RMST estimate adjusting for five prognostic covariates that were all significantly associated with recurrence-free survival in univariate analyses, that is, Karnofsky performance score, T-stage, N-stage, age, and smoking status.
Adjusted
Here, despite the increase in precision, adjustment has reduced the estimated difference in restricted means and increased the p-value. This is because the induction therapy plus chemoradiotherapy arm was slightly favored on these covariates. Of interest, similar conclusions are obtained from fitting a Cox proportional hazards regression model to these data: unadjusted
Discussion
Our simulation study suggests that analysis of restricted means based on the stratified Cox model (3) is similar to ANCOVA for linear models, in that adjusting for covariates associated with the outcome provides increased precision for the treatment effect contrast, whereas adjustment for non-prognostic covariates produces no improvement. Our findings suggest that incorporating covariates into the model can improve precision if they are appropriately chosen. A conservative approach to design clinical trials that compare RMST could be to power the study based on the expected precision of the unadjusted estimator, and then to incorporate covariates into the final analysis to narrow the confidence interval width and increase power. However, there can be downsides to this strategy. As shown by Beach and Meier, 25 adjustment for covariates in RCTs affords the analyst the opportunity to select the model that provides the strongest evidence for a treatment effect—so-called “p-value shopping.” One solution to this problem is to pre-specify in the protocol the set of covariates that one will include in the model based on a priori knowledge about which factors are likely to affect survival, that is, known prognostic factors. Alternatively, Tsiatis et al. 26 have developed a strategy for covariate adjustment that avoids this pitfall and which could potentially be adapted to RMST.
A nice feature of the analysis of restricted means, as suggested by our simulation studies, is that unadjusted estimates, as well as estimates from models that include only some of the true prognostic factors, show little or no bias. This implies that while some efficiency may be lost, the treatment effect estimator is centered at the same target even when the model is misspecified and influential covariates are omitted. In addition, the estimand can be interpreted as the average causal treatment effect, and its interpretation does not rely on proportional hazards or other parametric assumptions.
Finally, we reemphasize that RMST estimates require careful interpretation. If the survival estimates are at or near zero toward the end of the follow-up period, the restricted mean will be close in magnitude to the overall mean. But this is frequently not the case in clinical trials where the follow-up time can be relatively limited, resulting in high censoring rates, and such that survival estimates remain above 25% or even above the 50th percentile as, for example, in the DeCIDE trial. If survival rates differ at the end of the follow-up period and the true curves remain separated, the difference in restricted means will underestimate—potentially seriously underestimate—the difference in overall means. Thus, RMST informs us only about the survival experience up to the limit of observation. What else could it do? Only parametric assumptions or extrapolation beyond the observation period would give us estimates of the overall mean, and few would likely want to rely on such an approach. Nonetheless, with these caveats in mind, analysis of RMST can provide informative results about the effects of treatment on survival in clinical trials and be a useful complement to standard methods.
Supplemental Material
759281_supp_mat_final – Supplemental material for Restricted mean survival time: Does covariate adjustment improve precision in randomized clinical trials?
Supplemental material, 759281_supp_mat_final for Restricted mean survival time: Does covariate adjustment improve precision in randomized clinical trials? by Theodore Karrison and Masha Kocherginsky in Clinical Trials
Footnotes
Acknowledgements
The authors thank David Zucker for use of his SAS macros and the reviewers for their helpful comments.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
This work was supported by National Cancer Institute (Grant/Award Number: “P30CA014599” and “P30CA060553”). The content is solely the responsibility of the authors and does not necessarily represent the views of the National Institutes of Health.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
