Abstract
Survival functions are a common visualization of predictions from the Cox model. However, neither Stata’s
Keywords
1 Overview
The Cox proportional hazards model is the predominant model in survival analysis. Aside from the commonly used hazard ratios, survival functions are an intuitive way to communicate the implications of duration analyses (Cleves, Gould, and Marchenko 2016; Putter et al. 2005; and Ruhe 2018). Nevertheless, because the Cox proportional hazards model does not directly estimate the baseline hazard, it is not straightforward to describe the uncertainty around the point estimate for the survival function. In this article, I discuss how to calculate bootstrap pointwise confidence intervals for survival functions in Stata and introduce the
The
In the next section, I discuss how the bootstrap can improve statistical inference and outline how bootstrap pointwise confidence intervals can be estimated. Thereafter, I introduce the new command
2 Bootstrap versus asymptotic confidence intervals
If the model contains only covariates with proportional hazards, several approaches for analytical, asymptotic pointwise confidence intervals exist. Cefalu (2011) provides a Stata command to calculate these types of confidence intervals using either the linear or the log-log approach. However, it has been shown that bootstrap methods can provide a better uncertainty estimation for survival functions (Burr 1994). The analytically calculated intervals are derived based on assumptions regarding the asymptotic distribution of the estimated statistic. In the case of survival functions from the semiparametric Cox model, in which the baseline hazard is not defined, usually either the normal distribution or a transformation is assumed (Klein and Moeschberger 2003). If this assumption is incorrect, asymptotic confidence intervals will provide invalid inferences. In these cases, the bootstrap provides alternative methods to construct confidence intervals by resampling from the data used in the analysis.
Invalid inferences based on asymptotic methods are often due to deviations from a confidence interval’s nominal coverage probability, that is, the probability with which the confidence interval contains the true parameter. For example, for a confidence interval with a nominal coverage probability of 90%, an accurate coverage probability would imply that in 90% of all samples a confidence interval calculated with this method will contain the true value (Carpenter and Bithell 2000).
For survival estimates from the Cox model, simulation studies have shown that bootstrap confidence intervals have better coverage probability and outperform asymptotic confidence intervals, depending on the bootstrap method used. Whereas the asymptotic method used in these simulations provided smaller than nominal coverage probabilities, bootstrap confidence intervals based on percentile methods had approximately nominal coverage probabilities. Other methods to form bootstrap confidence intervals performed less well (Burr 1994).
Bootstrap methods are thus a good alternative to asymptotic methods or a crosscheck to assess whether the inference might be driven by inaccurate distributional assumptions. This is particularly relevant for small samples in which the asymptotic distributional assumptions are questionable, but it also applies in larger samples and complex models. In an ideal case, these methods support the inference drawn from asymptotic methods. In the worst case, bootstrap confidence intervals cast the results from asymptotic intervals in doubt (Carpenter and Bithell 2000).
Because bootstrap confidence intervals are nonparametric, it might seem advisable to always use the bootstrap. However, the bootstrap is computationally intensive because it requires that the estimation be repeated many times based on alternative samples.
In contrast, asymptotic confidence intervals are often trivial and fast to compute. Especially for large sample sizes and complex models, it is therefore advisable to apply the bootstrap procedure discussed in this article as a validation step in the final stages of the analysis.
In some instances, methods for asymptotic intervals are not readily available in Stata, such as when the Cox model contains nonproportional hazards, for example, because of time-varying effects. Nonproportional hazards can be modeled in the Cox model by interacting the respective variable with some function of analysis time. In these cases, researchers can now rely on the command provided with this article to get uncertainty estimates for survival functions from Cox models with nonproportional hazards. 1
3 Bootstrapping survival functions
Several methods exist to generate bootstrap samples and construct confidence intervals for the desired statistic. The sampling method used in this article is simple resampling with replacement, which requires no knowledge of the censoring distribution and has generated good results in previous simulation studies (Burr 1994, 1296). Assuming data with covariate vector Xi , duration time Yi , and censoring time Ci , we observe time Ti = min(Yi, Ci ) and the censoring indicator δi = I(Yi < Ci ). The simple bootstrap method simply resamples the triples (Xi, Ti, δi ) (Burr 1994).
In Stata, this consists of using the bootstrapping command
For each bootstrap sample, the survival function is estimated and saved. The code below demonstrates how this is done for a simple example Cox proportional hazards model with a single covariate called
where
Note that this procedure therefore produces conditional estimates (also often called adjusted predictions) rather than (average) marginal (that is, population-averaged) estimates.
2
Because marginal estimates would average over survival predictions for each individual in the study, this would require a substantive number of additional calculations for each bootstrap replication and increase the estimation time dramatically. Consequently,
The estimation results of each bootstrap replication are then appended to the data to form a new dataset that contains a time and a survival-function variable. These data record the estimated survival function for each bootstrap replication.
This process is repeated for the desired number of replications:
Based on the bootstrap results, the confidence interval at each observed event time can be calculated using the percentile method, which has been found to produce good results for survival functions in simulation studies (Burr 1994, 1296). Let i = 1, 2,…, b be the bootstrap samples and
For 90% confidence intervals, this is implemented using the code below. The data containing the estimated upper and lower bound for each time point can be saved and used for visualizations or further calculations.
4 The bsurvci command
The
4.1 Syntax
4.2 Description
4.3 Options
5 Examples
5.1 Basic use
To use the command, we need to have an ID variable that uniquely identifies each entry in the data. For multiple-record data, the ID variable specified in
(output omitted)
The command can be executed without having fit a model. The command fits the model and provides the output for the fitted model. Subsequently, the bootstrap replications are performed, and the progress of these replications is documented.
The command adds three new variables to the existing data: the point estimate newvar and the estimated lower and upper bounds newvar_
These three variables can be used to create flexible customized plots of the estimated survival function. Let’s assume we want to plot the survival estimates at each observed failure time because we have information only about the survival estimate at these time points. The following code creates figure 1, which displays this information: 3

Survival estimate for a 58-year-old patient with the drug treatment
5.2 Comparing survival estimates
In many situations, it is useful to compare the estimates for different scenarios. In the example data, this could be a comparison of the drug treatment relative to the placebo. To generate such a comparison, we can simply rerun the command with the alternative covariate values. We can also specify up to four sets of covariate values in one command through the
To compare the previous estimate of a 58-year-old patient with drug treatment with a person of similar age without the treatment, we can rerun the command by simply adjusting the covariate value of the treatment variable to 0.
(output omitted)
Alternatively, if we had not already estimated the prediction for the treatment case in the previous section, we could have specified the
(output omitted)
Figure 2 shows the estimation results for the drug and the placebo treatment. The following code produces the graph:

Survival estimate for a 58-year-old patient depending on treatment
5.3 The graph and plotopts() options
The
(output omitted)
Without any further specification, the line plot uses the default line pattern and colors of the scheme. To produce a more refined graph, you can use the

Survival estimate and 95% confidence intervals produced with the
5.4 Time-varying coefficients

Survival estimate for 43-year-old female patient, assuming a time-varying gender difference
6 Conclusion
In this article, I have described how to estimate bootstrap pointwise confidence intervals for covariate-adjusted survival functions based on the Cox model. The new community-contributed
Supplemental Material
Supplemental Material, st0553 - Bootstrap pointwise confidence intervals for covariate-adjusted survivor functions in the Cox model
Supplemental Material, st0553 for Bootstrap pointwise confidence intervals for covariate-adjusted survivor functions in the Cox model by Constantin Ruhe in The Stata Journal
Footnotes
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
