Abstract
Time-to-event semi-competing risk endpoints may be correlated when both events occur on the same individual. These events and the association between them may also be influenced by individual characteristics. In this article, we propose copula survival models to estimate hazard ratios of covariates on the non-terminal and terminal events, along with the effects of covariates on the association between the two events. We use the Normal, Clayton, Frank and Gumbel copulas to provide a variety of association structures between the non-terminal and terminal events. We apply the proposed methods to model semi-competing risks of graft failure and death for kidney transplant patients. We find that copula survival models perform better than the Cox proportional hazards model when estimating the non-terminal event hazard ratio of covariates. We also find that the inclusion of covariates in the association parameter of the copula models improves the estimation of the hazard ratios.
Introduction
Often in medical studies, patients who are lost to follow-up, or do not experience the event of interest during the study period, leave a censored observation. However, with semi-competing risk endpoints, the non-terminal event has the possibility of also being censored by the terminal event. 1 One example of semi-competing risks, which will be analysed later in this article, is graft failure and death following kidney transplant, where the non-terminal event is graft failure and the terminal event is death.
The Cox proportional hazards model is widely used in practice to analyse time-to-event outcomes. The Cox model was originally developed to analyse all-cause mortality, 2 of which there is no competing risk. The hazard ratios estimated from the Cox model, assuming an independent censoring mechanism, are potentially biased when analysing a non-terminal event. 3 Some authors use copula regression models to jointly model non-terminal and terminal events. Peng and Fine, 4 Hsieh et al. 5 and Chen 6 proposed semiparametric copula models for the regression on the marginal distributions. The analysis on the terminal event can be conducted using common survival analysis methodology. For instance, Peng and Fine 4 model the terminal event with a proportional hazards model marginally within their copula model. Hsieh et al. 5 used a copula model, which allows for different dependence structures between covariate groups. The Bayesian normal induced copula estimation model is developed by Fu et al. 7
Copulas are multivariate distribution functions which can model the marginal distributions separately, along with the dependence structure between them. Sorrell et al. 8 introduced bivariate copula models to estimate the correlation between semi-competing risk endpoints, using the following four copula functions, the Normal, Clayton, Frank and Gumbel copulas. However, the hazard rates of the non-terminal and terminal events, along with the association parameter between them, may be influenced by covariates. Including covariates into the analysis of the correlation between survival endpoints can help understand how the association may be influenced by individual characteristics. Covariates may also be included in the analysis of marginal distributions, which allows the estimation of hazard ratios and subsequently the comparison of risks of survival endpoints between groups.
In this article, we develop copula survival regression models by using conditional copulas, 9 which allow both the association parameter between the survival endpoints and the hazard rates to depend on multiple binary covariates. This is an extension from the copula survival model introduced by Sorrell et al., 8 where no covariates are included. We estimate the hazard ratio using copula survival regression models with binary covariates. We also estimate the effects of these binary covariates on the association between semi-competing risks. By jointly modelling the non-terminal and terminal events and including the correlation between them, we hope to improve the inference about the non-terminalevent.
We focus on the inclusion of covariates in the analysis of survival data with a semi-competing risk using conditional copulas. In the semi-competing risk setting, Chen 6 developed a model allowing the association parameter of the copula function to vary with categorical covariates. A model allowing each covariate group to assume a separate Archimedean copula function is introduced by Hsieh et al. 5 This allows different dependence structures describing the association between the semi-competing risk events for each covariate group. The effect of a discrete covariate on the non-terminal and terminal event and the association parameter between the semi-competing risk events is investigated by Ghosh. 10 Peng and Fine 4 studied the effect of a covariate on the non-terminal event and the association parameter, by proposing a model with a time-dependent copula and applying the methods to survival endpoints following AIDS diagnosis. Similarly, Zhu et al. 11 proposed a copula regression approach to examine covariate effects on the non-terminal and terminal events using a two-stage approach. In stage 1, the covariate effects on the marginal events are estimated and in stage 2, the association parameter of the copula model is estimated. Zhu et al. 11 prefered the two-stage approach, compared to the one-stage approach by Peng and Fine, 4 as it allows for separate identification of misspecification of the marginal regression models and the copula model.
The rest of the article is organised as follows. The motivating data for this article are described in Section 2. We then introduce our proposed copula regression survival methods to estimate the effect of covariates on the semi-competing risk events and on the association parameter using a variety of conditional copula functions in Section 3. In Section 4, we apply our proposed methods to the motivating data, followed by a simulation study to compare the copula models in Section 5. We conclude with a discussion.
Motivating data
The motivating data are from the United Kingdom Transplant Registry (UKTR), held by the National Health Service (NHS) Blood and Transplant. The outcomes, time to graft failure and time to death since kidney transplantation, are semi-competing risks. Our study population is kidney transplant recipients, who had their single and first transplant between 1995 and 2016 in the UK, with known age, sex and donor type at the time of transplantation. We present a novel application by using our proposed copula survival regression models.
We included
Table 1 shows the baseline characteristics of interest. The covariates to be included in the application later in this article are donor type (deceased or living), recipient’s sex (male or female), and recipient’s age (
Baseline characteristics for single and first kidney transplant recipients from the UK Transplant Registry data set between 1995 and 2016. The number of recipients (
) with the particular characteristic is given alongside the percentage (
) in brackets.
Baseline characteristics for single and first kidney transplant recipients from the UK Transplant Registry data set between 1995 and 2016. The number of recipients (
In this section, we describe methods to include covariates in copula functions to examine the effects that they have on the semi-competing risk events. We let the association parameter of the copula function, along with the hazard rates from the marginal survival distributions be conditional on covariates. We use conditional copulas, first introduced by Patton 9 who extended Sklar’s theorem 17 to account for the conditioning of the random variables on covariates.
Let
We let the marginal hazard rates and the association parameter depend on covariates,
We estimate the association between the survival probabilities of individuals, using univariate survival functions in the copula function. This changes the interpretation of association between the endpoints,
18
compared to the use of the cumulative distribution function (CDF) by Patton.
9
We represent the joint survival function using the copula function,
Then, the copula density function is as follows
Let
Below, we give the formulae for the likelihood when the four types of copula, the Normal, Clayton, Frank and Gumbel, are used. Table 2 summarises the different link functions for each copula that are used to ensure the association parameters are within the permissible ranges within the respective copula functions.
Link functions for the association parameter
For the Normal copula, the association parameter
For the Clayton copula, the likelihood function is given by
We consider the Exponential, Weibull and Gompertz distributions for the marginal distributions of event times. For each model, we specify the link function for marginal parameters. An Exponential model assumes a constant hazard rate, while the Weibull and Gompertz models allow for increasing, constant or decreasing hazard rates. Our proposed methods can be easily extended to other parametric survival distributions such as a Log-normal or Log-logistic distributions.
Assume that the marginal distributions for both events follow the Weibull distributions with the PDFs
The survival function and hazard function are given as follows
We consider the case where the shape parameters
Therefore, the hazard ratios for a binary covariate
Assume that the marginal distributions for both events follow the Gompertz distributions with the PDFs
The survival function and hazard function are given as follows
We consider the case where the shape parameters
Therefore, the hazard ratios for a binary covariate
For all three marginal models, the variances of the hazard ratios can be approximated by
The regression coefficients for the marginal distributions and the association parameters are estimated by maximising the log-likelihood, that is, the logarithm of (4). This is achieved by using the function
Application
We apply the methods described in Section 3 to the UKTR data set introduced in Section 2. The data set contains time to graft failure and time to death of kidney transplant recipients. To illustrate the use of the methods, we include the following binary covariates, recipient age group (
We use Exponential, Weibull and Gompertz distributions to model the survival time. We apply four different copula functions to describe the association between the survival endpoints and we select the best fitting models using the Akaike information criterion (AIC).
20
The estimated hazard ratios for each covariate and the regression coefficients of the covariates for the association parameter are provided in Tables 3 to 5 for the Exponential, Gompertz and Weibull survival distributions, respectively. The results for the Cox model are presented in Table 3. Moreover, in Tables 3 to 5 the computational time for each model is reported for a computer with processor
Cox model and Exponential survival copula models: Results from analysing the UK Transplant Registry data set. The binary covariates are defined as follows, sex: female versus male (reference), donor type: living versus deceased (reference), and age group:
years versus
years (reference).
Cox model and Exponential survival copula models: Results from analysing the UK Transplant Registry data set. The binary covariates are defined as follows, sex: female versus male (reference), donor type: living versus deceased (reference), and age group:
Gompertz survival copula models: Results from analysing the UK Transplant Registry data set. The binary covariates are defined as follows, sex: female versus male (reference), donor type: living versus deceased (reference), and age group:
Weibull survival copula models: Results from analysing the UK Transplant Registry data set. The binary covariates are defined as follows, sex: female versus male (reference), donor type: living versus deceased (reference), and age group:
Graft failure following transplant
Across all considered models, sex is not found to be associated with the risk for graft failure. Compared to a deceased donor transplant, living donor transplant is associated with lower risk for graft failure. Older age (
In the fitted Cox model for graft failure, there was evidence for non-proportional hazards for all three covariates. Using Grambsch and Therneau’s approach 21 to diagnose non-proportionality, we obtained P-values of 0.091, 0.017 and <0.001 for sex, age and donor type, respectively.
Death following transplant
For all considered models, including the Cox model and all copula models, all three covariates: sex, age and donor type, are found to be associated with death after transplant. In particular, female sex, living donor transplant and younger age (
In the fitted Cox model for death, there was evidence for non-proportional hazards for donor type (
Association between graft failure and death
The results from all the copula survival models showed that the association between graft failure and death is stronger for individuals in the older age group compared to the younger age group.
In most of the fitted copula models, we observed no difference between female and male recipients for the association between graft failure and death. The exceptions are the Clayton copula models and Frank copula Gompertz model which show stronger association between these two end points for female recipients. Similarly, in most of the fitted copula models, we observed no difference between living and deceased donors for the association between graft failure and death. The exceptions are the Clayton copula models and Frank copula Exponential model which showed stronger association between these two end points for living donor recipients.
Results for the preferred model
In our real data analyses, we reported the AIC value for each model. However, we did not compare Cox model with the copula models using AIC. This is because the Cox model was fitted to graft failure and death, separately, and the Cox model has a partial likelihood instead of a full likelihood. In contrast, the copula survival model analysed graft failure and death jointly, and has a full likelihood. Hence, the AIC values for the Cox model and the copula-based parametric survival models are not comparable. However, we used the AIC to compare between the copula survival models. For each survival distribution, the Frank copula model is preferred according to the AIC (Tables 3 to 5). Moreover, for each copula model, the Weibull survival distribution provides the lowest AIC.
In the Frank copula Weibull survival model, the association between graft failure and death is affected by age, with individuals in the older age group having a stronger association compared to those in the younger age group. Living donor recipient group is at lower risk for graft failure and death, with respective hazard ratios 0.560 (95% CI: 0.532–0.588) and 0.519 (95% CI: 0.490–0.549). Female sex is associated with lower risk for death compared to men, with hazard ratio 0.920 (95% CI: 0.882–0.957). The hazard ratio of graft failure for the older age group is 1.202 (95% CI: 1.152–1.251) and the hazard ratio of death is 3.734 (95% CI: 3.565–3.903), indicating the older age group is at higher risk of graft failure and death.
The Frank copula Weibull survival model suggests that in general male patients above 50 years who received a transplant from a deceased donor are at highest risk for death following kidney transplant. At the other extreme, younger female patients who received a transplant from a living donor are at the lowest risk for death. The hazard functions estimated from the preferred model are presented in Figure 1 for these two subgroups, respectively. For both subgroups, the hazard for graft failure is greatest immediately following the transplant and gradually decreases within 4 years where it is stabilised (Figure 1(a)). The hazard for death is stable over time since transplant (Figure 1(b)). For male recipients aged

Estimated hazard functions for two subgroups of patients: male recipients above 50 years with deceased donor and female recipients aged below 50 years with living donor, using Frank copula with Weibull survival times: (a) graft failure; (b) death.
We conduct two simulation studies. In simulation study 1, we aim to assess the performance of the proposed bivariate copula regression models for semi-competing risk data. We simulate data to mimic the real example of renal transplant data. We compare the performance of three models: the Cox proportional hazards model; the bivariate copula regression models with predictors for the hazard rates (copula model 1); and the bivariate copula regression models with predictors for both hazard rates and the association parameters (copula model 2).
In simulation study 2, we aim to assess the effect of misspecification of the survival distributions on the estimation of the model parameters. We use the AIC value to select the best fitting model for the simulated data and calculate the percentage that the underlying survival model is correctly selected.
Design
We simulate data with known marginal hazard rates for the non-terminal and terminal events, and the correlation between the two event times. We assess the accuracy of the estimates for the hazard ratios of the non-terminal and terminal events, We generate the binary covariates, We generate Pearson’s correlation coefficient for the Normal copula, or association parameters for the Clayton, Frank and Gumbel copulas, respectively, as described in Table 2, with chosen Hazard rates for the non-terminal event, We use the conditional distribution method
22
to simulate time to the non-terminal and terminal events from a specified copula.
Generate Generate We obtain We simulate censoring time, Set the time to the non-terminal event, Set the time to the terminal event,
In simulation study 1, the simulated data are analysed using Cox model, and the models with underlying copula function. For example, when analysing the simulated data generated using the Clayton copula, we use the Clayton copula model. Sorrell et al.
8
conducted an extensive simulation study to evaluate the effect of misspecification of copula functions and we will not duplicate that simulation in this article.
Choice of parameters
We generate data sets with 3000 individuals with hazard rates and association parameters mimicking the real data analysis from the UKTR data set. The binary covariates are generated from a Bernoulli distribution with parameters mimicking the real data. Specifically, the age group
The regression coefficients are set to be equal to those in Supplemental Materials Tables S1 and S2 for simulation studies 1 and 2, respectively. The censoring time,
Performance measures
We maximise the log-likelihood described in Section 3 using the
Results of simulation study 1: Performance of the proposed bivariate copula regression models
For the non-terminal event, the hazard ratio of age group estimated from the Cox model has the following coverage probabilities,
For the hazard ratios of sex and donor type for the non-terminal event, both the Cox model and copula model 2 result in coverage probabilities close to the nominal level. However, the results from the copula models show slight reductions in bias and MSE. For example, when using the Cox model for data generated from the Normal copula, the bias and MSE of the hazard ratio are
Results of simulation study.
Results of simulation study.
Copula model 1 represents the copula model with the marginal hazard rates depending on the covariates. Copula model 2 represents the copula model with both the marginal hazard rates and association parameters depending on the covariates. Data are generated from the Normal, Clayton, Frank and Gumbel copulas and the true copula distribution is used for estimation. We generate 1000 data sets with 3000 individuals in each. The non-terminal hazard ratio for each covariate is given by
The results for the hazard ratios of the terminal event are similar between the Cox model, copula models 1 and 2 with coverage probability close to
We evaluate the use of alternative survival distributions and the effects of misspecification of the survival distribution. For the simulation studies investigating misidentification, we simulate data mimicking the real data set using four different copula distributions, with true values given in Supplemental Material Table S2. We also evaluate the use of AIC to select the survival distributions. We simulate data with the following combinations of survival distributions and copula functions, Exponential, Weibull and Gompertz survival distributions and Normal, Clayton, Gumbel and Frank copulas.
The results of simulation study 2 are given in Supplemental Material Table S6. Here we give a summary of findings from this simulation study. Our simulation study shows that AIC can be used to select the survival distributions. The survival distribution is chosen correctly in almost 100% cases for the Weibull distribution, roughly 91% or above for Gompertz distribution, and around 78% or above for Exponential distribution. Where the underlying Exponential distribution is not correctly identified, about 10% of the times the survival distribution is misspecified as Weibull and about 10% as Gompertz distribution. However, the estimation is in general robust to the misidentification of the Exponential distributions by Weibull or Gompertz distribution. The summary of performance of the models selected by the lowest AIC shows that the coverage probability is close to the nominal level.
Discussion
We propose a set of copula regression models to allow both the hazard rates and the association between times to non-terminal and terminal events to vary by covariates. We use conditional copulas 9 with predictors for the hazard rates of the semi-competing risk events and the association parameters. The advantage of our proposed method is the estimation of the hazard ratios by taking into account the correlation between the semi-competing risks and the flexibility of using a variety of copulas to describe different patterns in the relationship between the survival endpoints.
Our work demonstrates the importance of considering the correlation between the semi-competing risks. In the real data analysis, different conclusions were found for the effect of age group, where Cox model showed older age was associated with lower risk of graft failure (HR: 0.911, 95% CI: 0.872–0.952), while our proposed copula models showed older age was associated with higher risk of graft failure from all copula models. The estimated hazard ratio from Frank copula Weibull survival model, which has the lowest AIC, is 1.202 with 95% CI: 1.152–1.251 (Table 5). We conduct simulation studies to assess the performance of the copula regression models and to evaluate the use of AIC to choose survival distributions. The estimation of the hazard ratio for the non-terminal event is improved by using copula model 2, with coverage probability around the nominal level, compared to using the Cox model which has coverage probability 0% in some scenarios (Table 6). These results corroborate the findings from Leffondré et al. 23 For the non-terminal hazard ratio, we found a reduction in bias and MSE using the copula regression models compared to the Cox model. Our work highlights the importance of acknowledging the semi-competing risk when analysing the effect of a covariate on an endpoint, where the covariate has a strong effect on the competing risk.
We have considered the effect of misspecification of survival distributions. Our simulation studies show that the AIC can be used to select the survival distributions in the majority of cases. In the remaining cases, misspecification is more likely to occur when the underlying distribution is Exponential or Gompertz. However, in our simulation studies, miss-specifying these two distributions by a Weibull distribution still holds good properties in terms of the parameter estimates. Further research may also investigate other parametric survival distributions or the use of non-parametric methods to model the marginal survival functions.
We have used a full maximum likelihood approach for estimating the model parameters. We acknowledge the two-stage estimation procedures for the association parameter may be more efficient for copula survival models. 24 In the two-stage approach, the association parameter and the parameters in the marginal survival distributions are estimated separately. In stage 1, parameters in the marginal distributions are estimated assuming they are independent. In stage 2, the association parameter is estimated by fixing the two marginal distributions at the estimates from stage 1. This approach ignores the dependency structure in stage 1, however, it offers the advantage of being practically efficient, especially when the models become more complex. The application of the two-stage approach in our proposed models and the comparison with the one-stage full maximum likelihood approach is a topic for future research. Another possible extension could be to estimate the marginal survival function using non-parametric methods 25 or Cox proportional hazards model. 26 It would be also of interest to investigate the use of grid search methods to find starting values for optimising the likelihood function.
In this article, we have extended our recent work in bivariate semi-competing risk models by including covariates. Another topic of future research could be to extend the model to include frailty terms to account for unmeasured covariates. Since we have bivariate semi-competing risk data, this extension will require to model the frailty terms by using a bivariate distribution to account for the potential correlation between the two frailty terms.
We have used Exponential, Weibull and Gompertz distributions to illustrate the methods and applications of our proposed models. However, this can be readily extended to other parametric survival distributions. We have assessed the performance of our proposed copula survival models using simulation studies and compare between them using AIC. Assessing the goodness-of-fit of the copula survival models may be developed in future research.
As this article focuses on describing copula regression models with binary covariates, the dichotomous age groups (
Further research may investigate the inclusion of continuous and categorical covariates. We have considered a linear function to incorporate covariates into the hazard rates and association parameter, however, alternatives may be considered. Acar et al. 27 have developed methods to compare potential functions that describe the association parameter of the copula function’s relationship with the covariate by using a generalised likelihood ratio test. We have used available case analysis to illustrate the application of our methods. Future research can use multiple imputation to deal with missing data when using our proposed methods. This could be time consuming when the normal copula model is used.
As can be seen in Tables 3 to 5, it takes 2 to 4 h to optimise the likelihood for normal copula models. Future research may investigate how to speed up the optimisation process for the analysis of a single data set, and the use of parallel computing in R for conducting simulation studies.
Supplemental Material
sj-pdf-1-smm-10.1177_09622802231188516 - Supplemental material for Bivariate copula regression models for semi-competing risks
Supplemental material, sj-pdf-1-smm-10.1177_09622802231188516 for Bivariate copula regression models for semi-competing risks by Yinghui Wei, Małgorzata Wojtyś, Lexy Sorrell and Peter Rowe in Statistical Methods in Medical Research
Footnotes
Acknowledgements
The authors gratefully acknowledge the NHS Blood and Transplant for the access to the UK Transplant Registry data used in this project. We are grateful to all the transplant centres in the UK who contributed data on which this project is based. We are grateful to have access to the University of Plymouth Faculty of Science and Engineering High Performance Computing Clusters for conducting our simulation studies.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: LS received a PhD studentship from the University of Plymouth School of Engineering, Computing and Mathematics. YW was supported by an UKRI MRC Fellowship (MC/W021358/1). YW and MW received funding from UKRI EPSRC Impact Acceleration Account (EP/X525789/1).
Supplemental material
Supplemental materials for this article are available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
