Sage Journals: Discover world-class research

Abstract

We consider a parametric modelling approach for survival data where covariates are allowed to enter the model through multiple distributional parameters (i.e., scale and shape). This is in contrast with the standard convention of having a single covariate-dependent parameter, typically the scale. Taking what is referred to as a multi-parameter regression (MPR) approach to modelling has been shown to produce flexible and robust models with relatively low model complexity cost. However, it is very common to have clustered data arising from survival analysis studies, and this is something that is under developed in the MPR context. The purpose of this article is to extend MPR models to handle multivariate survival data by introducing random effects in both the scale and the shape regression components. We consider a variety of possible dependence structures for these random effects (independent, shared and correlated), and estimation proceeds using a h-likelihood approach. The performance of our estimation procedure is investigated by a way of an extensive simulation study, and the merits of our modelling approach are illustrated through applications to two real data examples, a lung cancer dataset and a bladder cancer dataset.

Keywords

survival analysis correlated survival data frailty model parametric regression modelling multi-parameter regression h-likelihood

1 Introduction

Standard survival analysis models such as the proportional hazards (PH) model are known to have a single covariate-dependent parameter, the scale parameter. As a means to extend these models and afford more flexibility in the modelling of survival data, Burke and MacKenzie (2017) developed a multi-parameter regression (MPR) approach for a parametric hazard function. MPR is an approach whereby more than one distributional parameter is allowed to depend on covariates, and this is sometimes referred to as ‘distributional regression’ (Stasinopoulos et al., 2018; see also Rigby and Stasinopoulos, 2005, and Stasinopoulos et al., 2007). Using the Weibull MPR model as an example, Burke and Mackenzie’s (2017) paper demonstrates the advantages of allowing both the scale and the shape parameters to depend on covariates simultaneously. More recently, Burke et al. (2019) extended a semi-parametric accelerated failure time (AFT) model to multiparameter regression and Burke et al. (2020) explored a parametric MPR survival modelling framework in which the baseline hazard function follows an adapted power generalized Weibull (APGW) distribution.

Standard MPR models rely on the assumption of independent event times, and such an assumption cannot be met in studies with clustered observations. This includes studies where successive or recurrent event times are recorded on each subject; multi-centre studies where survival times of individuals from the same centre may be dependent due to centre-specific conditions, clinical or otherwise; family studies; and matched pair studies. Various methods have been developed to model the lack of independence in clustered data, but perhaps the most standard approach is the one whereby a cluster-specific random effect is introduced into the model. Given these random effects, the data are assumed to be conditionally independent. In the context of survival analysis, such random effects models are commonly referred to as frailty models, and, since the random effect represents a common effect on all members in a cluster, these types of common risk models are known as shared frailty models (Clayton, 1978; Duchateau and Janssen, 2008; Hougaard, 2012).

Extensions of the MPR models to include random effects have been centred around the ‘classical’ multiplicative frailty model, whereby the frailty term is included in the linear predictor corresponding to the scale parameter. Peng et al. (2020) extend a Weibull MPR model for interval censored data to include a gamma distributed multiplicative frailty term, but only for univariate frailty. In their article, Jones et al. (2020) extend the MPR PGW and MPR APGW models to handle bivariate data using multiplicative frailty, where the dependence is understood using links to the well known power variance copula (Goethals et al., 2008; Duchateau and Janssen, 2008; Hougaard, 2012). The attraction of MPR modelling is the flexibility afforded by allowing the hazard shapes to differ, but just as the shape can vary according to covariates, so too could it have random variation, for example, resulting from differences across multiple centres. Hence, in this article, we go beyond the classical multiplicative model to allow for a wider range of model structures: specifically, models in which a frailty term is included in each distributional parameter. This, we believe, is a more natural way of modelling correlated data using MPR models than multiplicative frailty which relates only to the scale of the hazard. Furthermore, our proposed model accounts for the potential correlation between the frailty terms themselves since scale and shape effects may be correlated.

Parameter estimation in the parametric shared frailty model is commonly achieved through integrating out the frailties from the conditional survival likelihood. The resulting equation is an explicit expression for the marginal likelihood, containing the fixed parameters and no longer the frailties. This marginal likelihood can then be maximized using numerical methods to yield estimates for the fixed effects. Inference about the random effects is not readily available and integrating out the frailties from the joint density typically involves the evaluation of analytically intractable integrals over the random-effect distributions. Numerical integration methods such as the Gauss-Hermite (G-H) quadrature can be used to approximate the value of the integrals, however, when the dimensionality of the integral is high, the number of quadrature points grows exponentially with the number of random effects and the approximation is sub-optimal. While several methods, such as the Monte Carlo Expectation-Maximisation, Markov Chain Monte Carlo and Gibbs sampling have been used to overcome the issue of intractable integrals, these methods are notoriously known for being computationally intensive. This is especially true when the number of random effects (clusters) is large or when the complex correlation structure among the clustered survival times requires the assumption of multiple frailties (Vaida and Xu, 2000; Ripatti and Palmgren, 2000; Abrahantes et al., 2007; Duchateau and Janssen, 2008). We adopt a so-called ‘hierarchical likelihood’ (h-likelihood) approach. This approach was originally proposed by Lee and Nelder (1996) for a generalized mixed-effects model but further studied and developed to provide a straightforward, unified framework for various random effect models including frailty models (Ha et al., 2001, 2002; Ha and Lee, 2003; Ha et al., 2017).

In contrast to the standard marginal likelihood approach, the h-likelihood framework treats the random effects or frailties as model parameters, which are then jointly estimated with the fixed parameters and the frailty dispersion parameter(s). Estimates of the fixed and random effects are found by maximizing the log-likelihood function conditional on the random effects plus a penalty term whose value depends on how dispersed the random parameters are: if the random effects have a large dispersion parameter, then the penalty term takes a small value. By treating the random effects as model parameters, the h-likelihood framework avoids the intractable integration needed to calculate the marginal likelihood and provides an efficient estimation procedure. Furthermore, classical analysis of random effects models focuses on the estimation of the fixed parameters and the frailty variance parameter(s), but in many recent applications, estimation of the random effects is also of interest. Such estimates allow for the survivor function for individuals with given characteristics to be estimated the cluster specific failure time distribution—and are especially useful in multi-centre studies when the frailty term represents the centres. Estimates of the random effects in such studies provide information about the merits of the different centres in terms of patient survival and are useful for investigating the potential heterogeneity in survival among clusters in order to better understand and interpret the variability in the data (Ha et al., 2016b).

The remainder of this article is organized as follows: Section 2 describes the MPR model, its extended version with random effects and the h-likelihood procedure for parameter estimation in the given model. Section 3 presents results of extensive simulation studies. The proposed methods are illustrated on two datasets arising from multi-centre studies in Section 4, followed by a discussion and conclusion in Section 5.

2 The MPR Frailty model

2.1 Model formulation

To formulate a shared frailty MPR model for survival data, we assume time-to-event data arising from a multi-centre study. In this study, we have q centres or clusters, with n_i individuals (patients), i = 1, 2, 3,...,q. The total sample size is the total number of individuals coming from all q centres, that is, $n = \sum_{i = 1}^{q} n_{i}$ . We define ${\tilde{T}}_{i j}$ as the survival time for the $j$ th individual, $j = 1,2,3, \dots, n_{i}$ , in the $i$ th cluster and $C_{i j}$ as the corresponding censored time. The cumulative hazard function for a shared frailty MPR model takes the following parameteric form:

Λ (t_{i j} | x_{i j}, v_{β i}, v_{α i}) = τ_{i j} Λ_{0} (t_{i j}^{γ_{i j}}),

(1)

where $Λ_{0} (.)$ is the underlying cumulative hazard function with scale parameters $τ_{i j} > 0$ and shape parameter $γ_{i j} > 0$ . The corresponding hazard function is given by

λ (t_{i j} | x_{i j}, v_{β i}, v_{α i}) = τ_{i j} γ_{i j} t_{i j}^{γ_{i j} - 1} λ_{0} (t_{i j}^{γ_{i j}}),

where

Λ_{0} (.)

is the baseline hazard function.

Λ_{0} (.; γ_{i j})

can be one of the commonly used survival distributions (Weibull, Gomperts, or log-logistic) (see Table 1).

Table 1

Possible distributions

$Λ_{0} (t)$	Distribution
t	Weibull
$\exp (t) - 1$	Gompertz
$\log (1 + t)$	log-logistic

The two distributional parameters, $τ_{i j}$ and $γ_{i j}$ depend on covariates as follows:

\log (τ_{i j}) = x_{i j}^{T} β + v_{β i}, \log (γ_{i j}) = x_{i j}^{T} α + v_{α i},

where $x_{i j} = {(1, x_{β i j 1}, \dots, x_{β i j p})}^{T}$ is the covariate vector, $β = {(β_{0}, β_{1}, \dots, β_{p})}^{T}$ and $α = {(α_{0}, α_{1}, \dots, α_{p})}^{T}$ are the corresponding regression coefficient vectors, $v_{β i}$ and $v_{α i}$ denote the scale and shape random effects from the ith cluster respectively, and a log link is used to ensure positivity of the parameters. Although we allow the scale and the shape parameters to depend on the same set of covariates, the parameters may or may not have covariates in common depending on the value of the corresponding regression coefficients. The individual cluster-specific effects (scale and shape random effects) are assumed to be independently and identically distributed (i.i.d.) according to some distribution.

In their work, Burke and MacKenzie (2017) found the distributional parameters in an MPR model to be highly correlated. To account for the possibility of the propagation of this correlation to the corresponding frailty terms, we allow for correlation between the two random effects terms, $v_{β i}$ and $v_{α i}$ , by assuming that $v_{i} = {(v_{β i}, v_{α i})}^{T}$ follow a bivariate normal distribution such that

v_{i} = (\begin{array}{l} v_{β i} \\ v_{α i} \end{array}) ~ B V N ((\begin{array}{l} 0 \\ 0 \end{array}), Σ = (\begin{matrix} σ_{β}^{2} & ρ σ_{β} σ_{α} \\ ρ σ_{α} σ_{β} & σ_{α}^{2} \end{matrix})) .

Fitting the model that assumes the bivariate normal distribution for the frailty terms avoids the potentially restrictive assumption of independence. Although we only consider the normal distribution for the random effects here, estimates of the fixed effects $(β, α)$ are usually robust against violations of this assumption if the censoring rate or frailty variance parameter is not too high (Ha et al., 2001; Ha and Lee, 2003; Ha et al., 2011, 2016b).

Note that our proposed MPR frailty model generalizes a variety of well-known (and lesser-known) models summarized in Table 2. Moreover, Figure 1 provides a schematic displaying the paths from our most general model, through the various sub-models, to the simplest (classical) PH sub-model.

Table 2

Key sub-models via parameter constraints

	Constraint
Model	$α$	$σ_{β}^{2}$	$σ_{α}^{2}$	$ρ$
PH model (Cox, 1972)	0	0	0	n/a
Multiplicative frailty PH model (Duchateau and Janssen, 2008)	0	-	0	n/a
MPR model (Burke and MacKenzie, 2017; Burke et al., 2020)	-	0	0	n/a
Scale frailty MPR model (Peng et al., 2020; Jones et al., 2020)	-	-	0	n/a
Shape frailty MPR model	-	0	-	n/a
Independent frailty	-	-	-	0
Common frailty (Ha et al., 2017)	-	-	-	$\pm 1$

Figure 1

A schematic diagram of some of the possible models generalized by the proposed MPR frailty model. Note: When going from the Common Frailty model to the MPR Shape Frailty model, the interpretation is that $σ_{β}^{2} \to 0$ and $ϕ \to \infty$ but $σ_{α}^{2} = ϕ^{2} σ_{β}^{2}$ is a constant; $u = \exp (v)$ .

Here, ‘ $α = 0$ ’ means $α = {(α_{0}, 0, \dots, 0)}^{⊤}$ . ‘-’ means that the model parameter is unconstrained and ‘n/a’ is used to reflect the fact that ρ is not meaningful in models where $σ_{β}^{2}$ or $σ_{α}^{2}$ are equal to 0. In the common frailty model, $ρ = \pm 1$ is achieved by setting $v_{α i} = ϕ v_{β i}$ , where $ϕ > 0$ .

2.2 Construction of the h-likelihood

Denoting the observed data by the pairs $(T_{i j}, δ_{i j})$ , where $T_{i j} = \min ({\tilde{T}}_{i j}, C_{i j})$ , the observed survival time for the jth individual in the ith cluster; and $δ_{i j}$ is the censoring indicator, which takes the value 0 for a censored observation and 1 for an event. Under the standard assumptions that, given $ν_{β}$ and $ν_{α}$ , the censored times ( $C_{i j}$ s) and the event times $({\tilde{T}}_{i j} s)$ are conditionally independent and the censoring is conditionally non-informative, the log-h-likelihood function of the proposed model (2.1) is given by

h = h (θ, Σ) = \sum_{i j} l_{1 i j} + \sum_{i} l_{2 i},

(2.2)

where

l_{1 i j} = l_{1 i j} (θ; t_{i j}, δ_{i j} ∣ v_{i}) = δ_{i j} \{\log τ_{i j} + \log γ_{i j} + (γ_{i j} - 1) \log t_{i j} + \log (λ_{0} (t_{i j}^{γ_{i j}}))\} - τ_{i j} Λ_{0} (t_{i j}^{γ_{i j}})

is the logarithm of the conditional joint density function for t_ij and δ_ij given the random effects $v_{i} = (v_{β i}, v_{α i})$ , where t_ij is the realization of T_ij, and

l_{2 i} = l_{2 i} (Σ; v_{i}) = - \log (2 π σ_{β} σ_{α} \sqrt{1 - ρ^{2}}) - \frac{1}{2 (1 - ρ^{2})} (\frac{v_{β i}^{2}}{σ_{β}^{2}} + \frac{v_{α i}^{2}}{σ_{α}^{2}} - 2 ρ \frac{v_{β i} v_{α i}}{σ_{β} σ_{α}})

is the logarithm of the density function of v_i with dispersion parameters (i.e., frailty parameters) $Σ = {(σ_{β}, σ_{α}, ρ)}^{⊤}$ , and $θ = {(β^{T}, α^{T})}^{T}$ as the vector of the fixed parameters. For meaningful inferences, it is important to define the h-likelihood on a particular scale of v , such that the random effects occur linearly in the linear predictor; hence, the log link we see in (1). (Lee and Nelder, 1996; Ha et al., 2001; Lee et al., 2017; Ha et al., 2017).

2.3 Estimation and inference procedure

From the h-likelihood function given in (2.2), we can derive two likelihoods, namely the marginal likelihood and the restricted likelihood. The marginal likelihood eliminates the random effects, v , from h while the restricted likelihood eliminates the fixed effects θ from the marginal likelihood; thus, the latter eliminates both the fixed effects θ and the random effects v from h. In theory, the h-likelihood should be used for inference about v , the marginal likelihood should be used for inference about θ and the restricted likelihood for inference on the dispersion parameter(s) (Patterson and Thompson, 1971; Harville, 1977). However, when the marginal likelihood is intractable, Lee and Nelder (1996, 2001) propose using adjusted profile likelihoods, p_v(h) and p_θ,v(h) as approximations to the marginal likelihood and the restricted likelihood respectively. Given the log-h-likelihood h with nuisance parameters ω , the adjusted profile likelihood suggested by Lee and Nelder (1996) eliminates the nuisance parameters ω from h and is given by

p_{ω} (h) = {[h - \frac{1}{2} \log \det {H (h; ω) / (2 π)}]|}_{ω = \hat{ω}},

where $H (h; ω) = - \partial^{2} h / \partial ω^{2}$ is the core of the adjustment term and ω is profiled by setting it to $\hat{ω}$ (the solution of $\partial h / \partial ω = 0$ ). Note here that the function $p_{ω} (.)$ produces an adjusted profile likelihood, profiling out (i.e., eliminating) the nuisance parameters ω, which can be the fixed effects $θ$ , the random effects $v$ or both. $p_{v} (h)$ is the first-order Laplace approximation to the marginal likelihood, $m$ (Lee and Nelder, 2001). Similarly, $p_{θ} (m)$ is the first-order Laplace approximation to the restricted likelihood. Moreover, $p_{θ, v} (h)$ approximates $p_{θ} (p_{v} (h))$ and therefore, $p_{θ} (m)$ .

With the exception of binary data with a small cluster size (i.e., $n_{i} = 2$ ), it has been found that $h$ and $p_{v} (h)$ give very similar results when used in the estimation of the fixed effects, $θ$ (Lee et al., 2017). Hence, Lee et al. (2017) recommend the joint maximization of $h$ over the fixed and random effects and refer to this as the uncorrected h-likelihood method. This methods works well for various models (Ha et al., 2002, 2017), including the models we propose. To summarise, we use $h$ for the estimation of $θ$ and $v$ , and $p_{θ, v} (h)$ for the estimation of $Σ$ . Note that $p_{θ, v} (h)$ is a function of $Σ$ only because it eliminates $(θ, v)$ , and, as mentioned above, is an approximation of the restricted likelihood. For details on h-likelihood inference, see Lee et al. (2017, Section 6.3.3] and Ha and Lee (2021).

The h-likelihood function given in (2) is maximized to obtain the maximum h-likelihood estimators (MHLEs) of $θ = (β, α)$ and $v = (v_{β}, v_{α})$ . The score functions are given by

\begin{array}{l} \partial h / \partial β = X^{T} U_{β}, \\ \partial h / \partial α = X^{T} U_{α}, \\ \partial h / \partial v_{β} = Z^{T} U_{β} - U_{v_{β}} and \\ \partial h / \partial v_{α} = Z^{T} U_{α} - U_{v_{α}} . \end{array}

$X$ is an $n \times p$ matrix whose $i j$ th row is $x_{i j}$ , $Z$ is an $n \times q$ matrix whose $i j$ th row is $z_{i j}$ , a vector indicating the cluster effect; $U_{β}$ and $U_{α}$ are vectors of length $n$ such that

\begin{array}{l} U_{β i j} = δ_{i j} - τ_{i j} Λ_{0} (t_{i j}^{γ_{i j}}) and \\ U_{α i j} = δ_{i j} + {δ_{i j} (1 + \frac{t_{i j}^{γ_{i j}} λ_{0^{'}} (t_{i j}^{γ_{i j}})}{λ_{0} (t_{i j}^{γ_{i j}})}) - τ_{i j} t_{i j}^{γ_{i j}} λ_{0} (t_{i j}^{γ_{i j}})} γ_{i j} \log t_{i j}; \end{array}

and $U_{v_{β}}$ and $U_{v_{α}}$ are vectors of length $q$ , such that

\begin{array}{l} U_{v_{β i}} = \frac{\partial l_{2}}{\partial v_{β i}} = \frac{1}{(1 - ρ^{2})} [\frac{v_{β i}}{σ_{β}^{2}} - \frac{ρ v_{α i}}{σ_{β} σ_{α}}] and \\ U_{v_{α i}} = \frac{\partial l_{2}}{\partial v_{α i}} = \frac{1}{(1 - ρ^{2})} [\frac{v_{α i}}{σ_{α}^{2}} - \frac{ρ v_{β i}}{σ_{β} σ_{α}}] . \end{array}

The corresponding observed information matrix can be written explicitly as

H = - \frac{\partial^{2} h}{\partial {(θ, v)}^{2}} = (\begin{matrix} X^{⊤} W_{β} X & X^{⊤} W_{β α} X & X^{⊤} W_{β} Z & X^{⊤} W_{β α} Z \\ X^{⊤} W_{β α} X & X^{⊤} W_{α} X & X^{⊤} W_{β α} Z & X^{⊤} W_{α} Z \\ Z^{T} W_{β} X & Z^{T} W_{β α} X & Z^{T} W_{β} Z + Q_{β} & Z^{T} W_{β α} Z + Q_{β α} \\ Z^{T} W_{β α} X & Z^{T} W_{α} X & Z^{T} W_{β α} Z + Q_{β α} & Z^{T} W_{α} Z + Q_{α} \end{matrix}),

(2.3)

where $X$ and $Z$ are $n \times p$ and $n \times q$ model matrices for $β$ , $α$ and $v$ whose $i j$ th rows are $x_{i j}$ and $z_{i j}$ , respectively. $W_{β}$ , $W_{α}$ and $W_{β α}$ are $n \times n$ diagonal matrices whose $i j$ th diagonal elements are given by

\begin{array}{l} w_{β i j} = & τ_{i j} Λ_{0} (t_{i j}^{γ_{i j}}), \\ w_{α i j} = & - {δ_{i j} (1 + t_{i j}^{γ_{i j}} a_{i j}) + γ_{i j} t_{i j}^{γ_{i j}} \log t_{i j} (δ_{i j} (\frac{t_{i j}^{γ_{i j}} λ_{{0^{'}}^{'}} (t_{i j}^{γ_{i j}})}{λ_{0} (t_{i j}^{γ_{i j}})} + a_{i j} - t_{i j}^{γ_{i j}} {(a_{i j})}^{2}) - \\ τ_{i j} (λ_{0} (t_{i j}^{γ_{i j}}) + t_{i j}^{γ_{i j}} λ_{0^{'}} (t_{i j}^{γ_{i j}}))) - τ_{i j} t_{i j}^{γ_{i j}} λ_{0} (t_{i j}^{γ_{i j}})} γ_{i j} \log t_{i j}, \\ w_{β α i j} = & τ_{i j} γ_{i j} t_{i j}^{γ_{i j}} \log (t_{i j}) λ_{0} (t_{i j}^{γ_{i j}}), \end{array}

where $a_{i j} = \frac{λ_{0^{'}} (t_{i j}^{γ_{i j}})}{λ_{0} (t_{i j}^{γ_{i j}})}$ . $Q_{β}, Q_{α}$ , and $Q_{β α}$ are $q \times q$ matrices whose $i j$ th elements arise from $- \partial^{2} l_{2} / \partial v \partial v^{T}$ such that

\begin{array}{l} Q_{β} = - \frac{\partial^{2} l_{2}}{\partial v_{β} \partial v_{β}^{T}} = I_{q} \times \frac{1}{(1 - ρ^{2}) σ_{β}^{2}}, \\ Q_{α} = - \frac{\partial^{2} l_{2}}{\partial v_{α} \partial v_{α}^{T}} = I_{q} \times \frac{1}{(1 - ρ^{2}) σ_{α}^{2}}, \\ Q_{β α} = - \frac{\partial^{2} l_{2}}{\partial v_{β} \partial v_{α}^{T}} = - \frac{\partial^{2} l_{2}}{\partial v_{α} \partial v_{β}^{T}} = I_{q} \times - \frac{ρ}{(1 - ρ^{2}) σ_{β} σ_{α}}, \end{array}

where $I_{q}$ is a $q \times q$ identity matrix. Given $Σ$ , the following system of Newton-Raphson equations can be solved iteratively for the MHLEs of $θ^{(m + 1)} = (β^{{(m + 1)}^{T}}, α^{{(m + 1)}^{T}})^{T}$ and $v^{(m + 1)} = (v_{α}^{{(m + 1)}^{T}}, v_{β}^{{(m + 1)}^{T}})^{T}$

(\begin{matrix} {\hat{θ}}^{(m + 1)} \\ {\hat{v}}^{(m + 1)} \end{matrix}) = (\begin{matrix} {\hat{θ}}^{(m)} \\ {\hat{v}}^{(m)} \end{matrix}) + H^{- 1} (\begin{matrix} \partial h / \partial θ \\ \partial h / \partial v \end{matrix}) |_{(θ, v) = ({\hat{θ}}^{(m)}, {\hat{v}}^{(m)})},

(2.4)

where $\partial h / \partial θ = (\partial h / \partial β, \partial h / \partial α)^{T}$ and $\partial h / \partial v = (\partial h / \partial v_{β}, \partial h / \partial v_{α})^{T}$ . For the estimation of the dispersion parameters, $Σ = (σ_{β}, σ_{α}, ρ)^{T}$ , the adjusted profile likelihood, $p_{θ, v} (h)$ , which eliminates $(θ, v)$ is used. Given $\hat{θ} = (\hat{β} (Σ), \hat{α} (Σ))$ , $\hat{v} = ({\hat{v}}_{β} (Σ), {\hat{v}}_{α} (Σ))$ , $h$ given in (2) and $H$ defined in (3), $p_{θ, v} (h)$ is defined as

p_{θ, v} (h) = [h - \frac{1}{2} \log det {H / (2 π {)}] |}_{(θ, v) = (\hat{θ}, \hat{v})} .

(2.5)

Here, $H = H (h; θ, v) = - \partial^{2} h / \partial {(θ, v)}^{2}$ provides an adjustment when eliminating $(θ, v)$ to approximate the restricted maximum likelihood (REML) for $Σ$ . Thus, solving the equations $\partial p_{θ, v} (h) / \partial Σ = 0$ yields the (approximate) REML estimators of $Σ$ . We opt for a non-linear optimizer implemented in R using the function nlm. The procedure iterates between $(\hat{θ}, \hat{v})$ and $\hat{Σ}$ until all the estimates converge. The standard errors for $(\hat{θ}, \hat{v})$ and $\hat{Σ}$ can be estimated directly from the inverse of the observed information matrices, $H$ and $- \partial^{2} p_{θ, v} (h) / \partial {(Σ)}^{2}$ respectively (Ha et al., 2016b, 2017). Note that this general h-likelihood estimation and inference procedure has been used for bivariate normal random effects previously, albeit in contexts different from ours, for example, random coefficients proportional hazards (Ha et al., 2011) and correlated competing risks (Ha et al., 2016a). Moreover, an anonymous reviewer highlighted the recent related work of Chen and Wang (2020) who used a penalized likelihood approach for bivariate normal random effects in a joint longitudinal-survival model.

2.4 Fitting algorithm

The model estimation algorithm described above can be summarized as follows:

Initialization:

Using 0.01 as the initial values for the scale and shape coefficients, we fit a fixed effects Weibull MPR model. Estimates from this model are used as the initial values for the fixed parameters, $θ$ , in the mixed effects Weibull MPR model. For the initial values of the random effects, $v$ , we use 0.01 and (0.1, 0.1, 0.1) is used for the dispersion parameters, $Σ = (σ_{β}, σ_{α}, ρ)^{T}$ .

Parameter estimation:

Step 1 Keeping the frailty variance parameters $\hat{Σ}$ fixed, maximize $h$ by iteratively re-solving the system of equations given in (2.4) to obtain the new estimates $(\hat{θ}, \hat{v})$ .

Step 2 Given the estimates $(\hat{θ}, \hat{v})$ from Step 1, a new estimate $\hat{Σ}$ is obtained by maximizing (2.5) using nlm.

Iterate between Step 1 and Step 2 until the convergence criterion is met, that is until the maximum absolute difference between the previous and current estimates for (θ, ν) and (Σ) is less than 10⁻⁶. After convergence is reached, the standard errors are estimated based on H for ( $\hat{θ}, \hat{ν}$ ) and $- \partial^{2} p_{θ, v} (h) / \partial {(Σ)}^{2}$ for $\hat{Σ}$ .

From the simulations we have carried out, we have found the algorithm to be computationally efficient, and it has almost always converged except for a small number of cases when the censoring percentage is high and the sample size is small (e.g., 50% censoring under (q, n_i). After convergence, the validity of our estimates is confirmed by comparing with the true parameters in the simulation scenario, for example, the estimation bias decreases asymptotically.

3 Simulation studies

The performance of the proposed methods is evaluated through simulation studies. The Weibull distribution is one of the most commonly used distributions in survival analysis, and hence, we chose to generate the survival times from a Weibull MPR frailty model with the following regression parameters

\begin{array}{l} β = {(β_{0}, β_{1}, β_{2})}^{T} = {(1, - 0.5, 0.5)}^{⊤} and \\ α = {(α_{0}, α_{1}, α_{2})}^{T} = {(0.5, 0.5, - 0.5)}^{⊤} . \end{array}

The corresponding covariates, $x = {(1, x_{1}, x_{2})}^{T}$ , were generated from an AR(1) process with a correlation coefficient of 0.5, and each variable is marginally standard normal. The corresponding censored times were generated from a uniform distribution with a censoring rate of approximately 25% or 50% respectively. Following the two real data structures in Section 4, three different cluster sizes, $n_{i} \in {5, 20, 50}$ , and two different cluster numbers, $q \in {20, 100}$ , are considered (but note that in contrast to this scheme, the real data has varying cluster sizes and this is something we have also investigated, see Supplementary Material). The frailty terms are generated from a bivariate normal distribution with the combination of various dispersion parameter values $σ_{β} \in (0.5, 1, 2)$ , $σ_{α} \in {0.25, 0.5, 1}$ , $ρ \in {- 0.5, 0.5}$ . Each simulation scenario was replicated 500 times.

To summarize the simulation results, we compute the mean and standard deviation (SD) over simulation replicates along with the average of the standard errors (ASE) computed as described in Section 2.3. The results for a censoring rate of 25%,

σ_{β} = 1, σ_{α} = 0.5

and

ρ = - 0.5

are presented in Table 3. Similar tables of the results from other simulation setups can be found in the Supplementary Material. We also present the empirical coverage probabilities for nominal 95% confidence intervals constructed for all estimated parameters (fixed, random and frailty dispersion parameters) in Table 4.

Table 3

Averaged coefficient estimates, standard deviations (SD) and the average standard errors (ASE) for the simulation scenario with dispersion parameters $σ_{β} = 1, σ_{α} = 0.5$ and $ρ = - 0.5$ and a censoring rate of 25%

	${\hat{β}}_{0}$	${\hat{β}}_{1}$	${\hat{β}}_{2}$	${\hat{α}}_{0}$	${\hat{α}}_{1}$	${\hat{α}}_{2}$	${\hat{α}}_{β}$	${\hat{α}}_{α}$	$\hat{ρ}$
	Mean	Mean	Mean	Mean	Mean	Mean	Mean	Mean	Mean
	(SD)	(SD)	(SD)	(SD)	(SD)	(SD)	(SD)	(SD)	(SD)
$(q, n_{i})$	(ASE)	(ASE)	(ASE)	(ASE)	(ASE)	(ASE)	(ASE)	(ASE)	(ASE)
True	1	—0.5	0.5	0.5	0.5	—0.5	1	0.5	—0.5
(20, 5)	1.30	—0.58	0.59	0.64	0.50	—0.49	1.18	0.51	—0.41
	(0.35)	(0.27)	(0.25)	(0.18)	(0.11)	(0.11)	(0.33)	(0.15)	(0.39)
	(0.32)	(0.21)	(0.20)	(0.15)	(0.09)	(0.09)	(0.20)	(0.09)	(0.19)
(20, 20)	1.05	—0.52	0.52	0.54	0.50	—0.50	1.00	0.50	—0.50
	(0.24)	(0.08)	(0.08)	(0.12)	(0.04)	(0.04)	(0.20)	(0.09)	(0.23)
	(0.24)	(0.08)	(0.08)	(0.12)	(0.04)	(0.04)	(0.16)	(0.08)	(0.17)
(20, 50)	1.03	—0.51	0.50	0.52	0.50	—0.50	1.01	0.50	—0.49
	(0.23)	(0.05)	(0.05)	(0.12)	(0.02)	(0.02)	(0.17)	(0.08)	(0.19)
	(0.23)	(0.05)	(0.05)	(0.12)	(0.02)	(0.02)	(0.16)	(0.08)	(0.17)
(100, 5)	1.21	—0.56	0.57	0.59	0.49	—0.49	1.07	0.52	—0.46
	(0.15)	(0.10)	(0.10)	(0.08)	(0.04)	(0.04)	(0.13)	(0.06)	(0.16)
	(0.13)	(0.08)	(0.08)	(0.07)	(0.04)	(0.04)	(0.08)	(0.04)	(0.09)
(100, 20)	1.05	—0.51	0.51	0.53	0.50	—0.50	1.01	0.50	—0.50
	(0.11)	(0.04)	(0.04)	(0.05)	(0.02)	(0.02)	(0.09)	(0.04)	(0.10)
	(0.11)	(0.04)	(0.04)	(0.05)	(0.02)	(0.02)	(0.07)	(0.04)	(0.08)
(100, 50)	1.02	—0.50	0.50	0.51	0.50	—0.50	1.00	0.50	—0.50
	(0.11)	(0.02)	(0.02)	(0.05)	(0.01)	(0.01)	(0.08)	(0.04)	(0.08)
	(0.10)	(0.02)	(0.02)	(0.05)	(0.01)	(0.01)	(0.07)	(0.04)	(0.08)

Table 4

Coverage probabilities

$(q, n_{i})$	$β_{0}$	$β_{1}$	$β_{2}$	$α_{0}$	$α_{1}$	$α_{2}$	$σ_{β}$	$σ_{α}$	$ρ$	$ν_{β}$	$ν_{α}$
(20, 5)	0.85	0.88	0.88	0.81	0.90	0.87	0.76	0.77	0.64	0.93	0.86
(20, 20)	0.94	0.95	0.96	0.92	0.93	0.95	0.88	0.90	0.83	0.93	0.94
(20, 50)	0.95	0.95	0.95	0.94	0.94	0.94	0.93	0.92	0.87	0.95	0.95
(100, 5)	0.65	0.83	0.85	0.72	0.90	0.93	0.72	0.76	0.75	0.95	0.94
(100, 20)	0.90	0.92	0.94	0.91	0.94	0.95	0.88	0.88	0.88	0.95	0.94
(100, 50)	0.91	0.96	0.94	0.94	0.96	0.94	0.95	0.92	0.91	0.95	0.95

Overall, the h-likelihood estimates of both the fixed parameters and the frailty dispersion parameters perform quite well. The bias in the estimates is reduced as we increase both the cluster size and the number of clusters: this is observed in all the combinations of dispersion parameters and for both censoring percentages considered. The standard errors appear to be underestimated in the smaller sample sizes. This is especially true for the frailty variance parameters, and to a lesser extent, for the fixed effects. As we increase the cluster size and the number of clusters, however, we see that the standard errors reduce for all the parameters, and the ASE converges towards the SD. Similarly, we see from Table 4 that the empirical coverage of the nominal 95% confidence intervals improves with the cluster size, albeit this convergence is slower for the frailty parameters. These findings are in line with the fact that the adjusted profile likelihood $p_{θ, v} (h)$ approximates the restricted likelihood, becoming exact as $n^{*} = \min_{1 \leq i \leq q} n_{i} \to \infty$ (Ha et al., 2016b). Note that the increase of n* rather than q improves the approximation more effectively, and, for example, we see in Table 3 that the results are better for the larger cluster sizes of n_i = 20 and n_i = 50; see also Ha et al. (2017) and Ha et al. (2010). In any case, the use of the standard errors in hypothesis testing for the frailty variance parameters is generally recommended against (Maller and Zhou, 2003).

4 Data analysis

4.1 Modelling details

We also illustrate the proposed models on two datasets. Though our main focus is the MPR model with BVN frailties, for the purpose of comparison and analysis, we also consider various simplifications of this model. More precisely, we fit Weibull MPR models with the following frailty structures to each of the datasets we consider.

BVNF: BVN frailty $(- 1 \leq ρ \leq 1)$

IF: independent frailty $(ρ = 0)$

CF: common frailty $(ρ = \pm 1)$

ScF: scale frailty only $(σ_{α}^{2} = 0)$ , which is the classical multiplicative frailty model

ShF: shape frailty only $(σ_{β}^{2} = 0)$

These models are fitted following the procedures described in Sections 2.3 and 2.4, and the standard errors for the estimated parameters are computed as described in Section 2.3. For the selection of the fraily structure that is best supported by the data, we use the Akaike information criterion (AIC). Various extended definitions of the AIC in random-effect models can be formulated based on different likelihood functions (Vaida and Blanchard, 2005; Xu et al., 2009; Ha et al., 2007, 2017). More specifically, we make use of the restricted AIC (rAIC) (Ha et al., 2007) and the conditional AIC (cAIC) (Vaida and Blanchard, 2005; Ha et al., 2017). The rAIC is based on the restricted likelihood approximation $p_{θ, v} (h)$ , which eliminates $(θ, ν)$ from h and thus is a function of the frailty parameters only. In contrast, the cAIC is based on the conditional joint density function for t_ij and $δ_{i j}$ given the random effects, $l_{1 i j}$ . Definitions of the AICs are as follows:

\begin{array}{l} r A I C = - 2 p_{θ, v} (h) + 2 d f_{r}, \\ c A I C = - 2 \sum_{i j} l_{1 i j} + 2 d f_{c}, \end{array}

where df_r is the number of dispersion parameters governing the frailty distribution, $d f_{c} = trace (H^{- 1} H^{*})$ is the effective degrees of freedom adjusted for the fixed and random effect estimates, and $H^{*} = - \partial^{2} \sum l_{1 i j} / \partial {(θ, v)}^{2}$ (Ha et al., 2017; Lee et al., 2017). The computation of df_c involves the fixed effects, the random effects and the frailty distribution parameters, but note that, in a model with no frailty, df_c is just the number of fixed effects in the model, and hence the cAIC becomes the classical AIC in this case (Ha et al., 2017). After fitting each of the aforementioned models as well as a model with no frailty (NF), we obtain the corresponding rAIC and cAIC values for the purpose of model comparison.

In the Weibull MPR models (or any other parametric MPR model with a scale and shape parameter), the scale coefficients describe the overall scale of the hazard and the shape coefficients describe its evolution over time. A positive scale coefficient indicates an increase in the hazard relative to some reference category and similarly, a positive shape coefficient indicates an increasing hazard over time relative to some reference category. While an examination of the β and α coefficients separately provides some initial understanding of the effect of a variable, it is important to look at the combined information from both coefficients when determining its overall effect; hence, it is important to look at the hazard ratios. For a binary covariate, x_k, Burke and MacKenzie (2017) show the hazard ratio under the Weibull MPR frailty model is given by

{HR}_{k} (t) = \frac{λ (t ∣ x_{k} = 1)}{λ (t ∣ x_{k} = 0)} = \exp (β_{k} + α_{k}) t^{\exp (x_{(- k)}^{⊤} α) \{\exp (α_{k}) - 1\}},

where $β_{k}$ and $α_{k}$ are, respectively, the scale and shape coefficients of x_k, and $x_{(- k)} = (1, x_{1}, \dots, x_{k - 1}, 0, x_{k + 1}, \dots, x_{p})$ , the covariate vector with x_k set to 0. (Note here that we have dropped the subscripts ij for notational convenience.) Because HR _k depends on the values of the other covariates in the model via the vector x(_-k), we set them to their empirical modal values. In line with this, we also set the random effect $ν_{α}$ to its modal value of zero.

4.2 Extensive-stage small-cell lung cancer

This dataset was collected as part of a randomized, multi-centre study conducted by the Eastern Cooperative Oncology Group (ECOG). The main purpose of the trial was to determine if cyclic alternating combination chemotherapy was superior to cyclic standard chemotherapy in patients with extensive-stage small-cell lung cancer. Patients were randomly assigned to one of two treatment arms: standard chemotherapy (CAV; reference category) or an alternating regimen (CAV-HEM). The dataset includes 579 patients from 31 different institutions, with the number of patients per institution ranging from 1 to 56 and a median of 17 patients per institution. The outcome variable was time (in years) from randomization until death. The median survival time and maximum follow-up time were 0.86 years and 8.45 years, respectively, and of the 579 study participants, only 10 were censored yielding a censoring rate of approximately 1.7%. Besides the survival time, censoring status, institution code and treatment, four other dichotomous variables were included in this dataset, namely (reference category listed first): the presence of bone metastases (no, yes), the presence of liver metastases (no, yes), patient status on entry (confined to bed or chair, ambulatory), and whether there was weight loss prior to entry (no, yes). More details on the trial and its clinical results can be found in Ettinger et al. (1990). This dataset was also previously analysed in Gray (1994) using a fully Bayesian approach, in Vaida and Xu (2000) using a marginal likelihood approach, and in Ha et al. (2016b) using a correlated frailty model fitted using a h-likelihood approach.

We fit the models listed in Section 4.1 to this lung cancer dataset and the results are presented in Table 5. To explore the degree of dependence between the two random components,

ν_{β}

and

ν_{α}

, we first fit the BVNF model, (i.e., the model which assumes a bivariate normal distribution). The estimate of

ρ (\hat{ρ} = 0.995)

indicates a very strong positive correlation between the predicted random components

{\hat{ν}}_{β}

and

{\hat{ν}}_{α}

. This perhaps suggests that the model could be simplified and only one random component is needed. Note that the large

\hat{ϕ}

value in the CF model is due to the very small

{\hat{ν}}_{β}

values, and this may be pointing towards a shape frailty only model.

Table 5

The coefficient estimates, frailty dispersion parameter estimates and estimated standard errors (in brackets) from each model we fit to the lung cancer dataset

		NF	BVNF	IF	CF	ScF	ShF
Scale	Intercept	0.11	0.12	0.11	0.11	0.12	0.11
		(0.13)	(0.13)	(0.13)	(0.13)	(0.13)	(0.13)
	Treatment	−0.24	−0.23	−0.24	−0.24	−0.24	−0.24
	CAV-HEM	(0.09)	(0.09)	(0.09)	(0.09)	(0.09)	(0.09)
	Bone metastases	0.23	0.24	0.22	0.22	0.25	0.22
	Yes	(0.10)	(0.10)	(0.10)	(0.10)	(0.10)	(0.10)
	Liver metastases	0.33	0.38	0.39	0.39	0.32	0.39
	Yes	(0.10)	(0.10)	(0.10)	(0.10)	(0.10)	(0.10)
	Patient status	−0.58	−0.63	−0.62	−0.62	−0.60	−0.62
	Ambulatory	(0.11)	(0.11)	(0.11)	(0.11)	(0.11)	(0.11)
	Weight loss	0.19	0.21	0.21	0.21	0.19	0.21
	Yes	(0.10)	(0.10)	(0.10)	(0.10)	(0.10)	(0.10)
Shape	Intercept	0.14	0.23	0.22	0.22	0.16	0.22
		(0.08)	(0.12)	(0.12)	(0.12)	(0.08)	(0.12)
	Treatment	−0.28	−0.23	−0.24	−0.24	−0.27	−0.24
	CAV-HEM	(0.06)	(0.07)	(0.07)	(0.07)	(0.06)	(0.07)
	Bone metastases	0.04	0.02	0.03	0.03	0.03	0.03
	Yes	(0.07)	(0.08)	(0.08)	(0.08)	(0.07)	(0.08)
	Liver metastases	−0.14	−0.11	−0.11	−0.11	−0.13	−0.11
	Yes	(0.07)	(0.07)	(0.07)	(0.07)	(0.07)	(0.07)
	Patient status	0.33	0.38	0.38	0.38	0.33	0.38
	Ambulatory	(0.07)	(0.09)	(0.09)	(0.09)	(0.08)	(0.09)
	Weight loss	0.06	0.02	0.03	0.03	0.05	0.03
	Yes	(0.07)	(0.08)	(0.08)	(0.08)	(0.07)	(0.08)
Frailty	${\hat{σ}}_{β}$		0.08	0.00	0.00	0.16
			(0.01)	(0.02)	(0.00)	(0.03)
	${\hat{σ}}_{α}$		0.29	0.31			0.31
			(0.04)	(0.05)			(0.05)
	$\hat{ρ}$		1.00
			(0.04)
	$\hat{ϕ}$				3 666.42
					(439.09)
$- 2 p_{θ, ν} (h)$		1 123.40	1 077.74	1 079.52	1 079.52	1 121.35	1 079.52
$d f_{r}$		0	3	2	2	1	1
$r A I C - r A I C_{\min}$		41.88	2.22	2.00	2.00	41.83	0
$d f_{c}$		12.00	30.96	30.89	30.89	19.89	30.89
$c A I C - c A I C_{\min}$		65.19	0	2.08	2.06	61.09	2.08

Note: For the CF model, an estimate for $σ_{α}$ can be found by evaluating $\hat{ϕ} {\hat{σ}}_{β}$ . Given the above values ${\hat{σ}}_{α} = 0.31$ .

With the exception of the ScF model, all the models containing at least one frailty component have quite similar rAIC values, all much smaller than the rAIC value from a no frailty model, with the model with a frailty term in the shape only, the ShF model, having the lowest value. The ScF model, which is the more standard multiplicative frailty model, has a similar rAIC value to the model with no frailty components; this suggests that the baseline risk is homogenous across the centres and a frailty term in the scale is not needed. We observe a similar pattern in the cAIC values, in that all the models containing at least one frailty component (with the exception of the ScF model) have similar cAIC values and all much smaller compared to the NF model. Although the BVNF model has the lowest cAIC value, it is still close to those of the IF, CF, and ShF models. Given this, and also the rAIC values, we consider the ShF as the ‘best’ model.

Although we report the standard errors of the frailty variance parameters, one should not use them for testing the hypothesis $H_{0} : σ = 0$ (Vaida and Xu, 2000). A likelihood ratio test can be carried out instead. Since the value of σ in the null hypothesis is at the boundary of the parameter space, the standard approximation of the loglikelihood ratio statistic by a $χ_{1}^{2}$ distribution often leads to over conservative test results (Self and Liang, 1987; Stram and Lee, 1994; Duchateau and Janssen, 2008). To correct this bias, a mixture of a chi-square distribution with one and zero degrees of freedom, $(χ_{0}^{2} + χ_{1}^{2}) / 2$ , should be used as the approximation of the log-likelihood ratio statistic (Maller and Zhou, 2003). The test statistic at the 5% significance level is thus 2.71. The difference in deviance, $- 2 p_{θ, v} (h)$ between the NF model and the ShF model is 43.878, and hence the shape frailty is significant, suggesting that, $σ_{α} > 0$ .

We now focus on the results from the model selected by the rAIC, the ShF model. Considering the scale parameter results first, all the variables included in the model have significant scale coefficients. The CAV-HEM treatment and the subject being ambulatory on entry have negative scale coefficients and so reduce the hazard of death relative to their respective reference categories, the CAV treatment and subject being confined to bed or chair on entry respectively. The presence of bone metastases, the presence of liver metastases and weight loss prior to study entry are all found to increase the hazard of death relative to their reference categories. Now, considering the shape parameter, only two variables, treatment and patient status on entry, have significant coefficients. The CAV-HEM treatment has a negative shape coefficient suggesting that the hazard further decreases over time, relative to the reference category. The positive shape coefficient for the subject being ambulatory on entry suggests that the hazard increases over time. Thus, although this variable has an initial effect of reducing the hazard of death, this effect wears off over time, relative to its reference category.

Figure 2 shows the hazard ratio corresponding to each of the variables in our model along with 95% confidence intervals estimated using a parametric bootstrap (Davison and Hinkley, 1997). The presence of bone metastases and weight loss prior to study entry appear to have a more or less constant hazard over time, which is expected since their corresponding shape effects are quite small and not significant. The presence of liver metastases has a negative effect on the hazard but this effect wears off within the first 2 to 3 years. The hazard ratio for the patient being ambulatory on entry appears to be increasing over time. The effectiveness of the CAV-HEM treatment is only observed after 2 months or so from the treatment start date and the hazard continues to decrease over time relative to the CAV treatment.

Figure 2

Note: The modal values were used in the computation of these hazard ratios (Treatment = CAV, Bone Metastases = no, Liver Metastases = no, Patient Status = ambulatory on entry, Weight Loss = yes).

The random effects along with their 95% confidence intervals under the ShF model are shown in Figure 3. Note that as the cluster size increases, the confidence bounds around the cluster effect shrink. The biggest changes in centre specific hazard over time can be seen in centres 12, 16 and 20. A positive frailty suggests the hazard is increasing over time relative to the baseline, while a negative one suggests it is decreasing relative to the baseline.

Figure 3

Note: Centres are sorted in increasing order based on the number of patients.

4.3 Bladder cancer

This multi-centre dataset was collected as part of the European Organisation for Research and Treatment of Cancer (EORTC) trial 30791 (Sylvester et al., 2006). A total of 410 patients with superficial bladder cancer were included in this dataset. The patients came from 21 different centres and the number of patients per centre varied between 3 and 78 patients with a median of 15 patients per centre. The outcome variable is relapse-free or disease-free interval after transurethral resection, that is, time from randomization until cancer relapse. Patients who did not experience a recurrence during the follow-up period were censored at their last date of follow-up. The maximum follow-up time was 10.16 years and 204 of the 410 patients (approximately 50% of the patients) were right-censored. The two covariates included in this dataset are (reference categories are listed first): a treatment indicator for chemotherapy (no, yes) and a variable representing the prior recurrence (no, yes). This dataset was also previously analysed in Ha et al. (2011) and can be found in the R package frailtyHL (Ha et al., 2012, 2017). As in the previous example, we fit the models listed in Section 4.1 to this dataset and compare the fit using the rAIC and cAIC. The results are presented in Table 6.

Table 6

The coefficient estimates, frailty dispersion parameter estimates and estimated standard errors (in brackets) from each model we fit to the bladder cancer dataset

		NF	BVNF	IF	CF	ScF	ShF
Scale	Intercept	−0.79	−0.71	−0.70	−0.70	−0.70	−0.79
		(0.18)	(0.19)	(0.20)	(0.20)	(0.20)	(0.18)
	Chemotherapy	−0.72	−0.74	−0.74	−0.74	−0.74	−0.72
	Yes	(0.19)	(0.19)	(0.19)	(0.19)	(0.19)	(0.19)
	Prior recurrence	0.55	0.57	0.57	0.57	0.57	0.55
	Yes	(0.17)	(0.17)	(0.17)	(0.17)	(0.17)	(0.17)
Shape	Intercept	−0.19	−0.17	−0.19	−0.19	−0.19	−0.19
		(0.13)	(0.13)	(0.13)	(0.13)	(0.13)	(0.13)
	Chemotherapy	0.03	0.02	0.03	0.03	0.03	0.03
	Yes	(0.13)	(0.13)	(0.13)	(0.13)	(0.13)	(0.13)
	Prior recurrence	−0.01	0.01	0.02	0.02	0.02	−0.01
	Yes	(0.12)	(0.12)	(0.12)	(0.12)	(0.12)	(0.12)
Frailty	${\hat{σ}}_{β}$		0.22	0.28	0.27	0.28
			(0.06)	(0.06)	(0.06)	(0.06)
	${\hat{σ}}_{α}$		0.06	0.00			0.03
			(0.02)	(0.03)			(0.03)
	$\hat{ρ}$		1.00
			(0.07)
	$\hat{ϕ}$				0.07
					(0.22)
$- 2 p_{θ, ν} (h)$		946.96	943.75	943.28	943.28	943.28	946.96
$d f_{r}$		0	3	2	2	1	1
$r A I C - r A I C_{\min}$		1.68	4.47	2.00	2.00	0	3.68
$d f_{c}$		6.00	12.76	13.09	13.11	13.09	6.35
$c A I C - c A I C_{\min}$		6.46	0.70	0.05	0	0.05	6.45

Note: For the CF model, an estimate for $σ_{α}$ can be found by evaluating $\hat{ϕ} {\hat{σ}}_{β}$ . Given the above values ${\hat{σ}}_{α} = 0.31$ .

Similar to the previous example, the correlation coefficient between the two random effects, $ν_{β}$ and $ν_{α}$ is approximately equal to one, suggesting that the model can be simplified. The model with the single frailty parameter in the scale has the lowest rAIC value and therefore is the preferred model, suggesting that there is substantial variation in the baseline risk across the centres and this variation is constant overtime. All of the models containing a scale frailty term have similar cAIC values, which are significantly lower than the values for both the NF model and the ShF model, suggesting the need for a scale frailty term. Since the difference in cAIC between the models BVNF, IF, CF and ScF is small, we focus on the simplest model in that set, the ScF model.

The difference in deviance, $- 2 p_{θ, v} (h)$ , between the NF model and the ScF model is 3.682 (> 2.71), and hence the scale centre effect is significant at the 5% significance level, suggesting that, $σ_{β} > 0$ . A caterpillar plot of the random effects from the ScF model is presented in Figure 4 and again the centres are sorted by the number of patients. Centre 19 appears to have the only significant effect; and hence, the observed frailty effect is solely due to this one centre having a lower hazard than the other centres included in the study. This, perhaps, explains why the rAIC is only slightly lower than the NF model.

Figure 4

Note: Centres are sorted in increasing order based on the number of patients.

Focusing on the results from the ScF model, both variables included in the model have significant scale coefficients but non-significant shape coefficients. The variable Chemotherapy has a negative coefficient and so the hazard is significantly reduced, that is, time until recurrence is prolonged for patients that received chemotherapy relative to those who did not. Prior recurrence has a positive coefficient and so having had a recurrence already significantly increases the risk of another recurrence, relative to it being a primary occurrence. The shape coefficients corresponding to the two variables, albeit non-significant, are both positive, suggesting that the effect of chemotherapy wears off over time, and also the risk of another recurrence increases with time after a prior recurrence. Plots of the hazard ratios along with their bootstrapped 95% confidence intervals can be seen in Figure 5. As expected, the two variables have approximately constant hazards over time and so perhaps the model can be reduced to a PH model.

Figure 5

Note: The modal values were used in the computation of these hazard ratios (Chemotherapy = yes, Prior Recurrence = no).

5 Discussion and conclusions

The MPR frailty modelling framework we have proposed not only includes frailty structures that have not been previously explored in the literature, but also generalizes a variety of existing sub-models. Existing literature on MPR frailty models has been limited to multiplicative frailty, and, to the best of our knowledge, a model with correlated frailty in each distributional parameter has not previously been considered. We believe that this is a natural structure in the context of MPR modelling, since it has been shown that estimates of the scale and shape can be quite correlated in practice (Burke and MacKenzie, 2017); hence, it is useful to allow for the possibility that correlation may propagate to the frailty terms.

Although the numerical studies were carried out on a Weibull MPR model, we have developed the model and the estimation procedure in a generic form; the underlying cumulative hazard function can be replaced with that of any other two parameter distribution. In principle, the methods can also be extended to models with more than two distributional parameters, for example, the power generalized Weibull model of Burke et al. (2020), using a higher order multivariate normal distribution but this is beyond the scope of this article. The adopted h-likelihood framework provides a computationally inexpensive and straightforward two step procedure to fit our frailty models, avoiding the often intractable integration of the random effects over the frailty distribution. Moreover, the readily available estimates of the frailties allow for the survivor function for individuals with specific characteristics to be estimated, and this is useful in providing information about the merits of the different centres in terms of patient survival in multi-centre studies.

While the proposed MPR framework provides a very flexible approach to modelling correlated survival data at a minimal computational cost, there are various ways in which we can extend it to handle more complicated frailty structures. All of the models that we have considered have a constant frailty variance, perhaps a natural next step for us is to allow the frailty variance to depend on covariates in a similar fashion to that of Peng et al. (2020). Another potential direction worth exploring would be multilevel or nested frailty structures. We can have data on patients, nested within centres or hospitals, with recurrent event times; hence, we may need a frailty component for the patients and a frailty component for the centre or hospital. The procedures we present can be straightforwardly extended to include more than one random component for each distributional parameter.

Supplementary material

Supplementary material is available online.

Supplemental Material for Multi-parameter regression survival modelling with random effects by Fatima-Zahra Jaouimaa, Il Do Ha, Kevin Burke, in Statistical Modelling

Footnotes

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Funding

This work was funded by the Irish Research Council and the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. NRF-2020R1F1A1A01056987). This work was also supported by the Confirm Smart Manufacturing Centre (https://confirm.ie/) funded by Science Foundation Ireland (grant number: 16/RC/3918).

References

Abrahantes

, Legrand

, Burzykowski

, Janssen

, Ducrocq

and Duchateau

(2007) Comparison of different estimation procedures for proportional hazards model with random effects. Computational Statistics & Data Analysis , 51, 3913–30.

Burke

and MacKenzie

(2017) Multi-parameter regression survival modeling: An alternative to proportional hazards. Biometrics , 73, 678–86.

Burke

, Eriksson

and Pipper

(2019) Semi-parametric multiparameter regression survival modeling. Scandinavian Journal of Statistics , 47, 555–71.

Burke

, Jones

and Noufaily

(2020) A flexible parametric modelling framework for survival analysis. Journal of the Royal Statistical Society: Series C (Applied Statistics) , 69, 429–57.

Chen

and Wang

(2020) Joint modeling of binary response and survival for clustered data in clinical trials. Statistics in Medicine , 39, 326–39.

Clayton

(1978) A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika , 65, 141–51.

Cox

(1972) Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological) , 34, 187–202.

Davison

and Hinkley

(1997) Bootstrap methods and their application . Vol. 1. Cambridge University Press.

Duchateau

and Janssen

(2008) The frailty model . New York: Springer Science & Business Media.

10.

Ettinger

, Finkelstein

, Abeloff

, Ruckdeschel

, Aisner

and Eggleston

(1990) A randomized comparison of standard chemotherapy versus alternating chemotherapy and maintenance versus no maintenance therapy for extensive-stage small-cell lung cancer: a phase iii study of the eastern cooperative oncology group. Journal of Clinical Oncology , 8, 230–40.

11.

Goethals

, Janssen

and Duchateau

(2008) Frailty models and copulas: similarities and differences. Journal of Applied Statistics , 35, 1071–79.

12.

Gray

(1994) A bayesian analysis of institutional effects in a multicenter cancer clinical trial. Biometrics , pages 244–53.

13.

and Lee

(2003) Estimating frailty models via poisson hierarchical generalized linear models. Journal of Computational and Graphical Statistics , 12, 663–81.

14.

and Lee

(2021) A review of h-likelihood for survival analysis. Japanese Journal of Statistics and Data Science , 4, 1157–78.

15.

, Lee

and Song

(2001) Hierarchical likelihood approach for frailty models. Biometrika , 88, 233–3.

16.

, Lee

and Song

(2002) Hierarchical-likelihood approach for mixed linear models with censored data. Lifetime Data Analysis , 8, 163–76.

17.

, Lee

and MacKenzie

(2007) Model selection for multi-component frailty models. Statistics in Medicine , 26, 4790–807.

18.

, Noh

and Lee

(2010) Bias reduction of likelihood estimators in semiparametric frailty models. Scandinavian Journal of Statistics , 37, 307–20.

19.

, Sylvester

, Legrand

and MacKenzie

(2011) Frailty modelling for survival data from multi-centre clinical trials. Statistics in Medicine , 30, 2144–59.

20.

, Noh

and Lee

(2012) frailtyHL: a package for fitting frailty models with hlikelihood. R Journal , 4, 28–36.

21.

, Christian

, Jeong

J-H

, Park

and Lee

(2016a) Analysis of clustered competing risks data using subdistribution hazard models with multivariate frailties. Statistical Methods in Medical Research , 25, 2488–505.

22.

, Vaida

and Lee

(2016b) Interval estimation of random effects in proportional hazards models with frailties. Statistical Methods in Medical Research , 25, 936–53.

23.

, Jeong

J-H

and Lee

(2017) Statistical Modelling of Survival Data with Random Effects . Springer.

24.

Harville

(1977) Maximum likelihood approaches to variance component estimation and to related problems. Journal of the American Statistical Association , 72, 320–38.

25.

Hougaard

(2012) Analysis of multivariate survival data . Springer Science & Business Media.

26.

Jones

, Noufaily

and Burke

(2020) A bivariate power generalized weibull distribution: A flexible parametric model for survival analysis. Statistical Methods in Medical Research , 29, 2295–306.

27.

Lee

and Nelder

(1996) Hierarchical generalized linear models. Journal of the Royal Statistical Society: Series B (Methodological) , 58, 619–56.

28.

Lee

and Nelder

(2001) Hierarchical generalised linear models: a synthesis of generalised linear models, random-effect models and structured dispersions. Biometrika , 88, 987–1006.

29.

Lee

, Nelder

and Pawitan

(2017) Generalized linear models with random effects: unified analysis via H-likelihood . Vol 153. CRC Press.

30.

Maller

and Zhou

(2003) Testing for individual heterogeneity in parametric models for event history data. Mathematical Methods of Statistics , 12, 276–304.

31.

Patterson

and Thompson

(1971) Recovery of inter-block information when block sizes are unequal. Biometrika , 58, 545–54.

32.

Peng

, MacKenzie

and Burke

(2020) A multiparameter regression model for interval-censored survival data. Statistics in Medicine , 39, 1903–18.

33.

Rigby

and Stasinopoulos

(2005) Generalized additive models for location, scale and shape. Journal of the Royal Statistical Society: Series C (Applied Statistics) , 54, 507–54.

34.

Ripatti

and Palmgren

(2000) Estimation of multivariate frailty models using penalized partial likelihood. Biometrics , 56, 1016–22.

35.

Self

and Liang

K-Y

(1987) Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. Journal of the American Statistical Association , 82, 605–10.

36.

Stasinopoulos

and Rigby

(2007). Generalized additive models for location scale and shape. Journal of Statistical Software , 23, 1–46.

37.

Stasinopoulos

, Rigby

and Bastiani

(2018) Gamlss: a distributional regression approach. Statistical Modelling , 18, 248–73.

38.

Stram

and Lee

(1994) Variance components testing in the longitudinal mixed effects model. Biometrics , pages 1171–77.

39.

Sylvester

, van der Meijden

, Oosterlinck

, Witjes

, Bouffioux

, Denis

, Newling

and Kurth

(2006) Predicting recurrence and progression in individual patients with stage ta t1 bladder cancer using eortc risk tables: a combined analysis of 2596 patients from seven eortc trials. European Urology , 49, 466–77.

40.

Vaida

and Blanchard

(2005) Conditional akaike information for mixed-effects models. Biometrika , 92, 351–70.

41.

Vaida

and Xu

(2000) Proportional hazards model with random effects. Statistics in Medicine , 19, 3309–24.

42.

, Vaida

and Harrington

(2009) Using profile likelihood for semiparametric model selection with application to proportional hazards mixed models. Statistica Sinica , 19, 819–42.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.14 MB

Multi-parameter regression survival modelling with random effects

Abstract

Keywords

1 Introduction

2 The MPR Frailty model

2.1 Model formulation

Possible distributions

Key sub-models via parameter constraints

A schematic diagram of some of the possible models generalized by the proposed MPR frailty model. Note: When going from the Common Frailty model to the MPR Shape Frailty model, the interpretation is that σ β 2 → 0 and ϕ → ∞ but σ α 2 = ϕ 2 σ β 2 is a constant; u = exp ( v ) .

3 Simulation studies

Table 3

Averaged coefficient estimates, standard deviations (SD) and the average standard errors (ASE) for the simulation scenario with dispersion parameters σ β = 1 , σ α = 0.5 and ρ = − 0.5 and a censoring rate of 25%

Coverage probabilities

4.1 Modelling details

4.2 Extensive-stage small-cell lung cancer

Table 5

The coefficient estimates, frailty dispersion parameter estimates and estimated standard errors (in brackets) from each model we fit to the lung cancer dataset

Table 6

The coefficient estimates, frailty dispersion parameter estimates and estimated standard errors (in brackets) from each model we fit to the bladder cancer dataset

Supplementary material

Supplementary material is available online.

Footnotes

Declaration of Conflicting Interests

Funding

References

Supplementary Material

Averaged coefficient estimates, standard deviations (SD) and the average standard errors (ASE) for the simulation scenario with dispersion parameters $σ_{β} = 1, σ_{α} = 0.5$ and $ρ = - 0.5$ and a censoring rate of 25%