Marginal semiparametric accelerated failure time cure model for clustered survival data

Abstract

The semiparametric accelerated failure time mixture cure model is an appealing alternative to the proportional hazards mixture cure model in analyzing failure time data with long-term survivors. However, this model was only proposed for independent survival data and it has not been extended to clustered or correlated survival data, partly due to the complexity of the estimation method for the model. In this paper, we consider a marginal semiparametric accelerated failure time mixture cure model for clustered right-censored failure time data with a potential cure fraction. We overcome the complexity of the existing semiparametric method by proposing a generalized estimating equations approach based on the expectation–maximization algorithm to estimate the regression parameters in the model. The correlation structures within clusters are modeled by working correlation matrices in the proposed generalized estimating equations. The large sample properties of the regression estimators are established. Numerical studies demonstrate that the proposed estimation method is easy to use and robust to the misspecification of working matrices and that higher efficiency is achieved when the working correlation structure is closer to the true correlation structure. We apply the proposed model and estimation method to a contralateral breast cancer study and reveal new insights when the potential correlation between patients is taken into account.

Keywords

Clustered survival data mixture cure model efficiency accelerated failure time model marginal method generalized estimating equation

1. Introduction

Many studies involving time-to-event analysis have a fraction of study subjects who will not experience the event of interest even after an extended follow-up. They are often deemed as cured or long-term survivors who are immune or non-susceptible to the event. For instance, only 5% to 50% of patients with head and neck cancer¹ experienced local recurrences, and many patients were free of symptoms of cancer at the end of the sufficiently long observation period and can be considered cured. Due to the existence of cured subjects, using classical survival models, such as Cox’s proportional hazards (PH) model, for such data, can result in biased estimates and information loss, and cure models^2,3 that have been developed to take the cured subjects into account should be considered.

In addition to potentially cured subjects, clustered survival times are often observed in the studies. Clustering may occur when there are multiple events from one subject⁴ or multiple subjects from the same family or hospital,⁵ and the times to the event of interest from the same cluster tend to be correlated due to shared genetic or other common environments.

To appropriately account for the correlation in a cluster, the two most studied approaches are marginal models and frailty models. The marginal models focus on the population average on the marginals of the joint distribution of data from one cluster. The correlation within a cluster is either estimated by a working correlation or treated as a nuisance parameter in the marginal models. Alternatively, frailty models explicitly formulate the underlying dependence structure by random effects or frailties, and the failure times are assumed to be independent conditional on the unobservable frailty. These models have been studied extensively in the literature. For example, Rubio and Drikvandi⁶ developed a novel parametric mixed-effects general hazard model for the analysis of clustered survival data. The heterogeneity between clusters is modeled via the incorporation of random effects into a hazard-based regression model. Chiou et al.⁷ considered a semiparametric accelerated failure time (AFT) model for clustered failure times from stratified random sampling and proposed weighted rank-based estimating equations for fitting the model with the induced smoothing approach. The generalized estimating equations (GEE) method⁸ has been adopted in both the marginal AFT models^9,10 and the marginal PH models.^11,12

To model clustered survival data with a cured fraction, the marginal mixture cure model is often assumed and a robust variance estimation is used for inference.^5,13 The PH assumption was considered for survival times among uncured subjects in the marginal models. This approach was further generalized with a transformation model for survival times among uncured subjects^14,15 and with GEE to allow for more efficient estimation.^16–18 Other approaches for modeling clustered survival data were also considered in cure models, including the random effects/frailty approach^19–23 and the copula approach.^24–27

The existing models for clustered/correlated survival time with a cured fraction are largely based on the PH assumption when modeling the effects of covariates on the survival time of uncured subjects. Although the PH assumption is widely used in modeling censored survival data, it is not an assumption that is easier to satisfy in practice than other assumptions. It also suffers from difficulty in interpreting the estimated effects. Researchers will benefit from having more than one analytic technique at their disposal when the PH assumption is not appropriate.

One alternative assumption to the PH assumption is the AFT assumption. The attractive feature of the AFT model is that the effects of covariates are modeled directly on the expected value of the survival time, making the interpretation more intuitive and straightforward than the effects from the PH assumption.²⁸ The AFT model also enjoys other desirable properties, including collapsibility, that are not exhibited by a PH model. The collapsibility makes the AFT model particularly attractive when quantifying confounding effects²⁹ and mediation effects of covariates^30,31 in causal inference.

The AFT assumption has been considered in models for survival data with a cure fraction, including parametric AFT mixture cure models^32,33 and semiparametric AFT mixture cure models.^34–41 However, we are not aware of any existing work on modeling clustered survival data with a cured fraction using the AFT assumption, particularly under semiparametric models. This may be due to the challenges in extending the existing semiparametric estimation methods to clustered survival data. It motivates us to consider AFT assumption-based cure models for clustered survival time with a cured fraction. This work is important because it fills the gap in the literature and provides useful alternatives to the PH-based models for clustered survival data with a cured fraction. The interpretation of covariate effects in the AFT assumption-based models is more straightforward and intuitive, and the models are more suitable for future work in causal inference.

In this paper, we propose a marginal semiparametric AFT mixture cure model for clustered survival time data with a cured fraction. An estimating equation approach is employed to estimate the regression parameters in the model with flexible working correlation structures for cure statuses and for the survival times of uncured subjects within clusters. The paper is organized as follows. Section 2 introduces the marginal semiparametric AFT mixture cure model for clustered survival data with a cure fraction. A semiparametric estimation method of the model based on a set of GEE is presented and the asymptotic properties of the estimators are investigated in this section. A simulation study is conducted in Section 3 to evaluate the finite sample performance of the proposed estimation method. The proposed model and the estimation method are applied to contralateral breast cancer data in Section 4. Conclusions and discussions are presented in Section 5.

2. Marginal AFT mixture cure model

Let ${\tilde{T}}_{i j}$ and $C_{i j}$ be the failure time and censoring time of the $j$ th subject in the $i$ th cluster, where $i = 1, 2, \dots, K$ , $K$ is the number of clusters, $j = 1, 2, \dots, n_{i}$ , and $n_{i}$ is the number of subjects in the $i$ th cluster. The total number of subjects in all clusters is $N = \sum_{i = 1}^{K} n_{i}$ . Denote $T_{i j} = min ({\tilde{T}}_{i j}, C_{i j})$ as the observed failure time and $δ_{i j} = I {{\tilde{T}}_{i j} \leq C_{i j}}$ as the censoring indicator. Let $Z_{i j}$ and $X_{i j}$ denote a $p_{Z} \times 1$ vector of covariates and a $p_{X} \times 1$ vector of covariates, respectively. Given $Z_{i j}$ and $X_{i j}$ , the censoring time $C_{i j}$ is assumed to be independent of ${\tilde{T}}_{i j}$ . Let $ω_{i j}$ be the cure status of the $j$ th subject in the $i$ th cluster, where $ω_{i j} = 0$ if the subject is cured and $ω_{i j} = 1$ if not. We further assume that given $X_{i j}$ , $Z_{i j}$ , $X_{i^{'} j^{'}}$ , and $Z_{i^{'} j^{'}}$ , ${\tilde{T}}_{i j} | (ω_{i j} = 1)$ and ${\tilde{T}}_{i j^{'}} | (ω_{i j^{'}} = 1)$ , and $ω_{i j}$ and $ω_{i j^{'}}$ for $j \neq j^{'}$ are correlated, respectively. However, for $i \neq i^{'}$ , ${\tilde{T}}_{i j} | (ω_{i j} = 1)$ and ${\tilde{T}}_{i^{'} j^{'}} | (ω_{i^{'} j^{'}} = 1)$ , and $ω_{i j}$ and $ω_{i^{'} j^{'}}$ are assumed independent, separately. For cured subjects with $ω_{i j} = 0$ , we assume ${\tilde{T}}_{i j} = \infty$ (a finite value is possible as long as it is beyond the support of the distribution of ${\tilde{T}}_{i j} | (w_{i j} = 1)$ ). It is clear $ω_{i j} = 1$ if $δ_{i j} = 1$ , and $ω_{i j}$ is usually unknown if $δ_{i j} = 0$ .

Let $S (t | X_{i j}, Z_{i j})$ denote the survival function of ${\tilde{T}}_{i j}$ . We propose the following semiparametric marginal AFT mixture cure model for the data above:

S (t | X_{i j}, Z_{i j}) = P ({\tilde{T}}_{i j} > t | X_{i j}, Z_{i j}) = 1 - π (Z_{i j}) + π (Z_{i j}) S_{u} (t | X_{i j})

(1)

where

π (Z_{i j})

(referred to as the incidence part of the model) is the uncured probability specified as

π (Z_{i j}) = P (ω_{i j} = 1 | Z_{i j}) = \frac{\exp (γ^{'} Z_{i j})}{1 + \exp (γ^{'} Z_{i j})}

(2)

γ

is a

(p_{Z} + 1) \times 1

vector of unknown parameters (the intercept is included),

S_{u} (t | X_{i j})

is the survival function of

{\tilde{T}}_{i j} > t | (ω_{i j} = 1)

(referred to as the latency part of the model), which is assumed to follow the AFT model:

\log {\tilde{T}}_{i j} = β^{'} X_{i j} + ε_{i j}

(3)

where

β

is a

p_{X} \times 1

vector of unknown parameters (the intercept is excluded), and

ε_{i j}

is an error term with an unspecified survival function

S_{ε} (\cdot)

. It is clear that

S_{u} (t | X_{i j}) = S_{ε} (\log t - β^{'} X_{i j})

If we assume all $ω_{i j}$ are known and ignore the correlation within clusters, the likelihood function for the marginal survival model (1) based on available data ${(t_{i j}, δ_{i j}, X_{i j}, Z_{i j}, ω_{i j}), i = 1, 2, \dots, K; j = 1, 2, \dots, n_{i}}$ is

\begin{aligned} ℓ_{c} (β, γ, S_{ε} (\cdot)) & = \sum_{i = 1}^{K} \sum_{j = 1}^{n_{i}} {ω_{i j} \log π (Z_{i j}) + (1 - ω_{i j}) \log [1 - π (Z_{i j})]} \\ + \sum_{i = 1}^{K} \sum_{j = 1}^{n_{i}} ω_{i j} {δ_{i j} \log f_{ε} (ε_{i j} (β)) + (1 - δ_{i j}) \log S_{ε} (ε_{i j} (β))} \end{aligned}

where

ε_{i j} (β) = \log t_{i j} - β^{'} X_{i j}

and

f_{ε} (\cdot)

is the corresponding density function of

S_{ε} (\cdot)

. The first term involves

γ

whereas the second term involves

β

and

S_{ε}

. The expectation–maximization (EM) algorithm can be employed to maximize the likelihood function. Let

{\hat{γ}}^{(m)}

{\hat{β}}^{(m)}

, and

{\hat{S}}_{ε}^{(m)} (\cdot)

denote the estimates of

γ

β

, and

S_{ε} (\cdot)

, respectively, in the

m

th iteration of the algorithm. The E-step in the next iteration of the EM algorithm calculates the conditional expectation of

ℓ_{c}

with respect to

ω_{i j}

, which is equivalent to replacing

ω_{i j}

ℓ_{c}

with its posterior expectation conditional on observed data and the current estimates:

\begin{aligned} g_{i j}^{(m)} & = E (ω_{i j} | t_{i j}, δ_{i j}, X_{i j}, Z_{i j}, {\hat{γ}}^{(m)}, {\hat{β}}^{(m)}, {\hat{S}}_{ε}^{(m)} (\cdot)) \\ = δ_{i j} + {\frac{(1 - δ_{i j}) π (Z_{i j}) S_{ε} (ε_{i j} (β))}{1 - π (Z_{i j}) + π (Z_{i j}) S_{ε} (ε_{i j} (β))} |}_{(γ, β, S_{ε} (\cdot)) = ({\hat{γ}}^{(m)}, {\hat{β}}^{(m)}, {\hat{S}}_{ε}^{(m)} (\cdot))} \end{aligned}

(4)

The M-step solves the following estimating equations with respect to

γ

and

β

separately to obtain

{\hat{γ}}^{(m + 1)}

and

{\hat{β}}^{(m + 1)}

\begin{aligned} \sum_{i = 1}^{K} \sum_{j = 1}^{n_{i}} {(\frac{\partial π (Z_{i j})}{\partial γ})}^{'} {[π (Z_{i j}) (1 - π (Z_{i j}))]}^{- 1} (g_{i j}^{(m)} - π (Z_{i j})) & = 0 \end{aligned}

(5)

\begin{aligned} \sum_{i = 1}^{K} \sum_{j = 1}^{n_{i}} g_{i j}^{(m)} X_{i j} {- δ_{i j} \frac{(d f_{ε} / d ε) (ε_{i j} (β))}{f_{ε} (ε_{i j} (β))} + (1 - δ_{i j}) \frac{f_{ε} (ε_{i j} (β))}{S_{ε} (ε_{i j} (β))}} & = 0 \end{aligned}

(6)

Due to the unspecified baseline distribution, the estimating equation (6) cannot be solved directly. Following Ritov,⁴² we replace

- (d f_{ε} / d ε) (ε) / f_{ε} (ε)

with a score function

η (ε) = ε

and center

X_{i j}

to account for the unknown intercept term in (6) to obtain the following estimating equation:

\sum_{i = 1}^{K} \sum_{j = 1}^{n_{i}} g_{i j}^{(m)} (X_{i j} - \bar{X}) {δ_{i j} ε_{i j} (β) + (1 - δ_{i j}) \frac{\int_{ε_{i j} (β)}^{+ \infty} u d F_{ε} (u)}{S_{ε} (ε_{i j} (β))}} = 0

(7)

where

\bar{X} = N^{- 1} \sum_{i = 1}^{K} \sum_{j = 1}^{n_{i}} X_{i j}

and

F_{ε} (\cdot)

is the corresponding distribution function of

S_{ε} (\cdot)

. To obtain

{\hat{β}}^{(m + 1)}

from this estimating equation, an estimate of

S_{ε} (y)

is needed. It can be nonparametrically estimated by Zhang and Peng³⁵

{\hat{S}}_{ε}^{(m + 1)} (y) = \exp (- \sum_{s : τ_{s} < y} \frac{d_{s}}{\sum_{(i, j) \in R (τ_{s})} g_{i j}^{(m)}})

(8)

where

τ_{1}, \dots, τ_{k}

are the distinct uncensored failure residuals

ε_{i j} (β^{(m)})

d_{s} = \sum_{i = 1}^{K} \sum_{j = 1}^{n_{i}} δ_{i j} I {ε_{i j} (β^{(m)}) = τ_{s}}

denotes the number of subjects with failure at

τ_{s}

, and

R (τ_{s}) = {(i, j) : ε_{i j} (β^{(m)}) \geq τ_{s}, i = 1, 2, \dots, K, j = 1, 2, \dots, n_{i}}

is the risk set at

τ_{s}

. To enhance the identifiability of the parameter estimation, we also set

{\hat{S}}_{ε}^{(m)} (y) = 0

when

y > τ_{k}

as in other semiparametric cure model estimation methods.²

To take the potential correlation within clusters into account in the estimation, we consider incorporating working correlation matrices into the estimating equations (5) and (7). Let $Q_{i}^{(1)}$ and $Q_{i}^{(2)}$ be the $n_{i} \times n_{i}$ working correlation matrices that approximate the correlation structure among $ω_{i j}$ s and the correlation structure among $\log {\tilde{T}}_{i j} | (ω_{i j} = 1)$ s, respectively, within the $i$ th cluster. Following Liang and Zeger,⁸ we propose the following GEE for $γ$ that includes the working correlation matrix $Q_{i}^{(1)}$ for $ω_{i j}$ :

U (γ) = \sum_{i = 1}^{K} {\frac{\partial π_{i}}{\partial γ}}^{'} {A_{i}^{1 / 2} Q_{i}^{(1)} A_{i}^{1 / 2} ϕ_{1}}^{- 1} {g_{i} - π_{i}} = 0

(9)

where

π_{i} = {π (Z_{i 1}), \dots, π (Z_{i n_{i}})}^{'}

A_{i}

is a diagonal matrix with diagonal elements

π (Z_{i 1}) [1 - π (Z_{i 1})], \dots, π (Z_{i n_{i}}) [1 - π (Z_{i n_{i}})]

g_{i} = (g_{i 1}^{(m)}, g_{i 2}^{(m)}, \dots, g_{i n_{i}}^{(m)})^{'}

, and

ϕ_{1}

is the scale parameter to accommodate potential over- or under-dispersion. When there is no correlation within clusters and

ϕ_{1} = 1

Q_{i}^{(1)}

reduces to an identity matrix and equation (9) reduces to equation (5).

To develop the GEE for $β$ , we rewrite equation (7) as follows:

\begin{aligned} \sum_{i = 1}^{K} \sum_{j = 1}^{n_{i}} g_{i j}^{(m)} (X_{i j} - \bar{X}) {δ_{i j} \log t_{i j} + (1 - δ_{i j}) [\frac{\int_{ε_{i j} (β)}^{+ \infty} u d F_{ε} (u)}{S_{ε} (ε_{i j} (β))} + β^{'} X_{i j}] - β^{'} X_{i j}} \\ = \sum_{i = 1}^{K} \sum_{j = 1}^{n_{i}} g_{i j}^{(m)} (X_{i j} - \bar{X}) {{\hat{y}}_{i j} (β) - β^{'} X_{i j}} \end{aligned}

where

{\hat{y}}_{i j} (β) = δ_{i j} \log t_{i j} + (1 - δ_{i j}) [\frac{\int_{ε_{i j} (β)}^{+ \infty} u d F_{ε} (u)}{S_{ε} (ε_{i j} (β))} + β^{'} X_{i j}]

which is

\log t_{i j}

δ_{i j} = 1

and is

E (\log {\tilde{T}}_{i j} | {\tilde{T}}_{i j} > C_{i j})

δ_{i j} = 0

. A similar estimating equation was also proposed by Buckley and James.⁴³ We propose the following GEE for

β

that includes the working correlation matrix

Q_{i}^{(2)}

U (β) = \sum_{i = 1}^{K} {(X_{i} - 1_{i} {\bar{X}}^{'})}^{'} {B_{i}^{1 / 2} Q_{i}^{(2)} B_{i}^{1 / 2} ϕ_{2}}^{- 1} G_{i} ({\hat{Y}}_{i} (β) - X_{i} β) = 0

(10)

where

X_{i} = (X_{i 1}, X_{i 2}, \dots, X_{i n_{i}})^{'}

1_{i}

is an

n_{i} \times 1

vector of 1s,

B_{i}

is an

n_{i} \times n_{i}

diagonal variance matrix with element

σ_{\hat{Y} (β)}^{2}

G_{i}

is the diagonal matrix with

g_{i}

{\hat{Y}}_{i} (β) = ({\hat{y}}_{i 1} (β), \dots, {\hat{y}}_{i n_{i}} (β))^{'}

, and

ϕ_{2}

is the scale parameter to accommodate potential over- or under-dispersion. When there is no correlation within clusters and

ϕ_{2} = 1

Q_{i}^{(2)}

reduces to the identity matrix and the GEE (10) reduces to equation (7). Similar to (7), equation (10) relies on the unknown

S_{ε} (\cdot)

. We suggest to use (8) to estimate

S_{ε} (\cdot)

. It is a consistent estimate but may not be efficient. In the simulation, we demonstrate that it still allows efficient estimation of

β

from (10).

Using the generalized estimation equations (9) and (10) also requires a full specification of $Q_{i}^{(1)}$ and $Q_{i}^{(2)}$ . Depending on the nature of the correlation structure, the working correlation matrices $Q_{i}^{(1)}$ and $Q_{i}^{(2)}$ can be specified in some special structures. The common working correlation structures are the independent correlation structure, the exchangeable correlation structure (equicorrelated, compound symmetric), and the first-order autoregressive correlation structure. Although the exchangeable correlation structure may be popular for clustered data, the first-order autoregressive correlation structure or other more complicated correlation structures could be considered for clustered data if there is a temporal or spatial distance among the subjects in the cluster that may affect the strength of the correlation. The working correlation matrices are the matrices with the diagonal elements equal to 1, and the off-diagonal elements equal to 0 for the working independent correlation structure, $ρ$ for the exchangeable correlation structure, and $ρ^{| j - j^{'} |}$ at the $j$ th row and $j^{'}$ th column for the first-order autoregressive correlation structure. Let $ρ_{1}$ and $ρ_{2}$ be $ρ$ in $Q_{i}^{(1)}$ and $Q_{i}^{(2)}$ when $Q_{i}^{(1)}$ and $Q_{i}^{(2)}$ are assumed to have the exchangeable correlation or the first-order autoregressive correlation structures. Consistent estimates of $ρ_{1}$ and $ϕ_{1}$ can be obtained based on the standardized Pearson residuals while consistent estimates of $ρ_{2}$ and $ϕ_{2}$ can be obtained based on the residual of the log-linear model. Specifically, if $Q_{i}^{(1)}$ and $Q_{i}^{(2)}$ are in the exchangeable correlation structure, then

\begin{aligned} {\hat{ρ}}_{1} & = {\hat{ϕ}}_{1}^{- 1} \sum_{i = 1}^{K} \sum_{j > j^{'}} {\hat{r}}_{i j}^{(1)} {\hat{r}}_{i j^{'}}^{(1)} / {\sum_{i = 1}^{K} \frac{1}{2} n_{i} (n_{i} - 1) - p_{Z} - 1} \end{aligned}

(11)

\begin{aligned} {\hat{ρ}}_{2} & = {\hat{ϕ}}_{2}^{- 1} \sum_{i = 1}^{K} \sum_{j > j^{'}} {\hat{r}}_{i j}^{(2)} {\hat{r}}_{i j^{'}}^{(2)} / {\sum_{i = 1}^{K} \frac{1}{2} n_{i} (n_{i} - 1) - p_{X}} \end{aligned}

(12)

and if

Q_{i}^{(1)}

and

Q_{i}^{(2)}

are in the first-order autoregressive correlation structure, then

\begin{aligned} {\hat{ρ}}_{1} & = {\hat{ϕ}}_{1}^{- 1} \sum_{i = 1}^{K} \sum_{j \leq n_{i} - 1} {\hat{r}}_{i j}^{(1)} {\hat{r}}_{i, j + 1}^{(1)} / {\sum_{i = 1}^{K} (n_{i} - 1) - p_{Z} - 1} \end{aligned}

(13)

\begin{aligned} {\hat{ρ}}_{2} & = {\hat{ϕ}}_{2}^{- 1} \sum_{i = 1}^{K} \sum_{j \leq n_{i} - 1} {\hat{r}}_{i j}^{(2)} {\hat{r}}_{i, j + 1}^{(2)} / {\sum_{i = 1}^{K} (n_{i} - 1) - p_{X}} \end{aligned}

(14)

where

\begin{aligned} {\hat{ϕ}}_{1} & = \sum_{i = 1}^{K} \sum_{j = 1}^{n_{i}} {{\hat{r}}_{i j}^{(1)}}^{2} / (N - p_{Z} - 1), \\ {\hat{r}}_{i j}^{(1)} & = {g_{i j}^{(m)} - π (Z_{i j})} / {[π (Z_{i j}) (1 - π (Z_{i j}))]}^{1 / 2} \end{aligned} \begin{aligned} {\hat{ϕ}}_{2} & = \sum_{i = 1}^{K} \sum_{j = 1}^{n_{i}} {{\hat{r}}_{i j}^{(2)}}^{2} / (N - p_{X}) \\ {\hat{r}}_{i j}^{(2)} & = {\hat{y}}_{i j} ({\hat{β}}^{(m)}) - {\hat{β}}^{(m)} X_{i j} \end{aligned}

(15)

The proposed algorithm for the semiparametric marginal AFT mixture cure model can be summarized as follows:

(a)

Set initial values $γ^{(1)}$ , $β^{(1)}$ , and $S_{ε}^{(1)} (\cdot)$ .

(b)

E-Step: Calculate $g_{i j}^{(m)}$ via (4).

(c)

M-Step: (i)

Calculate the estimate of $S_{ε} (\cdot)$ using (8).

(ii)

Given the current estimates of $γ$ and $β$ , calculate the residuals and the estimates of $ρ_{1}$ , $ρ_{2}$ , $ϕ_{1}$ , and $ϕ_{2}$ via (11), (12), (15), or (13) to (15), depending on the working correlation structure used.

(iii)

Update the estimate of $γ$ from (9) and the estimate of $β$ from (10).

(iv)

Repeat steps (ii) and (iii) until convergence. Denote the estimates as $γ^{(m + 1)}$ , $β^{(m + 1)}$ , and $S_{ε}^{(m + 1)} (\cdot)$ .

(d)

Repeat steps (b) and (c) until the algorithm converges. The stopping criterion is that the sum of the squares of the differences in estimates between two adjacent iterations is below $10^{- 4}$ .

Let

\hat{θ} = ({\hat{γ}}^{'}, {\hat{β}}^{'})^{'}

be the final estimator of

θ = (γ^{'}, β^{'})^{'}

. Under certain regularity conditions, we establish the asymptotic properties for

\hat{θ}

in the following theorem.

Theorem 1

Let $θ_{0} = (γ_{0}^{'}, β_{0}^{'})^{'}$ be the true value of $θ$ . Given the consistency of ${\hat{S}}_{ε} (\cdot)$ , under some regularity conditions provided in the Appendix, we have that (a)

the estimator $\hat{θ}$ is a consistent estimator of $θ_{0}$ ;

(b)

as $K \to \infty$ , $K^{1 / 2} (\hat{θ} - θ_{0}) \to N (0, Σ)$ in distribution, where $Σ = A^{- 1} (θ_{0}) V (θ_{0}) {A^{- 1} (θ_{0})}^{'}$ , $A (θ_{0}) = E {B (θ_{0})} - E {S (θ_{0}) S^{'} (θ_{0})}$ , $V (θ_{0}) = Σ_{i = 1}^{K} E {S_{i} (θ_{0})} E {S_{i}^{'} (θ_{0})}$ , $B (θ) = - \partial S (θ) / \partial θ$ , $S (θ) = (U^{'} (γ), U^{'} (β))^{'}$ , $S_{i} (θ) = (U_{i}^{'} (γ), U_{i}^{'} (β))^{'}$ , $U_{i} (γ) = {\partial π_{i} / \partial γ}^{'} {A_{i}^{1 / 2} Q_{i}^{(1)} A_{i}^{1 / 2} ϕ_{1}}^{- 1} {g_{i} - π_{i}}$ , and $U_{i} (β) = (X_{i} - 1_{i} {\bar{X}}^{'})^{'} {B_{i}^{1 / 2} Q_{i}^{(2)} B_{i}^{1 / 2} ϕ_{2}}^{- 1} G_{i} ({\hat{Y}}_{i} (β) - X_{i} β)$ .

A proof of the theorem is provided in the Appendix. Note that the theorem only provides the asymptotic properties of $\hat{γ}$ and $\hat{β}$ when other parameter estimates are viewed as fixed. The asymptotic properties of other parameter estimates in the model remain to be worked out. It implies that the variance estimates of $\hat{γ}$ and $\hat{β}$ based on $Σ$ in the theorem may not be accurate due to the unaccounted variation in other parameter estimates. To obtain better variance estimates for $\hat{γ}$ and $\hat{β}$ and to obtain variance estimates of other parameter estimates in the model, we consider the bootstrap method.⁴⁴ This method has been adopted by other researchers to deal with the variance estimation in models based on AFT assumption.^9,35,45 In this method, a bootstrap sample is obtained by sampling clusters with replacement. That is, all observations from one cluster are either included or excluded in one bootstrap sample. Our numerical study in the next section shows that the bootstrap method works well in approximating the standard errors of the parameter estimates in the model.

3. Simulation study

We conduct an extensive simulation study to investigate the finite sample performance of the proposed method. Simulated clustered survival data are generated from a mixture cure model with the marginal specified by (1) to (3). Two covariates are considered in $X_{i j}$ and $Z_{i j}$ , one is from the Bernoulli distribution $B (1, 0.5)$ , and the other is from the uniform distribution $U (- 1, 1)$ . For each simulated dataset, the number of clusters and the cluster size $(K, n_{i})$ are set as $(100, 3)$ and $(60, 5),$ separately.

To generate correlated $ω_{i j}$ s with the marginal model (2), we follow the method by Emrich and Piedmonte.⁴⁶ That is, given the marginal probabilities $π (Z_{i j})$ and $π (Z_{i j^{'}})$ within a cluster, we solve for ${\tilde{ρ}}_{i j j^{'}}$ in the following equation:

\frac{Φ {[z_{π (Z_{i j})}, z_{π (Z_{i j^{'}})}], {\tilde{ρ}}_{i j j^{'}}} - π (Z_{i j}) π (Z_{i j^{'}})}{{π (Z_{i j}) π (Z_{i j^{'}}) [1 - π (Z_{i j})] [1 - π (Z_{i j^{'}})]}^{1 / 2}} = ζ_{i j j^{'}}

where

Φ {\cdot, {\tilde{ρ}}_{i j j^{'}}}

is the standard bivariate normal distribution with correlation coefficient

{\tilde{ρ}}_{i j j^{'}}

z_{π (Z_{i j})}

is the

π (Z_{i j})

th quantile of the standard normal distribution,

ζ_{i j j^{'}} = 1

for

j = j^{'}

, and

ζ_{i j j^{'}} = ζ

for all

j \neq j^{'}

if the exchangeable correlation structure is assumed for

ω_{i j}

s and

ζ_{i j j^{'}} = ζ^{| j - j^{'} |}

for all

j \neq j^{'}

if the first-order autoregressive correlation structure is assumed for

ω_{i j}

s. We then generate

(z_{i 1}, z_{i 2}, \dots, z_{i n_{i}})

from the multivariate normal distribution

N (0, Σ_{1 i}),

where

Σ_{1 i}

has

{\tilde{ρ}}_{i j j^{'}}

as the matrix elements, and then obtain correlated

ω_{i j}

s by

ω_{i j} = I {z_{i j} < z_{π (Z_{i j})}}

. The value of

ζ

measures the strength of the correlation among the generated

ω_{i j}

s within a cluster. The R package mvtBinaryEP is available to produce binary data using the procedure above.

To generate correlated ${\tilde{T}}_{i j}$ s given $ω_{i j} = 1$ in the AFT model (3), we produce $ε_{i 1}, \dots, ε_{i n_{i}}$ from a multivariate normal distribution $N (0, Σ_{2 i})$ , where the covariance matrix $Σ_{2 i}$ has elements defined in a similar way as in $Σ_{1 i}$ and $τ$ is used to measure the strength of the correlation among $ε_{i j}$ s within a cluster. Then ${\tilde{T}}_{i j}$ can be obtained following (3). The R package mvtnorm can be used to generate the data.

We set $(ζ, τ) = (0.4, 0.8)$ , $(0.2, 0, 5)$ , and $(0, 0)$ to have strong, weak, and no correlation within clusters. We set $β = (β_{1}, β_{2})^{'} = (0.6, 0.9)$ and $γ = (γ_{0}, γ_{1}, γ_{2})^{'} = (- 0.3, 0.6, 0.9)^{'}$ and $(0.3, 0.6, 0.9)^{'}$ so that the average cure rate is about 0.5 and 0.35, respectively. The censoring times are noninformative and generated from the uniform distribution $U (0, 50)$ which results in a censoring rate of 0.55 when the cure rate is 0.5 and a censoring rate of 0.40 when the cure rate is 0.35. Without loss of the generality, we assume equal cluster sizes and let $n_{i} = n$ and consider $(K, n) = (100, 3)$ and $(60, 5)$ .

Table 1.
Bias, Var, Var $^{}$ , CP of 95% confidence intervals of ${\hat{γ}}_{0}, {\hat{γ}}_{1}, {\hat{γ}}_{2}, {\hat{β}}_{1}, {\hat{β}}_{2}$ from the proposed method under the IND working correlation , EX correlation , and AR(1) correlation and from the method of ZP³⁵ for data simulated under the cure rate 0.35 and the EX structure.

$(ζ, τ) = (0.4, 0.8)$ $(ζ, τ) = (0.2, 0.5)$ $(ζ, τ) = (0, 0)$

ZP IND AR(1) EX ZP IND AR(1) EX ZP IND AR(1) EX

$K = 100$ , $n = 3$

$γ_{0}$ Bias 0.015 0.014 0.014 0.013 −0.001 −0.003 −0.003 −0.003 0.008 0.006 0.006 0.007

Var 0.044 0.044 0.042 0.039 0.036 0.036 0.036 0.035 0.032 0.031 0.031 0.031

Var 0.047 0.047 0.046 0.044 0.041 0.040 0.040 0.040 0.035 0.034 0.035 0.035

CP 0.953 0.953 0.950 0.953 0.959 0.959 0.956 0.960 0.960 0.959 0.960 0.960

$γ_{1}$ Bias 0.017 0.015 0.012 0.013 0.025 0.023 0.026 0.023 0.023 0.020 0.019 0.019

Var 0.083 0.083 0.069 0.065 0.071 0.071 0.070 0.067 0.077 0.076 0.077 0.076

Var* 0.080 0.080 0.070 0.065 0.078 0.078 0.076 0.074 0.081 0.080 0.081 0.081

CP 0.943 0.939 0.942 0.951 0.951 0.951 0.955 0.953 0.957 0.956 0.953 0.952

$γ_{2}$ Bias 0.023 0.020 0.013 0.014 0.022 0.019 0.018 0.017 0.027 0.023 0.023 0.024

Var 0.069 0.068 0.064 0.058 0.061 0.060 0.059 0.057 0.066 0.066 0.066 0.065

Var* 0.072 0.073 0.066 0.061 0.068 0.068 0.067 0.066 0.068 0.068 0.069 0.069

CP 0.953 0.956 0.946 0.946 0.969 0.969 0.967 0.971 0.951 0.949 0.950 0.948

$β_{1}$ Bias 0.007 0.006 0.007 0.008 −0.002 0.001 0.003 0.004 −0.003 −0.001 −0.001 −0.001

Var 0.028 0.021 0.014 0.013 0.025 0.019 0.017 0.016 0.027 0.020 0.020 0.020

Var* 0.027 0.020 0.014 0.013 0.026 0.020 0.018 0.017 0.026 0.020 0.020 0.020

CP 0.938 0.934 0.944 0.941 0.948 0.944 0.947 0.945 0.940 0.944 0.946 0.948

$β_{2}$ Bias 0.008 0.011 0.009 0.007 0.006 0.011 0.010 0.011 0.002 0.005 0.004 0.005

Var 0.022 0.020 0.014 0.013 0.021 0.021 0.019 0.018 0.021 0.021 0.021 0.021

Var* 0.022 0.021 0.016 0.014 0.021 0.021 0.019 0.018 0.020 0.020 0.020 0.020

CP 0.950 0.946 0.963 0.959 0.936 0.936 0.937 0.938 0.940 0.943 0.943 0.946

$K = 60$ , $n = 5$

$γ_{0}$ Bias 0.023 0.022 0.022 0.022 0.021 0.019 0.019 0.018 0.010 0.008 0.008 0.008

Var 0.058 0.058 0.057 0.052 0.044 0.044 0.044 0.042 0.031 0.031 0.031 0.031

Var* 0.060 0.061 0.061 0.056 0.046 0.047 0.047 0.046 0.034 0.034 0.034 0.034

CP 0.955 0.955 0.954 0.955 0.958 0.959 0.958 0.960 0.956 0.955 0.954 0.955

$γ_{1}$ Bias 0.014 0.013 0.011 0.012 0.004 0.002 0.004 0.006 0.007 0.005 0.005 0.005

Var 0.083 0.083 0.069 0.060 0.068 0.068 0.068 0.063 0.074 0.073 0.074 0.074

Var* 0.082 0.083 0.075 0.063 0.078 0.078 0.077 0.072 0.076 0.076 0.077 0.077

CP 0.937 0.940 0.951 0.951 0.959 0.962 0.956 0.963 0.952 0.953 0.952 0.951

$γ_{2}$ Bias 0.020 0.019 0.024 0.022 0.027 0.024 0.028 0.027 0.018 0.016 0.015 0.016

Var 0.071 0.071 0.060 0.052 0.064 0.063 0.061 0.059 0.060 0.060 0.060 0.060

Var* 0.077 0.078 0.072 0.060 0.068 0.069 0.069 0.065 0.066 0.066 0.067 0.067

CP 0.959 0.958 0.957 0.957 0.947 0.951 0.953 0.951 0.951 0.953 0.955 0.955

$β_{1}$ Bias −0.006 −0.004 −0.002 −0.004 0.001 0.002 0.002 0.003 0.001 0.002 0.001 0.001

Var 0.025 0.019 0.012 0.010 0.024 0.018 0.016 0.015 0.025 0.019 0.019 0.019

Var* 0.027 0.021 0.014 0.012 0.026 0.020 0.018 0.016 0.026 0.020 0.020 0.020

CP 0.954 0.959 0.963 0.960 0.954 0.949 0.956 0.945 0.941 0.947 0.944 0.941

$β_{2}$ Bias 0.000 0.004 0.009 0.007 0.003 0.005 0.005 0.005 −0.004 0.001 0.001 0.001

Var 0.024 0.024 0.015 0.013 0.021 0.021 0.019 0.018 0.019 0.018 0.019 0.019

Var* 0.025 0.024 0.017 0.014 0.021 0.021 0.019 0.018 0.020 0.020 0.021 0.021

CP 0.950 0.939 0.958 0.960 0.927 0.938 0.937 0.943 0.951 0.946 0.950 0.945

		$(ζ, τ) = (0.4, 0.8)$	$(ζ, τ) = (0.2, 0.5)$	$(ζ, τ) = (0, 0)$
$K = 100$ , $n = 3$
$γ_{0}$	Bias	0.015	0.014	0.014	0.013	−0.001	−0.003	−0.003	−0.003	0.008	0.006	0.006	0.007
	Var	0.044	0.044	0.042	0.039	0.036	0.036	0.036	0.035	0.032	0.031	0.031	0.031
	Var*	0.047	0.047	0.046	0.044	0.041	0.040	0.040	0.040	0.035	0.034	0.035	0.035
	CP	0.953	0.953	0.950	0.953	0.959	0.959	0.956	0.960	0.960	0.959	0.960	0.960
$γ_{1}$	Bias	0.017	0.015	0.012	0.013	0.025	0.023	0.026	0.023	0.023	0.020	0.019	0.019
	Var	0.083	0.083	0.069	0.065	0.071	0.071	0.070	0.067	0.077	0.076	0.077	0.076
	Var*	0.080	0.080	0.070	0.065	0.078	0.078	0.076	0.074	0.081	0.080	0.081	0.081
	CP	0.943	0.939	0.942	0.951	0.951	0.951	0.955	0.953	0.957	0.956	0.953	0.952
$γ_{2}$	Bias	0.023	0.020	0.013	0.014	0.022	0.019	0.018	0.017	0.027	0.023	0.023	0.024
	Var	0.069	0.068	0.064	0.058	0.061	0.060	0.059	0.057	0.066	0.066	0.066	0.065
	Var*	0.072	0.073	0.066	0.061	0.068	0.068	0.067	0.066	0.068	0.068	0.069	0.069
	CP	0.953	0.956	0.946	0.946	0.969	0.969	0.967	0.971	0.951	0.949	0.950	0.948
$β_{1}$	Bias	0.007	0.006	0.007	0.008	−0.002	0.001	0.003	0.004	−0.003	−0.001	−0.001	−0.001
	Var	0.028	0.021	0.014	0.013	0.025	0.019	0.017	0.016	0.027	0.020	0.020	0.020
	Var*	0.027	0.020	0.014	0.013	0.026	0.020	0.018	0.017	0.026	0.020	0.020	0.020
	CP	0.938	0.934	0.944	0.941	0.948	0.944	0.947	0.945	0.940	0.944	0.946	0.948
$β_{2}$	Bias	0.008	0.011	0.009	0.007	0.006	0.011	0.010	0.011	0.002	0.005	0.004	0.005
	Var	0.022	0.020	0.014	0.013	0.021	0.021	0.019	0.018	0.021	0.021	0.021	0.021
	Var*	0.022	0.021	0.016	0.014	0.021	0.021	0.019	0.018	0.020	0.020	0.020	0.020
	CP	0.950	0.946	0.963	0.959	0.936	0.936	0.937	0.938	0.940	0.943	0.943	0.946
$K = 60$ , $n = 5$
$γ_{0}$	Bias	0.023	0.022	0.022	0.022	0.021	0.019	0.019	0.018	0.010	0.008	0.008	0.008
	Var	0.058	0.058	0.057	0.052	0.044	0.044	0.044	0.042	0.031	0.031	0.031	0.031
	Var*	0.060	0.061	0.061	0.056	0.046	0.047	0.047	0.046	0.034	0.034	0.034	0.034
	CP	0.955	0.955	0.954	0.955	0.958	0.959	0.958	0.960	0.956	0.955	0.954	0.955
$γ_{1}$	Bias	0.014	0.013	0.011	0.012	0.004	0.002	0.004	0.006	0.007	0.005	0.005	0.005
	Var	0.083	0.083	0.069	0.060	0.068	0.068	0.068	0.063	0.074	0.073	0.074	0.074
	Var*	0.082	0.083	0.075	0.063	0.078	0.078	0.077	0.072	0.076	0.076	0.077	0.077
	CP	0.937	0.940	0.951	0.951	0.959	0.962	0.956	0.963	0.952	0.953	0.952	0.951
$γ_{2}$	Bias	0.020	0.019	0.024	0.022	0.027	0.024	0.028	0.027	0.018	0.016	0.015	0.016
	Var	0.071	0.071	0.060	0.052	0.064	0.063	0.061	0.059	0.060	0.060	0.060	0.060
	Var*	0.077	0.078	0.072	0.060	0.068	0.069	0.069	0.065	0.066	0.066	0.067	0.067
	CP	0.959	0.958	0.957	0.957	0.947	0.951	0.953	0.951	0.951	0.953	0.955	0.955
$β_{1}$	Bias	−0.006	−0.004	−0.002	−0.004	0.001	0.002	0.002	0.003	0.001	0.002	0.001	0.001
	Var	0.025	0.019	0.012	0.010	0.024	0.018	0.016	0.015	0.025	0.019	0.019	0.019
	Var*	0.027	0.021	0.014	0.012	0.026	0.020	0.018	0.016	0.026	0.020	0.020	0.020
	CP	0.954	0.959	0.963	0.960	0.954	0.949	0.956	0.945	0.941	0.947	0.944	0.941
$β_{2}$	Bias	0.000	0.004	0.009	0.007	0.003	0.005	0.005	0.005	−0.004	0.001	0.001	0.001
	Var	0.024	0.024	0.015	0.013	0.021	0.021	0.019	0.018	0.019	0.018	0.019	0.019
	Var*	0.025	0.024	0.017	0.014	0.021	0.021	0.019	0.018	0.020	0.020	0.021	0.021
	CP	0.950	0.939	0.958	0.960	0.927	0.938	0.937	0.943	0.951	0.946	0.950	0.945

Var: empirical variance; Var*: average of bootstrap variance; CP: coverage percentage; IND: independent; EX: exchangeable; AR(1): first-order autoregressive; ZP: Zhang and Peng.

Table 2.

Bias, Var, Var $^{*}$ , CP of 95% confidence intervals of ${\hat{γ}}_{0}, {\hat{γ}}_{1}, {\hat{γ}}_{2}, {\hat{β}}_{1}, {\hat{β}}_{2}$ from the proposed method under the IND working correlation , EX correlation , and AR(1) correlation and from the method of ZP³⁵ for data simulated under the cure rate 0.5 and the EX structure.

		$(ζ, τ) = (0.4, 0.8)$				$(ζ, τ) = (0.2, 0.5)$				$(ζ, τ) = (0, 0)$
		ZP	IND	AR(1)	EX	ZP	IND	AR(1)	EX	ZP	IND	AR(1)	EX
$K = 100$ , $n = 3$
$γ_{0}$	Bias	−0.002	−0.003	0.000	−0.001	−0.013	−0.014	−0.016	−0.016	−0.002	−0.004	−0.004	−0.003
	Var	0.050	0.050	0.047	0.046	0.038	0.037	0.037	0.037	0.034	0.034	0.034	0.034
	Var*	0.045	0.045	0.044	0.042	0.039	0.039	0.039	0.038	0.033	0.033	0.033	0.033
	CP	0.939	0.940	0.931	0.936	0.943	0.945	0.947	0.950	0.937	0.936	0.939	0.937
$γ_{1}$	Bias	0.009	0.008	0.003	0.004	0.025	0.023	0.029	0.027	0.015	0.014	0.014	0.014
	Var	0.067	0.067	0.059	0.056	0.071	0.071	0.070	0.069	0.069	0.069	0.070	0.069
	Var*	0.072	0.071	0.063	0.058	0.070	0.069	0.068	0.066	0.069	0.069	0.070	0.070
	CP	0.951	0.951	0.951	0.949	0.944	0.943	0.944	0.947	0.950	0.949	0.950	0.948
$γ_{2}$	Bias	0.017	0.016	0.020	0.018	0.020	0.018	0.017	0.016	0.009	0.006	0.006	0.007
	Var	0.059	0.059	0.051	0.047	0.056	0.056	0.056	0.054	0.055	0.055	0.055	0.056
	Var*	0.063	0.064	0.058	0.053	0.059	0.060	0.059	0.057	0.059	0.059	0.060	0.060
	CP	0.958	0.960	0.955	0.955	0.961	0.962	0.954	0.955	0.955	0.955	0.950	0.953
$β_{1}$	Bias	−0.009	−0.006	−0.003	−0.002	0.007	0.006	0.006	0.006	0.007	0.008	0.008	0.008
	Var	0.032	0.022	0.017	0.016	0.032	0.023	0.022	0.021	0.032	0.022	0.023	0.023
	Var*	0.036	0.026	0.020	0.018	0.034	0.024	0.023	0.022	0.034	0.024	0.025	0.026
	CP	0.958	0.959	0.961	0.959	0.952	0.955	0.953	0.953	0.952	0.948	0.955	0.951
$β_{2}$	Bias	−0.001	0.005	0.010	0.009	−0.007	−0.004	−0.003	−0.003	−0.001	0.001	0.001	0.002
	Var	0.030	0.028	0.020	0.019	0.024	0.025	0.024	0.023	0.028	0.028	0.029	0.029
	Var*	0.030	0.031	0.027	0.024	0.027	0.028	0.027	0.026	0.028	0.028	0.029	0.030
	CP	0.932	0.949	0.960	0.960	0.950	0.949	0.950	0.954	0.941	0.943	0.951	0.950
$K = 60$ , $n = 5$
$γ_{0}$	Bias	0.006	0.005	0.005	0.004	0.004	0.003	0.003	0.001	0.003	0.002	0.001	0.002
	Var	0.052	0.052	0.053	0.049	0.044	0.044	0.043	0.043	0.030	0.030	0.030	0.030
	Var*	0.058	0.058	0.058	0.053	0.044	0.045	0.045	0.044	0.033	0.033	0.033	0.033
	CP	0.953	0.953	0.958	0.964	0.945	0.946	0.954	0.950	0.957	0.956	0.960	0.956
$γ_{1}$	Bias	0.013	0.012	0.009	0.013	0.016	0.015	0.016	0.019	0.017	0.014	0.015	0.014
	Var	0.069	0.069	0.060	0.050	0.069	0.069	0.066	0.064	0.063	0.062	0.062	0.062
	Var*	0.073	0.074	0.066	0.055	0.069	0.069	0.069	0.064	0.071	0.070	0.071	0.071
	CP	0.953	0.953	0.954	0.950	0.948	0.950	0.960	0.950	0.955	0.956	0.958	0.956
$γ_{2}$	Bias	0.024	0.024	0.020	0.021	0.032	0.031	0.029	0.030	0.029	0.026	0.026	0.026
	Var	0.062	0.062	0.055	0.047	0.051	0.051	0.051	0.047	0.055	0.055	0.055	0.055
	Var*	0.067	0.069	0.063	0.052	0.059	0.060	0.061	0.057	0.059	0.059	0.060	0.060
	CP	0.950	0.951	0.959	0.958	0.965	0.969	0.965	0.965	0.953	0.952	0.957	0.956
$β_{1}$	Bias	0.005	0.008	0.010	0.009	−0.003	0.001	0.000	0.000	0.009	0.007	0.005	0.006
	Var	0.034	0.026	0.019	0.016	0.033	0.024	0.022	0.020	0.033	0.023	0.023	0.023
	Var*	0.036	0.027	0.021	0.017	0.035	0.025	0.024	0.022	0.034	0.024	0.025	0.025
	CP	0.956	0.949	0.951	0.954	0.953	0.947	0.938	0.936	0.940	0.945	0.949	0.946
$β_{2}$	Bias	0.005	0.008	0.005	0.005	−0.004	0.001	0.001	0.003	0.013	0.015	0.014	0.013
	Var	0.033	0.031	0.023	0.020	0.027	0.029	0.027	0.025	0.027	0.028	0.028	0.028
	Var*	0.032	0.034	0.031	0.025	0.028	0.031	0.031	0.028	0.028	0.028	0.029	0.029
	CP	0.948	0.946	0.952	0.950	0.936	0.942	0.946	0.948	0.941	0.939	0.937	0.936

Var: empirical variance; Var*: average of bootstrap variance; CP: coverage percentage; IND: independent; EX: exchangeable; AR(1): first-order autoregressive; ZP: Zhang and Peng.

Table 3.

		$(ζ, τ) = (0.4, 0.8)$				$(ζ, τ) = (0.2, 0.5)$				$(ζ, τ) = (0, 0)$
		ZP	IND	AR(1)	EX	ZP	IND	AR(1)	EX	ZP	IND	AR(1)	EX
$K = 100$ , $n = 3$
$γ_{0}$	Bias	0.014	0.013	0.010	0.011	0.008	0.007	0.007	0.007	−0.003	−0.005	−0.004	−0.005
	Var	0.042	0.042	0.039	0.040	0.037	0.037	0.036	0.037	0.031	0.031	0.031	0.031
	Var*	0.044	0.044	0.041	0.042	0.039	0.039	0.038	0.039	0.034	0.034	0.034	0.034
	CP	0.953	0.954	0.954	0.954	0.949	0.947	0.946	0.947	0.958	0.958	0.958	0.958
$γ_{1}$	Bias	0.009	0.007	0.011	0.008	0.022	0.020	0.016	0.019	0.020	0.017	0.016	0.016
	Var	0.075	0.075	0.063	0.064	0.080	0.079	0.077	0.078	0.073	0.073	0.073	0.073
	Var*	0.080	0.079	0.068	0.070	0.078	0.078	0.076	0.077	0.077	0.077	0.077	0.077
	CP	0.952	0.954	0.957	0.958	0.936	0.937	0.941	0.940	0.951	0.951	0.955	0.953
$γ_{2}$	Bias	0.021	0.020	0.017	0.016	0.027	0.024	0.023	0.025	0.015	0.011	0.010	0.010
	Var	0.064	0.065	0.057	0.058	0.060	0.060	0.058	0.059	0.060	0.060	0.060	0.059
	Var*	0.070	0.071	0.062	0.063	0.068	0.068	0.066	0.067	0.065	0.065	0.066	0.066
	CP	0.959	0.958	0.954	0.960	0.963	0.963	0.965	0.963	0.954	0.956	0.956	0.959
$β_{1}$	Bias	0.007	0.007	0.008	0.007	0.009	0.009	0.007	0.008	0.011	0.009	0.009	0.009
	Var	0.026	0.020	0.013	0.014	0.024	0.018	0.016	0.017	0.027	0.020	0.020	0.020
	Var*	0.026	0.020	0.014	0.014	0.026	0.020	0.018	0.018	0.026	0.020	0.020	0.020
	CP	0.951	0.940	0.949	0.950	0.965	0.960	0.961	0.960	0.941	0.936	0.936	0.934
$β_{2}$	Bias	0.008	0.011	0.008	0.008	0.004	0.006	0.004	0.005	−0.003	−0.001	−0.001	−0.002
	Var	0.021	0.021	0.014	0.014	0.021	0.021	0.019	0.020	0.019	0.019	0.020	0.020
	Var*	0.022	0.021	0.015	0.015	0.021	0.021	0.019	0.019	0.020	0.020	0.020	0.020
	CP	0.946	0.949	0.951	0.957	0.944	0.942	0.936	0.936	0.942	0.943	0.939	0.943
$K = 60$ , $n = 5$
$γ_{0}$	Bias	0.019	0.018	0.018	0.018	0.003	0.001	0.001	0.001	0.011	0.009	0.009	0.010
	Var	0.049	0.049	0.046	0.047	0.040	0.040	0.039	0.040	0.032	0.032	0.033	0.032
	Var*	0.048	0.048	0.044	0.046	0.039	0.039	0.039	0.039	0.034	0.034	0.034	0.034
	CP	0.944	0.943	0.940	0.949	0.946	0.945	0.950	0.946	0.947	0.946	0.947	0.945
$γ_{1}$	Bias	0.019	0.018	0.014	0.016	0.017	0.015	0.014	0.015	0.020	0.017	0.018	0.017
	Var	0.078	0.078	0.064	0.069	0.080	0.078	0.074	0.076	0.077	0.077	0.077	0.077
	Var*	0.079	0.080	0.066	0.073	0.081	0.080	0.076	0.078	0.078	0.078	0.079	0.078
	CP	0.952	0.954	0.941	0.951	0.951	0.950	0.953	0.951	0.949	0.950	0.951	0.947
$γ_{2}$	Bias	0.031	0.029	0.024	0.026	0.015	0.013	0.012	0.012	0.028	0.024	0.025	0.025
	Var	0.062	0.062	0.054	0.059	0.059	0.058	0.056	0.058	0.058	0.057	0.058	0.057
	Var*	0.071	0.072	0.061	0.066	0.067	0.067	0.065	0.067	0.066	0.067	0.067	0.067
	CP	0.957	0.957	0.956	0.958	0.964	0.962	0.966	0.967	0.965	0.965	0.968	0.964
$β_{1}$	Bias	0.001	0.005	0.004	0.007	−0.002	0.001	0.003	0.003	0.004	0.006	0.005	0.005
	Var	0.026	0.020	0.013	0.014	0.027	0.021	0.018	0.019	0.026	0.019	0.020	0.020
	Var*	0.026	0.020	0.013	0.015	0.026	0.020	0.017	0.018	0.026	0.020	0.020	0.020
	CP	0.937	0.943	0.955	0.954	0.930	0.929	0.938	0.945	0.939	0.935	0.942	0.938
$β_{2}$	Bias	−0.001	0.005	0.002	0.001	−0.005	0.001	0.001	0.000	−0.002	−0.002	−0.001	−0.002
	Var	0.021	0.021	0.014	0.015	0.021	0.022	0.018	0.019	0.019	0.018	0.018	0.019
	Var*	0.022	0.023	0.016	0.017	0.020	0.020	0.018	0.019	0.021	0.020	0.020	0.020
	CP	0.936	0.943	0.962	0.960	0.933	0.930	0.947	0.937	0.951	0.952	0.949	0.947

Var: empirical variance; Var*: average of bootstrap variance; CP: coverage percentage; IND: independent; EX: exchangeable; AR(1): first-order autoregressive; ZP: Zhang and Peng.

Table 4.

		$(ζ, τ) = (0.4, 0.8)$				$(ζ, τ) = (0.2, 0.5)$				$(ζ, τ) = (0, 0)$
		ZP	IND	AR(1)	EX	ZP	IND	AR(1)	EX	ZP	IND	AR(1)	EX
$K = 100$ , $n = 3$
$γ_{0}$	Bias	0.011	0.010	0.011	0.010	0.007	0.006	0.008	0.007	0.003	0.002	0.002	0.001
	Var	0.040	0.039	0.037	0.038	0.038	0.038	0.037	0.037	0.034	0.034	0.035	0.035
	Var*	0.043	0.043	0.040	0.041	0.037	0.038	0.037	0.037	0.033	0.033	0.033	0.033
	CP	0.962	0.963	0.963	0.962	0.944	0.945	0.948	0.954	0.938	0.940	0.934	0.934
$γ_{1}$	Bias	0.008	0.006	0.002	0.004	0.010	0.008	0.006	0.006	0.015	0.013	0.014	0.014
	Var	0.068	0.067	0.057	0.058	0.069	0.069	0.068	0.068	0.071	0.071	0.072	0.072
	Var*	0.072	0.072	0.061	0.063	0.069	0.069	0.067	0.068	0.069	0.069	0.070	0.070
	CP	0.961	0.961	0.958	0.961	0.948	0.947	0.940	0.946	0.943	0.940	0.944	0.942
$γ_{2}$	Bias	0.026	0.024	0.015	0.015	0.013	0.012	0.010	0.011	0.038	0.036	0.036	0.037
	Var	0.061	0.060	0.051	0.052	0.051	0.051	0.049	0.049	0.061	0.061	0.062	0.062
	Var*	0.062	0.063	0.053	0.055	0.059	0.060	0.059	0.059	0.059	0.060	0.061	0.061
	CP	0.946	0.948	0.952	0.948	0.963	0.962	0.961	0.967	0.951	0.949	0.949	0.951
$β_{1}$	Bias	0.006	0.003	0.003	0.003	−0.005	−0.002	0.000	−0.001	−0.012	−0.010	−0.011	−0.011
	Var	0.034	0.025	0.018	0.019	0.036	0.025	0.023	0.023	0.034	0.024	0.024	0.024
	Var*	0.034	0.025	0.019	0.019	0.034	0.024	0.023	0.023	0.034	0.024	0.025	0.025
	CP	0.953	0.951	0.955	0.952	0.937	0.945	0.943	0.946	0.941	0.937	0.946	0.947
$β_{2}$	Bias	0.012	0.012	0.007	0.008	−0.004	0.001	0.001	0.002	0.005	0.013	0.011	0.011
	Var	0.029	0.026	0.019	0.020	0.025	0.026	0.024	0.025	0.026	0.026	0.026	0.026
	Var*	0.029	0.029	0.023	0.023	0.028	0.029	0.028	0.028	0.028	0.028	0.029	0.029
	CP	0.943	0.949	0.957	0.956	0.954	0.954	0.958	0.957	0.958	0.956	0.957	0.959
$K = 60$ , $n = 5$
$γ_{0}$	Bias	0.017	0.016	0.015	0.014	−0.004	−0.005	−0.001	−0.005	−0.013	−0.015	−0.015	−0.015
	Var	0.045	0.045	0.041	0.044	0.037	0.037	0.037	0.037	0.032	0.032	0.033	0.032
	Var*	0.046	0.047	0.043	0.045	0.038	0.039	0.038	0.039	0.033	0.033	0.033	0.033
	CP	0.946	0.946	0.956	0.948	0.954	0.957	0.953	0.952	0.948	0.950	0.948	0.948
$γ_{1}$	Bias	0.003	0.002	0.003	0.005	0.011	0.009	0.006	0.010	0.027	0.025	0.025	0.025
	Var	0.069	0.069	0.056	0.064	0.070	0.070	0.068	0.070	0.073	0.073	0.074	0.073
	Var*	0.071	0.072	0.059	0.065	0.069	0.069	0.067	0.068	0.070	0.070	0.071	0.071
	CP	0.946	0.948	0.950	0.955	0.951	0.949	0.939	0.940	0.939	0.939	0.940	0.939
$γ_{2}$	Bias	0.027	0.026	0.027	0.024	0.011	0.009	0.008	0.008	0.029	0.027	0.026	0.026
	Var	0.056	0.056	0.046	0.053	0.055	0.055	0.053	0.054	0.057	0.058	0.058	0.058
	Var*	0.063	0.065	0.054	0.060	0.058	0.059	0.058	0.059	0.059	0.059	0.060	0.060
	CP	0.957	0.961	0.964	0.960	0.948	0.952	0.952	0.950	0.958	0.957	0.954	0.956
$β_{1}$	Bias	0.007	0.008	0.008	0.009	0.002	0.002	0.002	0.001	−0.007	−0.004	−0.004	−0.005
	Var	0.037	0.027	0.019	0.021	0.032	0.023	0.022	0.023	0.032	0.023	0.023	0.023
	Var*	0.035	0.026	0.019	0.020	0.034	0.024	0.023	0.023	0.034	0.024	0.025	0.025
	CP	0.932	0.939	0.939	0.940	0.948	0.949	0.943	0.950	0.947	0.946	0.949	0.949
$β_{2}$	Bias	0.007	0.010	0.007	0.008	0.002	0.005	0.004	0.006	−0.004	0.000	0.000	−0.001
	Var	0.027	0.029	0.021	0.023	0.026	0.025	0.023	0.024	0.027	0.027	0.027	0.028
	Var*	0.029	0.031	0.026	0.026	0.027	0.029	0.029	0.028	0.028	0.028	0.029	0.029
	CP	0.944	0.948	0.955	0.952	0.941	0.958	0.949	0.951	0.933	0.944	0.939	0.941

Var: empirical variance; Var*: average of bootstrap variance; CP: coverage percentage; IND: independent; EX: exchangeable; AR(1): first-order autoregressive; ZP: Zhang and Peng.

For each setting above, we generate 1000 datasets and then fit each dataset with the proposed marginal model and the estimation method. As a comparison, we also fit the data with the AFT mixture cure model of Zhang and Peng³⁵ (referred to as the ZP method) that does not consider the correlation within clusters. This method is available in the R package smcure.⁴⁷ The average biases, empirical variances, average bootstrap variances (based on 100 bootstrap samples), and empirical coverage percentages of 95% confidence intervals of the estimators are reported in Tables 1 to 4. The results in the tables show that the average estimated variances of the regression parameters from the proposed method are close to their empirical variances in all cases, and the 95% confidence interval coverage rates are satisfactory and close to the nominal level. When the cure statuses and the failure times of uncured patients within a cluster are correlated, particularly for $(ζ, τ) = (0.4, 0.8)$ , the empirical variances of the regression parameters from the proposed method are less than those from the ZP method. When the working correlation structure in the proposed method coincides with the true correlation structure, the empirical variances are in general smaller than those with the misspecified one. For the same correlation strength, a higher cure rate generally implies smaller empirical variances in the incidence, especially for $γ_{1}$ and $γ_{2}$ , but larger variances in the latency, which is intuitive because in this case, fewer subjects are uncured. Given the same sample size, the empirical variance estimates tend to decrease as the correlation decreases. When the correlation reduces to zero, the biases and empirical variances based on the proposed method and the ZP method are comparable.

To further evaluate the efficiency gain of the proposed method, we calculate the relative efficiency defined as the ratio of mean squared errors (MSEs) of the estimates from the ZP method and the proposed method with the working independent, exchangeable, and first-order autoregressive correlation structures to the MSE of the estimates from the proposed method with the correctly specified working correlation structure. The results are summarized in Table 5, which shows that the MSEs from the proposed method with three working correlation structures are generally smaller than those from the ZP method when the values of $(ζ, τ)$ are nonzero. That is, the proposed method improves the estimation efficiency potentially when the correlation exists. The MSEs from the proposed model with the correctly specified working correlation structure are less than those with a misspecified correlation structure, which indicates that using the correct working correlation structure can achieve higher efficiency, especially when the correlation within clusters is strong. The improvement in efficiency diminishes when the correlation within clusters becomes weak.

Table 5.

Relative efficiencies (ratios of MSEs) from the proposed method under the IND working correlation, EX correlation, and AR(1) correlation and from the method of ZP³⁵ against the proposed method under the correct correlation structure for data simulated under the cure rate 0.35 and 0.5 and the EX and the AR(1) correlation structures.

		$(ζ, τ) = (0.4, 0.8)$				$(ζ, τ) = (0.2, 0.5)$				$(ζ, τ) = (0, 0)$
$(K, n)$		ZP	IND	AR(1)	EX	ZP	IND	AR(1)	EX	ZP	IND	AR(1)	EX
Cure rate 0.35, EX correlation structure
(100, 3)	${\hat{γ}}_{0}$	1.13	1.12	1.06	1.00	1.04	1.03	1.02	1.00	1.02	1.00	1.00	1.00
	${\hat{γ}}_{1}$	1.29	1.28	1.06	1.00	1.05	1.05	1.04	1.00	1.01	0.99	1.00	1.00
	${\hat{γ}}_{2}$	1.18	1.18	1.10	1.00	1.07	1.05	1.03	1.00	1.02	1.00	1.01	1.00
	${\hat{β}}_{1}$	2.12	1.64	1.09	1.00	1.54	1.17	1.03	1.00	1.33	0.99	1.00	1.00
	${\hat{β}}_{2}$	1.69	1.59	1.08	1.00	1.17	1.16	1.06	1.00	0.99	0.99	1.00	1.00
(60, 5)	${\hat{γ}}_{0}$	1.11	1.10	1.08	1.00	1.04	1.04	1.03	1.00	1.00	1.00	1.00	1.00
	${\hat{γ}}_{1}$	1.38	1.37	1.16	1.00	1.08	1.08	1.08	1.00	0.99	0.99	1.00	1.00
	${\hat{γ}}_{2}$	1.34	1.36	1.15	1.00	1.08	1.07	1.04	1.00	1.00	1.01	1.00	1.00
	${\hat{β}}_{1}$	2.45	1.91	1.19	1.00	1.59	1.22	1.07	1.00	1.28	0.98	0.99	1.00
	${\hat{β}}_{2}$	1.89	1.86	1.19	1.00	1.18	1.18	1.06	1.00	1.02	0.97	1.00	1.00
Cure rate 0.5, EX correlation structure
(100, 3)	${\hat{γ}}_{0}$	1.09	1.09	1.03	1.00	1.02	1.02	1.01	1.00	1.00	1.00	1.00	1.00
	${\hat{γ}}_{1}$	1.21	1.21	1.06	1.00	1.03	1.02	1.02	1.00	1.00	1.00	1.00	1.00
	${\hat{γ}}_{2}$	1.26	1.26	1.09	1.00	1.03	1.04	1.03	1.00	0.99	0.99	1.00	1.00
	${\hat{β}}_{1}$	2.01	1.42	1.07	1.00	1.54	1.12	1.04	1.00	1.40	0.98	1.00	1.00
	${\hat{β}}_{2}$	1.57	1.43	1.07	1.00	1.05	1.09	1.02	1.00	0.96	0.98	1.00	1.00
(60, 5)	${\hat{γ}}_{0}$	1.07	1.07	1.09	1.00	1.02	1.02	1.00	1.00	1.00	1.00	0.99	1.00
	${\hat{γ}}_{1}$	1.37	1.38	1.18	1.00	1.08	1.09	1.03	1.00	1.01	0.99	0.99	1.00
	${\hat{γ}}_{2}$	1.32	1.34	1.17	1.00	1.08	1.09	1.08	1.00	0.99	0.99	0.99	1.00
	${\hat{β}}_{1}$	2.06	1.57	1.14	1.00	1.62	1.18	1.10	1.00	1.43	0.98	1.00	1.00
	${\hat{β}}_{2}$	1.61	1.53	1.12	1.00	1.07	1.17	1.09	1.00	0.97	0.99	1.01	1.00
Cure rate 0.35, AR(1) correlation structure
(100, 3)	${\hat{γ}}_{0}$	1.06	1.06	1.00	1.02	1.01	1.01	1.00	1.01	0.99	0.99	1.00	1.00
	${\hat{γ}}_{1}$	1.20	1.19	1.00	1.02	1.04	1.03	1.00	1.01	1.00	1.00	1.00	1.00
	${\hat{γ}}_{2}$	1.13	1.14	1.00	1.01	1.04	1.03	1.00	1.01	1.00	1.00	1.00	0.99
	${\hat{β}}_{1}$	1.97	1.53	1.00	1.04	1.48	1.14	1.00	1.03	1.31	0.99	1.00	1.00
	${\hat{β}}_{2}$	1.55	1.51	1.00	1.03	1.11	1.10	1.00	1.03	0.98	0.99	1.00	1.00
(60, 5)	${\hat{γ}}_{0}$	1.08	1.08	1.00	1.03	1.04	1.03	1.00	1.02	1.00	1.00	1.00	1.00
	${\hat{γ}}_{1}$	1.22	1.21	1.00	1.08	1.08	1.05	1.00	1.03	1.00	0.99	1.00	1.00
	${\hat{γ}}_{2}$	1.15	1.15	1.00	1.09	1.05	1.04	1.00	1.03	1.00	0.98	1.00	0.99
	${\hat{β}}_{1}$	2.06	1.61	1.00	1.10	1.54	1.19	1.00	1.05	1.35	0.99	1.00	1.01
	${\hat{β}}_{2}$	1.56	1.55	1.00	1.09	1.21	1.22	1.00	1.09	1.03	0.99	1.00	1.01
Cure rate 0.5, AR(1) correlation structure
(100, 3)	${\hat{γ}}_{0}$	1.07	1.06	1.00	1.02	1.02	1.02	1.00	1.01	0.99	0.99	1.00	1.01
	${\hat{γ}}_{1}$	1.20	1.18	1.00	1.02	1.02	1.02	1.00	1.01	0.99	0.99	1.00	1.01
	${\hat{γ}}_{2}$	1.19	1.17	1.00	1.02	1.04	1.04	1.00	1.00	0.99	0.99	1.00	1.00
	${\hat{β}}_{1}$	1.90	1.37	1.00	1.04	1.57	1.09	1.00	1.01	1.40	0.99	1.00	1.01
	${\hat{β}}_{2}$	1.51	1.35	1.00	1.02	1.03	1.06	1.00	1.02	1.00	1.00	1.00	1.01
(60, 5)	${\hat{γ}}_{0}$	1.10	1.10	1.00	1.07	1.02	1.01	1.00	1.01	0.99	0.99	1.00	1.00
	${\hat{γ}}_{1}$	1.24	1.24	1.00	1.15	1.03	1.03	1.00	1.02	0.99	0.99	1.00	1.00
	${\hat{γ}}_{2}$	1.22	1.23	1.00	1.15	1.04	1.04	1.00	1.02	0.99	0.99	1.00	1.00
	${\hat{β}}_{1}$	1.91	1.40	1.00	1.08	1.49	1.09	1.00	1.06	1.40	0.99	1.00	1.01
	${\hat{β}}_{2}$	1.29	1.37	1.00	1.10	1.11	1.08	1.00	1.02	0.99	0.99	1.00	1.01

MSE: mean squared error; IND: independent; EX: exchangeable; AR(1): first-order autoregressive; ZP; Zhang and Peng.

The proposed estimation method also produces estimates of $ρ_{1}$ and $ρ_{2}$ , the correlation coefficients in the two working correlation matrices. Even though they do not necessarily correspond to the correlation measures $ζ$ and $τ$ in the data generation, Table 6 shows that the estimated values of $ρ_{1}$ and $ρ_{2}$ agree well with the values of $ζ$ and $τ$ in the sense that when the latter decrease, the former tend to decrease too. When there is no correlation in clusters, the estimates of $ρ_{1}$ and $ρ_{2}$ are very close to zero. In other words, the estimated values of $ρ_{1}$ and $ρ_{2}$ provide good measures of the strength of the correlations between the cure statuses and between the failure times of uncured subjects in a cluster. The variance estimates of ${\hat{ρ}}_{1}$ and ${\hat{ρ}}_{2}$ are obtained based on the bootstrap method. We observe that the empirical variance estimates and the average of bootstrap variances are quite close, which indicates that the bootstrap variance estimator works well for calculating the variance estimates of ${\hat{ρ}}_{1}$ and ${\hat{ρ}}_{2}$ .

Table 6.

Mean, Var, and the Var* of ( ${\hat{ρ}}_{1}, {\hat{ρ}}_{2}$ ) based on the proposed method with the working EX or the AR(1) correlation structure.

	Working		$(ζ, τ) = (0.4, 0.8)$			$(ζ, τ) = (0.2, 0.5)$			$(ζ, τ) = (0, 0)$
$(K, n)$	correlation		Mean	Var	Var*	Mean	Var	Var*	Mean	Var	Var*
Cure rate 0.35, EX correlation structure
(100, 3)	AR(1)	${\hat{ρ}}_{1}$	0.437	0.006	0.006	0.209	0.007	0.008	−0.006	0.005	0.005
		${\hat{ρ}}_{2}$	0.444	0.006	0.007	0.263	0.007	0.006	0.049	0.007	0.006
	EX	${\hat{ρ}}_{1}$	0.364	0.004	0.005	0.180	0.004	0.004	−0.005	0.003	0.003
		${\hat{ρ}}_{2}$	0.441	0.005	0.006	0.265	0.005	0.005	0.050	0.005	0.004
(60, 5)	AR(1)	${\hat{ρ}}_{1}$	0.540	0.009	0.009	0.237	0.011	0.010	−0.004	0.005	0.005
		${\hat{ρ}}_{2}$	0.445	0.006	0.007	0.261	0.006	0.006	0.050	0.006	0.005
	EX	${\hat{ρ}}_{1}$	0.363	0.004	0.004	0.176	0.004	0.003	−0.002	0.002	0.002
		${\hat{ρ}}_{2}$	0.443	0.005	0.005	0.260	0.004	0.004	0.049	0.003	0.003
Cure rate 0.5, EX correlation structure
(100, 3)	AR(1)	${\hat{ρ}}_{1}$	0.439	0.006	0.006	0.210	0.007	0.007	−0.004	0.005	0.005
		${\hat{ρ}}_{2}$	0.374	0.006	0.007	0.225	0.006	0.007	0.077	0.008	0.007
	EX	${\hat{ρ}}_{1}$	0.366	0.004	0.004	0.180	0.004	0.004	−0.003	0.003	0.003
		${\hat{ρ}}_{2}$	0.373	0.004	0.005	0.225	0.005	0.005	0.077	0.006	0.005
(60, 5)	AR(1)	${\hat{ρ}}_{1}$	0.548	0.007	0.008	0.241	0.012	0.010	−0.004	0.004	0.005
		${\hat{ρ}}_{2}$	0.371	0.007	0.007	0.233	0.006	0.006	0.076	0.008	0.006
	EX	${\hat{ρ}}_{1}$	0.369	0.004	0.004	0.179	0.004	0.003	−0.005	0.002	0.002
		${\hat{ρ}}_{2}$	0.371	0.005	0.005	0.229	0.004	0.004	0.076	0.005	0.004
Cure rate 0.35, AR(1) correlation structure
(100, 3)	AR(1)	${\hat{ρ}}_{1}$	0.370	0.005	0.005	0.180	0.006	0.005	0.001	0.005	0.005
		${\hat{ρ}}_{2}$	0.441	0.006	0.006	0.260	0.005	0.006	0.046	0.007	0.006
	EX	${\hat{ρ}}_{1}$	0.293	0.005	0.005	0.130	0.004	0.004	−0.002	0.004	0.003
		${\hat{ρ}}_{2}$	0.397	0.005	0.005	0.218	0.004	0.005	0.048	0.005	0.004
(60, 5)	AR(1)	${\hat{ρ}}_{1}$	0.367	0.005	0.005	0.179	0.005	0.005	−0.003	0.004	0.004
		${\hat{ρ}}_{2}$	0.442	0.006	0.006	0.265	0.005	0.005	0.045	0.006	0.005
	EX	${\hat{ρ}}_{1}$	0.200	0.004	0.004	0.082	0.003	0.002	−0.004	0.002	0.002
		${\hat{ρ}}_{2}$	0.336	0.004	0.005	0.174	0.004	0.003	0.046	0.004	0.003
Cure rate 0.5, AR(1) correlation structure
(100, 3)	AR(1)	${\hat{ρ}}_{1}$	0.376	0.005	0.005	0.185	0.005	0.005	−0.002	0.006	0.005
		${\hat{ρ}}_{2}$	0.374	0.006	0.006	0.234	0.006	0.006	0.078	0.008	0.007
	EX	${\hat{ρ}}_{1}$	0.298	0.004	0.004	0.133	0.004	0.004	−0.003	0.004	0.003
		${\hat{ρ}}_{2}$	0.335	0.005	0.005	0.201	0.005	0.005	0.077	0.007	0.005
(60, 5)	AR(1)	${\hat{ρ}}_{1}$	0.369	0.005	0.005	0.182	0.005	0.005	−0.004	0.004	0.004
		${\hat{ρ}}_{2}$	0.373	0.005	0.006	0.231	0.005	0.005	0.080	0.008	0.006
	EX	${\hat{ρ}}_{1}$	0.200	0.003	0.003	0.083	0.003	0.003	−0.003	0.002	0.002
		${\hat{ρ}}_{2}$	0.283	0.004	0.004	0.164	0.005	0.004	0.078	0.005	0.004

Var: empirical variance; Var*: average of bootstrap variance; EX: exchangeable; AR(1): first-order autoregressive.

4. Contralateral breast cancer analysis

We apply the proposed method to a dataset of contralateral breast cancer patients from the SEER database of the National Cancer Institute (https://seer.cancer.gov/data/). Patients with unilateral breast cancer diagnosed between 2005 and 2008 and followed for contralateral breast cancer cases, invasive ductal carcinoma of no special type (ICD-O-3: 8500/3), positive lymph node statuses, and positive histology are considered along with the following baseline covariates: radiation therapy, age at diagnosis, estrogen receptor (ER) status, progesterone receptor (PR) status, and the lymph node ratio (LNR) defined as the ratio between the number of positive lymph nodes and the total number of examined lymph nodes.⁴⁸ Patients with these characteristics are extracted from the SEER Plus Data 17 Registries database using SEER*Stat 8.4.0.1 software.

There are 694 eligible patients included in our study with a censoring rate of 52.6%. The survival time of interest is defined as the time (in years) to relapse or death due to the cancer. We plot the Kaplan–Meier survival curve in Figure 1(a) and observe that the survival curve presents a high plateau and levels off at a value substantially greater than 0 after 10 years of follow-up due to a large number of long-term survivors of the cancer. It indicates that the long-term survivors may be considered cured since they are unlikely to relapse or die of the cancer. We also conduct a nonparametric test⁴⁹ for the existence of long-term survivors and the obtained $p$ -value <0.05 shows significant evidence for the existence of cured or long-term survivors. Therefore, it is appropriate to consider a cure model for the data.

Figure 1.

(a) The Kaplan–Meier survival curve and its pointwise 95% confidence interval and (b) the logarithm of estimated cumulative hazard functions of survival time distribution from some groups defined by the covariates in the cure model for the breast cancer data.

The 694 patients are from 44 clusters formed by SEER registries. Patients from the same cluster may share a similar lifestyle, a similar socioeconomic status, and a similar healthcare system, and their cure statuses and failure times of uncured patients may tend to be correlated due to the shared environments. Due to the number of clusters in the data, it will not be efficient to consider clusters as a categorical variable in the model. Therefore, the proposed marginal mixture cure model is suitable for analyzing the data to take into account the potential correlation within clusters.

Since we do not have any a priori subject knowledge about which of the baseline covariates should be in which part of the cure model, we consider all of them in both parts. The covariates are denoted and coded as follows: Radiation (a binary covariate with value 1 if a patient received radiation therapy and 0 otherwise), Age (a binary covariate with value 1 if a patient’s age is more than or equal to 50 years old and 0 otherwise), ER (a binary covariate with value 1 if a patient’s ER status is positive and 0 otherwise), PR (a binary covariate with value 1 if a patient’s PR status is positive and 0 otherwise), and LNR (the LNR between 0 and 1).

To examine the suitability of the AFT assumption and the PH assumption in the marginal mixture cure model for the data, following the idea of Zhang and Peng,³⁵ we plot the logarithm of the cumulative hazard functions obtained from the weighted Kaplan–Meier survival estimators for the groups determined by the five covariates considered (LNR is dichotomized at 0.15⁵⁰). The weight is estimated by $g_{i j}$ in (4) so that the weighted Kaplan–Meier survival estimator can be viewed as an estimator of the distribution of uncured subjects in the groups. If the PH assumption holds for the covariates in the latency part, then the logarithm of the estimated cumulative hazard functions from the groups should be parallel to each other. Figure 1(b) shows the logarithm of the estimated cumulative hazard functions for some groups. It is obvious that many of them are not parallel to each other substantially, indicating that the PH assumption for the uncured patients may not be appropriate. Furthermore, we follow the method of Peng and Taylor⁵¹ to calculate the Cramér–von Mises criterion based on the modified Cox–Snell residuals with all the covariates for both the fitted semiparametric PH mixture cure model and the semiparametric AFT mixture cure model by ignoring the clusters (the clusters should have no impact on the effects of covariates in the marginal model) and obtain the corresponding values $3.68 \times 10^{- 4}$ and $4.12 \times 10^{- 6}$ , respectively. The smaller value, which indicates a better fit to the data, suggests that the marginal semiparametric AFT mixture cure model is a better choice than the PH-based model for the data.

We fit the proposed model to the data under two working correlation structures: the working independent correlation and the working exchangeable correlation structures. As a comparison, we also fit the data with the AFT mixture cure model using the ZP method. The standard errors of estimates from the models are obtained using 100 bootstrap samples. The results from the models, summarized in Table 7, show some substantial differences. For example, in the latency part, we observe that Radiation is significant ( $p$ -value = 0.047) under the proposed method with the working exchangeable correlation structure instead of marginal significant with the working independent correlation structure ( $p$ -value = 0.052) and under the ZP method ( $p$ -value = 0.093). This finding is perhaps due to the strong correlation ( ${\hat{ρ}}_{2} = 0.313$ ) among the survival times of susceptible subjects within clusters revealed by the proposed method with the working exchangeable correlation structure. LNR is significant in both the incidence and the latency from all methods, which indicates that the higher the LNR, the lower the probability of being cured and the earlier events for uncured patients. Both ER and PR are significant in the latency for all methods, which implies that the uncured patients with either positive ER or positive PR tend to have later events. The results in the incidence part from the three methods are generally similar because of the weak correlation ( ${\hat{ρ}}_{1} = - 0.013$ ) among the cure statuses of subjects within clusters.

Table 7.

Estimated parameters and their SEs from the ZP method and from the proposed marginal semiparametric AFT mixture cure model under the working IND correlation and the working EX correlation structures for the breast cancer data.

	ZP			IND			EX
	Estimate	SE	$p$ -value	Estimate	SE	$p$ -value	Estimate	SE	$p$ -value
Incidence
Intercept	−0.551	0.285	0.053	−0.584	0.256	0.023	−0.598	0.252	0.018
Radiation	−0.079	0.338	0.816	−0.065	0.304	0.831	−0.105	0.326	0.746
Age	0.162	0.222	0.465	0.169	0.206	0.411	0.154	0.207	0.457
ER	0.219	0.340	0.520	0.181	0.328	0.582	0.203	0.343	0.555
PR	−0.094	0.416	0.821	−0.183	0.370	0.621	−0.233	0.379	0.539
LNR	1.853	0.518	<0.001	1.872	0.490	<0.001	1.942	0.478	<0.001
${\hat{ρ}}_{1}$	–	–	–	–	–	–	−0.013	0.015	0.392
Latency
Radiation	0.321	0.191	0.093	0.367	0.189	0.052	0.314	0.158	0.047
Age	−0.147	0.130	0.258	−0.137	0.128	0.284	−0.168	0.125	0.179
ER	0.483	0.134	<0.001	0.468	0.126	<0.001	0.538	0.124	<0.001
PR	0.545	0.197	0.006	0.467	0.181	0.010	0.407	0.155	0.009
LNR	−0.944	0.183	<0.001	−0.853	0.179	<0.001	−0.753	0.170	<0.001
${\hat{ρ}}_{2}$	–	–	–	–	–	–	0.313	0.066	<0.001

SE; standard error; AFT: accelerated failure time; ZP: Zhang and Peng; EX: exchangeable; IND: independent; ER: estrogen receptor; PR: progesterone receptor; LNR: lymph node ratio.

5. Conclusion and discussion

Marginal cure models have been widely used for analyzing multivariate survival data with a cure fraction. However, most efforts have been focused on the marginal PH mixture cure models which may be improper for the applications when the PH assumption is not satisfied for the latency. In this paper, we considered a semiparametric marginal AFT mixture cure model for correlated survival data with a cure proportion. A novel estimation approach is developed based on the GEE in the marginal AFT mixture cure model. We showed that the regression estimators are consistent and asymptotically normal, and employed a bootstrap method to estimate the variances of the estimated parameters. Our work relaxes the independent observations assumption for potentially correlated survival data on the usage of the AFT mixture cure model³⁵ by incorporating working correlation structures in the estimation procedure. The proposed method is also an extension of the marginal AFT model,⁹ which was proposed for correlated failure time data without a cure fraction.

Our numerical studies show that the proposed method for clustered survival data improves the estimation efficiency of the regression estimators compared with the method in Zhang and Peng,³⁵ especially when the correlation is strong and the cluster size is large. When the prespecified working correlation structure matches the underlying true correlation structure, the empirical variances are in general smaller than those under other working correlation structures. The results from the proposed method are comparable with the results from the method in Zhang and Peng,³⁵ when the correlation strength declines or the cluster size decreases. Hence, the proposed semiparametric marginal AFT mixture cure model provides a new approach for the analysis of correlated lifetime data with a cure proportion, particularly when one is interested in characterizing covariate effects on the failure times of uncured patients directly, and the dependence among the survival times of uncured patients and among the cure statuses can be described by a few unknown parameters. The proposed model and estimation method also facilitate for future development of causal inference for clustered survival data with a cured fraction.

It is worth noting that although the models with random effects or frailties are often considered for clustered data where the random effects/frailties formulate the underlying dependence in a cluster, for clustered survival data with a cure fraction, we are not aware of any existing work based on the AFT assumption. Therefore, as pointed out by one reviewer, considering the AFT mixture cure model with random effects/frailties for clustered survival data should be an important and interesting topic for future research. However, a random effects/frailty model may not always be written as a marginal model unless under special cases such as a linear model with normal random effects. Thus, the estimates from a marginal model are generally not comparable with those from a random effects/frailty model.⁵²

The bootstrap method is employed to estimate the variances of the estimators in the model in all the numerical work reported in this paper. This method is relatively straightforward to implement but can be computationally intensive. Having a computationally efficient method for variance estimation is always preferable to the bootstrap method.⁵³ One interesting work in the future is to explore possibilities to simplify the variance estimation in the proposed method.

The proposed semiparametric estimation method for the marginal AFT mixture cure model was implemented in our R package smgeecure, which is publicly available at http://github.com/yiniu06/smgeecure.

Footnotes

Acknowledgements

The authors acknowledge the efforts of the Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute in the creation of the SEER database. The dataset that supports the findings in this paper is available from the website of the National Cancer Institute at . Restrictions apply to the availability of these data, which are used under license for this study.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The work of Yingwei Peng is partially supported by a research grant from the Natural Sciences and Engineering Research Council of Canada. The work of Yi Niu is partially supported by grants from the National Natural Science Foundation of China (11401072), the Fundamental Research Funds for the Central Universities of China (DUT20LK24, DUT22LAB302), and the Dalian High-level Talent Innovation Project (2020RD09).

ORCID iDs

Jie Ding

Yingwei Peng

Appendix

Proof Proof of Theorem 1

The proof of the theorem is developed based on Rosen et al.⁵⁴ and Liang and Zeger.⁸ We begin by listing the assumed conditions required for the consistency and asymptotic normality of $\hat{θ}$ . (C1)

The censoring mechanism is noninformative.

(C2)

The covariate vectors $X_{i j}$ and $Z_{i j}$ have bounded support. The design matrix formed by the column vectors of $X_{i j}$ and $Z_{i j}$ are of full rank.

(C3)

With probability one, there exists a positive constant $ζ_{0}$ such that $P (C_{i j} e^{- β_{0}^{'} Z_{i j}} \geq e^{ε_{i j}} \geq τ_{0} | Z_{i j}, X_{i j}) > ζ_{0}$ for all possible values of $Z_{i j}$ and $X_{i j}$ , where $τ_{0}$ is a finite positive number.

(C4)

For each fixed $θ \in Θ$ , $Ψ (θ)$ is $F_{t} -$ measurable, and $Ψ (θ)$ is separable.

(C5)

The function $Ψ$ is a.s. continuous in $θ$ : $lim_{θ^{'} \to θ} | Ψ (θ^{'}) - Ψ (θ) | = 0, a . s .$

(C6)

The expected value $φ (θ) = E (Ψ (θ))$ exists for all $θ \in Θ$ , and has a unique zero at $θ = θ_{0}$ .

(C7)

There exists a continuous function which is bounded away from zero, $b (θ) \geq b_{0} > 0$ , such that (i)

$sup_{θ} | Ψ (θ) | / b (θ)$ is integrable,

(ii)

$lim inf_{θ \to \infty} | φ (θ) | / b (θ) \geq 1$ ,

(iii)

$E {lim sup_{θ \to \infty} | Ψ (θ) - φ (θ) | / b (θ)} < 1$ .

(C8)

There are strictly positive numbers $a, b, c, d_{0}$ such that (i)

$| φ (θ) | > a \cdot | θ - θ_{0} |$ for $| θ - θ_{0} | \leq d_{0}$ ,

(ii)

$E {sup_{| τ - θ | \leq d} | Ψ (τ) - Ψ (θ) |} < b \cdot d$ for $| θ - θ_{0} | + d \leq d_{0}$ , $d \geq 0$ ,

(iii)

$E [{sup_{| τ - θ | \leq d} | Ψ (τ) - Ψ (θ) |}^{2}] \leq c \cdot d$ for $| θ - θ_{0} | + d \leq d_{0}$ , $d \geq 0$ .

(C9)

The expectation $E {| Ψ (θ_{0}) |^{2}}$ is finite.

Let

(S_{t}, F_{t}, σ)

(S_{c}, F_{c}, μ)

, and

(S_{ω}, F_{ω}, ν)

σ

-finite measure spaces, with a product measure space as follows:

(S_{t} \otimes S_{c} \otimes S_{ω}, F_{t} \otimes F_{c} \otimes F_{ω}, σ \otimes μ \otimes ν),

where

S_{t} \subset R^{d_{t}}

S_{c} \subset R^{d_{c}},

and

S_{ω} \subset R^{d_{ω}}

. Here

d_{t} = \dim (t)

d_{c} = \dim (c)

, and

d_{ω} = \dim (ω)

are defined as the dimension of the failure time

t

, the censoring time

c

, and the cure status

w

, respectively. We assume a marginal probability model

p_{i j} (t, c, ω ∣ θ)

for

(t_{i j}, c_{i j}, ω_{i j}) \in S_{t} \otimes S_{c} \otimes S_{ω}

, which is strictly positive on

S_{t} \otimes S_{c} \otimes S_{ω}

and may depend on

i

and

j

via covariates associated with the product measure

σ \otimes μ \otimes ν

for

i = 1, \dots, K, j = 1, \dots, n_{i}

. Here

θ

is a vector-valued parameter in a subset

Θ

R^{d_{θ}}

with

d_{θ} = \dim (θ)

, and

p_{i j} (t, c, ω | \cdot)

is continuously differentiable on

Θ

for each

(t_{i j}, c_{i j}, ω_{i j}) \in S_{t} \otimes S_{c} \otimes S_{ω}

. Let

p_{i j} (ω | t_{i j}, c_{i j}, θ)

be the conditional probability model for all

ω_{i j} \in S_{ω}

. That is,

p_{i j} (ω ∣ t_{i j}, c_{i j}, θ) = p_{i j} (t_{i j}, c_{i j}, ω ∣ θ) / \int_{S_{ω}} p_{i j} (t_{i j}, c_{i j}, u ∣ θ) d ν (u)

Let $q_{i j} (\cdot, \cdot, \cdot; \cdot)$ be a $d_{θ} \times 1$ vector-valued function composed by ${U_{i j}^{'} (γ)$ , $U_{i j}^{'} (β)}^{'}$ , where $U_{i j} (γ)$ and $U_{i j} (β)$ are the $i j$ th summand of $U (γ)$ and $U (β)$ separately. We define $q_{i j} (\cdot, \cdot, \cdot; \cdot)$ on $S_{t} \otimes S_{c} \otimes S_{ω} \otimes Θ \mapsto R^{d_{θ}}$ such that $q_{i j} (\cdot, \cdot, \cdot; φ) : S_{t} \otimes S_{c} \otimes S_{ω} \mapsto R^{d_{θ}}$ is measurable and integrable with respect to $p_{i j} (\cdot, \cdot, \cdot | θ)$ for each $φ \in Θ$ , and $q_{i j} (t, c, w; \cdot) : Θ \mapsto R^{d_{θ}}$ is continuously differentiable on $Θ$ for each $(t_{i j}, c_{i j}, ω_{i j}) \in S_{t} \otimes S_{c} \otimes S_{ω}$ . We then define a bivariate function $H (\cdot | \cdot) : Θ \otimes Θ \mapsto R^{d_{θ}}$ by $H (φ | θ^{(m)}) = \sum_{i = 1}^{K} \sum_{j = 1}^{n_{i}} \int_{S_{ω}} q_{i j} (t_{i j}, c_{i j}, ω; φ) p_{i j} (ω | t_{i j}, c_{i j}, θ^{(m)}) d ν (ω)$ . The E-step of the EM algorithm computes $H (φ | θ^{(m)})$ , and the M-step solves for $φ = θ^{(m + 1)}$ from the equation $H (φ | θ^{(m)}) = 0$ . As discussed by Niu and Peng,¹⁷ $H (\cdot | \cdot)$ is a bivariate continuous function on $Θ \otimes Θ$ , where $Θ \subseteq R^{d_{θ}}$ .

Next we prove that, $q_{i j} (\cdot, \cdot, \cdot; \cdot)$ is an unbiased estimating function satisfying $E {q_{i j} (t_{i j}, c_{i j}, y_{i j}; θ) | θ} = \iint \sum_{ω} p_{i j} (ω ∣ t, c, θ) q_{i j} (t, c, ω; θ) d F_{t} (t ∣ c, θ) d F_{c} (c ∣ θ) = 0$ for all $θ \in Θ$ and all $j = 1, \dots, n_{i}$ and $i = 1, \dots, K$ . In fact, proving the unbiasedness of $q_{i j} (\cdot, \cdot, \cdot; \cdot)$ is equivalent to showing the unbiasedness of (9) and (10), that is, the conditional expectations of $U (γ)$ and $U (β)$ with respect to $T$ are $0$ . Since the unbiasedness of $U (γ)$ can be established in the same way as in Niu and Peng,¹⁷ here we focus on the unbiasedness of estimating function $U (β)$ , or correspondingly, $U_{i j} (β)$ .

Let ${\tilde{Y}}_{i j} = \log {\tilde{T}}_{i j}$ which denotes the logarithm of the true underlying survival time. Following the idea of Buckley and James,⁴³ we replace ${\tilde{Y}}_{i j}$ in the equation with its conditional expectation ${\hat{y}}_{i j} (β)$ evaluated at the regression coefficients $β$ . By using the double-expectation formula and the Fubini’s theorem, we have

\begin{aligned} E {U_{i j} (β) ∣ c_{i j}, θ} & = \sum_{ω_{i j}} p_{i j} (ω_{i j} ∣ c_{i j}, θ) E {U_{i j} (β) ∣ ω_{i j}, c_{i j}, θ} \\ = \sum_{ω_{i j}} p_{i j} (ω_{i j} ∣ c_{i j}, θ) M_{i j} \int ω_{i j} ({\hat{y}}_{i j} (β) - β^{'} X_{i j}) d F_{t} (t ∣ ω_{i j}, c_{i j}, θ) \\ = π_{i j} M_{i j} \int ({\hat{y}}_{i j} (β) - β^{'} X_{i j}) d F_{t} (t ∣ c_{i j}, θ) \\ = π_{i j} M_{i j} \int (E ({\tilde{Y}}_{i j} ∣ t, c_{i j}, θ) - β^{'} X_{i j}) d F_{t} (t ∣ c_{i j}, θ) \\ = π_{i j} M_{i j} [\int E ({\tilde{Y}}_{i j} ∣ t, c_{i j}, θ) d F_{t} (t ∣ c_{i j}, θ) - β^{'} X_{i j}] \\ = π_{i j} M_{i j} [E ({\tilde{Y}}_{i j} ∣ c_{i j}, θ) - β^{'} X_{i j}] = π_{i j} M_{i j} [β^{'} X_{i j} - β^{'} X_{i j}] = 0 \end{aligned}

where

π_{i j} = π (Z_{i j})

M_{i j} = ϕ_{2}^{- 1} \sum_{l = 1}^{n_{i}} (X_{i l} - \bar{X}) (σ_{\hat{Y} (β)}^{- 1})_{i l} ({\tilde{Q}}_{i}^{(2)})_{l j} (σ_{\hat{Y} (β)}^{- 1})_{i j}

and

{\tilde{Q}}_{i}^{(2)} = (Q_{i}^{(2)})^{- 1}

. Then

\sum_{i = 1}^{K} \sum_{j = 1}^{n_{i}} E {U_{i j} (β) ∣ θ} = \sum_{i = 1}^{K} \sum_{j = 1}^{n_{i}} E {E {U_{i j} (β) ∣ c_{i j}, θ}} = 0

and we can obtain

\sum_{i = 1}^{K} \sum_{j = 1}^{n_{i}} E {q_{i j} (t_{i j}, c_{i j}, ω_{i j}; θ) ∣ θ} = 0

. Therefore, following the proposition in Rosen et al.,⁵⁴ if the algorithm converges, the final solution

\hat{θ}

satisfies the unbiased estimating equations

Ψ (\hat{θ}) = H (\hat{θ} | \hat{θ}) = 0

. Furthermore, based on the regularity conditions, we assume at the beginning of the Appendix, the consistency and asymptotic normality of

\hat{θ}

follows from the result of Huber⁵⁵ as the number of clusters

K \to \infty

, which completes the proof of Theorem 1.

Next we show the details of two items of $Σ$ , that is, $A (θ)$ and $V (θ)$ , in the theorem. Obtaining $V (θ)$ is straightforward. That is, we need to find $E {S_{i} (θ)} = (E (U_{i}^{'} (γ)), E (U_{i}^{'} (β)))^{'}$ and calculate $V (θ) = Σ_{i = 1}^{K} E {S_{i} (θ)} E {S_{i}^{'} (θ)}$ . For $A (θ)$ , it consists of two components, that is, $E {B (θ)}$ and $E {S (θ) S^{'} (θ)}$ . The first component $E {B (θ)}$ can be written as

E (B (θ)) = - E {\frac{\partial}{\partial θ} S (θ)} = - E (\begin{matrix} U_{γ γ} & U_{γ β} \\ U_{β γ} & U_{β β} \end{matrix}) = - E (\begin{matrix} U_{γ γ} & 0 \\ 0 & U_{β β} \end{matrix})

The

(h, k) (h, k = 1, 2, \dots, \dim (γ))

element in the first

\dim (γ) \times \dim (γ)

matrix

U_{γ γ}

\sum_{i = 1}^{K} A_{i}^{(γ)} [B_{i}^{(γ)} C_{i}^{(γ)} - D_{i}^{(γ)} E_{i}^{(γ)}]

where

\begin{aligned} A_{i}^{(γ)} & = {(Z_{i 1 h}, Z_{i 2 h}, \dots, Z_{i n_{i} h})}_{1 \times n_{i}} \\ B_{i}^{(γ)} & = {(\begin{matrix} B_{i 11}^{(γ)} & \dots & B_{i 1 n_{i}}^{(γ)} \\ ⋮ & ⋱ & ⋮ \\ B_{i n_{i} 1}^{(γ)} & \dots & B_{i n_{i} n_{i}}^{(γ)} \end{matrix})}_{n_{i} \times n_{i}} \end{aligned}

where

B_{i m n}^{(γ)} = (1 / 2 ϕ_{1}) (Z_{i m k} (1 - 2 π_{i m}) - Z_{i n k} (1 - 2 π_{i n})) [π_{i m} (1 - π_{i m})]^{1 / 2} [π_{i n} (1 - π_{i n})]^{- 1 / 2} ({\tilde{Q}}_{i}^{(1)})_{m n}

for

m \neq n

, otherwise

B_{i m m}^{(γ)} = 0, m, n = 1, 2, \dots, n_{i}

, and

{\tilde{Q}}_{i}^{(1)} = (Q_{i}^{(1)})^{- 1}

\begin{aligned} C_{i}^{(γ)} & = (ω_{i 1} - π_{i 1}, ω_{i 2} - π_{i 2}, \dots, ω_{i 2} - π_{i 2})^{'} \\ D_{i}^{(γ)} & = {(\begin{matrix} D_{i 11}^{(γ)} & \dots & D_{i 1 n_{i}}^{(γ)} \\ ⋮ & ⋱ & ⋮ \\ D_{i n_{i} 1}^{(γ)} & \dots & D_{i n_{i} n_{i}}^{(γ)} \end{matrix})}_{n_{i} \times n_{i}} \end{aligned}

D_{i m n}^{(γ)} = ϕ_{1}^{- 1} [π_{i m} (1 - π_{i m})]^{1 / 2} [π_{i n} (1 - π_{i n})]^{- 1 / 2} ({\tilde{Q}}_{i}^{(1)})_{m n},

for

m \neq n

, otherwise

D_{i m m}^{(γ)} = ({\tilde{Q}}_{i}^{(1)})_{m m}

E_{i}^{(γ)} = (Z_{i 1 k} π_{i 1} (1 - π_{i 1}), Z_{i 2 k} π_{i 2} (1 - π_{i 2}), \dots, Z_{i n_{i} k} π_{i n_{i}} (1 - π_{i n_{i}}))_{1 \times n_{i}}^{'}

The second block diagonal

\dim (β) \times \dim (β)

matrix

U_{β β}

\begin{aligned} U_{β β} & = \frac{\partial}{\partial β} (\sum_{i = 1}^{K} {(X_{i} - 1_{i} {\bar{X}}^{'})}^{'} {(B_{i}^{1 / 2} Q_{i}^{(2)} B_{i}^{1 / 2} ϕ_{2})}^{- 1} G_{i} ({\hat{Y}}_{i} (β) - X_{i} β)) \\ = \sum_{i = 1}^{K} {(X_{i} - 1_{i} {\bar{X}}^{'})}^{'} {(B_{i}^{1 / 2} Q_{i}^{(2)} B_{i}^{1 / 2} ϕ_{2})}^{- 1} G_{i} {\frac{\partial}{\partial β} {\hat{Y}}_{i} (β) - X_{i}} \end{aligned}

where

(\partial / \partial β) {\hat{Y}}_{i} (β) = ((\partial / \partial β) {\hat{y}}_{i 1} (β), \dots, (\partial / \partial β) {\hat{y}}_{i n_{i}} (β))^{'}

and

(\partial / \partial β) {\hat{y}}_{i j} (β)

is the first derivative of

{\hat{y}}_{i j} (β)

with respect to

β

for

i = 1, \dots, K

and

j = 1, \dots, n_{i}

. Specifically,

\begin{aligned} \frac{\partial}{\partial β} {\hat{y}}_{i j} (β) & = \frac{\partial}{\partial β} (δ_{i j} Y_{i j} + (1 - δ_{i j}) [\frac{\int_{ε_{i j} (β)}^{\infty} u d {\hat{F}}_{ε} (u)}{1 - {\hat{F}}_{ε} (ε_{i j} (β))} + β^{'} X_{i j}]) \\ = (1 - δ_{i j}) [\frac{{\hat{f}}_{ε} (ε_{i j} (β)) X_{i j} {\hat{S}}_{ε} (ε_{i j} (β))}{{\hat{S}}_{ε}^{2} (ε_{i j} (β))} - \frac{\int_{ε_{i j} (β)}^{\infty} u d {\hat{F}}_{ε} (u) {\hat{f}}_{ε} (ε_{i j} (β)) X_{i j}}{{\hat{S}}_{ε}^{2} (ε_{i j} (β))} + X_{i j}] \\ = (1 - δ_{i j}) [{\hat{f}}_{ε} (ε_{i j} (β)) X_{i j} \frac{{\hat{S}}_{ε} (ε_{i j} (β)) - \int_{ε_{i j} (β)}^{\infty} u d {\hat{F}}_{ε} (u)}{{\hat{S}}_{ε}^{2} (ε_{i j} (β))} + X_{i j}] \\ = (1 - δ_{i j}) [{\hat{h}}_{ε} (ε_{i j} (β)) X_{i j} (1 - \frac{\int_{ε_{i j} (β)}^{\infty} u d {\hat{F}}_{ε} (u)}{{\hat{S}}_{ε} (ε_{i j} (β))}) + X_{i j}] \\ = (1 - δ_{i j}) [{\hat{h}}_{ε} (ε_{i j} (β)) X_{i j} (1 - {\hat{y}}_{i j} (β)) + X_{i j}] \end{aligned}

where

{\hat{h}}_{ε} (t) = {\hat{f}}_{ε} (t) / {\hat{S}}_{ε} (t)

{\hat{f}}_{ε} (t) = (\partial / \partial t) {\hat{F}}_{ε} (t)

and

{\hat{S}}_{ε} (t) = 1 - {\hat{F}}_{ε} (t)

, which can be obtained based on (8).

For the second component of $A (θ)$ ,

E {S (θ) S^{'} (θ)} = E (\begin{matrix} U_{γ} U_{γ} & U_{γ} U_{β} \\ U_{β} U_{γ} & U_{β} U_{β} \end{matrix})

where

\begin{aligned} E (U_{γ} U_{γ}) & = E {\sum_{i = 1}^{K} R_{1 i}^{'} V_{1 i}^{- 1} (ω_{i} - g_{i})}^{2} = \sum_{i = 1}^{K} R_{1 i}^{'} V_{1 i}^{- 1} E {(ω_{i} - g_{i}) {(ω_{i} - g_{i})}^{'}} (V_{1 i}^{- 1})^{'} R_{1 i} \\ E (U_{γ} U_{β}) & = E {\sum_{i = 1}^{K} R_{1 i}^{'} V_{1 i}^{- 1} (ω_{i} - g_{i}) \sum_{l = 1}^{K} R_{2 l}^{'} V_{2 l}^{- 1} (ω_{l} - g_{l}) ({\hat{Y}}_{l} (β) - X_{l} β)} \\ = \sum_{i = 1}^{K} R_{1 i}^{'} V_{1 i}^{- 1} E {(ω_{i} - g_{i}) ({\hat{Y}}_{i} (β) - X_{i} β)^{'} {(ω_{i} - g_{i})}^{'}} (V_{2 i}^{- 1})^{'} R_{2 i} \end{aligned}

and

\begin{aligned} E (U_{β} U_{β}) & = E {\sum_{i = 1}^{K} R_{2 i}^{'} V_{2 i}^{- 1} (ω_{i} - g_{i}) ({\hat{Y}}_{i} (β) - X_{i} β)}^{2} \\ = \sum_{i = 1}^{K} R_{2 i}^{'} V_{2 i}^{- 1} E {(ω_{i} - g_{i}) ({\hat{Y}}_{i} (β) - X_{i} β) ({\hat{Y}}_{i} (β) - X_{i} β)^{'} {(ω_{i} - g_{i})}^{'}} (V_{2 i}^{- 1})^{'} R_{2 i} \end{aligned}

Here

ω_{i} = (ω_{i 1}, \dots, ω_{i n_{i}})

R_{1 i} = \partial π (Z_{i}) / \partial γ

V_{1 i} = A_{i}^{1 / 2} Q_{i}^{(1)} A_{i}^{1 / 2} ϕ_{1}

R_{2 i} = X_{i} - 1_{i} {\bar{X}}^{'}

and

V_{2 i} = B_{i}^{1 / 2} Q_{i}^{(2)} B_{i}^{1 / 2} ϕ_{2}

References

Taylor

JMG

. Semi-parametric estimation in failure time mixture models. Biometrics 1995; 51: 899–907.

Peng

. Cure models: Methods, applications, and implementation. Boca Raton, FL: CRC/Chapman & Hall, 2021.

Amico

Van Keilegom

. Cure models in survival analysis. Annu Rev Stat Appl 2018; 5: 311–342.

McGilchrist

Aisbett

. Regression with frailty in survival analysis. Biometrics 1991; 47: 461–466.

Peng

Taylor

JMG

. A marginal regression model for multivariate failure time data with a surviving fraction. Lifetime Data Anal 2007; 13: 351–369.

Rubio

Drikvandi

. MEGH: A parametric class of general hazard models for clustered survival data. Stat Methods Med Res 2022; 31: 1603–1616.

Chiou

Kang

Yan

. Semiparametric accelerated failure time modeling for clustered failure times from stratified sampling. J Am Stat Assoc 2015; 110: 621–629.

Liang

K-Y

Zeger

. Longitudinal data analysis using generalized linear models. Biometrika 1986; 73: 13–22.

Chiou

Kang

Kim

, et al. Marginal semiparametric multivariate accelerated failure time model with generalized estimating equations. Lifetime Data Anal 2014; 20: 599–618.

10.

Chiou

Kang

Yan

. Fitting accelerated failure time models in routine survival analysis with R package aftgee. J Stat Softw 2014; 61: 1–23.

11.

Gray

. Optimal weight functions for marginal proportional hazards analysis of clustered failure time data. Lifetime Data Anal 2002; 8: 5–19.

12.

Niu

Peng

. A new estimating equation approach for marginal hazard ratio estimation. Comput Stat Data Anal 2015; 87: 46–56.

13.

Peng

. Mixture cure models for multivariate survival data. Comput Stat Data Anal 2008; 52: 1524–1532.

14.

Chen

C-M

T-FC

. Marginal analysis of multivariate failure time data with a surviving fraction based on semiparametric transformation cure models. Comput Stat Data Anal 2012; 56: 645–655.

15.

Chen

C-M

C-Y

. A two-stage estimation in the Clayton–Oakes model with marginal linear transformation models for multivariate failure time data. Lifetime Data Anal 2012; 18: 94–115.

16.

Niu

Peng

. A semiparametric marginal mixture cure model for clustered survival data. Stat Med 2013; 32: 2364–2373.

17.

Niu

Peng

. Marginal regression analysis of clustered failure time data with long-term survivors. J Multivar Anal 2014; 123: 129–142.

18.

Niu

Song

Liu

, et al. Modeling clustered long-term survivors using marginal mixture cure model. Biom J 2018; 60: 780–796.

19.

Yau

KKW

ASK

. Long-term survivor mixture model with random effects: Application to a multi-centre clinical trial of carcinoma. Stat Med 2001; 20: 1591–1607.

20.

Lai

Yau

KKW

. Long-term survivor model with bivariate random effects: Applications to bone marrow transplant and carcinoma study data. Stat Med 2008; 27: 5692–5708.

21.

Peng

Taylor

JMG

. Mixture cure model with random effects for the analysis of a multi-centre tonsil cancer study. Stat Med 2011; 30: 211–223.

22.

Xiang

Yau

. Mixture cure model with random effects for clustered and interval-censored survival data. Stat Med 2011; 30: 995–1006.

23.

Tawiah

Bondell

. Multilevel joint frailty model for hierarchically clustered binary and survival data. Stat Med 2023; 42: 3745–3763.

24.

Chatterjee

Shih

. On use of bivariate survival models with cure fraction. Biometrics 2003; 59: 1184–1185.

25.

Wienke

Lichtenstein

Yashin

. A bivariate frailty model with a cure fraction for modeling familial correlation in disease. Biometrics 2003; 59: 1178–1183.

26.

Lakhal-Chaieb

Duchesne

. Association measures for bivariate failure times in the presence of a cure fraction. Lifetime Data Anal 2017; 23: 517–532.

27.

C-L

Lin

F-C

. Analysis of clustered failure time data with cure fraction using copula. Stat Med 2019; 38: 3961–3973.

28.

Wei

. The accelerated failure time model: A useful alternative to the Cox regression model in survival analysis. Stat Med 1992; 11: 1871–1879.

29.

Crowther

Royston

Clements

. A flexible parametric accelerated failure time model and the extension to time-dependent acceleration factors. Biostatistics 2023; 24: 811–831.

30.

Fulcher

Tchetgen

Williams

. Mediation analysis for censored survival data under an accelerated failure time model. Epidemiology 2017; 28: 660–666.

31.

VanderWeele

. Causal mediation analysis with survival data. Epidemiology 2011; 22: 582–585.

32.

Yamaguchi

. Accelerated failure-time regression models with a regression model of surviving fraction: An application to the analysis of ‘permanent employment’ in Japan. J Am Stat Assoc 1992; 87: 284–292.

33.

Peng

Dear

KBG

Denham

. A generalized F mixture model for cure rate estimation. Stat Med 1998; 17: 813–830.

34.

C-S

Taylor

JMG

. A semi-parametric accelerated failure time cure model. Stat Med 2002; 21: 3235–3247.

35.

Zhang

Peng

. A new estimation method for the semiparametric accelerated failure time mixture cure model. Stat Med 2007; 26: 3157–3171.

36.

Zhang

Peng

. An alternative estimation method for the accelerated failure time frailty model. Comput Stat Data Anal 2007; 51: 4413–4423.

37.

Zhang

. Multiple imputation method for the semiparametric accelerated failure time mixture cure model. Comput Stat Data Anal 2010; 54: 1808–1816.

38.

. Efficient estimation for an accelerated failure time model with a cure fraction. Stat Sin 2010; 20: 661–674.

39.

Zhang

Peng

. A new semiparametric estimation method for accelerated hazards mixture cure model. Comput Stat Data Anal 2013; 59: 95–102.

40.

Choi

Zhu

Huang

. Semiparametric accelerated failure time cure rate mixture models with competing risks. Stat Med 2018; 37: 48–59.

41.

Liu

Xiang

. Generalized accelerated hazards mixture cure models with interval-censored data. Comput Stat Data Anal 2021; 161: 107248.

42.

Ritov

. Estimation in a linear regression model with censored data. Ann Stat 1990; 18: 303–328.

43.

Buckley

James

. Linear regression with censored data. Biometrika 1979; 66: 429–436.

44.

Monaco

Cai

Grizzle

. Bootstrap analysis of multivariate failure time data. Stat Med 2005; 24: 3387–3400.

45.

Jin

Lin

Ying

. On least-squares regression with censored data. Biometrika 2006; 93: 147–161.

46.

Emrich

Piedmonte

. A method for generating high-dimensional multivariate binary variates. Am Stat 1991; 45: 302–304.

47.

Cai

Zou

Peng

, et al. smcure: An R-package for estimating semiparametric mixture cure models. Comput Methods Programs Biomed 2012; 108: 1255–1260.

48.

Huang

Luo

Liang

, et al. Survival nomogram for young breast cancer patients based on the SEER database and an external validation cohort. Ann Surg Oncol 2022; 29: 5772–5781.

49.

Maller

Zhou

. Testing for the presence of immune or cured individuals in censored survival data. Biometrics 1995; 51: 1197–1205.

50.

Kim

, et al. Clinical significance of the lymph node ratio in N1 breast cancer. Radiat Oncol J 2017; 35: 227–232.

51.

Peng

Taylor

JMG

. Residual-based model diagnosis methods for mixture cure models. Biometrics 2017; 73: 495–505.

52.

Lee

Nelder

. Conditional and marginal models: Another view. Stat Sci 2004; 19: 219–238.

53.

Kim

Kang

. Comparison of variance estimation methods in semiparametric accelerated failure time models for multivariate failure time data. Jpn J Stat Data Sci 2021; 4: 1179–1202.

54.

Rosen

Jiang

Tanner

. Mixtures of marginal models. Biometrika 2000; 87: 391–404.

55.

Huber

. The behavior of maximum likelihood estimates under nonstandard conditions. In: Le Cam LM and Neyman J (eds) Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol. 4. Berkeley, CA: University of California Press, 1967, pp.221–233.