Sage Journals: Discover world-class research

Abstract

Patient-centered outcomes, such as quality of life and length of hospital stay, are the focus in a wide array of clinical studies. However, participants in randomized trials for elderly or critically and severely ill patient populations may have truncated or undefined non-mortality outcomes if they do not survive through the measurement time point. To address truncation by death, the survivor average causal effect has been proposed as a causally interpretable subgroup treatment effect defined under the principal stratification framework. However, the majority of methods for estimating the survivor average causal effect have been developed in the context of individually randomized trials. Only limited discussions have been centered around cluster-randomized trials, where methods typically involve strong distributional assumptions for outcome modeling. In this article, we propose two weighting methods to estimate the survivor average causal effect in cluster-randomized trials that obviate the need for potentially complicated outcome distribution modeling. We establish the requisite assumptions that address latent clustering effects to enable point identification of the survivor average causal effect, and we provide computationally efficient asymptotic variance estimators for each weighting estimator. In simulations, we evaluate our weighting estimators, demonstrating their finite-sample operating characteristics and robustness to certain departures from the identification assumptions. We illustrate our methods using data from a cluster-randomized trial to assess the impact of a sedation protocol on mechanical ventilation among children with acute respiratory failure.

Keywords

Estimands generalized linear mixed models principal stratification principal score survival score survivor average causal effect cluster-randomized trials

1. Introduction

In cluster-randomized trials (CRTs), groups of individuals, such as schools, communities, or health facilities, are randomly assigned to treatments.¹ CRTs are a favorable alternative to individually randomized trials when the intervention must be administered to entire groups, there are concerns about contamination (i.e. individuals can gain access to treatment(s) to which they were not assigned), and/or individual random assignment is logistically challenging.² In CRTs, researchers may be interested in studying the causal effect of a binary treatment on a non-mortal outcome like quality of life. However, in some applications, individual participants may die before the time when their non-mortal outcome would be measured, and death acts as an intercurrent event. Mortality rates can be substantial in CRTs among vulnerable populations such as the critically ill or elderly patients.

Death prior to follow-up is a post-randomization event that precludes the complete measurement of a non-mortal outcome. However, estimation of treatment effects among observed survivors can introduce selection bias when survival status is impacted by treatment because such conditioning disrupts the balance of confounders, measured and unmeasured, afforded by randomization. Bias due to conditioning on post-randomization variables such as death is a well-documented issue in the causal inference literature.^3,4 To address this problem from a causal perspective, Frangakis and Rubin⁵ proposed the principal stratification framework which partitions populations with respect to cross-classified potential outcomes of a binary post-randomization variable. These latent strata are thus unaffected by treatment assignment and can be considered as pre-treatment covariates that define relevant subpopulations. In the context of truncation by death, a common causal estimand of interest is the mean difference in potential outcomes among individuals who would survive under either treatment received (referred to as the always-survivors), named the survivor average causal effect (SACE).^6,7 This is precisely the subpopulation where the pair of non-mortality potential outcomes are well-defined without truncation by death.

For the setting of individually randomized trials, several methods have been proposed to point identify and estimate SACE. For example, Zhang et al.⁸ introduced a mixture model approach for empirical identification of SACE, which relies on correctly specifying the joint distribution of principal strata and non-mortal outcomes conditional on covariates. While likelihood inference leads to efficient estimators under correct model assumptions, accurate specification of the conditional outcome models is usually more challenging than the principal strata model. In addition, empirical identification under the mixture model requires sufficient distance between the mixture components and may be numerically less stable otherwise.⁹ Alternatively, leveraging information from baseline covariates, Hayden et al.¹⁰ and Ding and Lu¹¹ considered different formulations of ignorability assumptions that allow for non-parametric point identification of SACE. These assumptions result in simple weighting methods for these estimands. Employing a similar set of assumptions to Ding and Lu,¹¹ Zehavi and Nevo¹² analogously developed matching estimators for SACE. Despite the simplicity of the weighting methods, they were designed for independent and identically distributed (iid) data and may not be directly applicable to the multilevel data structure present in CRTs. In CRTs, individuals within the same group tend to have positively correlated outcomes, even after conditioning on measured individual-level covariates. Failing to properly account for this correlation may result in underestimated variances and anti-conservative inference.¹³ Moreover, if there are unobserved cluster-level characteristics that simultaneously affect survival statuses and non-mortal outcomes of individuals, the existing SACE estimators based only on individual covariate information for non-parametric identification (as with iid data) would not suffice even under cluster randomization. The role of clustering effects on causal identification of the SACE, particularly from potentially unmeasured sources, has not been sufficiently explored for CRTs.

In this article, we survey the application of weighting estimators for addressing truncation by death in CRTs. We pursue the principal stratification framework and tackle the additional complications associated with the hierarchical data structure imposed by cluster randomization. While there are a number of likelihood-based methods that exist for CRTs with truncation by death,^14–16 the more intuitive weighting estimators for SACE have not been fully studied. To dispense with the need to model the outcome distributions, we provide two weighting estimators derived under different sets of identification assumptions inspired by Hayden et al.,¹⁰ Ding and Lu,¹¹ and Zehavi and Nevo.¹² While the estimators are distinct, they turn out to be functions of the exact same working models to predict survival status, and we provide a conceptual and numerical comparison between these estimators. Furthermore, for each estimator, we consider both conditional and marginal logistic regression paradigms for modeling survival status as they are commonly used in analyses of CRTs.¹ To accomplish this, we solve for the SACE estimators and construct their sandwich variance expressions using cluster-indexed estimating equations per the M-estimation method^17–19 that are specific to each modeling paradigm.

We have organized the remaining sections as follows. In Section 2, we define notation, provide two sets of identification assumptions, and present the corresponding weighting estimators for SACE in CRTs, along with their variance estimators. In Section 3, we conduct a simulation study to empirically assess the performance of our estimators under different data generating mechanisms which adhere to each set of assumptions. Moreover, we assess the necessity of explicitly modeling for latent group-level effects by comparing the inferential performance of the conditional to the marginal model for survival status in the SACE estimators. Although the proposed marginal model does not incorporate these effects directly, the cluster-indexed M-estimation results in a sandwich variance estimator that aims to correct for this in inference. We also directly compare the computing time and coverage of our proposed asymptotic confidence intervals against bootstrap confidence intervals. In Section 4, we apply our methods to SACE estimation in a CRT that evaluates the impact of a sedation protocol on mechanical ventilation duration among children with acute respiratory failure. In the data application, we consider the analysis of a duration outcome defined within a specific time frame and ascribe to it no parametric assumptions. However, we leave a full time-to-event analysis, which includes addressing censoring in causal assumptions as well as estimation, for future development of the survival framework for SACE in CRTs. Section 5 concludes with a discussion. To facilitate implementation of the proposed methods, we provide a companion R package, PtSaceCrts, for implementing our weighting estimators in CRTs, available at https://github.com/abcdane1/PtSaceCrts.

2. Methods

2.1. Notation, survivor average causal effect estimand, and assumptions

We use the following notation for CRTs. Let $n_{c}$ denote the number of clusters in a study. The subscript $i$ refers to the $i -th$ cluster such that $i = 1, \dots, n_{c}$ . Let $n_{i} < \infty$ be the size of cluster $i$ , and $A_{i} \in {0, 1}$ be the binary treatment assigned to cluster $i$ . The double subscript “ $i j$ ” refers to the $j -th$ individual, $j = 1, \dots, n_{i}$ , in the $i -th$ cluster. In the observed data, let $S_{i j} \in {0, 1}$ be individual survival status before the measurement of the non-mortal outcome, and let $Y_{i j}$ be the non-mortal outcome. Under the potential outcomes framework, we let $S_{i j} (a)$ and $Y_{i j} (a)$ be the potential values of the survival status and non-mortal outcome, respectively, of individual $j$ in cluster $i$ had cluster $i$ been assigned to $A_{i} = a$ . Under truncation by death, potential outcomes are connected to observed outcomes under received treatment assignment through the cluster-level SUTVA (the stable unit treatment value assumption), where $S_{i j} = A_{i} S_{i j} (1) + (1 - A_{i}) S_{i j} (0)$ and if $S_{i j} (a) = 1$ , then $Y_{i j} = A_{i} Y_{i j} (1) + (1 - A_{i}) Y_{i j} (0)$ . If $S_{i j} (a) = 0$ , then we denote $Y_{i j} (a) = *$ since it is not completely measured following established notation for SACE in the independent data setting.^7,8 In addition, we define $S_{i} (a) = (S_{i 1} (a), \dots, S_{i n_{i}} (a))^{T}$ , and $Y_{i} (a) = (Y_{i 1} (a), \dots, Y_{i n_{i}} (a))^{T}$ .

Under the principal stratification framework,⁵ we denote the prinicipal strata in terms of the joint variables, $G_{i j} = (S_{i j} (1), S_{i j} (0)) \in {0, 1} \times {0, 1}$ . These strata partition the population into subgroups defined with potential post-randomization events (sometimes referred to as intercurrent events) and since they are no longer functions of $A_{i} = a$ , stratum membership can be viewed as a pre-treatment covariate. The strata are always-survivors when $G_{i j} = (1, 1)$ , protected patients when $G_{i j} = (1, 0)$ , harmed patients when $G_{i j} = (0, 1)$ , and never-survivors when $G_{i j} = (0, 0)$ . For studying the treatment effect on the non-mortality outcome, we define the estimand within the sub-population of always-survivors. We restrict the estimand to this stratum because it is the only subgroup for which $Y_{i j} (1)$ and $Y_{i j} (0)$ are both unambiguously defined (or equivalently without impact of the intercurrent event of death). These effects are represented as functions $g (μ (1), μ (0))$ where for $a = 0, 1$ ,

μ (a) = E {Y_{i j} (a) | G_{i j} = (1, 1)} = \frac{E {Y_{i j} (a) S_{i j} (1) S_{i j} (0)}}{E {S_{i j} (1) S_{i j} (0)}}

(1)

Possible functions

g (μ (1), μ (0))

may be the mean difference or risk ratio. We primarily focus on the SACE expressed in mean difference (although the methods apply directly to other summary measures) defined as

τ = μ (1) - μ (0) = E {Y_{i j} (1) - Y_{i j} (0) | G_{i j} = (1, 1)}

(2)

This estimand represents the expected difference in response had an individual received the active treatment as compared to the control given their membership in the sub-population of always-survivors—that is, a subset of healthier patients for whom the definition of non-mortality potential outcomes is not affected by the intercurrent event of death.

Because stratum membership is only partially observed, the estimand $τ$ cannot be directly identified without additional assumptions. We consider assumptions that require leveraging information from observed baseline covariates. Let the variables $X_{i j}$ denote vectors of individual-level covariates such that $X_{i}$ is the covariate matrix for all individuals in cluster $i$ , and let the variables $C_{i}$ denote vectors of cluster-level covariates. In addition to the cluster-level SUTVA, we require that for all $i, j$ and $a = 0, 1$ that $0 < P (S_{i j} (a) = 1 | X_{i}, C_{i}) < 1$ . In other words, every individual within every cluster has some possibility of either survival or mortality under both treatments. The remaining assumptions are as follows:

Assumption 1 (Between-cluster independence)

The potential outcomes and cluster-level baseline variables ${S_{i} (1), S_{i} (0), Y_{i} (1), Y_{i} (0), X_{i}, C_{i}}$ are independent across clusters and drawn from a common distribution for a given cluster size.

Assumption 2 (Randomization)

Treatment assignment is independent of all cluster-level variables such that $A_{i} ⊥ {S_{i} (1), S_{i} (0), Y_{i} (1), Y_{i} (0), X_{i}, C_{i}}$ , and $P (A_{i} = 1) = p \in (0, 1)$ .

Assumption 3 (Non-informative cluster-size)

Within each cluster $i$ , the vector of the individual-level potential outcomes ${S_{i j} (1), S_{i j} (0), Y_{i j} (1), Y_{i j} (0)}$ have the same marginal distribution that is independent of the cluster size $n_{i}$ .

Assumptions 1 and 2 are standard assumptions for CRTs. Assumption 3 prevents the size of each cluster from possibly affecting individuals’ survival status and non-mortal outcomes, a phenomenon referred to as informative cluster size.²⁰ This assumption thus allows us to define $μ (a)$ as a ratio of marginal expectations of individual-level potential outcomes without ambiguity. We provide a further discussion of informative cluster size in Section 5. In addition to the aforementioned assumptions, we need one of the following two additional sets of assumptions to identify SACE in CRTs. Set 1 extends the explainable non-random survival assumptions of Hayden et al.,¹⁰ and Set 2 refers to the Ding and Lu¹¹ and Zehavi and Nevo¹² assumptions based on the survival monotonicity and principal ignorability. Both Sets 1 and 2 rely on cross-world assumptions,²¹ which make claims about survival and/or non-mortal outcome distributions across different interventions. The Set 1 assumptions are defined as follows.

Set 1 Assumption 4 (S1A4): (Conditional Survival Independence). For each individual in a cluster, $S_{i j} (a) ⊥ S_{i j} (1 - a) | X_{i}, C_{i}$ .

Set 1 Assumption 5 (S1A5): (Strong Partial Principal Ignorability). For each individual in a cluster and for $a = 0, 1$ , $Y_{i j} (a) ⊥ S_{i j} (1 - a) | X_{i}, C_{i}, {S_{i j} (a) = 1}$ .

Assumption S1A4 means that an individual’s survival status under one treatment is independent of their survival status under the other treatment given observed information within each cluster. In other words, a patient’s survival under one treatment provides no additional knowledge about their survival under the other treatment once a sufficient set of individual-level covariates (such as age, and other baseline clinical characteristics) and cluster-level covariates (geographical location such as urban or non-urban) is controlled for. Similarly, the conditional independence statement of S1A5 signifies that an individual’s non-mortal outcome under one treatment, contingent on surviving to have this recorded, is not informed by their survival status under the other treatment given a set of measured baseline characteristics within a cluster. To parallel the Set 1 assumptions, the Set 2 assumptions are defined as follows.

Set 2 Assumption 4 (S2A4): (Survival Monotonicity). For each individual in a cluster, $S_{i j} (1) \geq S_{i j} (0)$ such that there is no harmed patients stratum in the study population.

Set 2 Assumption 5 (S2A5): (Partial Principal Ignorability). For each individual in a cluster, $Y_{i j} (1) ⊥ S_{i j} (0) | X_{i}, C_{i}, {S_{i j} (1) = 1}$ .

Assumption S2A4 says that survival under the active treatment is no worse than survival under the control, therefore, leaving only three possible principal strata in the study population. S2A5 is a weaker version of S1A5 in that the relationship described need only hold under the active treatment. In their initial conception, Ding and Lu¹¹ provided the stronger assumption of generalized principal ignorability, whose equivalent here would be $Y_{i j} (a) ⊥ {S_{i j} (1), S_{i j} (0)} | X_{i}, C_{i}$ for $a = 0, 1$ . The purpose in that more general context is to allow for the identification of estimands defined in additional principal strata, for example, in the presence of treatment noncompliance as an intercurrent event. Under monotonicity, however, it is sufficient to consider S2A5 for identification of the SACE, as pointed out previously.¹²

As we explain later, each set of assumptions motivate a distinct weighting estimator to identify SACE in CRTs. Given their seeming similarity to one another, it is important to conceptually distinguish between the Set 1 and Set 2 assumptions. We visualize their relationships in Figure 1, which is meant to demonstrate two salient features. First, S2A4 and S2A5 imply S1A5. For the case of $a = 1$ , S2A5 directly implies S1A5, and for the case of $a = 0$ , survival monotonicity, S2A4, means the event $S_{i j} (0) = 1$ implies $S_{i j} (1) = 1$ (i.e. a constant), which induces the stated conditional independence.¹² Second, the Set 1 and Set 2 assumptions are mutually exclusive due to the relationship between S1A4 and S2A4. If S1A4 and S2A4 are simultaneously true, we would have for each individual and $X_{i}, C_{i}$ , $P (S_{i j} (1) = 0, S_{i j} (0) = 1 | X_{i}, C_{i}) = P (S_{i j} (1) = 0 | X_{i}, C_{i}) P (S_{i j} (0) = 1 | X_{i}, C_{i}) = 0$ , which is prohibited by the assumption of non-trivial survival, $0 < P (S_{i j} (a) = 1 | X_{i}, C_{i}) < 1$ , under $a = 0, 1$ . In summary, these two sets of assumptions live disjointly within the condition S1A5.

Figure 1.

Diagram of relationship between Set 1 and Set 2 assumptions for identification of SACE. Each assumption set is satisfied within the intersection of their respective conditions (where they are both preceded by Assumptions 1 to 3, non-trivial survival, and SUTVA). S1A4 $=$ conditional survival independence, S1A5 $=$ strong partial principal ignorability, S2A4 $=$ survival monotonicity, and S2A5 $=$ partial principal ignorability. SACE: survivor average causal effect; SUTVA: stable unit treatment value assumption.

The above framework puts forth the necessary causal assumptions for non-parametric identification supposing that we have a sufficient set of observed individual-level and cluster-level characteristics. However, in reality, even after conditioning on a rich set of baseline covariates, individuals’ outcomes within the same cluster tend to be positively correlated. If the source of this correlation is not accounted for in the assumptions, then the conditional independence statements of Set 1 (S1A4 and S1A5) and of Set 2 (S2A5), which hold across all individuals, may not be sufficiently strong to enable identification. To accommodate this complexity, we extend the above assumptions by having $C_{i}$ (wherever it appears) be comprised of two components, $C_{i} = (C_{i, fe}^{T}, b_{i})^{T}$ where $C_{i, fe}$ denotes the component of $C_{i}$ whose values can be observed and $b_{i}$ , a parametric random intercept for cluster, which induces the within-cluster correlation. The caveat of this extension is that identification requires specification of the distributions of the latent variables $b_{i}$ and restricts the form of unmeasured confounding between survival and the final outcome. Similar latent cluster-level confounding assumptions have also been proposed in earlier work on clustered observational studies without truncation by death.^22,23 Specifically, as described in the next sections, we make a Gaussian specification for the $b_{i}$ enabling parametric survival status modeling according to conventional mixed effects logistic regression for binary survival status. Nevertheless, we also evaluate, using simulations, the consequence of ignoring the cluster-level random intercept for survival status modeling to generate practical recommendations.

2.2. Two identification formulas

We next present Identity 1 and Identity 2, which represent how SACE is point identified with estimable parameters with intuitive weighting under the Set 1 and Set 2 assumptions respectively. Importantly, for the two SACE quantities, any appearance of the non-mortal outcome will be predicated on survival (due to a product term). Interestingly, while different in form, they will both depend on the exact same models for survival status.

Identity 1. The parameters $μ (a)$ for $a = 0, 1$ under Set 1 assumptions are identified with:

μ (a) = \frac{E {Y_{i j} I (A_{i} = a) S_{i j} p_{i j}^{1 - a} (X_{i}, C_{i})}}{E {I (A_{i} = a) S_{i j} p_{i j}^{1 - a} (X_{i}, C_{i})}}

(3)

where

p_{i j}^{A_{i}} (X_{i}, C_{i}) := P (S_{i j} = 1 | A_{i}, X_{i}, C_{i})

. The weight

I (A_{i} = a) S_{i j} p_{i j}^{1 - a} (X_{i}, C_{i})

means that the observed survivors within one treatment group are up-weighted by their probability of survival under the alternative treatment, which in turn signifies that individuals whose probability of being an always-survivor is higher are weighted more heavily. In the context of a CRT, the proof of Identity 1 is provided in Supplemental Material 1.1.

Identity 2. The parameters $μ (a)$ under Set 2 assumptions are identified with:

μ (0) = \frac{E {Y_{i j} (1 - A_{i}) S_{i j}}}{E {(1 - A_{i}) S_{i j}}}

(4)

and

μ (1) = \frac{E {Y_{i j} A_{i} S_{i j} p_{i j}^{0} (X_{i}, C_{i}) / p_{i j}^{1} (X_{i}, C_{i})}}{E {A_{i} S_{i j} p_{i j}^{0} (X_{i}, C_{i}) / p_{i j}^{1} (X_{i}, C_{i})}}

(5)

By cluster-level SUTVA, S2A4, and

A_{i} \in {0, 1}

, we have

(1 - A_{i}) S_{i j} = (1 - A_{i})^{2} S_{i j} (0) = (1 - A_{i}) S_{i j} (0) S_{i j} (1)

. Therefore, the weights of

μ (0)

are straightforward because those who are observed to have survived treatment 0 are always-survivors. The Set 2 assumptions also imply that

p_{i j}^{0} (X_{i}, C_{i}) = P (G_{i j} = (1, 1) | X_{i}, C_{i})

. Since

p_{i j}^{1} (X_{i}, C_{i}) = P (G_{i j} = (1, 1) | X_{i}, C_{i}) + P (G_{i j} = (1, 0) | X_{i}, C_{i})

, the terms in

A_{i} S_{i j} p_{i j}^{0} (X_{i}, C_{i}) / p_{i j}^{1} (X_{i}, C_{i})

μ (1)

re-weight the observed survivors in the treated group, which are comprised of both always-survivors and protected patients, to reflect the population of always-survivors. In the context of a CRT, the proof of Identity 2 is presented in Supplemental Material 1.2.

It is critical to emphasize that the quantities, $μ (a)$ , are expectations conditioned on $G_{i j} = (1, 1)$ so, at the outset, it may seem as though the identified estimands would necessarily rely on models for $P (G_{i j} = g | X_{i}, C_{i})$ with $g \in {0, 1} \times {0, 1}$ , referred to as “principal score models.”¹¹ However, both the Set 1 and Set 2 assumptions, which contain cross-world conditions, allow for identification of $μ (a)$ by separating the survival statuses and the non-mortal outcomes across treatments. As a result, we see that identification leads to products of survival status terms (as in equations (3) and (5) above)—one term that is observed and one that needs to be predicted given covariates and random effects under a given treatment. In contrast, the full likelihood based approaches, while avoiding these assumptions, require having to target the principal score models. From a practical point of view, modeling binary survival statuses is generally more straightforward than simultaneously modeling at least three principal strata.

2.3. Model specifications and weighting estimators

For our working survival modeling, we first consider that the data, ${A_{i}, S_{i}, C_{i}, X_{i}}$ , arise from each cluster independently. Given our beliefs about the mechanism of clustering effects, we will suppose that within each cluster $i$ , an individual’s survival status is drawn according to $S_{i j} | A_{i}, X_{i}, C_{i}; γ \sim Bern (expit {D_{i j}^{T} β + b_{i}})$ where $b_{i} \sim N (0, σ_{b}^{2})$ with $σ_{b}^{2} > 0$ , and $D_{i j}$ is a finite vector of observed regressors involving $A_{i}, X_{i}, C_{i, fe}$ . In practice, we often make a convenience choice that $D_{i j} = (1, A_{i}, X_{i j}^{T}, C_{i, fe}^{T})^{T}$ such that the survival of each individual only depends on the characteristics of that individual but not those from other individuals in the same cluster (apart from shared cluster-level information). Therefore, for $a = 0, 1$ , each individual’s probability of survival is represented as follows:

p_{i j}^{a} (X_{i}, C_{i}; γ) = \frac{\exp {D_{i j, a}^{T} β + b_{i}}}{1 + \exp {D_{i j, a}^{T} β + b_{i}}}

(6)

where

D_{i j, a}

represents setting

A_{i} = a

. The effect of treatment and observable baseline covariates on survival is captured in

β

, and the unobserved effects of clustering on survival are represented via the random intercept

b_{i}

with variance

σ_{b}^{2}

, so that

γ = (β^{T}, σ_{b}^{2})^{T}

is of finite dimension. Although we do not consider this possibility,

D_{i j}

can also include treatment-by-covariate interactions as well as summary measures of individual-level covariates through

g (X_{i})

(e.g.

{\bar{X}}_{i}

) as in a contextual-effects model.²⁴

Two common parametric approaches for analyzing survival status data from parallel-arm CRTs—that we will use for survival status modeling for our SACE estimators—are to (i) fit a GLMM, specifically a mixed effects logistic regression model with a random intercept for cluster membership and (ii) fit a GLM, specifically a logistic regression model (accounting for clustering only in the robust sandwich variance expression) where both models must include a treatment effect indicator.¹³ Since the GLMM relies on modeling the distribution of survival status conditional on the random effects (as well as any observed covariates), it is referred to as a conditional model. In contrast, the GLM, for which no random effects are modeled, is referred to as a marginal model under working independence.

We will denote the estimators derived from Set 1 and Set 2, respectively, as the survival score weighting (SSW) estimator, ${\hat{τ}}_{SSW} = {\hat{μ}}_{SSW} (1) - {\hat{μ}}_{SSW} (0)$ , and the principal score weighting (PSW) estimator, ${\hat{τ}}_{PSW} = {\hat{μ}}_{PSW} (1) - {\hat{μ}}_{PSW} (0)$ . Their forms are summarized in Table 1. Although neither estimator fits principal scores directly, Set 2 uses that under monotonicity, the always-survivors principal stratum is comprised of exactly those that survive under treatment 0, whereas Set 1 relies on S1A4 breaking up the principal strata distributions (hence the naming convention). As discussed in Supplemental Material 1.3 and 1.4, ${\hat{τ}}_{SSW}$ and ${\hat{τ}}_{PSW}$ are consistent for SACE if ${\hat{θ}}_{SSW} = ({\hat{γ}}^{T}, {\hat{μ}}_{SSW} (1), {\hat{μ}}_{SSW} (0))^{T}$ and ${\hat{θ}}_{PSW} = ({\hat{γ}}^{T}, {\hat{μ}}_{PSW} (1), {\hat{μ}}_{PSW} (0))^{T}$ are a solution to unbiased estimating equations indexed by cluster, and if under the regularity conditions for which these ${\hat{θ}}_{(\cdot)} \overset{p}{\to} θ_{0} = (γ_{0}^{T}, μ (1), μ (0))^{T}$ , we have correct specification of the working survival modeling such that $γ_{0}$ reflects the truth.¹⁹ With latent cluster-level confounding, to achieve consistency, we would in principle fit the GLMM model (i) as it directly controls for $b_{i}$ as a random intercept.

Table 1.

A side-by-side comparison of two weighting estimators for SACE ( $τ$ ).

Name (assumptions)	Expressions for estimator
${\hat{τ}}_{SSW}$ (Set 1)	$\frac{\sum_{i = 1}^{n_{c}} \sum_{j = 1}^{n_{i}} Y_{i j} A_{i} S_{i j} p_{i j}^{0} (X_{i}, C_{i}; \hat{γ})}{\sum_{i = 1}^{n_{c}} \sum_{j = 1}^{n_{i}} A_{i} S_{i j} p_{i j}^{0} (X_{i}, C_{i}; \hat{γ})} - \frac{\sum_{i = 1}^{n_{c}} \sum_{j = 1}^{n_{i}} Y_{i j} (1 - A_{i}) S_{i j} p_{i j}^{1} (X_{i}, C_{i}; \hat{γ})}{\sum_{i = 1}^{n_{c}} \sum_{j = 1}^{n_{i}} (1 - A_{i}) S_{i j} p_{i j}^{1} (X_{i}, C_{i}; \hat{γ})}$
${\hat{τ}}_{PSW}$ (Set 2)	$\frac{\sum_{i = 1}^{n_{c}} \sum_{j = 1}^{n_{i}} \frac{Y_{i j} A_{i} S_{i j} p_{i j}^{0} (X_{i}, C_{i}; \hat{γ})}{p_{i j}^{1} (X_{i}, C_{i}; \hat{γ})}}{\sum_{i = 1}^{n_{c}} \sum_{j = 1}^{n_{i}} \frac{A_{i} S_{i j} p_{i j}^{0} (X_{i}, C_{i}; \hat{γ})}{p_{i j}^{1} (X_{i}, C_{i}; \hat{γ})}} - \frac{\sum_{i = 1}^{n_{c}} \sum_{j = 1}^{n_{i}} Y_{i j} (1 - A_{i}) S_{i j}}{\sum_{i = 1}^{n_{c}} \sum_{j = 1}^{n_{i}} (1 - A_{i}) S_{i j}}$

Name (assumptions)

Expressions for estimator

{\hat{τ}}_{SSW}

(Set 1)

\frac{\sum_{i = 1}^{n_{c}} \sum_{j = 1}^{n_{i}} Y_{i j} A_{i} S_{i j} p_{i j}^{0} (X_{i}, C_{i}; \hat{γ})}{\sum_{i = 1}^{n_{c}} \sum_{j = 1}^{n_{i}} A_{i} S_{i j} p_{i j}^{0} (X_{i}, C_{i}; \hat{γ})} - \frac{\sum_{i = 1}^{n_{c}} \sum_{j = 1}^{n_{i}} Y_{i j} (1 - A_{i}) S_{i j} p_{i j}^{1} (X_{i}, C_{i}; \hat{γ})}{\sum_{i = 1}^{n_{c}} \sum_{j = 1}^{n_{i}} (1 - A_{i}) S_{i j} p_{i j}^{1} (X_{i}, C_{i}; \hat{γ})}

{\hat{τ}}_{PSW}

(Set 2)

\frac{\sum_{i = 1}^{n_{c}} \sum_{j = 1}^{n_{i}} \frac{Y_{i j} A_{i} S_{i j} p_{i j}^{0} (X_{i}, C_{i}; \hat{γ})}{p_{i j}^{1} (X_{i}, C_{i}; \hat{γ})}}{\sum_{i = 1}^{n_{c}} \sum_{j = 1}^{n_{i}} \frac{A_{i} S_{i j} p_{i j}^{0} (X_{i}, C_{i}; \hat{γ})}{p_{i j}^{1} (X_{i}, C_{i}; \hat{γ})}} - \frac{\sum_{i = 1}^{n_{c}} \sum_{j = 1}^{n_{i}} Y_{i j} (1 - A_{i}) S_{i j}}{\sum_{i = 1}^{n_{c}} \sum_{j = 1}^{n_{i}} (1 - A_{i}) S_{i j}}

SACE: survivor average causal effect; SSW: survival score weighting; PSW: principal score weighting.

According to the proposed identification assumptions, the marginal GLM model (ii), which assumes independence given observed covariates, is inherently mis-specified for predicting survival status. Only under the setting for which we are able to assume that there are absolutely no unobserved sources of within-cluster correlation of potential outcomes might this model be appropriate. Nonetheless, because cluster-indexed unbiased estimating equations are used to derive the GLM model parameters and SACE estimators, the asymptotic distribution of the SACE estimators have a sandwich variance, as demonstrated in Section 2.4 and in Supplemental Material 1.6, that can account for extra variation in the observed outcomes induced by clustering effects. Since the parameters of the survival status models themselves do not need to be interpreted, we juxtapose the small-sample performances of these two common modeling paradigms in the simulation study (Section 3) under data generation that explicitly includes a cluster-level random intercept as per the working model assumptions. Therefore, the marginal modeling strategy is considered in our simulations to elucidate the consequence of ignoring the latent cluster-level confounding under realistic levels of clustering observed in CRTs where intracluster correlation coefficients (ICCs) typically do not exceed 0.3.²⁵

2.4. Variance estimation

In this section, we approximate the variance of the weighting estimators, ${\hat{τ}}_{SSW}$ and ${\hat{τ}}_{PSW}$ , by establishing their asymptotic distributions through the theory of M-estimation.^17–19 Our motivation for using asymptotic variance expressions stems from the limitations of the cluster bootstrap despite its conceptual simplicity. Specifically, cluster bootstrapping generally requires re-sampling entire clusters with replacement from the original data set to preserve the unknown within-cluster correlation structure, so it tends to work better for a large number of clusters.²⁶ However, for even a moderate number of clusters, the computational time for the bootstrap approach with an acceptable number of replicates can be prohibitive since it requires fitting regression models repeatedly (such as generalized linear mixed model (GLMM)) to each bootstrapped data set. To this end, our proposed variance estimators serve as computationally efficient alternatives to re-sampling-based methods.

Recall we suppose that the available data ${D_{i}, S_{i}, Y_{i}}_{i = 1}^{n_{c}}$ are mutually independent by cluster randomization (where the observable elements of $Y_{i}$ are for observed survivors). Suppose that each $D_{i j}$ and thus $β$ have dimension $p$ . Our focus is on the space of parameters of the form $θ = (γ^{T}, μ (1), μ (0))^{T} = (β^{T}, σ_{b}^{2}, μ (1), μ (0))^{T}$ of dimension $(p + 3) \times 1$ since our target estimand $τ$ is simply the linear combination $k^{T} θ$ , where $k = (0^{T}, 0, 1, - 1)^{T}$ . For each $i$ , we define a function of observable data, $m (D_{i}, S_{i}, Y_{i}; θ) = m_{i} (θ)$ , which matches the dimension of $θ$ , such that $E_{θ} {m_{i} (θ)} = 0$ (for any $θ$ ). Then, for the included number of clusters in the study $n_{c}$ , the solution to $\frac{1}{n_{c}} \sum_{i = 1}^{n_{c}} m_{i} (θ) = 0$ is the M-estimator, $\hat{θ}$ . However, since the identification of $μ (1)$ and $μ (0)$ are distinct for the Set 1 and Set 2 assumptions, the corresponding unbiased estimating equations must themselves be distinct with functions, $m_{SSW, i} (θ)$ and $m_{PSW, i} (θ)$ , and solutions, ${\hat{θ}}_{SSW}$ and ${\hat{θ}}_{PSW}$ .

To obtain the unbiased estimating equations, we stack the score functions used to maximize the (marginalized) log-likelihood induced by the GLMM conditions, $l (S | D, β, σ_{b}^{2}) = \sum_{i = 1}^{n_{c}} l_{i} (S_{i} | D_{i}, β, σ_{b}^{2})$ (see Supplemental Material 1.4), which will be common to both the SSW and PSW estimators, and the functions that enable solving for ${\hat{μ}}_{(\cdot)} (1)$ and ${\hat{μ}}_{(\cdot)} (0)$ if the GLMM parameters, $γ$ , were known. The form of estimating functions corresponding to the Set 1 assumptions is

m_{SSW, i} (θ) = (\begin{matrix} \partial l_{i} (S_{i} | D_{i}, β, σ_{b}^{2}) / \partial β \\ \partial l_{i} (S_{i} | D_{i}, β, σ_{b}^{2}) / \partial σ_{b}^{2} \\ \sum_{j = 1}^{n_{i}} Y_{i j} A_{i} S_{i j} p_{i j}^{0} (X_{i}, C_{i}; γ) - μ (1) \sum_{j = 1}^{n_{i}} A_{i} S_{i j} p_{i j}^{0} (X_{i}, C_{i}; γ) \\ \sum_{j = 1}^{n_{i}} Y_{i j} (1 - A_{i}) S_{i j} p_{i j}^{1} (X_{i}, C_{i}; γ) - μ (0) \sum_{j = 1}^{n_{i}} (1 - A_{i}) S_{i j} p_{i j}^{1} (X_{i}, C_{i}; γ) \end{matrix})

(7)

and for the Set 2 assumptions

m_{PSW, i} (θ) = (\begin{matrix} \partial l_{i} (S_{i} | D_{i}, β, σ_{b}^{2}) / \partial β \\ \partial l_{i} (S_{i} | D_{i}, β, σ_{b}^{2}) / \partial σ_{b}^{2} \\ \sum_{j = 1}^{n_{i}} Y_{i j} A_{i} S_{i j} p_{i j}^{0} (X_{i}, C_{i}; γ) / p_{i j}^{1} (X_{i}, C_{i}; γ) - μ (1) \sum_{j = 1}^{n_{i}} A_{i} S_{i j} p_{i j}^{0} (X_{i}, C_{i}) / p_{i j}^{1} (X_{i}, C_{i}; γ) \\ \sum_{j = 1}^{n_{i}} Y_{i j} (1 - A_{i}) S_{i j} - μ (0) \sum_{j = 1}^{n_{i}} (1 - A_{i}) S_{i j} \end{matrix})

(8)

Using first-order Taylor approximations coupled with certain regularity conditions for these independent data, applications of Slutsky’s and continuous mapping theorems give us that

\sqrt{n_{c}} ({\hat{τ}}_{SSW} - τ_{0})

and

\sqrt{n_{c}} ({\hat{τ}}_{PSW} - τ_{0})

converge in distribution to

N (0, k^{T} V_{θ_{0}} k)

.¹⁹ Under correct specification,

τ_{0}

and

θ_{0}

are the true parameters, and

V_{θ_{0}}

is the true asymptotic variance as written out in Supplemental Material 1.3 (equation (20)). Note, we only subscript estimators with SSW and PSW and not parameters because parameters cannot be simultaneously identified under Set 1 and Set 2. The form of the estimator of the asymptotic variance, often termed a cluster-robust sandwich variance estimator, is

{\hat{V}}_{{\hat{θ}}_{(\cdot)}} = {[\frac{1}{n_{c}} \sum_{i = 1}^{n_{c}} {\frac{\partial m_{(\cdot), i} ({\hat{θ}}_{(\cdot)})}{\partial θ^{T}}}]}^{- 1} \frac{1}{n_{c}} \sum_{i = 1}^{n_{c}} m_{(\cdot), i} ({\hat{θ}}_{(\cdot)}) m_{(\cdot), i} ({\hat{θ}}_{(\cdot)})^{T} {[\frac{1}{n_{c}} \sum_{i = 1}^{n_{c}} {\frac{\partial m_{(\cdot), i} ({\hat{θ}}_{(\cdot)})}{\partial θ^{T}}}]}^{- T}

(9)

The regularity conditions under which

{\hat{V}}_{{\hat{θ}}_{(\cdot)}}

is consistent for

V_{θ_{0}}

are posited by Iverson and Randles.²⁷ The full expressions for the estimating functions (including the derivative of the log-likelihood) and a more thorough sketch of the asymptotic arguments are in Supplemental Material 1.3 and 1.4.

Now, we must determine the components of the sandwich variance for each estimator. We denote the middle matrices with $M_{SSW} (θ) = \sum_{i = 1}^{n_{c}} m_{SSW, i} (θ) m_{SSW, i} (θ)^{T}$ and $M_{PSW} (θ) = \sum_{i = 1}^{n_{c}} m_{PSW, i} (θ) m_{PSW, i} (θ)^{T}$ . The estimating functions that are required to find the middle matrices are already written above. We must also find the outer matrices, which we will denote by $B_{SSW} (θ) = \sum_{i = i}^{n_{c}} \partial m_{SSW, i} (θ) / \partial θ^{T}$ and $B_{PSW} (θ) = \sum_{i = i}^{n_{c}} \partial m_{PSW, i} (θ) / \partial θ^{T}$ . The general form of the outer matrices, $B_{(\cdot)} (θ)$ , is

\begin{aligned} (\begin{matrix} \frac{\partial^{2} l (S | D, β, σ_{b}^{2})}{\partial β^{T} \partial β} & \frac{\partial^{2} l (S | D, β, σ_{b}^{2})}{\partial σ_{b}^{2} \partial β} & \frac{\partial^{2} l (S | D, β, σ_{b}^{2})}{\partial μ (1) \partial β} & \frac{\partial^{2} l (S | D, β, σ_{b}^{2})}{\partial μ (0) \partial β} \\ \frac{\partial^{2} l (S | D, β, σ_{b}^{2})}{\partial β^{T} \partial σ_{b}^{2}} & \frac{\partial^{2} l (S | D, β, σ_{b}^{2})}{\partial (σ_{b}^{2})^{2}} & \frac{\partial^{2} l (S | D, β, σ_{b}^{2})}{\partial μ (1) \partial σ_{b}^{2}} & \frac{\partial^{2} l (S | D, β, σ_{b}^{2})}{\partial μ (0) \partial σ_{b}^{2}} \\ \sum_{i = 1}^{n_{c}} \frac{\partial m_{(\cdot), i, p + 2} (θ)}{\partial β^{T}} & \sum_{i = 1}^{n_{c}} \frac{\partial m_{(\cdot), i, p + 2} (θ)}{\partial σ_{b}^{2}} & \sum_{i = 1}^{n_{c}} \frac{\partial m_{(\cdot), i, p + 2} (θ)}{\partial μ (1)} & \sum_{i = 1}^{n_{c}} \frac{\partial m_{(\cdot), i, p + 2} (θ)}{\partial μ (0)} \\ \sum_{i = 1}^{n_{c}} \frac{\partial m_{(\cdot), i, p + 3} (θ)}{\partial β^{T}} & \sum_{i = 1}^{n_{c}} \frac{\partial m_{(\cdot), i, p + 3} (θ)}{\partial σ_{b}^{2}} & \sum_{i = 1}^{n_{c}} \frac{\partial m_{(\cdot), i, p + 3} (θ)}{\partial μ (1)} & \sum_{i = 1}^{n_{c}} \frac{\partial m_{(\cdot), i, p + 3} (θ)}{\partial μ (0)} \end{matrix}) \end{aligned}

(10)

where the third index of

m_{(\cdot), i, k} (θ)

represents the

k

-th component of the

i

-th vector function. Due to the nature of the stacked estimating equations, for which the score equations are solved independently, we have matrix structures

B_{SSW} (θ)

and

B_{PSW} (θ)

that are relatively sparse. However, they have different forms since

μ (0)

under the Set 2 assumptions is not identified with a parameter that involves a model for survival events. For the SSW estimator, we have outer matrix:

B_{SSW} (θ) = (\begin{matrix} B_{11}^{p \times p} & B_{12}^{p \times 1} & 0^{p \times 1} & 0^{p \times 1} \\ B_{21}^{1 \times p} & B_{22}^{1 \times 1} & 0^{1 \times 1} & 0^{1 \times 1} \\ B_{SSW, 31}^{1 \times p} & 0^{1 \times 1} & B_{SSW, 33}^{1 \times 1} & 0^{1 \times 1} \\ B_{SSW, 41}^{1 \times p} & 0^{1 \times 1} & 0^{1 \times 1} & B_{SSW, 44}^{1 \times 1} \end{matrix})

(11)

for the PSW estimator,

\begin{aligned} B_{PSW} (θ) = (\begin{matrix} B_{11}^{p \times p} & B_{12}^{p \times 1} & 0^{p \times 1} & 0^{p \times 1} \\ B_{21}^{1 \times p} & B_{22}^{1 \times 1} & 0^{1 \times 1} & 0^{1 \times 1} \\ B_{PSW, 31}^{1 \times p} & 0^{1 \times 1} & B_{PSW, 33}^{1 \times 1} & 0^{1 \times 1} \\ 0^{1 \times p} & 0^{1 \times 1} & 0^{1 \times 1} & B_{PSW, 44}^{1 \times 1} \end{matrix}) \end{aligned}

(12)

where the superscripts denote the dimension of the submatrices (

θ

on the RHS is suppressed for notation simplicity). Using the estimator of the asymptotic variance as in equation (9), we can approximate the variances for the SACE estimators,

{\hat{τ}}_{SSW}

and

{\hat{τ}}_{PSW}

, with

k^{T} B_{SSW} ({\hat{θ}}_{SSW})^{- 1} M_{SSW} ({\hat{θ}}_{SSW}) B_{SSW} ({\hat{θ}}_{SSW})^{- T} k

and

k^{T} B_{PSW} ({\hat{θ}}_{PSW})^{- 1} M_{PSW} ({\hat{θ}}_{PSW}) B_{PSW} ({\hat{θ}}_{PSW})^{- T} k

, respectively. These expressions assume that

B_{SSW} ({\hat{θ}}_{SSW})

and

B_{PSW} ({\hat{θ}}_{PSW})

are non-singular, which will generally be the case for a sufficiently large sample of clusters. Each non-zero entry of

B_{SSW} (θ)

and

B_{PSW} (θ)

is fully written out in Supplemental Material 1.4. Estimates of the GLMM parameters can be obtained from standard statistical software. However, both the outer and middle matrices involve integrals, as a byproduct of marginalizing over the random intercept in the likelihood, that are not solvable analytically. Computing these integrals numerically is not a trivial exercise as shown in prior work.²⁸ Since the entries in the matrices are complex (particularly in the outer matrix), just keeping track of the integral expressions and ensuring that they are numerically integrable is a cumbersome task. After accounting for these integrals, we employ a numerical integration technique that is well-suited to integrals representing Gaussian expectations. In the Supplemental Material 1.5, we present all integrals involved in variance estimation as well as outline our choice and usage of adaptive Gauss-Hermite quadrature (AGQ)²⁹ for the required numerical integration. Despite the necessity of said numerical methods, for the cases in our simulation study where the non-parametric cluster bootstrap is viable, the variance estimates using the asymptotic distributions are not meaningfully different than those using the bootstrap but the computation times are reduced substantially (for reference, see Table S4 of Supplemental Material 2).

The process for determining asymptotic distributions under the GLMM for survival status requires a similar formulation to the above, but the random intercept terms and row for $σ_{b}^{2}$ are removed from the estimating functions as outlined in Supplemental Material 1.6; accordingly, no numerical integration is required for estimation. Therefore, for clarity of notation, we will henceforth distinguish the SACE estimated using the GLM from the GLMM by using parameter estimators denoted with a tilde, which so far include $\tilde{β}$ , ${\tilde{θ}}_{SSW}$ , ${\tilde{μ}}_{SSW} (a)$ , ${\tilde{τ}}_{SSW}$ , ${\tilde{θ}}_{PSW}$ , ${\tilde{μ}}_{PSW} (a)$ , and ${\tilde{τ}}_{PSW}$ for $a = 0, 1$ . We note that Table 1 omits this notational distinction for continuity and succinctness. Since the parameter estimates for SACE are conceived as solutions to estimating equations that sum with respect to the cluster index, we obtain a sandwich variance expression that accounts for clustering when using the GLM. More specifically, the analogous sub-matrix of $B_{(\cdot)} ({\tilde{θ}}_{(\cdot)})^{- 1} M_{(\cdot)} ({\tilde{θ}}_{(\cdot)}) B_{(\cdot)} ({\tilde{θ}}_{(\cdot)})^{- T}$ corresponding to the estimated variance matrix for the model coefficients is precisely the sandwich variance estimate that we would achieve by solving a GEE³⁰ with mean and variance models as in logistic regression with a working correlation matrix for independence (Supplemental Material 1.6). In terms of interpretation and estimation, solving GEEs (marginal) are often compared to and viewed as a standard alternative to fitting GLMMs (conditional).^13,31 Although in this setting the parameters of the survival status models are not of interest, we emphasize that marginal and conditional logistic regression models are separate conceptual frameworks, whose parameters do not necessarily coincide (due to a lack of collapsibility).³² As it pertains to inference, these two approaches may also perform differently with CRTs for a binary response depending on the study conditions, for example, the number of clusters and the strength of the within-cluster correlation,³³ and we compare the effect of these modeling choices for SACE estimation across various scenarios in the simulation study in Section 3.

There are well-documented issues with small-sample inference associated with sandwich variance estimators, either from marginal or conditional modeling, for the analysis of CRTs. These estimators tend to underestimate the true variance in small samples and lead to anticonservative inference.^34,35 As such, we suggest employing a bias-corrected version of the proposed sandwich variance estimators (whether or not conditioning on random effects). For our simulation study and data application, we use a simple degrees of freedom correction,³⁶ multiplying the variance terms by $n_{c} / (n_{c} - # params)$ , where “params” refers to all survival status model parameters as well as $μ (1)$ and $μ (0)$ ; note, for the GLMM, we have one additional parameter for the random effect variance. While this proved sufficient for achieving near nominal coverage in our simulations and thus suitable to our motivating application, alternative corrections to the sandwich variance estimates may warrant future research.

Lastly, it may be that for the fitted GLMM, $σ_{b}^{2}$ is estimated to be 0 when using statistical software. This estimate arises due to the fact that convergence to a maximum is restricted by the boundary condition on the parameter space, where the variance of the random intercept must be at least 0; for further treatment of this phenomenon including asymptotics, see Self and Liang.³⁷ Therefore, constrained optimization algorithms will permit estimates of 0 for the variance, which is the default for the prevailing GLMM fitting package lme4 in R.³⁸ When this phenomenon occurs, it produces (virtually) the same coefficient estimates as the GLM, and thus can be constructed as the solution to the corresponding estimating equations (i.e. those which drop the random intercept); nonetheless, we treat it as a separate case from the marginal model approach because we penalize for the extra variance parameter. There are options to explicitly avoid a zero variance estimate such as changing the optimizer or putting a prior on the variance parameter that pulls it away from 0,³⁹ but we adopt the most standard application of the existing R software allowing for a zero variance component estimate.

3. Simulation study

In our simulation study, we assess the performance of the SACE estimators under different data generating mechanisms. We follow the ADEMP (Aims, Data-generating mechanisms, Methods, Estimands, Performance measures) schema defined by Morris et al.⁴⁰ to describe our simulation study. There are two primary goals for these simulations. The first goal is to evaluate the two weighting SACE estimators when the data generation adheres to either the Set 1 assumptions or the Set 2 assumptions in terms of their empirical bias and coverage (see Section 3.2 for more details). We recall that these are technically mutually exclusive data generating mechanisms because S1A4, conditional survival independence, and S2A4, the deterministic requirement of survival monotonicity, cannot exist simultaneously. Because of S1A4, it is natural to define a data generating mechanism that adheres to the Set 1 assumptions but operates on the survival terms separately. For the Set 2 assumptions, we follow the approach of Jiang et al.⁴¹ by imposing stochastic monotonicity,⁴² where we make the probability of survival under treatment $a = 1$ greater than under treatment $a = 0$ conditional on covariates. For our simulation, we increase the magnitude of the treatment effect on survival substantially so as to drive the empirical incidence of harmed patients to near 0. Of note, it is also possible to simulate data which achieves deterministic survival monotonicity and maintains the target marginal survival status models by specifying models for $S_{i j} (1)$ given values of $S_{i j} (0)$ .⁴³ This approach is also equivalent to an ordinal representation of the principal strata model, where stratum membership is assigned by thresholding a latent continuous response variable with cut-points that represent the treatment effect on survival, but such ordering of stratum membership is not intrinsic. The results for this ordinal approach as presented in Table S3 (Supplemental Material 2) are consistent relative to the other data generating mechanisms, so we save this exploration for Supplemental Material 1.7.

The second aim is to evaluate how well our SACE estimators handle latent clustering effects, represented by unobserved cluster-level random intercepts in the data generating process for the mortality and non-mortality outcomes. To this end, we compare the performance of the estimators with GLMM survival status modeling, ${\hat{τ}}_{SSW}$ and ${\hat{τ}}_{PSW}$ , to those with GLM survival status modeling (with cluster-robust variance), ${\tilde{τ}}_{SSW}$ and ${\tilde{τ}}_{PSW}$ . Variance expressions and confidence $z$ -intervals are computed based on the asymptotic distributions discussed in Section 2 and in Supplemental Material 1.

3.1. Data generating process

We consider scenarios for which we choose the number of clusters $n_{c} \in {30, 60, 90}$ . The number of individuals in each cluster is drawn from a discrete uniform distribution, $n_{i} \sim Unif {25, 50}$ . We generate two individual-level covariates for each cluster, $X_{i, 1} \sim {MVN}_{n_{i}} (2, 0.5 I_{n_{i}})$ and $X_{i, 2} \sim {MVN}_{n_{i}} (0.5, 0.25 I_{n_{i}})$ and one cluster-level covariate, $C_{i, 1} \sim Bern (0.3)$ . We incorporate latent cluster-level effects via the variables $b_{i}$ and $b_{i}^{*}$ for survival status and the non-mortal outcome respectively, where we suppose $b_{i}^{*} \sim N (0, 1 / 9)$ and $b_{i} = ξ b_{i}^{*}$ ; given $ξ > 0$ , we see that this setup induces a source of unobservable correlation between individuals’ non-mortal and survival events within each cluster. These variables differ from the observed covariates because we only propose their distributional form and relationship to other variables, but their actual values are unknown to us. For $a = 0, 1$ , we generate survival status and non-mortal outcomes according to:

\begin{aligned} S_{i j} (a) | X_{i j}, C_{i, 1}, b_{i} \sim Bern (expit {0.75 + δ a + 0.1 X_{i j, 1} - 0.05 X_{i j, 2} + 0.1 C_{i, 1} + b_{i}}) \end{aligned}

(13)

\begin{aligned} Y_{i j} (a) | S_{i j} (a) = 1, X_{i j}, C_{i, 1}, b_{i}^{*} \sim N ((a + 1) (1 + 0.25 X_{i j, 1} + 0.125 X_{i j, 2}) + b_{i}^{*}, 1) \end{aligned}

(14)

For treatment effect, we consider

δ \in {0, \log (1.25), \log (5)}

. The values of

0

and

\log (1.25)

represent Set 1 data generation and

\log (5)

represents Set 2 data generation because the treatment effect is large enough to induce near empirical monotonicity.

We quantify the strength of the latent clustering effects for each data generating mechanism via the ICC. For the continuous non-mortal outcomes, the ICC or equivalently the correlation among individuals within the same cluster is given by $ρ = σ_{b^{*}}^{2} / (σ_{b^{*}}^{2} + σ_{ϵ}^{2})$ , which is set equal to $0.1$ . We define the ICC for survival as $λ = σ_{b}^{2} / (σ_{b}^{2} + π^{2} / 3)$ . This standard representation for ICC for mixed effects logistic regression is motivated via a latent variable model, which creates a dichotomous outcome by thresholding an underlying continuous variable with a standard logistic distribution.^44,45 We briefly show how it is derived as a note in Supplemental Material 1.7. This definition of the ICC for GLMMs then allows for an analogous interpretation to the ICC for linear mixed models (LMMs), except that it is defined on the latent or log-odds scale. Since $b_{i} \sim N (0, ξ^{2} σ_{b^{*}}^{2})$ , we choose values of $ξ$ such that $λ = ξ^{2} σ_{b^{*}}^{2} / (ξ^{2} σ_{b^{*}}^{2} + π^{2} / 3) \in {0.3, 0.1}$ .

For finding the true parameters, we generate the data using a large number of clusters, $n_{c} = 1000$ , as if to represent the population of clusters. We define the true SACE estimand within each simulation as

\frac{\sum_{i = 1}^{1000} \sum_{j = 1}^{n_{i}} Y_{i j} (1) S_{i j} (0) S_{i j} (1)}{\sum_{i = 1}^{1000} \sum_{j = 1}^{n_{i}} S_{i j} (0) S_{i j} (1)} - \frac{\sum_{i = 1}^{1000} \sum_{j = 1}^{n_{i}} Y_{i j} (0) S_{i j} (0) S_{i j} (1)}{\sum_{i = 1}^{1000} \sum_{j = 1}^{n_{i}} S_{i j} (0) S_{i j} (1)}

(15)

For the results presented in the next section, the percentage of always-survivors under

n_{c} = 1000

ranges from

51 %

65 %

, where larger values of

δ

and smaller

λ

induce higher rates of this subgroup.

For each data generating mechanism, the observed data are obtained after random allocation of clusters to treatment where $A_{i} \sim Bern (0.5)$ . For example, if cluster $i$ is assigned to treatment 1, the observed data are ${Y_{i} (1), S_{i} (1), X_{i}, C_{i, 1}}$ with cluster size $n_{i}$ . In the next section, we report the performance of SACE estimates on these observed data under the different data generation assumptions and modeling techniques as described above. The results for the additional values of $δ = - \log (1.25)$ and $λ \in {0.15, 0.05}$ as well as the scenarios for deterministic monotonicity are presented in Supplemental Material 2.

3.2. Simulation results

We include four measures to evaluate the performance of the estimators under the various simulation scenarios: empirical bias, empirical variance, average of the estimated variance, and coverage of the $95 %$ confidence interval. The empirical or Monte Carlo variance is included in parentheses as a benchmark for the average of the estimated variance. Results for 1000 simulations are presented in Table 2, where the bias and variance metrics are scaled by 100.

Table 2.
Comparison of average point estimates and variance estimates as well as performance measures of bias and coverage ( $%$ ) for nominally $95 %$ $z$ -intervals of the SSW and PSW estimators.

Scenarios Estimates Model variance $^{b}$ (EV $^{b}$ ) Bias $^{b}$ $%$ Coverage

$δ$ $n_{c}$ RE SSW PSW SSW PSW SSW PSW SSW PSW

$λ = 0.1$

0.0 30 T 1.567 1.564 2.7 (2.4) 2.6 (2.2) −0.1 −0.4 95.1 95.8

F 1.567 1.564 2.5 (2.1) 2.5 (2.1) −0.1 −0.4 96.1 95.9

60 T 1.564 1.562 1.2 (1.2) 1.1 (1.1) −0.4 −0.6 93.1 93.9

F 1.566 1.563 1.1 (1.1) 1.1 (1.1) −0.2 −0.5 94.4 94.7

90 T 1.566 1.563 0.7 (0.7) 0.7 (0.7) −0.2 −0.5 94.9 95.2

F 1.566 1.563 0.7 (0.6) 0.7 (0.6) −0.2 −0.5 95.4 95.3

0.2 30 T 1.564 1.562 2.6 (2.3) 2.6 (2.2) −0.4 −0.6 95.4 96.1

F 1.560 1.558 2.5 (2.1) 2.5 (2.1) −0.7 −1.0 95.8 95.9

60 T 1.561 1.560 1.1 (1.2) 1.1 (1.1) −0.7 −0.7 93.7 94.1

F 1.558 1.555 1.1 (1.1) 1.1 (1.1) −0.9 −1.2 94.7 94.8

90 T 1.562 1.560 0.7 (0.7) 0.7 (0.7) −0.5 −0.7 95.2 94.6

F 1.558 1.555 0.7 (0.6) 0.7 (0.6) −1.0 −1.2 94.7 94.8

1.6 $^{a}$ 30 T 1.541 1.543 2.7 (2.1) 2.6 (2.1) −2.5 −2.4 95.6 96.2

F 1.528 1.528 2.4 (2.1) 2.4 (2.0) −3.8 −3.8 95.0 95.2

60 T 1.541 1.543 1.2 (1.1) 1.1 (1.1) −2.5 −2.3 94.1 94.6

F 1.526 1.525 1.1 (1.0) 1.1 (1.0) −4.0 −4.1 93.1 92.8

90 T 1.542 1.543 0.7 (0.7) 0.7 (0.7) −2.4 −2.3 94.8 95.1

F 1.526 1.525 0.7 (0.6) 0.7 (0.6) −4.0 −4.1 93.0 93.0

$λ = 0.3$

0.0 30 T 1.567 1.565 2.5 (2.7) 2.5 (2.5) 0.0 −0.2 92.5 94.0

F 1.568 1.565 2.4 (2.1) 2.3 (2.1) 0.1 −0.2 94.9 94.6

60 T 1.563 1.562 1.1 (1.3) 1.1 (1.2) −0.4 −0.5 91.2 92.5

F 1.565 1.563 1.0 (1.0) 1.0 (1.0) −0.1 −0.4 94.9 94.1

90 T 1.565 1.562 0.7 (0.8) 0.7 (0.7) −0.2 −0.5 92.7 93.7

F 1.566 1.563 0.7 (0.6) 0.7 (0.6) −0.1 −0.4 95.2 95.1

0.2 30 T 1.564 1.563 2.5 (2.7) 2.4 (2.4) −0.3 −0.4 92.4 93.9

F 1.557 1.555 2.3 (2.1) 2.3 (2.0) −1.0 −1.2 95.0 94.9

60 T 1.561 1.560 1.1 (1.3) 1.1 (1.2) −0.6 −0.6 91.8 92.4

F 1.555 1.552 1.0 (1.0) 1.0 (1.0) −1.2 −1.5 94.4 94.4

90 T 1.563 1.561 0.7 (0.8) 0.7 (0.7) −0.4 −0.6 92.3 93.4

F 1.555 1.552 0.7 (0.6) 0.7 (0.6) −1.2 −1.5 95.2 94.9

1.6 $^{a}$ 30 T 1.549 1.549 2.5 (2.3) 2.5 (2.2) −1.7 −1.7 94.2 94.5

F 1.508 1.507 2.3 (1.9) 2.3 (1.9) −5.9 −5.9 93.6 93.6

60 T 1.546 1.547 1.1 (1.2) 1.1 (1.1) −2.0 −1.8 92.8 93.1

F 1.503 1.502 1.0 (1.0) 1.0 (1.0) −6.3 −6.3 89.5 89.5

90 T 1.549 1.549 0.7 (0.7) 0.7 (0.7) −1.7 −1.6 93.5 93.6

F 1.504 1.503 0.7 (0.6) 0.6 (0.6) −6.2 −6.3 88.2 87.9

Scenarios	Estimates	Model variance $^{b}$ (EV $^{b}$ )	Bias $^{b}$	$%$ Coverage
$λ = 0.1$
0.0	30	T	1.567	1.564	2.7 (2.4)	2.6 (2.2)	−0.1	−0.4	95.1	95.8
		F	1.567	1.564	2.5 (2.1)	2.5 (2.1)	−0.1	−0.4	96.1	95.9
	60	T	1.564	1.562	1.2 (1.2)	1.1 (1.1)	−0.4	−0.6	93.1	93.9
		F	1.566	1.563	1.1 (1.1)	1.1 (1.1)	−0.2	−0.5	94.4	94.7
	90	T	1.566	1.563	0.7 (0.7)	0.7 (0.7)	−0.2	−0.5	94.9	95.2
		F	1.566	1.563	0.7 (0.6)	0.7 (0.6)	−0.2	−0.5	95.4	95.3
0.2	30	T	1.564	1.562	2.6 (2.3)	2.6 (2.2)	−0.4	−0.6	95.4	96.1
		F	1.560	1.558	2.5 (2.1)	2.5 (2.1)	−0.7	−1.0	95.8	95.9
	60	T	1.561	1.560	1.1 (1.2)	1.1 (1.1)	−0.7	−0.7	93.7	94.1
		F	1.558	1.555	1.1 (1.1)	1.1 (1.1)	−0.9	−1.2	94.7	94.8
	90	T	1.562	1.560	0.7 (0.7)	0.7 (0.7)	−0.5	−0.7	95.2	94.6
		F	1.558	1.555	0.7 (0.6)	0.7 (0.6)	−1.0	−1.2	94.7	94.8
1.6 $^{a}$	30	T	1.541	1.543	2.7 (2.1)	2.6 (2.1)	−2.5	−2.4	95.6	96.2
		F	1.528	1.528	2.4 (2.1)	2.4 (2.0)	−3.8	−3.8	95.0	95.2
	60	T	1.541	1.543	1.2 (1.1)	1.1 (1.1)	−2.5	−2.3	94.1	94.6
		F	1.526	1.525	1.1 (1.0)	1.1 (1.0)	−4.0	−4.1	93.1	92.8
	90	T	1.542	1.543	0.7 (0.7)	0.7 (0.7)	−2.4	−2.3	94.8	95.1
		F	1.526	1.525	0.7 (0.6)	0.7 (0.6)	−4.0	−4.1	93.0	93.0
$λ = 0.3$
0.0	30	T	1.567	1.565	2.5 (2.7)	2.5 (2.5)	0.0	−0.2	92.5	94.0
		F	1.568	1.565	2.4 (2.1)	2.3 (2.1)	0.1	−0.2	94.9	94.6
	60	T	1.563	1.562	1.1 (1.3)	1.1 (1.2)	−0.4	−0.5	91.2	92.5
		F	1.565	1.563	1.0 (1.0)	1.0 (1.0)	−0.1	−0.4	94.9	94.1
	90	T	1.565	1.562	0.7 (0.8)	0.7 (0.7)	−0.2	−0.5	92.7	93.7
		F	1.566	1.563	0.7 (0.6)	0.7 (0.6)	−0.1	−0.4	95.2	95.1
0.2	30	T	1.564	1.563	2.5 (2.7)	2.4 (2.4)	−0.3	−0.4	92.4	93.9
		F	1.557	1.555	2.3 (2.1)	2.3 (2.0)	−1.0	−1.2	95.0	94.9
	60	T	1.561	1.560	1.1 (1.3)	1.1 (1.2)	−0.6	−0.6	91.8	92.4
		F	1.555	1.552	1.0 (1.0)	1.0 (1.0)	−1.2	−1.5	94.4	94.4
	90	T	1.563	1.561	0.7 (0.8)	0.7 (0.7)	−0.4	−0.6	92.3	93.4
		F	1.555	1.552	0.7 (0.6)	0.7 (0.6)	−1.2	−1.5	95.2	94.9
1.6 $^{a}$	30	T	1.549	1.549	2.5 (2.3)	2.5 (2.2)	−1.7	−1.7	94.2	94.5
		F	1.508	1.507	2.3 (1.9)	2.3 (1.9)	−5.9	−5.9	93.6	93.6
	60	T	1.546	1.547	1.1 (1.2)	1.1 (1.1)	−2.0	−1.8	92.8	93.1
		F	1.503	1.502	1.0 (1.0)	1.0 (1.0)	−6.3	−6.3	89.5	89.5
	90	T	1.549	1.549	0.7 (0.7)	0.7 (0.7)	−1.7	−1.6	93.5	93.6
		F	1.504	1.503	0.7 (0.6)	0.6 (0.6)	−6.2	−6.3	88.2	87.9

Note: Changes to cluster size $n_{c}$ , treatment effects $δ$ , survival ICC $λ$ , and modeling choices, GLMM (RE = T) versus GLM (RE = F), are considered. Model-based variances are computed using a degrees of freedom correction and are compared to the empirical variance (EV) in parentheses. 1000 simulations are run according to the proposed data generating mechanisms.

RE: random effect; SSW: survival score weighting; PSW: principal score weighting; GLMM: generalized linear mixed model; GLM: generalized linear model.

$^{a}$ indicates the setting of empirical monotonicity.

$^{b}$ indicates the value is scaled up by $100$ .

Overall, the simulation study reveals that both of the proposed SACE estimators, where the mortality outcome is modeled using mixed effects logistic regression, exhibit low bias across all data generating scenarios. The $95 %$ $z$ -intervals constructed using the asymptotic variance expressions with a degrees of freedom adjustment in general get close to nominal rates. In summary, regardless of the assumptions underlying the data generating mechanism, ${\hat{τ}}_{SSW}$ and ${\hat{τ}}_{PSW}$ tend to perform similarly and well, where there may be a slight advantage to the latter estimator. This empirically shows that ${\hat{τ}}_{SSW}$ and ${\hat{τ}}_{PSW}$ are relatively robust to certain departures from their requisite identification assumptions in CRT settings, that is, when the identification assumptions underlying the other estimator hold and when data generating process mimics realistic CRTs. Performance instead appears to be most affected by modeling choice, where we see analogous patterns for the SSW and PSW estimators.

The results demonstrate that there is usually benefit relative to bias for explicitly accounting for clustering effects in modeling the mortality event via a GLMM as opposed to a GLM. The differences in their bias are relatively negligible in most scenarios, but the bias of ${\tilde{τ}}_{SSW}$ and ${\tilde{τ}}_{PSW}$ becomes more pronounced when the treatment effect on survival status further deviates from zero. In general, the fixed effects parameters in marginal logistic models tend to be attenuated towards zero,⁴⁶ which accords with the opposite sign bias we see with the GLM when there are larger treatment effects. For the most part, when there are small treatment effects, the coverage rates are comparable across these modeling techniques, and they all are near to the nominal $95 %$ rate. However, when the value of $λ$ is higher, such as 0.3 (and occasionally 0.15), the constructed confidence intervals for ${\hat{τ}}_{SSW}$ and ${\hat{τ}}_{PSW}$ tend to suffer from undercoverage as compared ${\tilde{τ}}_{SSW}$ and ${\tilde{τ}}_{PSW}$ even though the variance estimates of the former are higher on average (regardless of small-sample corrections). Discrepancies here might be explained by the inferential trade-offs of using a marginal model in lieu of a conditional model in the presence of certain values of ICC.^13,33,47,48 Given our observations in the small treatment effect scenarios, opting for ${\tilde{τ}}_{SSW}$ and ${\tilde{τ}}_{PSW}$ may suffice even if they cost some in terms of bias. However, the story for inference is different when the treatment effect on survival is large, particularly for higher values of $λ$ . The interval estimates for ${\hat{τ}}_{SSW}$ and ${\hat{τ}}_{PSW}$ achieve closer to nominal coverage as opposed to ${\tilde{τ}}_{SSW}$ and ${\tilde{τ}}_{PSW}$ , which demonstrate noticeable undercoverage. In our simulations, all of the average estimated variances are close to their empirical counterparts signifying that our variance estimators perform adequately in finite samples, and bias, particularly for the GLM, is mostly driving the coverage results. The results for additional treatment effects and survival ICC are shown in Tables S1 to S3 of Supplemental Material 2 and in general are consistent with those presented above.

In selected scenarios, we have also examined the performance of the cluster bootstrap for generating variance estimates and confidence intervals, presented in Table S4 of Supplemental Material 2. We observe that the results are similar (if not slightly worse in terms of coverage) to the proposed variance estimators. However, the computation time required by our variance estimators is on average reduced by more than 10-fold for GLMM modeling and more than 60-fold for GLM modeling relative to the corresponding non-parametric bootstrap methods with 250 replicates.

4. Application to the RESTORE trial

We apply the weighting methods to estimate SACE in the Randomized Evaluation of Sedation Titration for Respiratory Failure (RESTORE) trial, a CRT investigating the effect of protocolized sedation on mechanical ventilation for children with acute respiratory failure, first analyzed by Curley et al.⁴⁹ In this study, 31 pediatric intensive care units (PICUs) were randomized in parallel to one of two treatment arms pertaining to sedation practices. Following the necessary training, intervention PICUs (17 sites) managed sedation according to a goal-directed nurse-implemented comfort algorithm informed by illness trajectory, pain and arousal assessments, extubation readiness testing, sedation reevaluation every 8 hours, and weaning status. Control PICUs (14 sites) administered sedation per their usual care and in the absence of a protocol. The cluster sizes range from 12 to 272 with a median size of 57, and there are a total of 2449 children with 1225 in intervention and 1224 in control. There are 110 (4.5 $%$ ) observed deaths of which 47 (3.8 $%$ ) are in the intervention group and 63 (5.1 $%$ ) are in the control group.

In our analysis of RESTORE, the non-mortal outcome of interest is duration in days on a mechanical ventilator for a 28-day period. Duration on a ventilator not only has potential implications for the length of convalescence and overall health and well-being of a patient but may also impact resource availability in healthcare systems.⁵⁰ Patients who do not reach their follow-up due to ICU-death within a 28-day period are considered to have their outcome truncated because their duration on mechanical ventilation, had they in fact survived on their assigned treatment, cannot be fully defined. Restricting only to those individuals observed to have complete duration measurements could introduce survivor bias, but the SACE is a valid causal estimand in this context. Moreover, because our weighting methods impose no distributional assumption on the non-mortal outcome, issues about model mis-specification for this length-of-stay type outcome⁵¹ over a specific time frame are avoided. This approach, however, does not explicitly allow for simultaneous SACE estimation over a continuous time domain (although it can be repeated at each restriction time), and we also assume, given strict oversight of care for critically-ill children, that if neither extubation occurs nor 28 days is reached, there are no sources of incomplete measurements outside of death, such as dropout or human error. Handling these additional considerations would necessitate a continuous-time principal stratification approach in CRTs as discussed briefly in Section 5.

We apply our weighting estimators of SACE to the RESTORE data, where the survival status models adjust for variables suggested in the initial analysis.⁴⁹ The covariates are age in years, Pediatric Risk of Mortality (PRISM) III-12 score (on log-scale), an aggregated measure of physiological status,⁵² and Pediatric Overall Performance Category (POPC) > 1, an assessment of functional morbidity and cognitive impairment,⁵³ and for the GLMM, there is a random intercept for PICU facility. The SACE estimates where survival status is modeled using a mixed effects logistic regression model are ${\hat{τ}}_{SSW} = - 0.152 (- 1.657, 1.353)$ and ${\hat{τ}}_{PSW} = - 0.152 (- 1.652, 1.349)$ . The SACE estimates where survival status is modeled using a logistic regression model are similar ${\tilde{τ}}_{SSW} = - 0.148 (- 1.621, 1.325)$ and ${\tilde{τ}}_{PSW} = - 0.151 (- 1.621, 1.319)$ . All standard errors are computed using the proposed sandwich variance estimators with a degrees of freedom correction. As displayed in Table 3, differences between the sandwich variance estimates (uncorrected) are negligible, which is expected because the estimated ICC for survival is small, only $0.026$ . The point estimates suggest that the protocolized sedation reduces duration on mechanical ventilation among always-survivors on average by $\sim$ 3.6 hours, but this effect is not statistically significant at the 0.05 level.

Table 3.
Treatment effect estimates of sedation protocol on mechanical ventilation duration at 28 days in RESTORE data.

(a) SACE estimators (b) Coefficient estimators, ${\hat{β}}_{trt}$ , among observed survivors

Name Estimate $(\hat{Var})$ Model name Estimate $(\hat{Var})$

${\hat{τ}}_{SSW}$ −0.152 (0.437) Unadjusted linear GEE −0.149 (0.435)

${\hat{τ}}_{PSW}$ −0.152 (0.435) Unadjusted LMM −0.465 (0.422)

${\tilde{τ}}_{SSW}$ −0.148 (0.437) Adjusted linear GEE −0.103 (0.399)

${\tilde{τ}}_{PSW}$ −0.151 (0.435) Adjusted LMM −0.440 (0.428)

(a) SACE estimators
${\hat{τ}}_{SSW}$	−0.152 (0.437)	Unadjusted linear GEE	−0.149 (0.435)
${\hat{τ}}_{PSW}$	−0.152 (0.435)	Unadjusted LMM	−0.465 (0.422)
${\tilde{τ}}_{SSW}$	−0.148 (0.437)	Adjusted linear GEE	−0.103 (0.399)
${\tilde{τ}}_{PSW}$	−0.151 (0.435)	Adjusted LMM	−0.440 (0.428)

Note: The proposed SACE and variance estimates based on differing assumptions and modeling techniques are provided in subtable (a). Coefficients, ${\hat{β}}_{trt}$ , and variance estimates from standard multilevel models fit on patients who were observed to survive are included in subtable (b). No degrees of freedom adjustments are made.

RESTORE: Randomized Evaluation of Sedation Titration for Respiratory Failure; SACE: survivor average causal effect; SSW: survival score weighting; PSW: principal score weighting; GEE: generalized estimating equation; LMM: linear mixed model.

Due to conditional survival independence and survival monotonicity, the Set 1 and Set 2 identification assumptions are mutually exclusive, so from a causal perspective, it is instructive to consider which set is more suited to this data application (supposing correct representation of the randomization scheme). Given that sedation practices are not meant to directly treat the conditions associated with acute respiratory failure, the causal assertion that survival under the active treatment is no worse than under the control treatment may not be easily justified. On the other hand, the conditional independence assumptions for Set 1 may be plausible since the key covariates that potentially impact and differentiate patient survival events are identified in the original analysis plan. All that said, while the Set 2 assumptions seem reasonable for this particular CRT, they nor those for Set 1 are testable using the observed data alone, but the nearly identical results for SSW and PSW estimators lend empirical credence to their robustness relative to whichever underlying identification assumptions hold.

As a follow-up illustration, we fit unadjusted and adjusted standard multilevel models to the RESTORE data removing patients whose death was reported prior to 28 days. The models are a generalized estimating equation (GEE) with Gaussian-type mean and variance functions with an independence working correlation, also known as linear regression with cluster-robust standard errors, and a LMM with a random intercept for PICU. We consider these models as they are among the commonly used regression methods for analyzing CRTs but may not produce causal estimates in the presence of truncation by death. In Table 3, the non-mortal treatment effect estimate, which is the coefficient of treatment ${\hat{β}}_{trt}$ in the models, is reported alongside the variance estimate, $\hat{Var}$ $({\hat{β}}_{trt})$ . These values are juxtaposed with the estimates for SACE. We observe that the results for the unadjusted GEE are similar to those that we have with the SACE estimators. However, ${\hat{β}}_{trt}$ does not target a clearly interpretable causal estimand because conditioning on survival breaks the balance of confounders across treatment groups and can lead to comparisons defined on different subpopulations. The similarity in estimates alone does not support the general application of such methods due to the ambiguity in the target estimand and may likely be attributed to the relatively low mortality rates in the RESTORE trial. Moreover, we see that the estimates for the remaining models, in particular the LMMs, are not comparable to those that we obtained for our SACE estimators. These differences may in fact be related to the selection bias introduced by conditioning on survivors and any distributional assumptions about the observed outcome. In contrast, the weighting methods avoid the need to consider outcome models that are often difficult to correctly specify, and they only require selecting a binary regression model to capture the likelihood of survival until the time of outcome measurement.

5. Discussion

For randomized trials targeting the effect of a binary treatment on a non-mortal outcome, participants who do not survive to their follow-up visit will have truncated outcome measurements. Estimating effects conditional on realizations of post-treatment variables such as survival status is a well-known potential source of selection bias that depends on underlying causal pathways. Moreover, when clusters are the randomization unit as in CRTs, there is an additional source of variation in patient outcomes due to within-group correlation. For this multifaceted problem, we target the SACE, a treatment effect conditional on individuals who would survive under either treatment received, and we discuss two sets of assumptions that allow for point identification of the SACE, notably accounting for unobservable sources of variation via a group-level latent variable. We develop two corresponding estimators for the SACE, denoted by SSW and PSW, based on the theory of M-estimation, where their asymptotic distributions have sandwich variance expressions enabling cluster-robust variance estimation.

Results of the simulation studies, which vary both the magnitudes of the within-cluster correlation and the treatment effect for survival status, suggest that the SACE estimators perform well in terms of empirical bias, and their confidence intervals generally achieve nominal coverage even for a small to modest number of clusters. There are occasional issues in undercoverage, but sources of this phenomenon have been documented in the past and may simply be remedied by other choices of small-sample corrections. Critically, both the SSW and PSW estimators prove to be robust to violations to one of their identifying assumptions—conditional survival independence for SSW and survival monotonicity for PSW. Furthermore, for these estimators, we consider models to predict survival status that do and do not explicitly address within-cluster correlation : mixed effects logistic regression (a type of GLMM) and logistic regression (a type of GLM), respectively. Performance in simulations across these modeling decisions is similar. Nevertheless, from the standpoint of addressing latent cluster-induced variation, the GLMM is a more principled approach compared to the GLM, and the simulation study shows that the empirical bias for the SACE estimator is in general smaller when the former is used. But, we demonstrate that in finite samples where the ICC for survival is also relatively low, the choice to use a GLMM may not be as stable (due in part to numerical methods), and one may prefer the GLM with a cluster-robust variance expression for inference, except if there is a previously known large effect of the treatment on survival. The above observations bear out in our application to the RESTORE data, where there are negligible differences in SACE interval estimates across the assumption sets and modeling choices. However, the SACE estimates may differ from estimation of treatment effects among observed survivors, which have no causal interpretation, and the magnitude of this difference may be amplified by (incorrect) assumptions when modeling the outcome distribution. To enable users to implement our methods, we provide an R package, PtSaceCrts, for computing SACE interval estimates based on choices of the set of assumptions and survival status model, available at https://github.com/abcdane1/PtSaceCrts. While we have a non-parametric bootstrap option in this package, the default estimation of asymptotic variances is computationally more efficient than bootstrapping and avoids complications of model fitting on bootstrapped samples when there is a small number of clusters.

The limitations and future considerations for our work relate to assumptions on and utilization of the data. For our analysis of the RESTORE study, we only consider the impact of the sedation protocol on mechanical ventilation duration over a fixed period of time, which does not make maximal use of the available data. Viewed as a time-to-extubation outcome, we could have performed a survival analysis examining treatment effect changes over time and accounting for censoring mechanisms. There has been recent literature on estimating SACE within the semi-competing risks framework for individually randomized data.^54–56 Aside from requiring causal censoring assumptions to be estimable, these estimands can be viewed as continuous time extensions of our target estimand but in the iid context. For identification of continuous time SACE estimands, all the existing approaches also include some framing of assumptions that allow teasing apart non-mortal and survival times across treatments, which enable specifying a likelihood from observable data and thus a range of estimation strategies. For example, Xu et al.⁵⁶ employed a Set 1 like approach, replacing the conditional survival independence assumption with a slightly weaker Gaussian copula formulation with correlation parameter $ρ$ that allows defining the survival status models marginally and is identified for a fixed correlation (e.g. $ρ = 0$ , as in our case). In this article, we do not pursue an extension of the aforementioned continuous time methods to estimate SACE in the CRT context which would require fully specifying a likelihood for estimation; instead we focus on simpler weighting estimators that are able to be applied to a variety of non-mortal outcomes (continuous, binary, and count). An extension of our methods to estimate continuous time SACE in CRTs with application to the RESTORE study is a direction for future research.

Additionally, the proposed estimation of the SACE estimand relies on cross-world identification assumptions, that is, worlds defined by intervening on $a = 1$ and $a = 0$ . This is a debated framework in causal inference because there is no setting, even hypothetically, where simultaneous interventions could occur and assumptions across them are able to be verified.²¹ As an alternative framework, Stensrud et al.⁵⁷ proposed estimands based on the notion of separable effects which avoid these cross-world assumptions and can be applied to the truncation by death problem. But, separable effects requires a significant conceptual exercise itself—one of decomposing treatment disjointly into its effect on survival and its effect on the non-mortal outcome. Despite the inherent untestability of cross-world assumptions, we opt for them because we feel that these assumptions as well as the SACE estimand that they identify are more easily translated to language accessible to practitioners. We have also assumed that there is no (marginal) dependence between the individual-level survival and non-mortal outcomes and cluster size (i.e. no informative cluster size), which can impact the validity of the proposed estimators for targeting individual-level effects.²⁰ When informative cluster size is present, cluster-average and individual-average causal estimands may not overlap, necessitating a more principled approach to defining identification assumptions and compatible estimators as others have ventured at in recent literature in the absence of truncation by death.⁵⁸ Another fundamental assumption is related to the mechanism of randomization, where we suppose that our data come from parallel-arm CRTs. However, a comprehensive review of analyses of CRTs in critical care units shows that less than half of them are based on true parallel-arm trials with the remaining being crossover or stepped wedge.⁵⁹ The emergence of these types of randomized trials, motivated by practical, ethical, and statistical reasons, warrants an exploration of estimands for SACE that can accommodate different randomization designs.

Supplemental Material

sj-pdf-1-smm-10.1177_09622802241309348 - Supplemental material for Weighting methods for truncation by death in cluster-randomized trials

Supplemental material, sj-pdf-1-smm-10.1177_09622802241309348 for Weighting methods for truncation by death in cluster-randomized trials by Dane Isenberg, Michael O Harhay, Nandita Mitra and Fan Li in Statistical Methods in Medical Research

Footnotes

Acknowledgements

The data example was prepared using the RESTORE Research Materials obtained from the National Heart, Lung, and Blood Institute (NHLBI) Biologic Specimen and Data Repository Information Coordinating Center (BioLINCC) and does not necessarily reflect the opinions or views of the RESTORE study investigators or the NHLBI.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Research in this article was supported by the Patient-Centered Outcomes Research Institute® (PCORI® Award ME-2020C1-19220), and the United States National Institutes of Health (NIH), National Heart, Lung, and Blood Institute (grant number R01-HL168202). All statements in this report, including its findings and conclusions, are solely those of the authors and do not necessarily represent the views of the NIH or PCORI® or its Board of Governor or Methodology Committee.

ORCID iDs

Dane Isenberg

Fan Li

Supplemental material

Supplemental material for this article is available online.

References

Turner

Gallis

, et al. Review of recent methodological developments in group-randomized trials: part 1 –design. Am J Public Health 2017; 107: 907–915.

Hayes

Moulton

. Cluster randomised trials. 2nd ed. Boca Raton, FL: Chapman and Hall/CRC Press, 2017.

Rosenbaum

. The consequences of adjustment for a concomitant variable that has been affected by the treatment. J R Stat Soc Ser A: Stat Soc 1984; 147: 656–666.

Wang

Scharfstein

Colantuoni

, et al. Inference in randomized trials with death and missingness. Biometrics 2017; 73: 431–440.

Frangakis

Rubin

. Principal stratification in causal inference. Biometrics 2002; 58: 21–29.

Rubin

. Causal inference without counterfactuals: comment. J Am Stat Assoc 2000; 95: 435–438.

Zhang

Rubin

. Estimation of causal effects via principal stratification when some outcomes are truncated by “death”. J Educ Behav Stat 2003; 28: 353–368.

Zhang

Rubin

Mealli

. Likelihood-based analysis of causal effects of job-training programs using principal stratification. J Am Stat Assoc 2009; 104: 166–176.

Mercatanti

Mealli

. Improving inference of Gaussian mixtures using auxiliary variables. Stat Anal Data Mining: The ASA Data Sci J 2015; 8: 34–48.

10.

Hayden

Pauler

Schoenfeld

. An estimator for treatment comparisons among survivors in randomized trials. Biometrics 2005; 61: 305–310.

11.

Ding

. Principal stratification analysis using principal scores. J R Stat Soc Ser B (Stat Methodol) 2017; 79: 757–777.

12.

Zehavi

Nevo

. Matching methods for truncation by death problems. J R Stat Soc Ser A: Stat Soc 2023; 186: 659–681.

13.

Turner

Prague

Gallis

et al. Review of recent methodological developments in group-randomized trials: part 2 –analysis. Am J Public Health 2017; 107: 1078–1086.

14.

Tong

Chen

, et al. A Bayesian approach for estimating the survivor average causal effect when outcomes are truncated by death in cluster-randomized trials. Am J Epidemiol 2023; 192: 1006–1015.

15.

Wang

Bridges Jr

, et al. Bayesian framework for causal inference with principal stratification and clusters. Stat Biosci 2023; 15: 114–140.

16.

Wang

Tong

Hirani

, et al. A mixed model approach to estimate the survivor average causal effect in cluster-randomized trials. Stat Med 2024; 43: 16–33.

17.

Huber

. Robust estimation of a location parameter. Ann Math Stat 1964; 35: 73–101.

18.

Huber

. The behavior of maximum likelihood estimates under nonstandard conditions. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, volume 1, University of California Press, 1967, pp.221–233.

19.

Stefanski

Boos

. The calculus of M-estimation. Am Stat 2002; 56: 29–38.

20.

Kahan

Copas

, et al. Estimands in cluster-randomized trials: choosing analyses that answer the right question. Int J Epidemiol 2023; 52: 107–118.

21.

Richardson

Robins

. Single world intervention graphs (swigs): a unification of the counterfactual and graphical approaches to causality. Center Stat Soc Sci Univer Washington Ser Work Paper 2013; 128: 2013.

22.

Zaslavsky

Landrum

. Propensity score weighting with multilevel data. Stat Med 2013; 32: 3373–3387.

23.

Yang

. Propensity score weighting for causal inference with clustered data. J Causal Inference 2018; 6: 20170027.

24.

Raudenbush

. Statistical analysis and optimal design for cluster randomized trials. Psychol Methods 1997; 2: 173.

25.

Campbell

Fayers

Grimshaw

. Determinants of the intracluster correlation coefficient in cluster randomized trials: the case of implementation research. Clin Trials 2005; 2: 99–107.

26.

Rabideau

Wang

. Multiply robust generalized estimating equations for cluster randomized trials with missing outcomes. Stat Med 2024; 43: 1458–1474.

27.

Iverson

Randles

. The effects on convergence of substituting parameter estimates into U-statistics and other families of statistics. Probab Theory Relat Fields 1989; 81: 453–471.

28.

Yucel

. Model-based inference on average causal effect in observational clustered data. Health Serv Outcomes Res Methodol 2019; 19: 36–60.

29.

Liu

Pierce

. A note on Gauss-Hermite quadrature. Biometrika 1994; 81: 624–629.

30.

Liang

Zeger

. Longitudinal data analysis using generalized linear models. Biometrika 1986; 73: 13–22.

31.

Hubbard

Ahern

Fleischer

, et al. To GEE or not to GEE: comparing population average and mixed models for estimating the associations between neighborhood risk factors and health. Epidemiology 2010; 21: 467–474.

32.

Neuhaus

Kalbfleisch

Hauck

. A comparison of cluster-specific and population-averaged approaches for analyzing correlated binary data. Int Stat Review/Revue Int de Stat 1991; 59: 25–35.

33.

Thompson

Leyrat

Fielding

, et al. Cluster randomised trials with a binary outcome and a small number of clusters: comparison of individual and cluster level analysis method. BMC Med Res Methodol 2022; 22: 1–15.

34.

Redden

. Small sample performance of bias-corrected sandwich estimators for cluster-randomized trials with binary outcomes. Stat Med 2015; 34: 281–296.

35.

Kahan

Forbes

Ali

, et al. Increased risk of type I errors in cluster randomised trials with small or medium numbers of clusters: a review, reanalysis, and simulation study. Trials 2016; 17: 1–8.

36.

MacKinnon

White

. Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties. J Economet 1985; 29: 305–325.

37.

Self

Liang

. Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J Am Stat Assoc 1987; 82: 605–610.

38.

Bates

Maechler

Bolker

, et al. Package ‘lme4’. URL http://lme4r-forge r-project org 2009.

39.

Chung

Rabe-Hesketh

Dorie

, et al. A nondegenerate penalized likelihood estimator for variance parameters in multilevel models. Psychometrika 2013; 78: 685–709.

40.

Morris

White

Crowther

. Using simulation studies to evaluate statistical methods. Stat Med 2019; 38: 2074–2102.

41.

Jiang

Yang

Ding

. Multiply robust estimation of causal effects under principal ignorability. J R Stat Soc Ser B: Stat Methodol 2022; 84: 1423–1445.

42.

Small

Tan

Lorch

, et al. Instrumental variable estimation when compliance is not deterministic: the stochastic monotonicity assumption. arXiv preprint arXiv:14077308 2014.

43.

Sasson

Wang

Ogino

, et al. The subtype-free average causal effect for heterogeneous disease etiology. ArXiv preprint arXiv:220600209 2022.

44.

Goldstein

Browne

Rasbash

. Partitioning variation in multilevel models. Underst Stat: Stat Issu Psychol Educ Soc Sci 2002; 1: 223–231.

45.

Eldridge

Ukoumunne

Carlin

. The intra-cluster correlation coefficient in cluster randomized trials: a review of definitions. Int Stat Rev 2009; 77: 378–394.

46.

Zeger

Liang

Albert

. Models for longitudinal data: a generalized estimating equation approach. Biometrics 1988; 1049–1060.

47.

Rodríguez

Goldman

. An assessment of estimation procedures for multilevel models with binary responses. J R Stat Soc: Ser A (Stat Soc) 1995; 158: 73–89.

48.

Huang

. Using cluster-robust standard errors when analyzing group-randomized trials with few clusters. Behav Res Methods 2022; 54: 1–19.

49.

Curley

Wypij

Watson

, et al. Protocolized sedation vs usual care in pediatric patients mechanically ventilated for acute respiratory failure: a randomized clinical trial. Jama 2015; 313: 379–389.

50.

Hill

Fowler

Burns

, et al. Long-term outcomes and health care utilization after prolonged mechanical ventilation. Ann Am Thoracic Soc 2017; 14: 355–362.

51.

Faddy

Graves

Pettitt

. Modeling length of stay in hospital and other right skewed data: comparison of phase-type, gamma, and log-normal distributions. Value Health 2009; 12: 309–314.

52.

Pollack

Holubkov

Funai

, et al. The pediatric risk of mortality score: update 2015. Pediat Crit Care Med: J Soc Crit Care Med World Feder Pediat Inten Crit Care Soc 2016; 17: 2.

53.

Fiser

. Assessing the outcome of pediatric intensive care. J Pediat 1992; 121: 68–74.

54.

Comment

Mealli

Haneuse

, et al. Survivor average causal effects for continuous time: a principal stratification approach to causal inference with semicompeting risks. ArXiv preprint arXiv:190209304, 2019, pp.1–28.

55.

Nevo

Gorfine

. Causal inference for semi-competing risks data. Biostatistics 2022; 23: 1115–1132.

56.

Scharfstein

Müller

, et al. A Bayesian nonparametric approach for evaluating the causal effect of treatment in randomized trials with semi-competing risks. Biostatistics 2022; 23: 34–49.

57.

Stensrud

Robins

Sarvet

, et al. Conditional separable effects. J Am Stat Assoc 2022; 118: 1–13.

58.

Wang

Park

Small

, et al. Model-robust and efficient covariate adjustment for cluster-randomized experiments. J Am Stat Assoc 2024; 119: 1–13.

59.

Cook

Rutherford

Scales

, et al. Rationale, methodological quality, and reporting of cluster-randomized controlled trials in critical care medicine: a systematic review. Crit Care Med 2021; 49: 977–987.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.32 MB

Scenarios			Estimates		Model variance $^{b}$ (EV $^{b}$ )		Bias $^{b}$		$%$ Coverage
$δ$	$n_{c}$	RE	SSW	PSW	SSW	PSW	SSW	PSW	SSW	PSW
$λ = 0.1$
0.0	30	T	1.567	1.564	2.7 (2.4)	2.6 (2.2)	−0.1	−0.4	95.1	95.8
		F	1.567	1.564	2.5 (2.1)	2.5 (2.1)	−0.1	−0.4	96.1	95.9
	60	T	1.564	1.562	1.2 (1.2)	1.1 (1.1)	−0.4	−0.6	93.1	93.9
		F	1.566	1.563	1.1 (1.1)	1.1 (1.1)	−0.2	−0.5	94.4	94.7
	90	T	1.566	1.563	0.7 (0.7)	0.7 (0.7)	−0.2	−0.5	94.9	95.2
		F	1.566	1.563	0.7 (0.6)	0.7 (0.6)	−0.2	−0.5	95.4	95.3
0.2	30	T	1.564	1.562	2.6 (2.3)	2.6 (2.2)	−0.4	−0.6	95.4	96.1
		F	1.560	1.558	2.5 (2.1)	2.5 (2.1)	−0.7	−1.0	95.8	95.9
	60	T	1.561	1.560	1.1 (1.2)	1.1 (1.1)	−0.7	−0.7	93.7	94.1
		F	1.558	1.555	1.1 (1.1)	1.1 (1.1)	−0.9	−1.2	94.7	94.8
	90	T	1.562	1.560	0.7 (0.7)	0.7 (0.7)	−0.5	−0.7	95.2	94.6
		F	1.558	1.555	0.7 (0.6)	0.7 (0.6)	−1.0	−1.2	94.7	94.8
1.6 $^{a}$	30	T	1.541	1.543	2.7 (2.1)	2.6 (2.1)	−2.5	−2.4	95.6	96.2
		F	1.528	1.528	2.4 (2.1)	2.4 (2.0)	−3.8	−3.8	95.0	95.2
	60	T	1.541	1.543	1.2 (1.1)	1.1 (1.1)	−2.5	−2.3	94.1	94.6
		F	1.526	1.525	1.1 (1.0)	1.1 (1.0)	−4.0	−4.1	93.1	92.8
	90	T	1.542	1.543	0.7 (0.7)	0.7 (0.7)	−2.4	−2.3	94.8	95.1
		F	1.526	1.525	0.7 (0.6)	0.7 (0.6)	−4.0	−4.1	93.0	93.0
$λ = 0.3$
0.0	30	T	1.567	1.565	2.5 (2.7)	2.5 (2.5)	0.0	−0.2	92.5	94.0
		F	1.568	1.565	2.4 (2.1)	2.3 (2.1)	0.1	−0.2	94.9	94.6
	60	T	1.563	1.562	1.1 (1.3)	1.1 (1.2)	−0.4	−0.5	91.2	92.5
		F	1.565	1.563	1.0 (1.0)	1.0 (1.0)	−0.1	−0.4	94.9	94.1
	90	T	1.565	1.562	0.7 (0.8)	0.7 (0.7)	−0.2	−0.5	92.7	93.7
		F	1.566	1.563	0.7 (0.6)	0.7 (0.6)	−0.1	−0.4	95.2	95.1
0.2	30	T	1.564	1.563	2.5 (2.7)	2.4 (2.4)	−0.3	−0.4	92.4	93.9
		F	1.557	1.555	2.3 (2.1)	2.3 (2.0)	−1.0	−1.2	95.0	94.9
	60	T	1.561	1.560	1.1 (1.3)	1.1 (1.2)	−0.6	−0.6	91.8	92.4
		F	1.555	1.552	1.0 (1.0)	1.0 (1.0)	−1.2	−1.5	94.4	94.4
	90	T	1.563	1.561	0.7 (0.8)	0.7 (0.7)	−0.4	−0.6	92.3	93.4
		F	1.555	1.552	0.7 (0.6)	0.7 (0.6)	−1.2	−1.5	95.2	94.9
1.6 $^{a}$	30	T	1.549	1.549	2.5 (2.3)	2.5 (2.2)	−1.7	−1.7	94.2	94.5
		F	1.508	1.507	2.3 (1.9)	2.3 (1.9)	−5.9	−5.9	93.6	93.6
	60	T	1.546	1.547	1.1 (1.2)	1.1 (1.1)	−2.0	−1.8	92.8	93.1
		F	1.503	1.502	1.0 (1.0)	1.0 (1.0)	−6.3	−6.3	89.5	89.5
	90	T	1.549	1.549	0.7 (0.7)	0.7 (0.7)	−1.7	−1.6	93.5	93.6
		F	1.504	1.503	0.7 (0.6)	0.6 (0.6)	−6.2	−6.3	88.2	87.9

(a) SACE estimators		(b) Coefficient estimators, ${\hat{β}}_{trt}$ , among observed survivors
Name	Estimate $(\hat{Var})$	Model name	Estimate $(\hat{Var})$
${\hat{τ}}_{SSW}$	−0.152 (0.437)	Unadjusted linear GEE	−0.149 (0.435)
${\hat{τ}}_{PSW}$	−0.152 (0.435)	Unadjusted LMM	−0.465 (0.422)
${\tilde{τ}}_{SSW}$	−0.148 (0.437)	Adjusted linear GEE	−0.103 (0.399)
${\tilde{τ}}_{PSW}$	−0.151 (0.435)	Adjusted LMM	−0.440 (0.428)