Intervention treatment distributions that depend on the observed treatment process and model double robustness in causal survival analysis

Abstract

The generalized g-formula can be used to estimate the probability of survival under a sustained treatment strategy. When treatment strategies are deterministic, estimators derived from the so-called efficient influence function (EIF) for the g-formula will be doubly robust to model misspecification. In recent years, several practical applications have motivated estimation of the g-formula under non-deterministic treatment strategies where treatment assignment at each time point depends on the observed treatment process. In this case, EIF-based estimators may or may not be doubly robust. In this paper, we provide sufficient conditions to ensure the existence of doubly robust estimators for intervention treatment distributions that depend on the observed treatment process for point treatment interventions and give a class of intervention treatment distributions dependent on the observed treatment process that guarantee model doubly and multiply robust estimators in longitudinal settings. Motivated by an application to pre-exposure prophylaxis (PrEP) initiation studies, we propose a new treatment intervention dependent on the observed treatment process. We show there exist (1) estimators that are doubly and multiply robust to model misspecification and (2) estimators that when used with machine learning algorithms can attain fast convergence rates for our proposed intervention. Finally, we explore the finite sample performance of our estimators via simulation studies.

Keywords

Causal inference double robustness estimating equations observational study stochastic treatment strategies

1 Introduction

The goal of many observational analyses is to estimate the causal effect on survival of different time-fixed or time-varying treatment strategies, interventions, or rules in a study population. These causal effects can be formally defined by a contrast (e.g. difference or ratio) in the distributions of counterfactual outcomes had interventions been implemented to ensure those strategies are followed in that population. Robins¹ showed that under assumptions that allow complex longitudinal data structures such that measured time-varying confounders may themselves be affected by past treatment, the g-formula indexed by a particular treatment strategy identifies the average counterfactual outcome under that strategy. Therefore, estimators of the g-formula and associated contrasts indexed by different strategies may be used to estimate causal effects.

In practice, the g-formula typically depends on high-dimensional nuisance parameters. In this case, many estimators of the g-formula and associated contrasts have been proposed including the density-based parametric g-formula,¹ iterated conditional expectation (ICE) estimators,^2,3 inverse probability weighted (IPW) estimators,^4,5 and estimators derived from the efficient influence function (EIF).^6,7 EIF-based estimators (i.e. estimators constructed to evaluate the EIF from an empirical sample) have several theoretical advantages over the other approaches including they may be $\sqrt{n}$ -consistent if the nuisance functions are estimated at slower rates through flexible nonparametric or machine learning methods.^8–10

EIF-based estimators may also have a model double-robustness property in that when nuisance functions are estimated via parametric models, these estimators may remain consistent and asymptotically normal if models for only one of two (sets of) nuisance functions are correctly specified, not necessarily both. This model double-robustness property always holds for EIF estimators when the g-formula is indexed by a deterministic treatment strategy at most dependent on past treatment and confounders.^6,11,12 However, the identification results of Robins¹ were not limited to such deterministic strategies but generalized to allow identification of stochastic treatment strategies at most dependent on the measured past. The latter identifying functional or generalized g-formula depends on the intervention treatment distribution, that is, the distribution of treatment under an intervention that ensures the strategy of interest is followed conditional only on the measured past in the observational study. The generalized g-formula coincides with the more familiar g-formula indexed by a deterministic strategy when the intervention treatment distribution is chosen as degenerate conditional on any level of the measured past.

Recently, several practical applications have motivated estimation of the generalized g-formula indexed by intervention treatment distributions that depend on the observed treatment process, that is, the observed treatment distribution conditional on the measured past.^13–17 The generalized g-formula indexed by an intervention treatment distribution dependent on the observed treatment process has the particular advantage of relying on relatively weak positivity conditions^15,18 even, for example, in observational studies where the propensity score is equal or close to zero for certain measured confounder histories.¹⁶ When the (degenerate or non-degenerate) intervention treatment distribution does not depend on the observed treatment process, EIF-derived estimators of the generalized g-formula will be model doubly robust. However, when the intervention treatment distribution does depend on the observed treatment process, such estimators may or may not be doubly robust.^14–16,19

In this paper, we exploit particular representations of the generalized g-formula to give sufficient conditions for the existence of doubly robust estimators for point treatment interventions when the chosen intervention treatment distribution depends on the observed treatment process. We also provide a general form of EIFs for a class of intervention treatment distributions that may depend on the observed treatment process in longitudinal settings that guarantee model doubly (and multiply) robust estimators. Motivated by observational studies of the effects of realistic HIV PrEP initiation interventions, we consider a new class of intervention treatment distributions dependent on the observed treatment process that is a variation on the incremental propensity score interventions proposed by Kennedy.¹⁶ We show that estimators based on the EIF for our proposed intervention treatment distribution are model doubly/multiply robust, and can attain fast convergence rates even when used in combination with machine learning algorithms, where modeling assumptions are relaxed. We illustrate both EIF-based, as well as simpler singly robust, estimators of the g-formula indexed by this class of intervention treatment distribution in simulated data and in an illustrative data application.

2 Observed data structure

Consider a longitudinal study with $j = 0, 1, 2, \dots, J$ denoting a follow-up time interval (e.g. week and month), where $J$ is the end of the follow-up of interest. Assume the following random variables are measured in this study on each of $n$ individuals meeting some eligibility criteria at baseline. For each $j = 0, 1, 2, \dots, J - 1$ , let $L_{j}$ denote a vector of time-varying covariates measured at the beginning of time interval $j$ , $A_{j}$ a binary or discrete treatment variable measured during time interval $j$ , and $Y_{j + 1}$ an indicator of surviving an event of interest by time interval $j + 1$ (e.g. the first diagnosis of bacterial sexually transmitted infection (STI)). For notational simplicity, we will assume throughout that all covariates are discrete in that they have distributions that are absolutely continuous with respect to a counting measure but arguments naturally extend to settings with continuous covariates and Lebesgue measures. By definition, $Y_{0} = 1$ (all individuals are at risk of failure baseline) and by convention we define ${\bar{L}}_{- 1} = \emptyset$ and ${\bar{A}}_{- 1} = 0$ . For a random variable $X$ , we let ${\bar{X}}_{j} = (X_{0}, \dots, X_{j})$ denote history through time $j$ . We assume the ordering $O = (L_{0}, A_{0}, Y_{1}, \dots, L_{J - 1}, A_{J - 1}, Y_{J})$ . Without loss of generality, we will assume no individual is lost to follow-up until Section 7.3.

3 Intervention treatment distribution

Let $g$ denote a treatment rule that specifies how a treatment should be assigned at each $j = 0, 1, 2, \dots, J - 1$ . Following Richardson and Robins,²⁰ denote $(L_{j}^{g}, Y_{j}^{g})$ and $A_{j}^{g +}$ as the natural values of covariates and survival status and the intervention value of treatment at $j$ under $g$ , respectively. In turn, the distribution of $A_{j}^{g +}$ evaluated at some treatment level $a_{j}$ conditional on the “measured past” under $g$ $(Y_{j}^{g} = 1, {\bar{L}}_{j}^{g} = {\bar{l}}_{j}, {\bar{A}}_{j - 1}^{g +} = {\bar{a}}_{j - 1})$ is specified by $q^{g} (a_{j} ∣ 1, {\bar{l}}_{j}, {\bar{a}}_{j - 1}) \equiv \Pr (A_{j}^{g +} = a_{j} ∣ Y_{j}^{g} = 1, {\bar{L}}_{j}^{g} = {\bar{l}}_{j}, {\bar{A}}_{j - 1}^{g +} = {\bar{a}}_{j - 1})$ which we refer to as the intervention treatment distribution at $j$ associated with $g$ .

When treatment assignment at any time under a selected rule $g$ deterministically depends on the measured past, there is only one value $a_{j}^{+} \in supp (A_{j}^{g +})$ given any history $({\bar{l}}_{j}, {\bar{a}}_{j - 1}) \in supp ({\bar{L}}_{j}^{g}, {\bar{A}}_{j - 1}^{g +})$ for those with $Y_{j}^{g} = 1$ . In this case, $q^{g} (a_{j} ∣ 1, {\bar{l}}_{j}, {\bar{a}}_{j - 1}) = 1$ when $a_{j} = a_{j}^{+}$ and 0 otherwise. Examples include static deterministic rules that assign the same level of treatment to all surviving individuals at all follow-up times and dynamic deterministic rules that assign treatment based on the measured past.

By contrast, when a selected rule $g$ assigns treatment stochastically at some $j$ (as a random draw from a distribution), at most dependent on the measured past, then there will be multiple values $a_{j}^{+} \in supp (A_{j}^{g +})$ such that we may have $0 < q^{g} (a_{j} ∣ 1, {\bar{l}}_{j}, {\bar{a}}_{j - 1}) < 1$ when $a_{j} = a_{j}^{+}$ . We focus here on the problem of estimating $E [Y_{J}^{g}] = \Pr [Y_{J}^{g} = 1]$ , the cumulative survival probability by end of follow-up under a choice of $g$ , when the intervention treatment distribution associated with $g$ has this non-degenerate property of a stochastic rule, in particular, through its dependence on the observed treatment distribution conditional on the measured past as specified by $f (a_{j} ∣ 1, {\bar{l}}_{j}, {\bar{a}}_{j - 1}) \equiv \Pr (A_{j} = a_{j} | Y_{j} = 1, {\bar{L}}_{j} = {\bar{l}}_{j}, {\bar{A}}_{j - 1} = {\bar{a}}_{j - 1}) .$ We refer to $f (a_{j} ∣ 1, {\bar{l}}_{j}, {\bar{a}}_{j - 1})$ as the observed treatment process conditional on the measured past. The observed treatment process evaluated at $a_{j} = 1$ coincides with the so-called propensity score²¹ at $j$ when treatment $A_{j}$ is binary. In Supplemental Appendix A, we consider several examples of such intervention distributions.

4 Motivating example and incremental propensity score interventions

Kennedy¹⁶ posed incremental propensity score interventions that at each $j$ assign a binary treatment according to a strategy $g$ based on an intervention treatment distribution defined by a shifted version (on the odds scale) of the propensity score. Specifically, for a particular $δ \in (0, \infty)$ , treatment is assigned as a random draw from

q^{g} (1 ∣ 1, {\bar{l}}_{j}, {\bar{a}}_{j - 1}) = \frac{δ f (1 ∣ 1, {\bar{l}}_{j}, {\bar{a}}_{j - 1})}{δ f (1 ∣ 1, {\bar{l}}_{j}, {\bar{a}}_{j - 1}) + f (0 ∣ 1, {\bar{l}}_{j}, {\bar{a}}_{j - 1})}

(1)

or equivalently, from

q^{g} (a_{j} ∣ 1, {\bar{l}}_{j}, {\bar{a}}_{j - 1}) = {a_{j} δ f (a_{j} ∣ 1, {\bar{l}}_{j}, {\bar{a}}_{j - 1}) + (1 - a_{j}) f (a_{j} ∣ 1, {\bar{l}}_{j}, {\bar{a}}_{j - 1})} {δ f (1 ∣ 1, {\bar{l}}_{j}, {\bar{a}}_{j - 1}) + f (0 ∣ 1, {\bar{l}}_{j}, {\bar{a}}_{j - 1})}^{- 1}

. Here we consider a modification of (1) motivated by the PrEP context. Randomized trials have demonstrated that antiretroviral PrEP is highly effective in preventing HIV infection among men who have sex with men (MSM).^22–24 Some MSM are motivated to use PrEP not only to reduce HIV risk but also to improve sexual well-being,^25,26 including increased intimacy and pleasure facilitated by condomless sex without fear of HIV acquisition.²⁷ Reduced condom use among PrEP users may increase the risk of STIs, making STI screening and treatment an important part of comprehensive PrEP care.²⁸ In our motivating example, we investigated the effect of increases in PrEP uptake on the incidence of bacterial STIs among MSM.

Specifically, for $δ \in [0, 1]$ and $L_{j}^{*} \in L_{j}$ , a measured marker of risk for HIV acquisition (e.g. receiving a bacterial sexually transmitted infection or STI test at $j$ and no prior HIV diagnosis), we consider interventions indexed by the alternative intervention treatment distribution

q^{g} (0 ∣ 1, {\bar{l}}_{j}, {\bar{a}}_{j - 1}) = {\begin{matrix} f (0 ∣ 1, {\bar{l}}_{j}, {\bar{a}}_{j - 1}), & if l_{j}^{*} = 0 \\ f (0 ∣ 1, {\bar{l}}_{j}, {\bar{a}}_{j - 1}) δ, & if l_{j}^{*} = 1 \end{matrix}

(2)

q^{g} (a_{j} ∣ 1, {\bar{l}}_{j}, {\bar{a}}_{j - 1}) = (1 - δ) l_{j}^{*} a_{j} + (l_{j}^{*} δ + 1 - l_{j}^{*}) f (a_{j} ∣ 1, {\bar{l}}_{j}, {\bar{a}}_{j - 1})

after some algebra.

In words, the probability of initiating PrEP conditional on the past under $g$ at each $j$ will be larger than the observed propensity score at $j$ by decreasing its complement by a factor of $δ$ for those with an indication ( $L_{j}^{*} = 1$ ). Choosing $δ = 0$ corresponds to selecting an intervention where individuals with $L_{j}^{*} = 1$ are always treated. Choosing $δ = 1$ corresponds to an intervention where the intervention treatment distribution coincides with the observed treatment distribution. We will refer to interventions indexed by either (1) or (2) as incremental propensity score interventions, distinguishing them by the classifier odds shift or multiplicative shift, respectively.

5 Identification by the generalized g-formula

Consider a treatment assignment rule $g$ at most dependent on past covariates. Further, let $D_{g}$ denote the set of all deterministic strategies at most dependent on this past that individuals could be observed to follow under the selected rule $g$ , with $d$ any element of $D_{g}$ . In the special case when $g$ is initially selected to be a deterministic rule then the only element of $D_{g}$ is $g$ . Otherwise, $D_{g}$ may contain many elements. Let $Y_{j}^{d}, L_{j}^{d}$ and $A_{j}^{d +}$ denote the natural values of survival status and covariates and the intervention value of treatment at $j$ , respectively, under a deterministic $d \in D_{g}$ though ( $j = 0, \dots, J)$ and consider the following assumptions:

Exchangeability: $(Y_{j + 1}^{d}, \dots, Y_{J}^{d}) A_{j} ∣ {\bar{L}}_{j} = {\bar{l}}_{j}, {\bar{A}}_{j - 1} = {\bar{a}}_{j - 1}^{+}, Y_{j} = 1$ .

Consistency: If ${\bar{A}}_{j} = {\bar{A}}_{j}^{d +}$ then ${\bar{Y}}_{j + 1} = {\bar{Y}}_{j + 1}^{d}$ and ${\bar{L}}_{j} = {\bar{L}}_{j}^{d}$ .

Positivity: $f_{{\bar{L}}_{j}, {\bar{A}}_{j - 1}, Y_{j}} ({\bar{l}}_{j}, {\bar{a}}_{j - 1}^{+}, 1) > 0 ⟹ f_{A_{j} ∣ Y_{j}, {\bar{L}}_{j}, {\bar{A}}_{j - 1}} (a_{j}^{+} ∣ 1, {\bar{l}}_{j}, {\bar{a}}_{j - 1}^{+}) > 0$ .

Robins¹ showed that given these exchangeability, consistency and positivity conditions hold for all deterministic

d \in D_{g}

then

E [Y_{J}^{g}]

had all subjects been assigned treatment according to a random draw from

q^{g} (a_{j} ∣ 1, {\bar{l}}_{j}, {\bar{a}}_{j - 1})

is equivalent to the g-formula:

\begin{aligned} ψ^{g} = & \sum_{\forall {\bar{a}}_{J - 1}} \sum_{\forall {\bar{l}}_{J - 1}} P (Y_{J} = 1 ∣ Y_{J - 1} = 1, {\bar{L}}_{J - 1} = {\bar{l}}_{J - 1}, {\bar{A}}_{J - 1} = {\bar{a}}_{J - 1}) \\ \times \prod_{j = 0}^{J - 1} P (Y_{j} = 1 ∣ Y_{j - 1} = 1, {\bar{L}}_{j - 1} = {\bar{l}}_{j - 1}, {\bar{A}}_{j - 1} = {\bar{a}}_{j - 1}) f (l_{j} ∣ Y_{j} = 1, {\bar{l}}_{j - 1}, {\bar{a}}_{j - 1}) q^{g} (a_{j} ∣ 1, {\bar{l}}_{j}, {\bar{a}}_{j - 1}) \end{aligned}

(3)

The function

ψ^{g}

is referred to as the generalized g-formula indexed by the intervention treatment distribution

q^{g} (a_{j} ∣ 1, {\bar{l}}_{j}, {\bar{a}}_{j - 1})

. Note that, under different identifying conditions, the generalized g-formula may identify the outcome mean under a rule

g

that depends on more than the measured past.^a,20,18

5.1 Generalized positivity

Note the assumption that the positivity condition above holds for all deterministic $d \in D_{g}$ can be alternatively stated as follows:

q^{g} (a_{j} ∣ 1, {\bar{l}}_{j}, {\bar{a}}_{j - 1}) > 0 ⟹ f (a_{j} ∣ 1, {\bar{l}}_{j}, {\bar{a}}_{j - 1}) > 0

(4)

for all

{\bar{l}}_{j}, {\bar{a}}_{j} \in supp ({\bar{L}}_{j}^{g}, {\bar{A}}_{j}^{g +})

. The positivity condition (4) generalizes the more familiar definition of positivity often relied on in the literature that there may be treated and untreated individuals within any level of the measured past; that is, the assumption that the propensity score and its complement are positive for all possible measured histories and all

j

. It is straightforward to see that the more general condition (4) reduces to this typical definition of positivity only for the special case of a static deterministic intervention

g

on a binary treatment. By contrast, the more general condition (4) only requires that, for any level of the past possible in the observational study and also plausible under

g

, if an intervention level of treatment can occur under

g

it must also possibly occur in the observational study. Depending on the choice of

g

, this condition may hold when traditional definitions requiring positive propensity scores fail. Intervention treatment distributions that depend on the observed treatment process may help avoid positivity violations by this more general definition and, in some instances, may guarantee that positivity violations cannot occur regardless of the observed treatment process. We discuss this further in the next section.

Similar to arguments given in Kennedy,¹⁶ the odds shift (1) has the particular advantage that, by construction, the generalized positivity condition (4) is guaranteed to hold, no matter the nature of the observed treatment process. By contrast, the multiplicative shift (2) only enjoys this guarantee for measured pasts consistent with $L_{j}^{*} = 0$ . However, compared to (1) which is indexed by a shift $δ$ with no upper bound that quantifies an odds ratio, (2) may be easier to communicate to subject matter collaborators as it constrains the choice of $δ \in [0, 1]$ and quantifies a risk ratio. Notably, the performance of weighted estimators of $ψ^{g}$ indexed by both (1) and (2) are relatively resilient to so-called “near positivity violations”—such that (4) holds but $f (a_{j} ∣ 1, {\bar{l}}_{j}, {\bar{a}}_{j - 1})$ is still close to zero for some $({\bar{l}}_{j}, {\bar{a}}_{j - 1})$ —particularly when $δ$ is chosen to coincide with relatively small increases in treatment uptake under $g$ (see Section 8).

In the PrEP context, the multiplicative shift incremental propensity score interventions are particularly useful because analyses of observational data on the effects of deterministic interventions, such as “always treat” versus “never treat” with PrEP, will result in near or true positivity violations, with propensity scores close to (or equal to) zero for individuals with certain levels of the measured confounders. In conjunction with these challenges, such deterministic treatment effects are not of greatest interest for treatments such as PrEP, where biological benefits are established but population disease burden may be impacted with even small increases in treatment uptake. In the following sections, we will describe efficient estimators for the multiplicative shift incremental propensity score interventions.

6 Model double robustness when the intervention treatment distribution depends on the observed treatment process

Suppose that the observed data $O$ defined in Section 2 follows a law $P$ , which is known to belong to $M = {P_{θ} : θ \in Θ}$ , where $Θ$ is the parameter space. The EIF $U_{ψ^{g}} (O)$ for the causal parameter $ψ^{g} \equiv ψ^{g} (θ)$ in a non-parametric model that imposes no restrictions on the law of $O$ other than positivity is given by $d ψ^{g} (θ_{t}) / d t |_{t = 0} = E {U_{ψ^{g}} (O) S (O)}$ , where $d ψ^{g} (θ_{t}) / d t ∣_{t = 0}$ is known as the pathwise derivative of the parameter $ψ^{g}$ along any parametric submodel of the observed data distribution indexed by $t$ , and $S (O)$ is the score function of the parametric submodel evaluated at $t = 0$ .^29,30 In this section, we provide results that aid the intuition on the existence of doubly robust estimators of $ψ^{g}$ when the intervention treatment distribution depends on the observed treatment process through understanding properties of the EIF for the parameter $ψ^{g}$ .

6.1 Point treatment

We begin with the special case of a point treatment where $J = 1$ and $O = (L_{0}, A_{0}, Y_{1}) \equiv (L, A, Y)$ . In this case, (3) reduces to $ψ^{g} = \sum_{\forall \bar{a}} \sum_{\forall \bar{l}} E (Y | A = a, L = l) q^{g} (a | l) f (l)$ .

Theorem 1.
Suppose $ψ^{g}$ can be written as a linear combination of the form:
$ψ^{g} = c_{1} \underset{ν_{1}}{\underset{⏟}{E {h_{1} (O)}}} + c_{2} \underset{ν_{2}}{\underset{⏟}{E [E {h_{2} (O) ∣ A = a^{}, L}]}}$
(5)
where $a^{}$ , $c_{1}$ , and $c_{2}$ are constants, and $h_{1} (O)$ and $h_{2} (O)$ are known measurable functions of $O$ (i.e. they do not depend on $θ$ ). Then the EIF for $ψ^{g}$ is given by
$U_{ψ^{g}} (O) = c_{1} h_{1} (O) + c_{2} [\frac{I (A = a^{})}{f (A ∣ L)} [h_{2} (O) - E {h_{2} (O) ∣ A, L}] + E {h_{2} (O) ∣ A = a^{}, L}] - ψ^{g}$
(6)

See Supplemental Appendix C for proof. Clearly, $ψ^{g}$ under a static deterministic strategy that sets treatment to level $a^{}$ for all individuals trivially meets the conditions of Theorem 1 by selecting $h_{1} (O) = 0$ , $c_{2} = 1$ , $h_{2} (O) = Y$ . In this case, the EIF for the g-formula indexed by $g$ or $E {E (Y ∣ A = a^{}, L)}$ equals:
$U_{ψ^{g}} (O) = \frac{I (A = a^{})}{f (A ∣ L)} {Y - m (A, L)} + m (a^{}, L) - ψ^{a^{}}$
(7)
where $m (A, L) \equiv E (Y ∣ A, L)$ and $m (a^{}, L) \equiv E (Y ∣ A = a^{}, L)$ .^6,31,7 A heuristic justistification for Theorem 1 follows from the fact that the EIF of $ν_{1} \equiv ν_{1} (θ)$ is simply $h_{1} (O)$ , and the EIF of $ν_{2} \equiv ν_{2} (θ)$ can realized by replacing $Y$ with $h_{2} (O)$ in Expression (7), as the function $h_{2} (\cdot)$ does not depend on $θ$ and therefore its pathwise derivative is zero. Furthermore, it is established that an estimator derived from the influence function (7) (e.g. an estimator solving $\sum_{i = 1}^{n} U_{ψ^{g}} (O_{i}) = 0$ for $ψ^{g}$ ) is model doubly robust in that it remains consistent if estimated under correctly specified parametric models for either one of two (sets of) nuisance functions, specifically $E (Y ∣ A, L)$ or $f (A ∣ L)$ . The following Corollary gives a sufficient condition for the existence of doubly robust estimators of $ψ^{g}$ when the intervention treatment distribution depends on the observed treatment process, provided the conditions of Theorem 1 hold.
Corollary 1.1.
Suppose the conditions of Theorem 1 hold. If* $h_{2} (O) = Y {\tilde{h}}_{2} (A, L)$ , where ${\tilde{h}}_{2} (A, L)$ is a known measurable function of $(A, L)$ , then an estimator of $ψ^{g}$ derived from an EIF of the form (6) is model doubly robust.

A proof of Corollary 1.1 is given in Supplemental Appendix C. A similar heuristic reasoning for Corollary 1.1 is that the estimator of the EIF of a mean outcome does not rely on any models, and doubly robust estimators exist for $ν_{2}$ because we have simply replaced $Y$ with $h_{2} (O)$ in equation (7) which does not depend on $θ$ . We now consider an application of Theorem 1 and Corollary 1.1 to our multiplicative shift incremental propensity score interventions. Additional examples are provided in Supplemental Appendix C.
Example 1.
Multiplicative shift incremental propensity score interventions for $J = 1$ . The intervention treatment distribution is given by $q^{g} (a ∣ l) = (1 - δ) a l^{} + f (a ∣ l) (l^{} δ + 1 - l^{})$ .

In this case, for a choice of $δ \in {0, 1}$ we have
$\begin{aligned} ψ^{g} (δ) & = E_{L} {\sum_{a = 0}^{1} E (Y ∣ a, L) q^{g} (a ∣ L)} \\ = E_{L} [\sum_{a = 0}^{1} E (Y ∣ a, L) {f (a ∣ L) (L^{} δ + 1 - L^{}) + L^{} a (1 - δ)}] \\ = E_{L, A} [E {Y (L^{} δ + 1 - L^{})} ∣ A, L] + E_{L} [E {Y L^{} (1 - δ) ∣ A = 1, L}] \end{aligned}$
Selecting $a^{} = 1$ , $c_{1} = 1$ , $c_{2} = (1 - δ)$ , $h_{1} (O) = Y (L^{} δ + 1 - L^{})$ , $h_{2} (O) = Y L^{}$ , we have
$\begin{aligned} ψ^{g} (δ) & = c_{1} E {h_{1} (O)} + c_{2} E [E {h_{2} (O) ∣ A = a, L}] \\ = \underset{ν_{1}}{\underset{⏟}{E {Y (L^{} δ + 1 - L^{})}}} + (1 - δ) \underset{ν_{2}}{\underset{⏟}{E [E {Y L^{} ∣ A = 1, L}]}} \end{aligned}$
by Theorem 1 and the EIF for $ψ^{g} (δ)$ is given by
$U_{ψ^{g} (δ)} (O) = Y (L^{} δ + 1 - L^{}) + (1 - δ) [\frac{L^{} A}{f (A ∣ L)} {Y - m (A, L)} + m (1, L) L^{}] - ψ^{g} (δ)$
This can be re-expressed as
$U_{ψ^{g} (δ)} (O) = \frac{q^{g} (A ∣ L)}{f (A ∣ L)} {Y - m (A, L)} + m (A, L) (L^{} δ + 1 - L^{}) + m (1, L) L^{*} (1 - δ) - ψ^{g} (δ)$
which is useful for deriving doubly robust estimators. By Corollary 1.1, the estimators based on the EIF for $ψ^{g} (δ)$ will be model doubly robust. As Kennedy¹⁶ noted, the EIF for $ψ^{g}$ indexed by an odds shift (1) is not model doubly robust and, therefore, does not meet the conditions of Corollary 1.1. In fact, it is not hard to see that both IPW estimators and ICE estimators depend explicitly on (1). Thus, the consistency of any estimator that combines IPW and ICE will necessarily always depend on the correct estimation of (1). Thus, intuitively model doubly robust estimators will not exist for the odds shift intervention.
6.2 Time-varying treatments

Recently, Molina³² showed that, in time-varying treatment settings, estimators derived from the EIF for a $ψ^{g}$ indexed by any intervention treatment distribution that does not depend on the observed treatment process^33,6,7 confer more protection against model misspecification than model double robustness. Rather, they showed that these estimators are $J + 1$ model multiply robust, which implies model double robustness. The following theorem gives a sufficient condition for the existence of $J + 1$ model multiple robust estimators of $ψ^{g}$ when the intervention treatment distribution may depend on the observed treatment process and a simple approach to deriving the EIFs for a particular class of such intervention treatment distributions.

Theorem 2.
Suppose an intervention treatment distribution can be written as the following:
$\begin{aligned} q^{g} (a_{j} ∣ 1, {\bar{l}}_{j}, {\bar{a}}_{j - 1}) = & c_{1} h_{1} ({\bar{l}}_{j}, {\bar{a}}_{j - 1}) I (a_{j} = a_{j}^{}) + c_{2} h_{2} ({\bar{l}}_{j}, {\bar{a}}_{j - 1}) f (a_{j} ∣ 1, {\bar{l}}_{j}, {\bar{a}}_{j - 1}) \\ + c_{3} h_{3} ({\bar{l}}_{j}, {\bar{a}}_{j - 1}) p^{} (a_{j} ∣ 1, {\bar{l}}_{j}, {\bar{a}}_{j - 1}) \end{aligned}$
(8)
where $a_{j}^{}$ , $c_{1}, c_{2}$ , and $c_{3}$ are constants; $h_{1} ({\bar{L}}_{j}, {\bar{A}}_{j - 1})$ , $h_{2} ({\bar{L}}_{j}, {\bar{A}}_{j - 1})$ , and $h_{3} ({\bar{L}}_{j}, {\bar{A}}_{j - 1})$ are known measurable functions of $({\bar{L}}_{j}, {\bar{A}}_{j - 1})$ ; and $p^{} (a_{j} ∣ 1, {\bar{l}}_{j}, {\bar{a}}_{j - 1})$ is a non-degenerate known probability distribution for $A_{j}$ . Then the EIF for $ψ^{g}$ indexed by this intervention treatment distribution is
$\begin{matrix} U_{ψ^{g}} (O) = \sum_{j = 1}^{J} (T_{j} - Q_{j - 1}) \prod_{k = 0}^{j - 1} \frac{q^{g} (A_{k} ∣ Y_{k} = 1, {\bar{L}}_{k}, {\bar{A}}_{k - 1})}{f (A_{k} ∣ Y_{k} = 1, {\bar{L}}_{k}, {\bar{A}}_{k - 1})} + T_{0} - ψ^{g} \end{matrix}$
(9)
where $Q_{j} \equiv Q_{j} ({\bar{L}}_{j}, {\bar{A}}_{j}, {\bar{Y}}_{j})$ and $T_{j} \equiv T_{j} ({\bar{L}}_{j}, {\bar{A}}_{j}, {\bar{Y}}_{j})$ are iteratively defined from $j = J - 1, \dots, 0$ such that for $T_{J} \equiv Y_{J}$ , we have $Q_{j} \equiv E (T_{j + 1} ∣ {\bar{L}}_{j}, {\bar{A}}_{j}, {\bar{Y}}_{j})$ and
$T_{j} = c_{1} Q_{j}^{A_{j} = a_{j}^{}} h_{1} ({\bar{L}}_{j}, {\bar{A}}_{j - 1}) + c_{2} Q_{j} h_{2} ({\bar{L}}_{j}, {\bar{A}}_{j - 1}) + c_{3} {\sum_{a_{j}} p^{} (a_{j} ∣ 1, {\bar{L}}_{j}, {\bar{A}}_{j - 1}) Q_{j}^{A_{j} = a_{j}}} h_{3} ({\bar{L}}_{j}, {\bar{A}}_{j - 1})$
with $Q_{j}^{A_{j} = a_{j}^{}} \equiv Q_{j} ({\bar{L}}_{j}, A_{j} = a_{j}^{}, {\bar{A}}_{j - 1}, {\bar{Y}}_{j})$ . Estimators based on this EIF are $J + 1$ model multiply robust in that they are consistent if models for $Q_{j}$ are correctly specified for $j = k, \dots, J - 1$ and the observed treatment models are correctly specified from $j = 0, \dots, k - 1$ (for $k = 0, \dots, J$ ), where $j = s, s - 1$ is ∅︀, $\forall s$ .

Theorem 2 makes the derivation of the EIF and the corresponding estimators far more straightforward and accessible when intervention distributions are in the form given by (8). In Supplemental Appendix D, we prove that expression (9) is the EIF under a nonparametric model that imposes no restriction on the observed data law for $ψ^{g}$ indexed by (8). In Supplemental Appendix E, we prove that estimators based on this EIF are $J + 1$ model multiply robust. Note that, by the monotonicity of the survival indicators, we have $Y_{j + 1} = Y_{j + 1} Y_{j}$ . This implies that $Q_{j} = Y_{j} Q_{j} = Y_{j} Q_{j} ({\bar{L}}_{j}, {\bar{A}}_{j}, Y_{j} = 1)$ , where $Q_{j} ({\bar{L}}_{j}, {\bar{A}}_{j}, Y_{j} = 1) = E (T_{j + 1} ∣ {\bar{L}}_{j}, {\bar{A}}_{j}, Y_{j} = 1)$ . We now apply Theorem 2 to our multiplicative shift incremental propensity score intervention.
Example 2.
Consider the multiplicative shift incremental propensity score interventions from Section 4, recalling the intervention distribution is $q^{g} (a_{j} ∣ 1, {\bar{l}}_{j}, {\bar{a}}_{j - 1}) = (1 - δ) l_{j}^{} a_{j} + (l_{j}^{} δ + 1 - l_{j}^{}) f (a_{j} ∣ 1, {\bar{l}}_{j}, {\bar{a}}_{j - 1})$ .

This intervention distribution can be written in the form of equation (8) by selecting $a_{j}^{} = 1$ , $c_{1} = 1 - δ, c_{2} = 1, h_{1} ({\bar{l}}_{j}, {\bar{a}}_{j - 1}) = l_{j}^{}, h_{2} ({\bar{l}}_{j}, {\bar{a}}_{j - 1}) = l_{j}^{} δ + 1 - l_{j}^{}, h_{3} ({\bar{l}}_{j}, {\bar{a}}_{j - 1}) = 0$ . By Theorem 2, the EIF for this intervention distribution is then given by
$\begin{aligned} U_{ψ^{g} (δ)} (O) = & (Y_{J} - Q_{J - 1}) \prod_{j = 0}^{J - 1} \frac{q^{g} (A_{j} ∣ Y_{j} = 1, {\bar{L}}_{j}, {\bar{A}}_{j - 1})}{f (A_{j} ∣ Y_{j} = 1, {\bar{L}}_{j}, {\bar{A}}_{j - 1})} \\ + \sum_{j = 1}^{J - 1} {\underset{T_{j}}{\underset{⏟}{(1 - δ) Q_{j}^{A_{j} = 1} L_{j}^{} + Q_{j} (L_{j}^{} δ + 1 - L_{j}^{})}} - Q_{j - 1}} \prod_{k = 0}^{j - 1} \frac{q^{g} (A_{k} ∣ Y_{k} = 1, {\bar{L}}_{k}, {\bar{A}}_{k - 1})}{f (A_{k} ∣ Y_{k} = 1, {\bar{L}}_{k}, {\bar{A}}_{k - 1})} \\ + \underset{T_{0}}{\underset{⏟}{(1 - δ) Q_{0}^{A_{0} = 1} L_{0}^{} + Q_{0} (L_{0}^{} δ + 1 - L_{0}^{})}} - ψ^{g} \end{aligned}$
(10)
It is also straightforward to see that any $g$ corresponding to a deterministic static treatment rule meets the conditions of Theorem 2 by selecting $h_{2} ({\bar{l}}_{j}, {\bar{a}}_{j - 1}) = h_{3} ({\bar{l}}_{j}, {\bar{a}}_{j - 1}) = 0$ , $h_{1} ({\bar{l}}_{j}, {\bar{a}}_{j - 1}) = 1$ and $c_{1} = 1$ . In Supplemental Appendix E, we further illustrate the application of Theorem 2 to deterministic dynamic treatment rules, as well as other examples of intervention distributions that depend on the observed treatment process for the time-varying case including representative interventions and dynamic treatment initiation strategies with a grace period. Note that, in these examples and the incremental propensity score intervention example above, Theorem 2 holds by selecting $h_{3} ({\bar{l}}_{j}, {\bar{a}}_{j - 1}) = 0$ such that $p^{} (a_{j} ∣ 1, {\bar{l}}_{j}, {\bar{a}}_{j - 1})$ need not be specified. More generally, the applicability of Theorem 2 may require specification of $p^{*} (a_{j} ∣ 1, {\bar{l}}_{j}, {\bar{a}}_{j - 1})$ . For example, this applies to an alternative grace period strategy where initiation within the grace period is assigned such that there is a uniform probability of initiating at each $j$ .⁴

Note that the EIF given in Diaz et al.¹⁹ cover a comprehensive class of interventions that also guarantees estimators with model double robustness, including interventions that depend on the natural value of treatment^14,15 also known as modified treatment policies. Following the results of Theorem 2 in Diaz et al.,¹⁹ the EIF of the implied modified treatment policy from our proposed intervention necessarily involve randomizer terms,^b but their derivation of the corresponding EIF assumes that the distributions of the randomizers are not known, when they certainly will be. Our Theorem 2 provides the EIF for the g-formula indexed by a class of stochastic interventions that may depend on the observed treatment process. It can be shown that nearly all functionals in this class are captured by the g-formula functionals for which Diaz et al.¹⁹ provides the EIF. Diaz et al.’s results would capture all of the functionals in this class, including those indexed by our proposed multiplicative incremental propensity score interventions, provided they projected their EIF onto a tangent space corresponding to smaller models whereby the distributions of some conceptualized randomizers are known. We do not take this approach and allow one to derive the EIF directly from a stochastic treatment distribution without requiring one to define an implied modified treatment policy first.^c This alternative derivation of the EIF may be more intuitive for treatment distributions that do not depend on the natural value of treatment.

Finally, in Supplemental Appendix C, we use a similar line of reasoning to Theorem 1 and Corollary 1.1 to derive the EIF for $ψ^{g}$ and to assess the existence of doubly robust estimators for $ψ^{g}$ indexed by an intervention distribution that depends on the observed treatment process when $J = 2$ with examples. However, this approach to deriving the EIF is cumbersome for large $J$ , providing no simplification over Theorem 2.

In the next section, we consider various estimators of $ψ^{g}$ under the multiplicative shift incremental propensity score interventions defined by (2).
7 Estimators of $ψ^{g}$ indexed by multiplicative shift incremental propensity score interventions

7.1 EIF-based estimators

Several EIF-based estimators for $ψ^{g}$ have been proposed for deterministic treatment interventions including the standard one-step augmented IPW (AIPW) estimator, Bang and Robins (2005)’s estimator,^6,34,33 weighted ICE estimator^35,12 and targeted maximum likelihood estimator (TMLE).^7,36,37Unlike the other estimators, the one-step augmented IPW estimator that solves the empirical EIF does not guarantee sample-boundedness. Weighted ICE and TMLE are variations of Bang and Robins.⁶ Compared with the one-step AIPW estimator and Bang and Robins,⁶ weighted ICE can give better performance.² Unlike Bang and Robins⁶ and weighted ICE, the one-step AIPW estimator and TMLE can incorporate any machine learning algorithms² for both sets of nuisance functions. In the absence of machine learning algorithms, weighted ICE and TMLE perform similarly,² but weighted ICE is easier to implement. In this section, we will consider two estimators: (1) weighted ICE estimator that uses parametric models to estimate the nuisance functions thereby allowing for $J + 1$ model multiple robustness and (2) TMLE that also uses sample-splitting and cross-fitting^30,38,10 to allow one to incorporate machine learning algorithms to estimate the nuisance functions.

7.1.1 Weighted ICE estimator

Let $π_{j} \equiv f (A_{j} ∣ Y_{j} = 1, {\bar{A}}_{j - 1}, {\bar{L}}_{j})$ and let $π_{j} (α_{j}) = f (A_{j} ∣ Y_{j} = 1, {\bar{A}}_{j - 1}, {\bar{L}}_{j}; α_{j})$ be a working parametric model for $π_{j}$ with $α = (α_{0}, \dots, α_{J - 1})$ . Denote estimates ${\hat{π}}_{j} \equiv π_{j} ({\hat{α}}_{j})$ of $π_{j}$ with ${\hat{α}}_{j}$ the maximum likelihood estimate (MLE) of $α_{j}$ computed from the observed data. Subsequently, let ${\hat{q}}_{j}^{g} \equiv q_{j}^{g} ({\hat{π}}_{j})$ be an estimate of $q^{g} (A_{j} ∣ Y_{j} = 1, {\bar{A}}_{j - 1}, {\bar{L}}_{j})$ as defined in (2) for a choice of $δ \in [0, 1]$ , replacing the observed treatment process with the estimate ${\hat{π}}_{j}$ . Let ${\hat{Q}}_{j}$ be a working parametric model for $Q_{j}$ defined in Theorem 2. In the following algorithm, each ${\hat{T}}_{j}$ is calculated by replacing $Q_{j}$ in formula (10) with the estimate ${\hat{Q}}_{j}$ . The weighted ICE algorithm is specifically implemented as follows:

Algorithm 1:
Algorithm for Weighted ICE.

Compute the MLEs $\hat{α}$ of $α$ from the observed data. Set ${\hat{T}}_{J} = Y_{J}$ .

Recursively from $j = J - 1, \dots, 0$ :
A. Fit a regression model $Q_{j} ({\bar{L}}_{j}, {\bar{A}}_{j}, Y_{j} = 1; θ_{j}) = expit {θ_{j}^{T} ϕ ({\bar{L}}_{j}, {\bar{A}}_{j})}$ for $E ({\hat{T}}_{j + 1} ∣ {\bar{L}}_{j}, {\bar{A}}_{j}, Y_{j} = 1)$ where the score function for each observation is weighted by $\prod_{k = 0}^{j} ({\hat{q}}_{k}^{g} / {\hat{π}}_{k})$ in those who survive by time $j$ . Here, $ϕ ({\bar{L}}_{j}, {\bar{A}}_{j})$ is a known function of ${\bar{L}}_{j}$ and ${\bar{A}}_{j}$ . More specifically, we solve for $θ_{j}$ in the following estimating equation: $P_{n} [Y_{j} \prod_{k = 0}^{j} \frac{{\hat{q}}^{g}}{{\hat{π}}_{k}} ϕ_{j} ({\bar{L}}_{j}, {\bar{A}}_{j}) {{\hat{T}}_{j + 1} - Q_{j} ({\bar{L}}_{j}, {\bar{A}}_{j}, Y_{j} = 1; θ_{j})}] = 0$

B. Compute ${\hat{T}}_{j}$ from ${\hat{Q}}_{j} \equiv Q_{j} ({\bar{L}}_{j}, {\bar{A}}_{j}, {\bar{Y}}_{j}; {\hat{θ}}_{j})$ ensuring ${\hat{T}}_{j} = 0$ when $Y_{j} = 0$ .

Estimate ${\hat{ψ}}^{g} (δ)_{WICE} = P_{n} ({\hat{T}}_{0})$

where $P_{n} {f (X)} = n^{- 1} \sum_{i = 1}^{n} f (X_{i})$ . Following arguments in Section 6.2, this estimator is $J + 1$ model multiply robust. Note that we fit a weighted generalized linear model on ${\hat{T}}_{j + 1}$ by specifying a quasibinomial family with a logit link function (Papke & Wooldridge,39 ) and weights given by $\prod_{k = 0}^{j} ({\hat{q}}_{k}^{g} / {\hat{π}}_{k})$ in Step 2A. This type of regression is known as fractional logistic regression.

7.1.2 TMLE with sample-splitting and cross-fitting

This algorithm utilizes sample-splitting and cross-fitting to allow flexible machine learning algorithms for estimating nuisance functions while circumventing Donsker class conditions.^40,30 In Supplemental Appendix F, we prove the asymptotic normality of this estimator under the condition that the nuisance functions are estimated consistently at rates faster than $n^{- 1 / 4}$ when $ψ^{g}$ is indexed by the interventions (2).

Suppose that a sample of size $n$ is split into $M$ disjoint subsets. Let $S_{m}$ denote the subset of individuals in split $m = 1, \dots, M$ and let $S_{- m}$ denote individuals not in split $m$ (i.e. $S_{- m} = {i \notin S_{m}}$ ). Moreover, let ${\hat{π}}_{j}^{(- m)}$ , ${\hat{q}}_{j}^{(- m)}$ and ${\hat{Q}}_{j}^{(- m)}$ denote estimates of $π_{j}, q_{j}^{g}$ and $Q_{j}$ obtained from machine learning algorithms to individuals in $S_{(- m)}$ .

Algorithm 2:
Algorithm for TMLE with sample-splitting and cross-fitting.

For each $m = 1, \dots, M$ :
A. For individuals in $S_{- m}$ : compute ${\hat{π}}_{j}^{(- m)}$ , $\forall j$ . Set ${\hat{T}}_{J} = Y_{J}$ .

B. Recursively from $j = J - 1, \dots, 0$ for individuals in $S_{- m}$ :
Compute ${\hat{Q}}_{j}^{(- m)} ({\bar{L}}_{j}, {\bar{A}}_{j}, Y_{j} = 1)$ by regressing ${\hat{T}}_{j + 1}$ on $({\bar{L}}_{j}, {\bar{A}}_{j})$ in those alive at time $j$

Compute ${\hat{T}}_{j}$ from ${\hat{Q}}_{j}^{(- m)} \equiv {\hat{Q}}_{j}^{(- m)} ({\bar{L}}_{j}, {\bar{A}}_{j}, {\bar{Y}}_{j})$ by formula (10), setting ${\hat{T}}_{j} = 0$ if $Y_{j} = 0$

C. For individuals in $S_{m}$ , set ${\hat{T}}_{J} = Y_{J}$ . Then recursively from $j = J - 1, \dots, 0$ :
Solve for $γ_{j}$ in the following set of estimating equations: $P_{n}^{m} (Y_{j} \prod_{k = 0}^{j} \frac{{\hat{q}}_{k}^{g^{(- m)}}}{{\hat{π}}_{k}^{(- m)}} [{\hat{T}}_{j + 1} - expit {logit ({\hat{Q}}_{j}^{(- m)} ({\bar{L}}_{j}, {\bar{A}}_{j}, Y_{j} = 1)) + γ_{j}}]) = 0$

Compute ${\hat{T}}_{j}$ from ${\hat{Q}}_{j}^{Δ} ({\bar{L}}_{j}, {\bar{A}}_{j}, Y_{j} = 1) \equiv expit {logit ({\hat{Q}}_{j}^{(- m)} ({\bar{L}}_{j}, {\bar{A}}_{j}, Y_{j} = 1)) + {\hat{γ}}_{j}}$ if $Y_{j} = 1$ , otherwise set ${\hat{T}}_{j} = 0$ if $Y_{j} = 0$

Calculate ${\hat{ψ}}^{g} (δ)_{TMLE} = \frac{1}{M} \sum_{m = 1}^{M} P_{n}^{m} ({\hat{T}}_{0})$

Here $P_{n}^{m} {f (X)} = \frac{1}{∣ S_{m} ∣} \sum_{i \in S_{m}} f (X_{i})$ where $∣ S_{m} ∣ = n / M$ is the cardinality of $S_{m}$ . Note that in Step 1C(a), we fit a weighted generalized linear model on ${\hat{T}}_{j + 1}$ with weights given by $\prod_{k = 0}^{j} [{\hat{q}}_{k}^{g^{(- m)}} / {\hat{π}}_{k}^{(- m)}]$ and an offset given by $logit ({\hat{Q}}_{j}^{(- m)} ({\bar{L}}_{j}, {\bar{A}}_{j}, Y_{j} = 1))$ among survivors in $S_{m}$ at time $j$ .

7.2 Singly robust estimators

We also consider less optimal but computationally simple singly robust estimators of $ψ^{g}$ indexed by (2). An IPW estimator ${\hat{ψ^{g}}}_{IPW} (δ)$ can be obtained by the product ${\hat{ψ}}_{IPW}^{g} (δ) = \prod_{j = 0}^{J - 1} {\hat{Υ}}_{IPW, j}^{g} (δ)$ , where ${\hat{Υ}}_{IPW, j}^{g} (δ)$ can be interpreted as an estimate of the discrete hazard at $j$ under a stochastic strategy $g$ where treatment assignment is a draw from (2) given the identifying conditions of Section 5. Each ${\hat{Υ}}_{IPW, j}^{g} (δ)$ can be obtained by solving for $Υ_{IPW, j}^{g} (δ)$ in the following estimating equations:

P_{n} [Y_{j} \prod_{k = 0}^{j} \frac{{\hat{q}}_{k}^{g}}{{\hat{π}}_{k}} {Y_{j + 1} - Υ_{IPW, j}^{g} (δ)}] = 0

which can be estimated using weighted generalized linear model on

Y_{j + 1}

with weights given by

\prod_{k = 0}^{j} ({\hat{q}}_{k}^{g} / {\hat{π}}_{k})

. Interestingly, it can also be shown that

{\hat{ψ^{g}}}_{IPW} (δ)

can be obtained as a special case of the algorithm for weighted ICE above where

ϕ_{j} ({\bar{L}}_{j}, {\bar{A}}_{j}) = 1

for all

j

.⁴¹ Alternatively, the singly robust ICE estimator, which we will denote

{\hat{ψ}}_{ICE}^{g} (δ)

, can be obtained as a special case of the algorithm for weighted ICE above where the observational weights are set to 1.

7.3 Censoring

Straightforward extensions of the identification arguments in Section 5 in studies with censoring follow by implicitly including in $g$ a hypothetical intervention that eliminates censoring throughout follow-up⁴² with straightforward extensions of the g-formula $ψ^{g}$ , properties of its EIF and associated estimation procedures. Briefly, denote $C_{j}$ as the indicator of censoring by time $j$ and adopt the order $(L_{j}, A_{j}, C_{j + 1}, Y_{j + 1})$ . Extensions to accommodate censoring for singly robust weighted estimators and the various EIF-based estimators considered, require, in addition to estimating $α_{j}$ in $f (A_{j} ∣ Y_{j} = 1, {\bar{A}}_{j - 1}, {\bar{L}}_{j}, C_{j} = 0; α_{j})$ , also estimating $α_{j}^{c}$ in $P (C_{j + 1} = 1 ∣ C_{j} = 0, {\bar{A}}_{j}, {\bar{L}}_{j}, Y_{j} = 1; α_{j}^{c})$ for $j = 0, \dots, J - 1$ with $α^{c} = (α_{1}^{c}, \dots, α_{J}^{c})$ . Further details of modifications to the weighted ICE and TMLE to accommodate censoring are provided in Supplemental Appendix G.

8 Simulation studies

We conducted two different simulation studies. The first simulation study aims to compare the performance of the weighted ICE, IPW, and ICE estimators when the nuisance functions are estimated through parametric models under various model misspecification scenarios. The second simulation study aims to compare the performance of TMLE with sample-splitting and cross-fitting, IPW and ICE when the nuisance functions are estimated through machine learning algorithms.

8.1 Simulation study 1: Using parametric models

In this simulation study, we compare the performance of the weighted ICE estimator with the singly robust estimators (IPW and ICE estimators) for $ψ^{g}$ indexed by the intervention distribution (2) which, under identifying conditions discussed in Section 5, equals the cumulative probability of survival at $J$ under an intervention that increases the probability of treatment initiation in those with $L_{j}^{*} = 1$ as a function $δ$ . Recall that this increase is defined such that decreasing values of $δ$ correspond to an increasing probability of treatment initiation (with $δ = 1$ coinciding with no treatment intervention).

We simulated 1000 samples of $n = (500, 1000, 2500)$ individuals selecting $J = 5$ and $δ = (0.75, 0.50, 0.25)$ . We simulated the following variables: $(L_{0}, A_{0}, C_{1}, Y_{1}, L_{1}, A_{1}, \dots, C_{5}, Y_{5})$ , where $L_{j} = (L_{j}^{*}, L_{1 j}, L_{2 j})$ is the vector of measured confounders. Specifically, we generated $L_{0}^{*}$ and $L_{10} \sim Ber {expit (- 1)}$ , and $L_{20} \sim Ber {expit (1 + L_{0}^{*})}$ . The censoring indicator at each time $j$ ( $j = 1, \dots, 5$ ) was simulated from $C_{j} \sim Ber {expit (- 2 + L_{1 j} - L_{2 j})}$ if $C_{j - 1} = 0$ and $Y_{j} = 1$ . The outcome at each time $j$ ( $j = 1, \dots, 5$ ) is simulated from $Y_{j} \sim Ber {expit (1 + 3 A_{j - 1} - 2 L_{j - 1}^{*} + L_{1, j - 1} - L_{2, j - 1})}$ if $Y_{j - 1} = 1$ and $C_{j} = 0$ . The time-varying confounders at time $j$ ( $j = 1, \dots, 4$ ) are simulated from $L_{j}^{*} \sim Ber {expit (- 1 - A_{j - 1} + L_{j - 1} - L_{1, j - 1} + L_{2, j - 1})}$ , $L_{1 j} \sim Ber {expit (- 1 + A_{j - 1} + L_{1, j - 1} - L_{2, j - 1})}$ and $L_{2 j} \sim Ber {expit (1 + A_{j - 1} + L_{j}^{*} + L_{2, j - 1})}$ if $Y_{j} = 1$ . Treatment at time $j$ ( $j = 0, \dots, 4$ ) is simulated from $A_{j} \sim Ber {expit (- 1 - 2 L_{j}^{*} - L_{1 j} + L_{2 j} + 2 A_{j - 1})}$ if $Y_{j} = 1$ . In addition $(Y_{j}, L_{j}, A_{j}, \dots) = (\emptyset, \emptyset, \emptyset, \dots)$ if $C_{j} = 1$ , and $(L_{j}, A_{j}, C_{j + 1}, \dots) = (\emptyset, \emptyset, \emptyset, \dots)$ if $Y_{j} = 0$ .

The true cumulative probabilities of survival were calculated by using the true parametric models to generate a Monte Carlo sample of size $10^{7}$ under all interventions of interest. Our selection of parameters resulted in a scenario where selecting smaller $δ$ (i.e. interventions with larger increases in the probability of treatment initiation at each $j$ ) improves survival.

We considered three estimation scenarios for each choice of $δ$ and sample size such that (1) all models are correctly specified, (2) only the outcome regression models are correctly specified, and (3) only the treatment (propensity score) and censoring models are correctly specified. The true functional forms of the treatment and censoring models are known under our simulation because treatment and censoring were generated to only depend on past measured variables. Similarly, the functional form of the outcome regression model for $Q_{J - 1} = E (Y_{J} ∣ Y_{J - 1} = 1, C_{J} = 0, {\bar{A}}_{J - 1}, {\bar{L}}_{J - 1})$ is known due to the absence of unmeasured common causes. However, the true functional forms of the outcome regressions $Q_{j}$ for $0 \leq j < J - 1$ are not known under our simulation. To ensure correctly specified models for $Q_{j}$ , $0 \leq j < J - 1$ , saturated models were fit, that is, all main terms and interaction terms for $(A_{j}, L_{j}^{*}, L_{1 j}, L_{2 j})$ . In scenarios with misspecified models, at each time $j$ , the misspecified treatment model ignores the censoring process and excludes $A_{j - 1}$ in the model, and the misspecified outcome regression model excludes any pairwise interactions between the covariates and treatment.

Figure 1 compares performance of the three estimators of $ψ^{g}$ indexed by (2) for $δ = (0.75, 0.50, 0.25)$ . Complementary results are given in Tables 3 to 5 in Supplemental Appendix H. As expected, all estimators were nearly unbiased under correctly specified models. Under our model misspecification scenarios, ${\hat{ψ}}^{g} (δ)_{WICE}$ is nearly unbiased, but the IPW estimator is biased when the treatment models are misspecified, and the ICE estimator is biased when the outcome models are misspecified. In addition, under correctly specified models ${\hat{ψ}}^{g} (δ)_{ICE}$ is the most efficient, and ${\hat{ψ}}^{g} (δ)_{IPW}$ is the least efficient estimator. Interestingly, the simulation results show that ${\hat{ψ}}^{g} (δ)_{WICE}$ has smaller mean squared error (MSE) than the IPW estimator in all scenarios.

Figure 1

Results for the simulation study 1.

The simulation results also show that as $δ$ decreases, the standard error (and MSE) in all three estimators increases. This is due to an increase in the effect of near positivity violations as $δ$ nears zero. In fact, we would expect all three estimators to have the largest standard errors when $δ = 0$ , which is equivalent to a strategy that treats all individuals with $L_{j}^{*} = 1$ at all times. In Supplemental Appendix H, we also show $J + 1$ model robustness of our weighted ICE estimator in a model misspecification scenario that requires more than model double robustness.

8.2 Simulation study 2: Using machine learning methods

In the second simulation, we compare the performance of algorithms that use machine learning to estimate the nuisance functions for $ψ^{g}$ indexed by (2) with $J = 5$ . Specifically, we compare TMLE with sample-splitting and cross-fitting, IPW and ICE. Given much longer computation times, we limited consideration to one choice of $δ = 0.5$ . Unlike Simulation 1, we add model complexity to the data-generating mechanism by considering continuous covariates, which might mimic real-life data more closely. We simulated 1000 hypothetical cohorts of $n = (250, 500, 1000)$ comprising the following variables: $(L_{0}, A_{0}, C_{1}, Y_{1}, L_{1}, A_{1}, \dots, C_{5}, Y_{5})$ , where $L_{j} = (L_{1 j}, L_{j}^{*})$ . In addition, $L_{0} = (L_{0}^{1}, L_{0}^{2}, L_{10}, L_{0}^{*})$ , where $L_{0}^{1}$ and $L_{0}^{2}$ are baseline covariates. In particular, $L_{0}^{1} \sim Ber (0.5)$ , $L_{0}^{2} \sim N (0, 1)$ , $L_{10} \sim N (2 + L_{0}^{1}, 1)$ and $L_{0}^{*} \sim Ber {expit (1.5 - 0.5 L_{0} + L_{0}^{1} + 0.25 L_{0}^{2})}$ . For $j \geq 1$ , $L_{1 j} \sim N (2 + A_{j - 1} - L_{j - 1}^{*} + 0.5 L_{1, j - 1} + L_{0}^{1}, 1)$ and $L_{j}^{*} \sim Ber {expit (1.5 - A_{j - 1} - 0.5 L_{1 j} + L_{j - 1}^{*} + L_{0}^{1} + 0.25 L_{0}^{2})}$ if $Y_{j} = 1$ . Censoring indicator at each time $j$ ( $j = 1, \dots, 5$ ) is simulated from $C_{j} \sim Ber [expit {- 4 - A_{j - 1} - L_{j - 1}^{*} - 0.5 \sqrt{∣ L_{1, j - 1} L_{0}^{2} ∣} + 1.5 ∣ L_{1, j - 1} ∣ / (1 + \exp (L_{0}^{2}))}]$ if $C_{j - 1} = 0$ and $Y_{j} = 1$ . The outcome at each time $j$ ( $j = 1, \dots, 5$ ) is simulated from $Y_{j} \sim Ber [expit {- 1 + 2 A_{j - 1} - 2 L_{j - 1}^{*} + 0.25 L_{j - 1}^{*} L_{1, j - 1} + 0.5 L_{0}^{1} + 0.75 ∣ L_{1, j - 1} + L_{0}^{2} ∣^{1.5}}]$ if $Y_{j - 1} = 1$ and $C_{j} = 0$ . Treatment at time $j$ ( $j = 0, \dots, 4$ ) is simulated from $A_{j} \sim Ber {expit (- 3 + L_{j}^{*} - 0.5 L_{1 j} + 0.25 L_{j}^{*} L_{j} + 0.5 L_{0}^{1} + 0.25 L_{0}^{2} + 0.5 ∣ L_{0}^{2} ∣)}$ if $Y_{j} = 1$ and $A_{j - 1} = 0$ , and is set to $1$ if $Y_{j} = 1$ and $A_{j - 1} = 1$ .

Nuisance functions were estimated using the Super Learner ensemble, which uses cross-validation to select the best convex combination of predictions from a pool of prediction algorithms.⁴³ The library of potential candidates used here consisted of generalized linear models and their variants (SL.glm, SL.glm.interaction), Bayesian generalized linear models (SL.bayesglm), generalized additive models with smoothing splines (SL.gam), multivariate adaptive regression Splines (SL.earth), neural networks (SL.nnet), and random forest (SL.ranger).

Table 1 compares the performance of the three estimators. The ICE and IPW estimators show bias as they are not expected to converge at $\sqrt{n}$ rates when machine learning is used for nuisance parameter estimation. TMLE, on the other hand, shows little to no bias in all instances. This agrees with the theory as TMLE allows the nuisance functions to converge at slower nonparametric rates. The results show that even though some of the learners in the Super Learner ensemble (e.g. neural networks and random forest) may not converge at the required $n^{- 1 / 4}$ rate, other learners in the Super Learner ensemble converged to the truth at sufficiently fast rates. Moreover, the estimated coverage probability of the confidence intervals (CIs) for TMLE based on the asymptotic variance (see Supplemental Appendix F) is very close to the nominal 95%: $(94.7, 96.2, 95.2, 94.4)$ for $n = (250, 500, 1000, 2500)$ , respectively.

Table 1
Simulation study 2 for proposed treatment intervention distribution and incorporating machine learning algorithms ( $M = 2$ ). The true probability of survival at time 5 is $0.629$ . All values are multiplied by 100.

$n = 250$ $n = 500$ $n = 1000$ $n = 2500$

Estimator BIAS SE RMSE BIAS SE RMSE BIAS SE RMSE BIAS SE RMSE

${\hat{ψ}}^{g} (δ)_{ICE}$ $-$ 1.50 4.35 4.61 $-$ 0.82 2.83 2.95 $-$ 0.47 2.14 2.19 $-$ 0.16 1.38 1.39

${\hat{ψ}}^{g} (δ)_{IPW}$ $-$ 1.50 4.91 5.13 $-$ 1.50 3.32 3.64 $-$ 1.35 2.60 2.93 $-$ 1.10 1.71 2.03

${\hat{ψ}}^{g} (δ)_{TMLE}$ $-$ 0.19 5.79 5.79 $-$ 0.09 3.61 3.62 $-$ 0.07 2.59 2.59 0.03 1.65 1.65

	$n = 250$	$n = 500$	$n = 1000$	$n = 2500$
${\hat{ψ}}^{g} (δ)_{ICE}$	$-$ 1.50	4.35	4.61	$-$ 0.82	2.83	2.95	$-$ 0.47	2.14	2.19	$-$ 0.16	1.38	1.39
${\hat{ψ}}^{g} (δ)_{IPW}$	$-$ 1.50	4.91	5.13	$-$ 1.50	3.32	3.64	$-$ 1.35	2.60	2.93	$-$ 1.10	1.71	2.03
${\hat{ψ}}^{g} (δ)_{TMLE}$	$-$ 0.19	5.79	5.79	$-$ 0.09	3.61	3.62	$-$ 0.07	2.59	2.59	0.03	1.65	1.65

SE: standard error; RMSE: root mean square error; ICE: iterated conditional expectation; IPW, inverse probability weighted; TMLE, targeted maximum likelihood estimator.

9 Application

We illustrate the application of the estimators discussed in Section 7 using electronic health record data from the Cambridge Health Alliance—a large community healthcare system in Eastern Massachusetts—to estimate the effects of increasing PrEP uptake on bacterial STI diagnosis by time $J$ . In the analysis, baseline covariates $L_{0}$ included age and calendar year at baseline, race/ethnicity, and time-varying covariates $L_{j}$ included indicator of any ambulatory encounter, indicator of HIV, indicator of any HIV testing, and indicator of any STI testing. $A_{j}$ is the indicator of PrEP initiation during time $j$ , and $Y_{j}$ is the indicator of not receiving an STI diagnosis by time $j$ .

Specifically, we consider multiplicative shift interventions that, beginning at the time of an HIV-negative test, successfully increase the proportion initiating PrEP in each follow-up week $j$ only among those receiving an STI test and no prior diagnosis of HIV at time $j$ . Thus, $L_{j}^{*}$ is the indicator of receiving an STI test and having no prior diagnosis of HIV at time $j$ . An individual with $L_{j}^{*} = 1$ (being tested for STIs and being HIV-free at time $j$ ) suggests recent condomless sex and that PrEP would not be used after an HIV diagnosis. No intervention is made for the remainder of the population at time $j$ ( $L_{j}^{*} = 0$ ). Increases in treatment uptake under these interventions are quantified by a specified $δ \in [0, 1]$ as defined in (7.), which quantifies the factor by which the probability of treatment non-initiation is decreased (relative to no intervention) at $j$ . We consider $J = 26$ weeks and, as in simulation study 1, consider $δ = (0.95, 0.85, 0.75)$ representing realistic interventions that result in “low,” “medium,” and “high” success in PrEP uptake relative to no intervention. We use $δ = 1$ (corresponding to no intervention) as the reference in defining causal effects.

Our analytic dataset was restricted to patients who met all the following inclusion criteria at some point during 2012 to 2017: (1) Cis male with a report of the male gender of sex partner(s); (2) 15 years of age or older; (3) an HIV-negative test; (4) had no PrEP prescription in the 3 months prior to baseline; and (5) had no STI diagnosis in the 12 months prior to baseline. The baseline (week $j = 0$ ) for an individual was defined as the first week that all of these inclusion criteria are met. For simplicity, we excluded one individual who met these criteria but died without a bacterial STI diagnosis during the 26-week follow-up period. Our final analytic dataset consisted of $n = 1103$ individuals. As expected, few initiated PrEP over the follow-up (cumulatively 5.1% over the 26 weeks). The cumulative proportion of those receiving an STI test while being HIV-free over the 26 weeks was 70.7%. Note that no individual was treated as censored in this analysis, requiring additional assumptions that medical care was not sought outside of the Cambridge Health Alliance by any individual included at baseline over the 26-week follow-up.

We used the Super Learner ensemble (with the same potential candidates as in the simulation) to estimate all nuisance functions for TMLE with $M = 5$ . We compared these results with the IPW, ICE, and weighted ICE estimators described in this paper where the nuisance functions are specified by parametric models. CIs for each of the methods are obtained from 1000 bootstrap samples by taking the 2.5th and 97.5th percentiles of the resulting estimates.

Our estimate of the probability of not receiving an STI diagnosis under no intervention by 26-week follow-up ( $δ = 1$ ) was 93.7%. Table 2 shows results from the four methods for $δ < 1$ . In this case, point estimates from all of the methods are similar. The results do not provide sufficient evidence that increasing PrEP uptake increases the risk of STI diagnosis. For instance, compared with no intervention ( $δ = 1$ ), the relative survival estimates under low, medium, and high increases in PrEP uptake were 0.99 ( $95 % CI = (0.96, 1.01)$ ), 0.97 ( $95 % CI = (0.91, 1.01)$ ) and 0.96 ( $95 % CI = (0.87, 1.02)$ ), respectively, under ${\hat{ψ}}^{g} (δ)_{TMLE}$ . The relative survival estimates using other estimators were very similar (see Supplemental Appendix I). We also note that due to an increase in the presence of near positivity violations as $δ$ nears zero, observational weights calculated under smaller $δ$ were more variable than larger $δ$ (see Supplemental Appendix I). We would expect standard errors from all of the estimators to be the largest for $δ = 0$ .

Table 2
Point estimates and 95% CIs from analysis of MSM from the Cambridge Health Alliance on the effect of incremental PrEP initiation on incident STI diagnosis. All values are multiplied by 100.

${\hat{ψ}}^{g} (δ)_{TMLE}$ (with ML) ${\hat{ψ}}^{g} (δ)_{WICE}$ ${\hat{ψ}}^{g} (δ)_{ICE}$ ${\hat{ψ}}^{g} (δ)_{IPW}$

$↑$ in PrEP Est. 95% CI Est. 95% CI Est. 95% CI Est. 95% CI

Low 92.9 (89.3, 95.2) 93.0 (90.9, 94.9) 92.9 (91.1, 94.6) 92.9 (90.5, 94.9)

Medium 91.4 (84.8, 95.5) 91.6 (87.1, 95,1) 91.4 (88.5, 94.1) 91.3 (85.8, 95.0)

High 90.9 (80.9, 95.8) 90.3 (83.2, 95.4) 90.0 (85.8, 93.8) 89.9 (80.9, 95.3)

	${\hat{ψ}}^{g} (δ)_{TMLE}$ (with ML)	${\hat{ψ}}^{g} (δ)_{WICE}$	${\hat{ψ}}^{g} (δ)_{ICE}$	${\hat{ψ}}^{g} (δ)_{IPW}$
Low	92.9	(89.3, 95.2)	93.0	(90.9, 94.9)	92.9	(91.1, 94.6)	92.9	(90.5, 94.9)
Medium	91.4	(84.8, 95.5)	91.6	(87.1, 95,1)	91.4	(88.5, 94.1)	91.3	(85.8, 95.0)
High	90.9	(80.9, 95.8)	90.3	(83.2, 95.4)	90.0	(85.8, 93.8)	89.9	(80.9, 95.3)

CI: confidence interval; MSM: men who have sex with men; PrEP: pre-exposure prophylaxis; STI: sexually transmitted infection; ML: maximum likehood; TMLE: targeted ML estimator; ICE: iterated conditional expectation; WICE: weighted ICE; IPW, inverse probability weighted.

10 Discussion

Many methods have been proposed for estimating causal estimands in time-varying treatment settings for survival analysis, and among these methods are estimators that offer protection against model misspecification and can also attain the semiparametric efficiency bound. However, most of these doubly robust estimators have been in the setting of deterministic treatment interventions. In this paper, we provided some sufficient conditions for the existence of doubly robust estimators when a treatment intervention distribution can depend on the observed treatment process for point treatment processes. We also discussed a class of intervention distributions that are always guaranteed to give doubly/multiply robust estimators and gave a general form of the EIFs that are associated with these intervention distributions. Among these intervention distributions is our multiplicative shift incremental propensity score intervention distribution, which aims to increase treatment uptake in a group of individuals who are at high risk of the outcome but have low exposure to treatment. We provided various estimators that can be used for our proposed treatment intervention for both parametric and machine learning algorithms.

We conducted two simulation studies for our proposed multiplicative shift intervention distribution. Our first study shows that even in finite sample settings, the weighted ICE is more robust to model misspecification than IPW and ICE when the nuisance functions are estimated using parametric models. Our second study shows that TMLE with sample-splitting and cross-fitting can outperform singly robust estimators when machine learning algorithms are used. Indeed, the TMLE with sample-splitting and cross-fitting is consistent as long as the nuisance functions are estimated consistently at fast enough rates using machine learning methods, which may not necessarily be $n^{- 1 / 2}$ . We also illustrated an application of our estimators to a real-world dataset in the PrEP context.

Note that our proposed intervention treatment distribution (2) is guaranteed under a stochastic intervention such that treatment initiation status at time $j$ for an individual is a random draw from (2). The identifying conditions reviewed in Section 5 are sufficient to give our effect estimates of this interpretation. A more policy-relevant interpretation might, for example, be an intervention where individuals with $L_{j}^{*} = 1$ are always offered PrEP counseling. In these individuals, the intervention distribution (2) quantifies the hypothesized “success” of such an intervention where $g$ in this case really refers to a deterministic strategy relative to the unmeasured treatment “offered PrEP counseling.” Additional assumptions are needed to give our effect estimates of this interpretation following similar arguments to those given in Richardson and Robins²⁰ and Young et al.¹⁸

Finally, we note that while machine learning algorithms are more robust to model form misspecification, they are also computationally complex and may be practically infeasible for very large datasets without powerful computing systems. Even though the Super Learner ensemble performed well in the simulation study as it included learners that converged to the truth at fast enough rates, in real-world applications there is no guarantee that such learners exist. When more flexible learners such as neural networks or random forests are required, it is unclear if these machine learning methods will exhibit more or less bias compared with parametric models. Moreover, issues related to data privacy make access to advanced computational resources impossible in many cases. Therefore, estimators that offer model double/multiple robustness are useful in practice as they offer protection against model misspecification and can be easily computed using standard off-the-shelf regression software in R.

Supplemental Material

sj-pdf-1-smm-10.1177_09622802221146311 - Supplemental material for Intervention treatment distributions that depend on the observed treatment process and model double robustness in causal survival analysis

Supplemental material, sj-pdf-1-smm-10.1177_09622802221146311 for Intervention treatment distributions that depend on the observed treatment process and model double robustness in causal survival analysis by Lan Wen, Julia L. Marcus and Jessica G. Young in Statistical Methods in Medical Research

Footnotes

Acknowledgement

The author(s) thank Dr Gerard Coste at Cambridge Health Alliance for assistance with collection of clinical data.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by NIAID grant R21AI143386-01A1.

ORCID iD

Lan Wen

Supplemental material

Supplemental material for this article is available online.

Notes

References

Robins

. A new approach to causal inference in mortality studies with a sustained exposure period: Application to control of the healthy worker survivor effect. Math Model 1986; 7: 1393–1512.

Tran

Yiannoutsos

Wools-Kaloustian

, et al. Double robust efficient estimators of longitudinal treatment effects: Comparative performance in simulations and a case study. Int J Biostat 2019; 15.

Wen

Young

Robins

, et al. Parametric g-formula implementations for causal survival analyses. Biometrics 2021; 77.

Cain

Robins

Lanoy

, et al. When to start treatment? A systematic approach to the comparison of dynamic regimes using observational data. Int J Biostat 2010; 6: 1–24.

Neugebauer

Schmittdiel

Van Der Laan

. Targeted learning in real-world comparative effectiveness research with time-varying interventions. Stat Med 2014; 33: 2480–2520.

Bang

Robins

. Doubly robust estimation in missing data and causal inference models. Biometrics 2005; 61: 962–973.

Van Der Laan

Rose

. Targeted learning: Causal inference for observational and experimental data. New York: Springer Science & Business Media, 2011.

Robins

Tchetgen

, et al. Quadratic semiparametric von mises calculus. Metrika 2009; 69: 227–247.

Robins

Tchetgen

, et al. Asymptotic normality of quadratic estimators. Stoch Process Appl 2016; 126: 3733–3759.

10.

Chernozhukov

Chetverikov

Demirer

, et al. Double/debiased machine learning for treatment and structural parameters, 2018.

11.

Stitelman

Van Der Laan

. A general implementation of TMLE for longitudinal data applied to causal inference in survival analysis. Int J Biostat 2012; 8: 1–37.

12.

Rotnitzky

Robins

Babino

. On the multiply robust estimation of the mean of the g-functional. arXiv preprint arXiv:170508582 2017.

13.

Taubman

Robins

Mittleman

, et al. Intervening on risk factors for coronary heart disease: An application of the parametric g-formula. Int J Epidemiol 2009; 38: 1599–1611.

14.

Muñoz

Van Der Laan

. Population intervention causal effects based on stochastic interventions. Biometrics 2012; 68: 541–549.

15.

Haneuse

Rotnitzky

. Estimation of the effect of interventions that modify the received treatment. Stat Med 2013; 32: 5260–5277.

16.

Kennedy

. Nonparametric causal effects based on incremental propensity score interventions. J Am Stat Assoc 2019; 114: 645–656.

17.

Young

Logan

Robins

, et al. Inverse probability weighted estimation of risk under representative interventions in observational studies. J Am Stat Assoc 2019; 114: 938–947.

18.

Young

Hernán

Robins

. Identification, estimation and approximation of risk under interventions that depend on the natural value of treatment using observational data. Epidemiol Method 2014; 3: 1–19.

19.

Díaz

Williams

Hoffman

, et al. Non-parametric causal effects based on longitudinal modified treatment policies. arXiv preprint arXiv:200601366, 2020.

20.

Richardson

Robins

. Single world intervention graphs (swigs): A unification of the counterfactual and graphical approaches to causality. Center for the Statistics and the Social Sciences, University of Washington Series Working Paper, 2013; 128.

21.

Rosenbaum

Rubin

. The central role of the propensity score in observational studies for causal effects. Biometrika 1983; 70: 41–55.

22.

Grant

Lama

Anderson

, et al. Preexposure chemoprophylaxis for HIV prevention in men who have sex with men. N Engl J Med 2010; 363: 2587–2599.

23.

Molina

Capitant

Spire

, et al. On-demand preexposure prophylaxis in men at high risk for HIV-1 infection. N Engl J Med 2015; 373: 2237–2246.

24.

McCormack

Dunn

Desai

, et al. Pre-exposure prophylaxis to prevent the acquisition of HIV-1 infection (proud): Effectiveness results from the pilot phase of a pragmatic open-label randomised trial. Lancet 2016; 387: 53–60.

25.

Quinn

Christenson

Sawkin

, et al. The unanticipated benefits of prep for young black gay, bisexual, and other men who have sex with men. AIDS Behav 2020; 24: 1376–1388.

26.

Gamarel

Golub

. Intimacy motivations and pre-exposure prophylaxis (prep) adoption intentions among HIV-negative men who have sex with men (MSM) in romantic relationships. Ann Behav Med 2015; 49: 177–186.

27.

Marcus

Katz

Krakower

, et al. Risk compensation and clinical decision making—the case of HIV preexposure prophylaxis. N Engl J Med 2019; 380: 510.

28.

Jenness

Weiss

Goodreau

, et al. Incidence of gonorrhea and chlamydia following human immunodeficiency virus preexposure prophylaxis among men who have sex with men: A modeling study. Clin Infect Dis 2017; 65: 712–718.

29.

Newey

. The asymptotic variance of semiparametric estimators. Econ: J Econ Soc 1994; 62: 1349–1382.

30.

Van Der Vaart

. Asymptotic statistics, vol. 3. Cambridge: Cambridge University Press, 2000.

31.

Tsiatis

. Semiparametric theory and missing data. 1st ed. New York, NY: Springer, 2006.

32.

Molina

Rotnitzky

Sued

, et al. Multiple robustness in factorized likelihood models. Biometrika 2017; 104: 561–581.

33.

Robins

. Robust estimation in sequentially ignorable missing data and causal inference models. In: Proceedings of the American statistical association Section on Bayesian Statistical Science (pp. 6–10, Annual meeting of the American Statistical Association). The Association, 2000.

34.

Scharfstein

Rotnitzky

Robins

. Adjusting for nonignorable drop-out using semiparametric nonresponse models. J Am Stat Assoc 1999; 94: 1096–1120.

35.

Robins

Sued

Lei-Gomez

, et al. Comment: Performance of double-robust estimators when “inverse probability” weights are highly variable. Stat Sci 2007; 22: 544–559.

36.

Petersen

Schwab

Gruber

, et al. Targeted maximum likelihood estimation for dynamic and static longitudinal marginal structural working models. J Causal Inference 2014; 2: 147–185.

37.

Lendle

Schwab

Petersen

, et al. ltmle: An R package implementing targeted minimum loss-based estimation for longitudinal data. J Stat Softw 2017; 81: 1–21.

38.

Zheng

Van Der Laan

. Asymptotic theory for cross-validated targeted maximum likelihood estimation, 2010.

39.

Papke

Wooldridge

. Panel data methods for fractional response variables with an application to test pass rates. J Econom 2008; 145: 121–133.

40.

Van Der Vaart

Wellner

. Weak convergence. In: Martin Gilchrist (eds) Weak convergence and empirical processes. New York: Springer, 1996, pp. 16–28.

41.

Wen

Hernán

Robins

. Multiply robust estimators of causal effects for survival outcomes. Scand J Stat 2022; 49: 1304–1328.

42.

Young

Cain

Robins

, et al. Comparative effectiveness of dynamic treatment regimes: An application of the parametric g-formula. Stat Biosci 2011; 3: 119–143.

43.

Van Der laan

Polley

Hubbard

. Super learner. Stat Appl Genet Mol Biol 2007; 6: 1–21.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.45 MB

	$n = 250$			$n = 500$			$n = 1000$			$n = 2500$
Estimator	BIAS	SE	RMSE	BIAS	SE	RMSE	BIAS	SE	RMSE	BIAS	SE	RMSE
${\hat{ψ}}^{g} (δ)_{ICE}$	$-$ 1.50	4.35	4.61	$-$ 0.82	2.83	2.95	$-$ 0.47	2.14	2.19	$-$ 0.16	1.38	1.39
${\hat{ψ}}^{g} (δ)_{IPW}$	$-$ 1.50	4.91	5.13	$-$ 1.50	3.32	3.64	$-$ 1.35	2.60	2.93	$-$ 1.10	1.71	2.03
${\hat{ψ}}^{g} (δ)_{TMLE}$	$-$ 0.19	5.79	5.79	$-$ 0.09	3.61	3.62	$-$ 0.07	2.59	2.59	0.03	1.65	1.65

	${\hat{ψ}}^{g} (δ)_{TMLE}$ (with ML)		${\hat{ψ}}^{g} (δ)_{WICE}$		${\hat{ψ}}^{g} (δ)_{ICE}$		${\hat{ψ}}^{g} (δ)_{IPW}$
$↑$ in PrEP	Est.	95% CI	Est.	95% CI	Est.	95% CI	Est.	95% CI
Low	92.9	(89.3, 95.2)	93.0	(90.9, 94.9)	92.9	(91.1, 94.6)	92.9	(90.5, 94.9)
Medium	91.4	(84.8, 95.5)	91.6	(87.1, 95,1)	91.4	(88.5, 94.1)	91.3	(85.8, 95.0)
High	90.9	(80.9, 95.8)	90.3	(83.2, 95.4)	90.0	(85.8, 93.8)	89.9	(80.9, 95.3)