Sage Journals: Discover world-class research

Abstract

The sequential treatment decisions made by physicians to treat chronic diseases are formalized in the statistical literature as dynamic treatment regimes. To date, methods for dynamic treatment regimes have been developed under the assumption that observation times, that is, treatment and outcome monitoring times, are determined by study investigators. That assumption is often not satisfied in electronic health records data in which the outcome, the observation times, and the treatment mechanism are associated with patients’ characteristics. The treatment and observation processes can lead to spurious associations between the treatment of interest and the outcome to be optimized under the dynamic treatment regime if not adequately considered in the analysis. We address these associations by incorporating two inverse weights that are functions of a patient’s covariates into dynamic weighted ordinary least squares to develop optimal single stage dynamic treatment regimes, known as individualized treatment rules. We show empirically that our methodology yields consistent, multiply robust estimators. In a cohort of new users of antidepressant drugs from the United Kingdom’s Clinical Practice Research Datalink, the proposed method is used to develop an optimal treatment rule that chooses between two antidepressants to optimize a utility function related to the change in body mass index.

Keywords

One-stage dynamic treatment regime individualized treatment rule repeated measures covariate-driven observation times,confounding

1. Introduction

In recent years, significant effort has been put towards developing statistical methods that can leverage observational data to make valid causal inference about treatment (or exposure) effects (see e.g. Bang and Robins,¹ Stuart², Moodie et al.,³ and Schuler and Rose.⁴) Much of the literature on causal inference has focused on assessing marginal effects of treatments in the whole population, that is, how the outcomes in the study population would differ, on average, had we given everyone one treatment versus another. Though such marginal effects are often interesting from a policy standpoint, they are not always the most relevant in clinical practice, where patients hope to receive a treatment that is tailored to their unique characteristics. Individualized treatments (often falling under the umbrella term of precision medicine⁵) may be especially interesting in settings where it is known that the treatment effect for some individuals differs considerably from the marginal effect. In this work, we focus on the estimation of dynamic treatment regimes (DTRs), which formalize individualized, possibly sequential, treatment decisions taken as functions of patient’s characteristics. We limit the DTRs under consideration to those that optimize an expected outcome, the so-called ‘optimal’ DTRs.

Three common methods for developing optimal DTRs are g-estimation,⁶ Q-learning (see Laber et al.⁷ for a review), and dynamic weighted ordinary least squares (dWOLS).⁸ These methods are implemented in standard software (see e.g. Wallace et al.,^9,10 Simoneau et al.,¹¹ Tsiatis et al.,¹² Linn et al.,¹³ and McGrath et al.¹⁴). The latter method, dWOLS, optimizes the expected outcome by estimating treatment effects within strata of patients’ characteristics; these effects are sometimes termed effect modifications by patients’ characteristics. The dWOLS method provides intuitive estimators for optimal DTRs while combining the advantages of Q-learning and propensity-score based weighting.^15,16 It also achieves similar properties to the estimators derived from g-estimation but under a more familiar framework. In particular, under conditions stated in Section 2, the estimators derived using dWOLS are doubly robust in the sense that they are consistent if either the treatment model or the outcome model is correctly specified.

Electronic health records (EHRs) data are often used to develop DTRs.^11,17,18 These data are recorded irregularly across all patients, with the observation of patients’ outcome and treatment likely depending on their unique characteristics (such as symptoms, comorbidites, age, sex, etc.). That is, observation indicators, which represent whether or not a patient was observed at a given time, are associated with covariates that could be associated with the longitudinal outcome. In such situations, conditioning on observed data without making any adjustments for the treatment and observation processes may lead to DTRs with spurious associations caused by collider-stratification bias.¹⁹ The observational nature of studies that are based on EHR data also means that patients were not randomized to treatment but prescribed treatment based on their individual characteristics (a feature that is commonly called confounding when the same covariates affect the longitudinal outcome). Given the spurious associations mentioned above, using EHR data to assess treatment effects (or treatment effect modifications) must be done carefully. Drawing a causal diagram that represents the assumed data generating mechanism can help in determining whether these associations are problematic.²⁰

Though the statistical literature has extensively discussed confounding in the context of developing optimal DTRs, it has paid little attention to covariate-driven observation times. Robins et al.²¹ discussed the identification of causal effects when jointly modeling DTRs and monitoring (observation) schedules as well as some issues related to the extrapolation of optimal treatment and testing strategies. They introduced a no direct effect assumption for observation, which requires that observation decisions have no effect on patient characteristics (including outcomes) after conditioning on the treatment decision. Neugebauer et al.²² extended their work to settings with survival outcomes and differentiated five classes of counterfactual variables that may be of interest in such context. Kreif et al.²³ tackled another important issue related to the varying observation schedules in EHR data and the development of DTRs by proposing a way to analyze irregularly measured time-varying confounders.²³ Bayesian approaches have been proposed to estimate optimal two-stage strategies in settings with interval censoring²⁴ and to estimate causal effects via g-computation under irregular observation schedules.²⁵ Most recently, Yang²⁶ proposed a methodology to estimate the parameters of a continuous structural nested mean model when data are irregularly observed and the observation schedule may confound the treatment effects. Yang built on the work of Lok,²⁷ who had proposed estimating equations for the same parameters. An important development of Yang is the use of semiparametric theory of influence functions to construct an efficient estimator for continuous-time structural nested mean models. The method does not model explicitly the observation times and relies on a no unmeasured confounding assumption based on a martingale condition, which does not allow for mediators of the treatment–outcome relationship to drive observation times.

While few methods have been proposed in the literature on DTRs that simultaneously account for covariate-driven mechanisms for treatment and observation times, the issues mentioned above have been covered in the literature on the estimation of the marginal effect of covariates (e.g. treatment) on a longitudinal outcome (see, e.g. Goldstein et al.²⁸ who demonstrated the strength of bias due to the association of a risk factor with the outcome and the observation processes, and McCulloch et al.²⁹ who also discussed the estimation of such marginal parameters when the corresponding covariates are associated with random effects). To account for the observation process, authors have proposed the use of inverse intensity of visit (IIV) weights,^30–32 random or latent effects,^33–35 or fully parametric inference by specifying the full joint likelihood of the outcome and observation processes.³⁶ Within the causal inference framework, the bias due to the spurious associations mentioned above in the estimation of the marginal effect of a binary treatment on a longitudinal outcome using EHR data was demonstrated in Coulombe et al.³⁷ In that work, two semiparametric estimators were proposed for the causal marginal effect of a binary treatment on a continuous longitudinal outcome that accounted for the covariate-driven treatment and observation mechanisms.³⁷ Here, we extend one of these estimators to the case of DTRs.

In this work, we focus on a single-stage rule, known as an individualized treatment rule (ITR), which is a special case of a DTR. We consider repeated measurements of the treatment and outcome of each individual. Patients can, therefore, contribute multiple measurements in the estimation of the ITR, which we term a repeated measures ITR. The more general case of DTRs comprising multiple sequential rules corresponding to multiple treatment decisions is a topic of future work. We show here that by extending one of the estimators proposed by Coulombe et al.³⁷ to an ITR setting, we can consistently estimate the conditional effect of treatment within strata of patients’ variables rather than a single marginal effect. For that, we use dWOLS and a new weighting mechanism that incorporates independently the informative observation times and the treatment process under assumptions that are commonly postulated in the causal inference literature. To our knowledge, we propose the first estimator for optimal repeated measures ITR for binary treatment and continuous longitudinal outcome that applies to data subject to covariate-driven observation and treatment processes.

This article is divided as follows. We introduce the proposed methodology and the required assumptions in Section 2. We test the method through extensive simulation studies, the details and results of which we describe in Section 3. In Section 4, we apply the methodology to develop a repeated measures ITR that chooses between two commonly prescribed antidepressants, citalopram and fluoxetine, to maximize a utility function related to changes in body mass index (BMI). The optimal treatment rule is estimated using a cohort of patients with depression taken from the United Kingdom’s (UK) Clinical Practice Research Datalink (CPRD).³⁸ Finally, a discussion follows in Section 5.

2. Methods

2.1. Notation

We suppose that we have a random sample of $n$ individuals, taken from a larger population, that is indexed by $i = 1, \dots, n$ . We use the bold notation to refer to both vectors and matrices. For each individual in the population, we are interested in the estimation of a repeated measures ITR that optimizes a continuous outcome denoted by $Y_{i} (t)$ . By ITR, we mean a one-stage treatment rule that does not require optimization over several time points simultaneously but rather searches for the “cross-sectional” treatment rule that, at a given point, optimizes the outcome. By repeated measures, we mean that patients can contribute multiple observations in the estimation of the ITR (where each observation is a vector containing a treatment value, an outcome value, etc). We discuss in the next paragraphs the assumptions required about the treatment effect to ensure that such repeated measures ITR can be estimated consistently. The treatment rule is based on the estimated effect modification by some covariates (patients features) that we call tailoring variables and that we denote by $Q_{i} (t)$ . The treatment $A_{i} (t)$ is binary and takes values in ${0, 1}$ , and both the treatment and the continuous outcome $Y_{i} (t)$ can vary over time $t$ .

Assume that a larger $Y_{i} (t)$ is better, such that we aim for a treatment rule that maximizes $Y_{i} (t)$ . The outcome is assumed to be measured irregularly across patients, at individual-specific times $T_{i 1}, \dots, T_{i F_{i}}$ contained in $[0, τ]$ with $τ$ the maximum follow-up time across the study cohort. Patients are, therefore, allowed to have a different number of visits $F_{i}$ and varying gap times between their visits. The observation (or monitoring) indicators $d N_{i} (t)$ are equal to 1 if individual $i$ was observed at time $t$ , and 0 otherwise. These indicators can be seen as part of a counting process $N_{i} (t)$ that counts the number of observation times (visits) by time $t$ , with $N_{i} (t) = \int_{s = 0}^{t} d N_{i} (s) = \int_{s = 0}^{t} \sum_{j = 1}^{F_{i}} I (s = T_{i j})$ . While the outcome is assumed to be observed at those individual-specific times, the treatment and all covariates that will be used in nuisance models (observation and treatment models) are assumed to be measured continuously in time and the tailoring variables $Q_{i} (t)$ are assumed to be measured at least at the same times as the outcome, and possibly at other times too. That assumption is not unrealistic, especially for covariates related to prescription drugs, which are generally recorded in EHR automatically at the time of prescription. An example of such an observation setting is a study using EHR data in which the study treatment is defined using drug prescriptions, always available to the data analyst, in which the outcome is measured sporadically (e.g. a weight or a blood pressure outcome) and in which we are interested in effect modification by sex, our tailoring variable. We denote by $K_{i} (t)$ the set of potential confounders for the causal effect of $A_{i} (t)$ on $Y_{i} (t)$ and by $V_{i} (t)$ a second set of covariates that can affect or be associated with the observation indicator $d N_{i} (t)$ . The set $V_{i} (t)$ can contain confounders from the set $K_{i} (t)$ but also any other variables affecting the observation of the outcome $Y_{i} (t)$ . We broadly assume that covariates in $V_{i} (t)$ are related to the treatment $A_{i} (t)$ , the outcome $Y_{i} (t)$ , or both (possibly inducing biasing associations for the causal effect of interest). We denote by $Z_{i} (t)$ the variables in $V_{i} (t)$ that mediate the effect of treatment $A_{i} (t)$ on $Y_{i} (t)$ . Tailoring variables $Q_{i} (t)$ are allowed to contain confounders or pure predictors of the outcome, and they may share variables with $V_{i} (t)$ if these variables affect the observation times (although $Q_{i} (t)$ should not contain mediators of the treatment effect on the outcome, to avoid bias in the estimation of effect modifications). The assumed data-generating mechanism is depicted in Figure 1.

Figure 1.

Data generating mechanism considered in our simulation studies (the causal diagram is presented only for time $t$ and the individual index is removed). The set of covariates $V (t)$ corresponds to ${A (t), Z (t), K (t)}$ all affecting the observation indicator. Interactions are not depicted in the diagram.

We present in Supplemental Material A a summary of the notation introduced in the whole Methods Section and the abbreviations used in this manuscript (Supplemental Table 1) along with the potential overlaps between the sets of covariates $K$ , $V$ , $Z$ , and $Q$ in a Venn diagram (Supplemental Figure 1).

2.2. Assumptions

Some assumptions on the acuteness of the treatment effect are required for using repeated measures of ITR. Omitting the patient index for ease of notation, we broadly assume that $A (t)$ is the treatment associated with the outcome $Y (t)$ over what we call a treatment interval for the outcome $Y (t)$ . The treatment $A (t)$ may have been measured (observed to be prescribed) previous to time $t$ , and it is assumed that the corresponding outcome $Y (t)$ , if $d N (t) = 1$ , is observed at a time when treatment $A (t)$ had the time to be effective. It is, therefore, necessary to have a treatment with an effect that is acute enough (and not too much delayed) such that any observed outcome corresponding to a particular treatment interval is only affected by the treatment corresponding to that treatment interval. That is, there should be no treatment effects overlap across treatment intervals, nor delayed effects of treatment, nor treatment interactions across different treatment intervals. In studies where the gap times between observation times for the outcome are very large, treatment intervals may span longer periods of time, and the acuteness assumption may need to be relaxed. While relaxing the acuteness assumption is of interest, it is out of the scope of the current article, and, going forward, we assume that the acuteness assumption holds.

We use the potential outcome framework^39,40 and introduce two potential outcomes for inference, denoted by $Y_{i}^{1} (t)$ and $Y_{i}^{0} (t)$ , where $Y_{i}^{1} (t)$ represents the outcome we would observe for individual $i$ at time $t$ , had they received treatment $1$ over the corresponding treatment interval as defined above, and $Y_{i}^{0} (t)$ , had they received treatment $0$ . Only one of these two potential outcomes can actually be observed at a given time, as the binary treatment assumption implies an individual can only receive one of the two treatment options.

For simplicity, denote by $X^{β} (t)$ the matrix of risk factors for the outcome (excluding the treatment, which coefficient is considered via the blip function, defined later), which in our case is composed of columns $K (t)$ , $Q (t)$ , and a first column of $1$ s for modeling the intercept ( $X^{β} (t)$ should not contain any mediator of the treatment’s effect on the outcome). Sets $K (t)$ and $Q (t)$ could be entirely different or exactly the same, as would be the case if the tailoring variables are the confounders, and shared columns between the two sets should not be included twice in building matrix $X^{β} (t)$ . Further denote by $X^{ψ} (t)$ the matrix comprising the tailoring variables (that we also denoted by $Q (t)$ ). Suppose that the full matrix $X (t)$ is such that

X (t) = [X^{β} (t) X^{ψ} (t)],

which, under this notation, may contain duplicated columns (like any variable that is considered in the treatment-free model and that is also a tailoring variable).

As in Coulombe et al.,³⁷ we make the following assumptions (P1) to (P3):

\begin{aligned} A_{i} (t) ⊥ {Y_{i}^{0} (t), Y_{i}^{1} (t)} | K_{i} (t), V_{i} (t), d N_{i} (t) (conditional exchangeability), \end{aligned}

(P1)

\begin{aligned} 0 < P (A_{i} (t) = 0 | K_{i} (t)), P (A_{i} (t) = 1 | K_{i} (t)) < 1 (positivity of treatment), and \end{aligned}

(P2)

\begin{aligned} Y_{i}^{a} (t) = Y_{i} (t) if A_{i} (t) = a (consistency of the outcome) . \end{aligned}

(P3)

Assumptions (P2) and (P3) are standard in all causal inference settings while (P1) is specific to the setting with covariate-driven observation times. Assumption (P1) means that upon conditioning on the observation indicator

d N_{i} (t)

(i.e. keeping only the observed outcomes in the analysis), the sets of covariates

K_{i} (t)

and

V_{i} (t)

are together sufficient to break any potential biasing associations for the causal effect of interest.

To account for the observation process, we assume that observation at time $t$ depends on set $V_{i} (t)$ and that the observation intensity can be modeled by a proportional rate model as follows

E [d N_{i} (t) | V_{i} (t)] = ξ_{i} (t) \exp {γ^{'} V_{i} (t)} d Λ_{0} (t)

(V1)

where

ξ_{i} (t) = I (C_{i} \geq t)

is an indicator of being at risk at time

t

, with

C_{i}

the censoring time of individual

i

, that is, the time when an individual is lost to follow-up. The function

Λ_{0} (t)

is any non-decreasing function.^41,42 The proportional rate model in (V1) allows for the observation times to occur at any time (continuously) and irregularly across patients, as a function of their characteristics. In this work, we assume that censoring is uninformative after conditioning on the treatment, the tailoring and the confounder variables, which can be expressed as

E [Y_{i}^{a} (t) | X_{i} (t), C_{i} \geq t] = E [Y_{i}^{a} (t) | X_{i} (t)] .

(C1)

This assumption can be extended to the setting where censoring is informative by additionally using inverse probability of censoring weights (see e.g. Robins et al.,²¹ Section 3). Finally, we also assume positivity of observation, an assumption that is denoted by

0 < E [d N_{i} (t) | V_{i} (t)] < 1 \forall t .

(V2)

Times at which the probability of observing an individual is zero should not be included in the analysis set (e.g. times when a patient is not yet enrolled in the health system). These times would not only preclude treatment positivity (as the treatment would not be allowed to change), but they could also lead to bias in the causal estimation due to extrapolations in regions of the domain of time when a patient had no chance of being observed.

The interactions between the treatment and tailoring variables and their effects on the mean outcome can inform the best treatment decision to maximize an expected outcome $\hat{Y_{i}} (t)$ . We, therefore, base the ITR on those interactions. Optimizing (in this case, maximizing) the expected outcome is our ultimate goal in tailoring the treatment to the individual. While the more commonly estimated causal marginal effect is a population-average effect, the ITR we aim to estimate is based on a conditional outcome mean model that is used to estimate the treatment effects within the strata of the tailoring variables.

The following outcome mean model is further postulated, conditional on the risk factors and tailoring variables

E [Y_{i} (t) | A_{i} (t), X_{i} (t)] = f {X_{i}^{β} (t); β} + A_{i} (t) ψ^{'} X_{i}^{ψ} (t) .

(O2)

The first term in (O2),

f {X_{i}^{β} (t); β}

, is called the treatment-free model and is a function of the risk factors for the outcome. In this work, we use a linear combination of the risk factors for the function

f {\cdot}

, that is, we assume that

f {X_{i}^{β} (t); β} = β^{'} X_{i}^{β} (t)

is an appropriate treatment-free model. The second term comprises the treatment indicator,

A_{i} (t)

, and the blip function,

ψ^{'} X_{i}^{ψ} (t)

. As briefly mentioned earlier, we must assume that treatment effects are acute enough not to overlap across treatment intervals, and that there are no synergistic or antagonistic effects between any subsequent treatments of a patient (i.e. treatments

A_{i} (s)

and

A_{i} (t)

for

s < t

).⁴³ These conditions ensure that the ITR can be estimated consistently using repeated measurements of the same individual, without any carryover treatment effect that could bias the ITR. Under all conditions stated above and if the model in (O2) represents the true outcome generating mechanism, the blip function indicates how the outcome varies when going from treatment 0 to treatment 1 (i.e. the difference between the two potential outcomes). In particular, the outcome mean is larger under

A_{i} (t) = 0

ψ^{'} X_{i}^{ψ} (t) < 0

, and conversely, larger under

A_{i} (t) = 1

ψ^{'} X_{i}^{ψ} (t) \geq 0

. Therefore, by estimating the blip function, one can determine which treatment should be prescribed to optimize the expected outcome. Note that in situations like our motivating example, where there are two active treatments, this model is not an expected outcome “in the absence of treatment” but rather at the reference level of treatment.

2.3. Proposed methodology

To estimate an optimal ITR under our postulated assumptions, it suffices to estimate the coefficients $ψ$ in (O2). This can be done using dWOLS⁸ which, in our setting, corresponds to a weighted least squares regression because no optimization over time points is required.⁴³ The dWOLS method leads to estimators $\hat{ψ}$ for optimal treatment rules of the form

``Treat with A_{i} (t) = 1 if {\hat{ψ}}^{'} X_{i}^{ψ} (t) \geq 0, and with A_{i} (t) = 0 otherwise''{.}

(1)

If one or both of the treatment model and the outcome model are correctly specified, the blip function is correctly specified, and we use an inverse probability weight that meets the balancing condition introduced in Wallace and Moodie,⁸ the dWOLS method leads to consistent estimators. The estimator is, therefore, called doubly robust. Note, a correct specification for one of the treatment or treatment-free models requires that (i) the correct set of confounders be included in that model and (ii) the confounders be incorporated as predictors in the model using the appropriate functional form (e.g. a variable that is deemed to affect quadratically the outcome must be included as a quadratic term in the outcome mean model). As mentioned in their introductory paper,⁸ the well known inverse probability of treatment (IPT) weights^15,16 meet the balancing condition.

Wallace and Moodie⁸ assumed that the observation times were not driven by covariates. In the current work, the data generating mechanism assumed for each time $t$ and for each individual $i$ is depicted in Figure 1. In that causal diagram, the set of covariates $V_{i} (t) = {K_{i} (t), A_{i} (t), Z_{i} (t)}$ affects the observation indicator at each time $t$ . We use an IIV weight proposed by Lin et al.³⁰ to create a pseudo-population⁴⁴ in which covariates are unassociated with the observation process. Under assumption (V1), an IIV weight of the form

ρ_{i} (t; γ) = {[ξ_{i} (t) \exp {γ^{'} V_{i} (t)} d Λ_{0} (t)]}^{- 1}

can be used to break the association between covariates in the set

V_{i} (t)

and the observation indicator. The causal diagram depicted in Figure 2 is assumed to represent the updated data generating mechanism after IIV-weighting. If the time axis used in the recurrent events model for observation indicators is the time since cohort entry,

d Λ_{0} (t)

cancels out across individuals at time

t

and it need not to be estimated.³¹ Therefore, our estimated IIV weight is given by

ρ_{i} (t; \hat{γ}) = {[ξ_{i} (t) \exp {{\hat{γ}}^{'} V_{i} (t)}]}_{,}^{- 1}

(2)

where

\hat{γ}

are obtained from the Andersen and Gill model,⁴⁵ a model that can be fit using standard software, for example, coxph of the survival package in R.⁴⁶

Figure 2.

Data generating mechanism after re-weighting data by the correctly specified IIV weight. Additional reweighting by IPT weights further removes the arrow from $K (t)$ to $A (t)$ . Interactions are not depicted in the diagram. IIV: inverse intensity of visit; IPT: inverse probability of treatment

We fit the propensity score⁴⁷ using a logistic regression model and use the resulting IPT weight as our balancing weight in the dWOLS. In the logistic regression model, we include all potential confounders as predictors of the treatment $A_{i} (t)$ . The IPT weight is given by

w_{i} (t; \hat{κ}) = \frac{I (A_{i} (t) = 1)}{P (A_{i} (t) = 1 | K_{i} (t); \hat{κ})} + \frac{I (A_{i} (t) = 0)}{P (A_{i} (t) = 0 | K_{i} (t); \hat{κ})}

at time

t

for individual

i

(with

\hat{κ}

the fitted parameters from the logistic regression model, and where

I (\cdot)

is the indicator function). If the predictors of the logistic regression model are not of the correct functional form, confounding may remain. However, one advantage of using dWOLS is that we may have a second chance at unbiasedness if we can correctly specify the functional form of the confounders in the outcome mean model thanks to double robustness.

A weighted least squares (dWOLS in a one-stage treatment setting) that incorporates both weights is then fit to the data to estimate the blip function in (O2). The proposed methodology corresponds to solving the following estimating equation for the coefficients of the mean outcome model in (O2) :

\begin{aligned} U (β, ψ; \hat{γ}, \hat{κ}) & = \sum_{i = 1}^{n} \int_{0}^{τ} w_{i} (t; \hat{κ}) ρ_{i} (t; \hat{γ}) \\ \times [\begin{matrix} \frac{\partial f {X_{i}^{β} (t); β}}{\partial β} \\ A_{i} (t) X_{i}^{ψ} (t) \end{matrix}] [Y_{i} (t) - f {X_{i}^{β} (t); β} - A_{i} (t) ψ^{'} X_{i}^{ψ} (t)] d N_{i} (t) = 0 \end{aligned}

(3)

In contrast to the estimator introduced in Coulombe et al.³⁷ for estimating marginal effects, the design matrix here includes not only interaction terms between the tailoring variables and treatment

A_{i} (t)

but also the terms corresponding to potential confounders

K_{i} (t)

(leading to the double-robustness property, which was not addressed previously.³⁷) The proposed estimators for the parameters in the blip function are denoted by

{\hat{ψ}}_{D W}

(for Double Weights) and those for the treatment-free model, by

{\hat{β}}_{D W}

. In simulation studies and in our application, we demonstrate empirically that our proposed methodology leads to a new type of robustness for the estimators in the blip function, which we term partially doubly robust. That is, in settings with confounding, only one of the propensity score models or outcome models must be correctly specified, the blip function must be correctly specified, and the observation model must only be correctly specified with respect to predictors that create an association between the treatment and the outcome. This mimics the coarseness of the propensity score, in that the observation model need not be the data-generating model, but simply the coarsest function to provide balance with respect to patients’ characteristics that also affect the longitudinal outcome, between instances with and without visits.

The asymptotic variance of the estimators ${\hat{β}}_{D W}$ and ${\hat{ψ}}_{D W}$ can be derived using theory on two-step estimators⁴⁸ treating the parameters in the treatment and the observation models as nuisance parameters. Effectively, the asymptotic theory for the estimator for the optimal repeated measures ITR in equation (1) is obtained by reproducing the developments in Web Appendix C of Coulombe et al.³⁷ where the design matrix used to estimate the effect modifications is modified to include not only the treatment but also the tailoring variables, confounders, and pertinent interaction terms. In our application, we use nonparametric bootstrap with 500 samples to obtain variance estimates.

3. Simulation study

We conducted several simulation studies, with different strengths of dependance of the observation times on covariates, to assess the performance of our estimators ${\hat{ψ}}_{D W}$ . We used a data-generating mechanism that was very similar to that presented in Figure 1. We compared our proposed estimator for the repeated measures ITR to three other types of estimators. The first, ${\hat{ψ}}_{O L S}$ (for Ordinary Least Squares), did not account for the observation process nor the confounders. This estimator consisted of estimating the blip function using dWOLS with no weights at all (but correctly modeling for the confounders in the treatment-free model). The second estimator, ${\hat{ψ}}_{I P T}$ , did not consider the covariate-driven observation process but incorporated an IPT weight as a function of a correctly specified propensity score. For the third strategy, we assessed our proposed estimator ${\hat{ψ}}_{D W}$ under four different model misspecification scenarios. The first scenario was based on all models specified correctly. The corresponding estimator is referred to as ${\hat{ψ}}_{D W 1}$ throughout the rest of the paper. We compared that scenario to (i) a partially misspecified observation model and misspecified outcome model which lacked an adjustment for the second confounder $K_{2}$ ( ${\hat{ψ}}_{D W 2}$ ); (ii) a misspecified treatment model (that adjusted for the squared terms of $K_{1}$ and $K_{3}$ instead of their linear terms) and partially misspecified observation model ( ${\hat{ψ}}_{D W 3}$ ); or (iii) a (fully) misspecified observation model ( ${\hat{ψ}}_{D W 4}$ ). The partially misspecified observation model was such that the covariates required to appropriately block the biasing paths were correctly specified, but all other covariates were misspecified (more details follow in the next paragraph). Because of the partially double robustness of our estimator, the first three estimators are unbiased but may vary in their finite sample performance. In contrast, because of the misspecified observation model, the estimator ${\hat{ψ}}_{D W 4}$ has no guarantee of unbiasedness. We present in Supplemental Figure 2 (Supplemental Material B) the causal diagram corresponding to the data generating mechanism described below. Each estimator listed above incorporated different single or double weights which led to different pseudo-populations on which the mean outcome model was fitted. For each estimator, we present in Supplemental Figure 2 (panels b to g) the updated causal diagrams after the observations were reweighted by the corresponding weights and, based on the diagrams, provide there a justification for which estimators could be biased.

Figure 3.

Data generating mechanism at time $t$ considered in the application to Clinical Practice Research Datalink (CPRD). C/F, respectively, refer to citalopram and fluoxetine.

Our simulation studies were similar to those of Coulombe et al.³⁷ except that new tailoring variables were simulated and interaction terms were added to the outcome mean model. We tested sample sizes of either 250 or 500 patients and conducted 1000 simulations for each sample size. A sensitivity analysis was also conducted in which the sample size was increased to 50,000 for a few of the studied scenarios. In the description that follows, we removed the patient index for ease of notation. First, three baseline confounders ${K_{1}, K_{2}, K_{3}}$ were generated with $K_{1} \sim N (1, 1), K_{2} \sim Bernoulli (0.55)$ , and $K_{3} \sim N (0, 1)$ . The treatment $A (t)$ was binary and time-varying and was simulated at each time $t$ as $A (t) \sim Bernoulli (p_{A})$ with $p_{A} = expit (0.5 + 0.55 K_{1} - 0.2 K_{2} - 1 K_{3})$ , where $expit (x) = exp {x} / {1 + exp (x)}$ . A time-varying mediator of the relationship between $A (t)$ and the outcome $Y (t)$ was simulated as a function of the treatment, as $Z (t) | A (t) = 1 \sim N (2, 1)$ and $Z (t) | A (t) = 0 \sim N (4, 2)$ . A tailoring variable that did not depend on the treatment, the confounders, or the mediator was simulated as $Q (t) \sim N (0.5, 0.5)$ . The outcome, before being set to missing on times when the observation indicator equaled 0, was simulated for each time point as $Y (t) = α (t) - 2 A (t) + 2.5 {Z (t) - E [Z (t) | A (t)]} + 0.4 K_{1} + 0.05 K_{2} - 0.6 K_{3} + 0.5 {A (t) \times Q (t)} - 1 {A (t) \times K_{1}} + ϵ (t)$ with $ϵ (t) \sim N (ϕ, 0.01)$ , $ϕ \sim N (0, 0.04)$ an individual-specific random effect, and where the intercept function $α (t) = \sqrt{t / 100}$ was the same across individuals. Under this setup, the true values of the $ψ$ parameters are ( $- 2, 0.5, - 1$ ) and correspond to the coefficients on the intercept, $Q (t)$ and $K_{1}$ in the blip function, respectively. Note, the treatment effect is immediate in our simulation setting, as $A (t)$ is allowed to vary at any time and it affects $Y (t)$ , an outcome measured roughly at the same time as $A (t)$ if we ignore the granularity of time discretization.

As discussed earlier, a challenge in the estimation of the optimal ITR is that the outcome is observed irregularly, and its observation depends on patients’ covariates. To reproduce this behavior, the quantities above were first simulated in continuous time, with time discretized over a grid of 0.01 units, from 0 to $τ = 1$ . Then, observation times (i.e. when the outcome was observed) were simulated according to a nonhomogeneous Poisson process, with intensity at time $t$ equal to $λ {t | A (t), Z (t), K_{2}, K_{3}} = 0.1 \exp {γ_{1} A (t) + γ_{2} Z (t) + γ_{3} K_{2} + γ_{4} K_{3}}$ . Bernoulli draws with probabilities proportional to these intensities were used at each time point to assign observation times (i.e. to determine whether the outcome is observed at that time). Observation times were drawn until the maximum follow-up time $τ$ . We tested different combinations of $γ$ parameters, which encoded the strength of the dependance of observation indicators on covariates; these are shown in Table 1. These parameters led to a different number of observation times of the outcome across individuals, which correspond to the repeated measures used for ITR estimation. Given that we simulated an immediate treatment effect, each observation (“repeated measure”) used for the ITR estimation comprised an observed outcome (e.g. $Y (s)$ at time $s$ , for $d N (s) = 1$ ) and its corresponding treatment $A (s)$ measured at the exact same time, along with confounders, tailoring variables, and mediators. To assess the performance of the proposed estimator when the observation model was misspecified, we estimated the observation intensity model using (a) only $A (t)$ and $Z (t)$ as predictors (partial misspecification) or (b) only $A (t)$ and $K_{2}$ as predictors (important misspecification). These strategies corresponded to estimators ${\hat{ψ}}_{D W 3}$ and ${\hat{ψ}}_{D W 4}$ . Note, the latter model did not include the mediator $Z (t)$ , a variable deemed to be important in the adjustment for the observation process because conditioning on $d N (t)$ opens a biasing path from $A (t)$ to $Y (t)$ that goes through $Z (t)$ and that is not due to the actual causal treatment effect. Furthermore, even if that observation model did contain an adjustment for $A (t)$ and $K_{2}$ , the estimated coefficient for $A (t)$ in the observation model was likely biased, as $A (t)$ and $Z (t)$ were strongly dependent in the pseudo-population created after reweighting the observations by the IPT weights and the misspecified IIV weight in ${\hat{ψ}}_{D W 4}$ . Hence, the observation model used in the estimator ${\hat{ψ}}_{D W 4}$ was likely misspecified with respect to both $A (t)$ and $Z (t)$ , and the path going from $A (t)$ to $Y (t)$ via the mediator $Z (t)$ likely biased the causal effect of interest (see Supplemental Figure 2, panel (e) for a depiction of the corresponding causal diagram). While the observation model in ${\hat{ψ}}_{D W 3}$ is misspecified, the estimator remains unbiased because the covariates that are not accounted for in the observation model are included in the IPT weights, thus blocking paths from the confounders into A(t). We therefore anticipated ${\hat{ψ}}_{D W 3}$ to be unbiased and possibly less variable than ${\hat{ψ}}_{D W 1}$ as the observation weights were expected to be more stable (as their models were more parsimonious).

Table 1.

Simulation study results ( $M = 1000$ simulations) for the comparison of MSEs of the blip values obtained with six alternative models: DW1 the proposed doubly weighted estimator which accounts for both processes correctly, DW2 for which the observation process was partially misspecified and the outcome model was misspecified, DW3 for which the treatment process was misspecified and the observation process was partially misspecified, DW4 for which the observation process was misspecified, OLS which does not adjust for confounding or observation process, and IPT which accounts only for confounding. Empirical MSEs are computed as the squared empirical bias of the estimated blip function evaluated at the patients’ characteristics plus its empirical variance. The observation process varies but the confounding mechanism and the parameters of the true blip function remain the same in all four scenarios of varying $γ$ below.

Sample	$γ^{1}$	No. obs. times	MSE
size	parameters	mean (IQR)	${\hat{ψ}}_{D W 1}$	${\hat{ψ}}_{D W 2}$	${\hat{ψ}}_{D W 3}$	${\hat{ψ}}_{D W 4}$	${\hat{ψ}}_{O L S}$	${\hat{ψ}}_{I P T}$
250	1	3 (1–3)	0.60	0.46	0.32	0.84	0.75	0.83
	2	3 (2–5)	1.69	1.86	1.60	3.24	2.88	3.26
	3	6 (3–9)	1.99	1.25	1.14	4.54	4.42	4.52
	4	10 (8–12)	0.11	0.11	0.08	0.11	0.07	0.11
500	1	3 (1–3)	0.34	0.24	0.16	0.64	0.63	0.63
	2	3 (1–5)	0.92	1.06	0.84	2.82	2.61	2.83
	3	6 (3–9)	1.27	0.72	0.66	4.36	4.30	4.36
	4	10 (8–12)	0.05	0.05	0.04	0.05	0.03	0.05

MSE: mean squared error; IQR: interquartile range; IPT: inverse probability of treatment; OLS: ordinary least squares.

$^{1}$ 1. $(- 2, - 0.3, 0.2, - 1.2); 2. (0.3, - 0.6, - 0.4, - 0.3); 3. (0.4, - 0.8, 1, 0.6); 4. (0, 0, 0, 0)$ , that is, uninformative observation.

In our simulation setting, the true value of the blip function (or “gold standard”) at time $t$ was given by $b (Q (t), K_{1}; ψ = {- 2, 0.5, - 1}) = - 2 + 0.5 Q (t) - 1 K_{1}$ . We evaluated the performance of the estimators in three different ways. First, we computed the empirical mean squared error (MSE) of the blip values (i.e. the blip function evaluated at the covariates). For a given estimator $\hat{ψ} = ({\hat{ψ}}_{0}, {\hat{ψ}}_{1}, {\hat{ψ}}_{2})$ , that MSE was given by the mean of $[- 2 + 0.5 q (t) - 1 k_{1} - ({\hat{ψ}}_{0} + {\hat{ψ}}_{1} q (t) - {\hat{ψ}}_{2} k_{1})]^{2}$ (results averaged over 1000 simulations in Table 1). Next, we calculated the error rate in optimal treatment decisions (i.e. the proportion of estimated optimal treatment decisions that do not agree with the true optimal treatment decisions); see Supplemental Table 3 in Supplemental Material C. Our third performance criterion was the estimated value function, evaluated in a new population of size 25,000. The value function is the expected outcome under a given treatment strategy.⁴⁹ We compare in Supplemental Material C (Supplemental Table 6) the value function under the actual treatment received and the optimal treatment strategies obtained with our proposed estimator or other comparators. In Supplemental Material C, we show the empirical bias of the six estimators for the blip values (i.e. the blip function evaluated at the observed covariates) (Supplemental Table 4), the absolute bias of each blip coefficient separately across all estimators compared and for each scenario for the observation process (Supplemental Table 5), and the performance in terms of value function (Supplemental Table 6). In Supplemental Material D, we show the results of the sensitivity analysis with a larger sample size. Finally, there is in Supplemental Material E a sample of R code to reproduce the simulation studies and to obtain the proposed estimator along with some code and details on the running time of the procedure.

3.1. Results of the simulation study

The results of the comparison of MSE of the blip values across all six estimators for $ψ$ (Table 1) are as expected. First, the MSE is larger (and similar) for ${\hat{ψ}}_{D W 4}$ , ${\hat{ψ}}_{O L S}$ , and ${\hat{ψ}}_{I P T}$ (three last columns in Table 1) as compared with the three other estimators. Although ${\hat{ψ}}_{D W 4}$ , ${\hat{ψ}}_{O L S}$ , and ${\hat{ψ}}_{I P T}$ make adjustments in the outcome mean model for confounders ${K_{1}, K_{2}, K_{3}}$ , neither estimator accounts (adequately) for the observation process. The difference in MSE across the set of estimators ${\hat{ψ}}_{D W 4}$ , ${\hat{ψ}}_{O L S}$ and ${\hat{ψ}}_{I P T}$ and the set of estimators ${\hat{ψ}}_{D W 1}$ , ${\hat{ψ}}_{D W 2}$ , and ${\hat{ψ}}_{D W 3}$ increases with increasing sample size (Table 1). We obtain similar results when comparing the bias of the blip values, rather than the MSE (Supplemental Table 4). There is a clear decrease in the bias of the blip values for the estimators ${\hat{ψ}}_{D W 1}, {\hat{ψ}}_{D W 2}$ , and ${\hat{ψ}}_{D W 3}$ when increasing the sample size to 1000 or 2500, a result not observed with the three other estimators (Supplemental Table 4).

We observe similar patterns of results when comparing the treatment decisions from each estimator. Some scenarios for the parameters $γ$ , such as 2 and 3, lead to an important empirical bias in estimators ${\hat{ψ}}_{D W 4}$ , ${\hat{ψ}}_{O L S}$ , and ${\hat{ψ}}_{I P T}$ which is reflected both in the MSE and empirical bias of the blip values and in the MSE of the optimal treatment decisions. Overall, the error rate of optimal treatment decisions varies from 0% to 6% with the correctly specified estimators ${\hat{ψ}}_{D W 1}, {\hat{ψ}}_{D W 2}$ , and ${\hat{ψ}}_{D W 3}$ , while it reaches 25% with the three other estimators. For scenario 1, the error rate is small across all six estimators compared. This is explained by the fact that even when the blip function is biased, part of the treatment decisions is correct (unbiased) if the estimated blip value falls on the right side of the zero threshold (i.e. the threshold for the treatment rule).

In scenarios 2 and 3 for the $γ$ parameters, the bias of the three ITR estimators using misspecified models is, therefore, due to biased blip values falling possibly near the threshold, yet on the wrong side of the threshold as compared to the true blip values.

The results above on the MSE and empirical bias of the blip values and error rates on the treatment decisions also agree with the results on the absolute bias of each blip coefficient found in Supplemental Table 5 (Supplemental Material C). In general, the bias in the estimation of the intercept coefficient of the blip function is small for the three preferred estimators ${\hat{ψ}}_{D W 1}, {\hat{ψ}}_{D W 2}$ , and ${\hat{ψ}}_{D W 3}$ and the bias decreases with the sample size. On the other hand, the intercept coefficient is consistently biased when estimated with ${\hat{ψ}}_{D W 4}$ , ${\hat{ψ}}_{O L S}$ , or ${\hat{ψ}}_{I P T}$ , even with increasing the sample size (Supplemental Table 5). The results from the sensitivity analysis with a sample size of 50,000 patients show a similar pattern (Supplemental Table 7 in Supplemental Material D). The bias of each blip coefficient tends toward 0 when using one of the three estimators ${\hat{ψ}}_{D W 1}, {\hat{ψ}}_{D W 2}$ , or ${\hat{ψ}}_{D W 3}$ , with the maximum bias being 0.18, as opposed to 0.50 with the sample size of 250 patients. The three other estimators remain as biased as with a smaller sample size.

The performance as measured by the value function is also consistent with the other results above, showing a more important difference across the estimators under scenarios 2 and 3 for the observation process (Supplemental Table 6 in Supplemental Material C). The results for scenarios 2 and 3 convey the differences in optimal treatment decisions found across the correctly specified and the other estimators. Again, the results for the average outcome are similar across the three estimators ${\hat{ψ}}_{D W 1}$ , ${\hat{ψ}}_{D W 2}$ and ${\hat{ψ}}_{D W 3}$ , always leading to larger (or equal) average outcomes when compared to ${\hat{ψ}}_{D W 4}$ , ${\hat{ψ}}_{O L S}$ , or ${\hat{ψ}}_{I P T}$ .

In most results, we find that the estimator ${\hat{ψ}}_{D W 3}$ performs as well or better than the proposed estimator ${\hat{ψ}}_{D W 1}$ , both in terms of bias and variance. This result seems to be due to the higher efficiency of that estimator when compared with ${\hat{ψ}}_{D W 1}$ . The latter estimator incorporates inverse weights that are based on correctly specified models which contain more covariates than those used with ${\hat{ψ}}_{D W 3}$ (for an approach that can be used to select an efficient adjustment set to make causal inference, see e.g.⁵⁰). Even if under-adjusted models like those used with ${\hat{ψ}}_{D W 3}$ may lead to unbiased estimators in certain settings (e.g. when the IPT weight accounts for variables that we did not incorporate in the observation model but that predict observation), we do not suggest purposely using under-adjusted models for the weights or for the outcome model. It can be risky not to include in one model the covariates that act as predictors in several models, as we may lose the benefit of the multiple robustness of the proposed method. In practice, we generally do not know the true functional form of a covariate in a given model.

The results in this section and in Supplemental Material C demonstrate empirically that ${\hat{ψ}}_{D W 1}$ , ${\hat{ψ}}_{D W 2}$ , and ${\hat{ψ}}_{D W 3}$ lead to comparable MSEs of the blip values and error rates of the optimal treatment decisions, which MSEs and error rates are smaller when compared to more naive estimators that do not account appropriately for both the treatment and the observation processes. The proposed estimator ${\hat{ψ}}_{D W}$ is unbiased for the optimal ITR under certain assumptions for the treatment, the observation, and the outcome models. We introduced the term partially doubly-robust for our proposed methodology, where our treatment, observation, and outcome models have different opportunities to yield consistent estimators. If the observation model and the blip model are correctly specified, then only one of the treatment or the treatment-free models must be correctly specified. This is also true when the observation model is partially misspecified, that is, when it is misspecified only with respect to covariates that are not linking the treatment and the outcome in the causal diagram (assuming that the diagram also accounts for the paths blocked by the IPT weighting, if an IPT weight is used simultaneously). When the observation model is misspecified with respect to covariates linking the treatment and the outcome in the causal diagram after conditioning on the observation indicator $d N (t)$ and after re-weighting observations by a correctly specified IPT weight, then our proposed methodology generally fails to be consistent (depending on the strength of dependence between covariates and observation indicators).

4. Illustration with the CPRD

We applied the proposed methodology for the estimation of a repeated measures ITR to data from the UK’s CPRD. Our aim was to develop an optimal ITR that chooses between two commonly prescribed antidepressant drugs, citalopram and fluoxetine, to optimize a BMI utility function. The BMI utility function was repeatedly and irregularly measured in time. We assumed that confounding and covariate-driven observation times were potential concerns and could cause bias in the estimation of the ITR. The causal diagram we assumed at each time is depicted in Figure 3. The causal diagram encapsulates the general assumptions we make about the variables affecting the treatment, the visit, or the outcome models, but whether some of these variables come before or after the treatment is, in some cases, unclear (especially since some of these will be adjusted as time-varying variables in the visit model). Thus, the unblocked paths due to the visit process could be due to confounders linking the treatment and the visit indicator to the outcome, or to mediators of the treatment-outcome relationship linking the treatment and the visit indicator to the outcome. These precise assumptions are left unspecified and we aim, with our approach, to merely remove or block any path going into the treatment, the visit, and the outcome, separately, via the propensity score, the proportional rate model for the visit, and the treatment-free model for the outcome, respectively. Furthermore, given that our proposed method cannot accommodate settings in which the outcome itself affects its observation process, we did not draw an arrow from BMI to the indicator $d N (t)$ in that diagram. However, it is possible that the observation process of BMI indeed depends on BMI values. If one believes that this is the case, then one could either relax the assumption on conditional independence between the visit and the outcome processes and incorporate the most recent BMI measurement (taken at the previous visit) in the adjustment set, or new methods could be developed that allow for predictors of observation to be measured sporadically. For instance, Sun et al.⁵¹ have proposed a method based on kernel-smoothing for semiparametric proportional rate models when covariates are measured sporadically in the analysis of recurrent events. That method could be extended for informative observation times. We do not assume that pure predictors of the outcome (i.e., variables that are not contained in the set of confounders but that cause the outcome) are tailoring variables for the antidepressant choice. Therefore, the diagram in Figure 3 does not include an independent set of tailoring variables like in Figure 1 (the set $Q_{i} (t)$ ). In this illustration, it is rather postulated that the tailoring variables are contained in the set of confounders.

4.1. Data source

CPRD is one of the largest primary care databases of anonymized health records and comes from a network of more than 700 general practictioner practices in the UK. The data contain demographics, lifestyle factors, prescription drugs, medical diagnoses, and referrals to specialists and hospitals for more than 13 million patients. Information on prescription drugs comes from written prescriptions (as opposed to filled medications). The data we used were linked with the Hospital Episode Statistics, which contains information on hospital diagnoses, and the Office for National Statistics mortality database, which provides details on dates and causes of death. The study protocol was approved by the Independent Scientific Advisory Committee of the United Kingdom Clinical Practice Research Datalink (CPRD) (protocol number 19_017R) and the Research Ethics Committee of the Jewish General Hospital (Montréal, Québec, Canada).

We defined a cohort of new users of citalopram or fluoxetine with a recent diagnosis of depression. That cohort was previously defined and described¹⁷ and a flow chart for the cohort creation is shown in Supplemental Material F. Briefly, patient follow-up started at the time of initiation of either citalopram or fluoxetine, and follow-up was stopped (censored) when a patient discontinued their treatment, switched to another antidepressant drug, became pregnant, died, reached the end of registration with the practice, or the end of the study period (December 2017), whichever occurred first. Although we censored patients’ follow-up time when they discontinued or switched treatment, we were interested in a repeated measures ITR, in which treatment decisions are taken not only at cohort entry, but also at anytime during follow-up when a treatment decision must be made to reduce the potential weight change. We defined the end of a prescription as the prescription date plus its duration in days and considered that treatment discontinuation occurred whenever 30 days had passed since the end of the last prescription for the corresponding drug without any new prescription for that drug being issued. Note, we did not consider informative censoring in this work but it is possible that, for instance, censoring at treatment discontinuation could be informative. In such case, inverse probability of censoring weights could be used to account for informative censoring.^52,53 We only kept in the study cohort patients who had at least one BMI measurement before or at cohort entry and used the most recent BMI measurement to define their baseline BMI. Then, any BMI measurement recorded during patient follow-up was kept as an outcome for analysis (i.e. an outcome for which we aim to optimize the expectation under the ITR). If a BMI value was smaller than 15 or larger than 50, it was replaced by a missing value and not used in the analysis.

4.2. Outcome definition

For the repeated outcome, we defined a continuous utility function that conveyed the negative impacts of weight gain or weight loss while being treated with antidepressant drugs. That outcome was defined every time when BMI was measured as

U (t) = 100 - [| BMI (t) - 22 | - | BMI (0) - 22 |] / BMI (0) \times 100 .

Briefly, that function varies according to the baseline BMI of a patient and, for any baseline BMI value, it is maximized around the normal BMI range (18.5–24.9) for a BMI measurement taken at time $t$ . It peaks at 22, which is roughly in the middle of that range. The utility function’s highest value corresponds to the best weight change outcome. As an individual’s change in BMI varies in a detrimental fashion, the utility decreases towards some minimum across the study cohort. The minimum utility observed in our analysis dataset is reported in the Results section. Theoretically, assuming that a patient may not have a percent change in BMI greater than 50% since cohort entry (whether it is a detrimental or a beneficial change), the minimum value the utility can achieve is 50%, and the maximum is 150%. Given our outcome definition, a larger value for the utility function is better.

4.3. Confounder, tailoring variables, and predictors of observation

We defined the potential confounders of the relationship between the prescribed antidepressant and the utility function at baseline (cohort entry) and included the continuous-valued age, sex, continuous-valued BMI at baseline, current smoking status (smoker or non-smoker), alcohol abuse, calendar year of cohort entry (1998–2005, 2006–2011, 2012–2017), psychiatric disease history (which included autism spectrum disorder, obsessive-compulsive disorder, bipolar disorder, and schizophrenia), anxiety, or generalized anxiety disorder (further referred to as anxiety), antipsychotics use, any other psychotropic medication use (benzodiazepine drugs, anxiolytics, barbiturates, and hypnotics), lipid-lowering drugs, the number of psychiatric admissions, or hospitalizations for self-harm in the 6 months prior to cohort entry and the Index of Multiple Deprivation⁵⁴ as a proxy for the socioeconomic status. For the tailoring variables used to construct the optimal repeated measures ITR, similar to previous work¹⁷, we included in the models the interaction terms between the treatment and age, sex, smoking status, a composite indicator of psychiatric disease history (a diagnosis for either autism spectrum disorder, obsessive-compulsive disorder, bipolar disorder, or schizophrenia), a diagnosis for anxiety, and the number of psychiatric admissions or hospitalizations for self-harm in the previous 6 months. Note, we could also have defined time-varying confounders and tailoring variables, which our methodology allows for, but we used simpler definitions for our illustration. Comorbidities were defined using any diagnostic codes recorded by cohort entry, and medication use, using any prescriptions in the year prior to cohort entry.

In the observation intensity model for the outcome, we included the same set of covariates as confounders but these were defined in a time-varying manner, except for BMI at baseline. Our rationale for this is that we wish to capture any effect of these variables on the observation intensity, and time-varying variables may, therefore, provide more sensitivity to any such effect. For those time-varying covariates, we used different definitions. We considered a patient exposed to a medication for the duration of the corresponding prescription (medication considered were the lipid-lowering drugs, antipsychotics, and other psychotropic drugs). Then, after any first diagnosis for a chronic disease (including alcohol abuse, anxiety, and other psychiatric diseases), a patient was considered to have the condition for the remainder of the follow-up. The smoking status was updated at any time a new code related to smoking was recorded during follow-up (this included codes for smoking status and smoking cessation therapy). At any other time, it was defined using the most recent code for smoking.

As with the rest of the manuscript, we assumed in the illustration that the treatment was known (observed) at all times, which is realistic given that we had access to any prescriptions given by general practitioners. We also assumed that covariates in the two nuisance models (observation and treatment models) were always measurable. However, the smoking status at baseline and the Index of Multiple Deprivation were missing for some individuals. We assumed that individuals with no smoking status recorded (at anytime before cohort entry) were non-smokers, and we removed from the study the few individuals (< 1%) with a missing Index of Multiple Deprivation. For all other covariates (confounders, tailoring variables or time-dependent features in the observation intensity model), we assumed that any existing condition was recorded in the database and that any drug prescribed was also recorded and available for defining the medication variables (and therefore, had no missing values).

In the final outcome model, the design matrix incorporated the potential confounders, the treatment (citalopram or fluoxetine), and the interaction terms corresponding to all tailoring variables. The model, which predicted the utility function defined above, also incorporated both the IPT weight as a function of the propensity score and the IIV weight computed using the Andersen and Gill model. All predictors were included as linear terms in both nuisance models. Only the times when the utility function was available (i.e. when BMI was available) were included in the analysis and accounted for in the model fit. Those times corresponded to the repeated measures in the repeated measures ITR. It is likely that the value of $U (t)$ depends on $t$ , as patients who spend more time using one of the two study drugs might see a greater effect of the drug on their weight. One way to account for that effect of time would be to include a time-related variable in the outcome model used to develop the ITR. We did not consider the effect of time in this data analysis. We compared four different estimators for the optimal repeated measures ITR, with three of which were defined in Section 3. The other estimator, ${\hat{ψ}}_{I I V}$ , is an IIV-weighted one-stage dWOLS estimator that incorporates an IIV weight. For each estimator and corresponding coefficients in the rule, we computed 95% bootstrap confidence intervals (CIs) using 500 bootstrap samples. We used a two-stage sampling: We first sampled patients with replacement (using the same sample size as the original dataset) and, within each patient, sampled the same number of measurements as the original dataset with replacement. The same procedure was used to obtain 95% CIs for the observation rate ratios. The bootstrap procedure, therefore, considered the within-patient correlation due to similar within-patient observations (but it did not necessarily preserve time trends in the BMI outcome). Our approach might, therefore, lead to an underestimation of the variance of the blip coefficients, but it is unlikely given the small number of within-patient repetitions of the outcome in our application to CPRD. In other applications, the procedure could be adapted to account for a more complex correlation structure using, for instance, block bootstrap.⁵⁵

4.4. Results

After applying the exclusion criteria, the final cohort comprised 109,756 citalopram initiators and 85,606 fluoxetine initiators with a baseline BMI measurement who contributed to the treatment and the monitoring models. Of those, a total of 31,120 patients (60% citalopram initiators) had at least one follow-up BMI measure and contributed to the outcome model. A total of 47,938 records for BMI during follow-up were used for estimating the repeated measured ITR (60% of which were measured in citalopram initiators). Both the patients initiating citalopram and those initiating fluoxetine had an average of 1.6 BMI measures during follow-up. We present a comparison of patients’ covariates at cohort entry, stratified by the study antidepressant, for both the cohort of 195,362 patients with a baseline BMI and for the cohort of 31,120 patients who contributed to the outcome model fit in Supplemental Table 8 (Supplemental Material G). Some differences were found across the two groups, especially for the distribution of calendar year at cohort entry and the proportion of patients diagnosed with anxiety. These variables may act as confounders for the relationship between the antidepressants and the weight utility function. In the bottom of Supplemental Table 8, we also present for the latter cohort the time to a first BMI measurement during follow-up and the mean of that first BMI value.

The estimated rate ratios obtained from the observation model are presented in Supplemental Table 9 (Supplemental Material G) along with their 95% bootstrap CIs. A few variables were found to be associated with the observation intensity. Prescription for citalopram, male sex, and later calendar year of cohort entry were statistically significantly associated with lower chances for the weight to be recorded. On the other hand, a higher Index of Multiple Deprivation quintile, being an ever smoker, or the use of antipsychotics, other psychotropic drugs, or lipid-lowering drugs, were all statistically significantly associated with it being more likely for the outcome to be recorded.

When taking the average of all the observed BMI records, we found a crude mean utility function of 99.1 (SD 8.2) in patients who initiated citalopram and of 99.3 (SD 8.1) in those who initiated fluoxetine. The minimum utility value across the cohort was 50.3. and the maximum utility was 144.2. We present in Supplemental Table 10 (Supplemental Material G) the fitted coefficients for the tailoring variables in the mean outcome model, along with the corresponding 95% bootstrap CIs. The fitted optimal ITR under our proposed methodology is given by:

Treat with citalopram if

\begin{aligned} - 1.26 + 0.01 \times [Age] - 0.02 \times I [Malesex] + 0.06 \times [Index of MultipleDeprivation] \\ - 0.04 \times I [Eversmoker] + 0.46 \times I [Alcoholabuse] + 1.36 \times I [Psychiatricdiagnosis] \\ + 0.46 \times I [Anxiety] - 0.66 \times I [Antipsychotics drug use] \\ + 0.24 \times I [Other psychotropic druguse] + 0.06 \times I [Lipidlowering drug use] > 0 \end{aligned}

where

I [\cdot]

is the indicator function. To provide an idea of the blip values one could obtain with this rule, we evaluated the rule under some of the 1280 possible profiles of patient characteristics; these results are shown in Supplemental Material H. There, we also present the proportion of patients who would be recommended citalopram under the different estimated ITRs. Finally, a comparison of the average fitted outcome under all ITR estimators is presented in Table 2. Each outcome is fitted using the corresponding model applied to the actual treatment received (first row) or to the optimal treatment decision based on the rule (second row or Table 2). Using optimal treatment rules to make the optimal treatment decision consistently leads to greater average fitted outcomes than the actual treatment received, and all four treatment rules lead to similar results in terms of optimization, in this case. This might be explained by the fact that the observation process was not very strongly linked to covariates in this study (as per the rate ratios found in Supplemental Table 9) and that there were relatively few imbalances between treatment groups at cohort entry, such that confounding is relatively minor (Supplemental Table 8).

With all four estimators, we have found that the interaction term between age and the treatment was statistically significant at the 0.05 level (Supplemental Table 10). It is a signal that patients’ age may generally be useful in tailoring the antidepressant drug, after accounting for the covariate-driven treatment and observation processes. To generalize these results, however, the study should be reproduced in other (possibly larger) study cohorts.

Table 2.
Comparison of the fitted outcome (i.e. a BMI-related utility function) under each estimated optimal ITR and compared to the actual treatment received, comparison for each of the four estimators: OLS which does not adjust for confounding or observation process, IPT which accounts only for confounding, IIV which accounts only for the observation process, and the proposed doubly weighted estimator which accounts for both processes, CPRD, UK, 1998–2017, $n =$ 31,120 individuals.

Average fitted outcome (SE $^{1}$ )

Treatment ${\hat{ψ}}_{O L S}$ ${\hat{ψ}}_{I P T}$ ${\hat{ψ}}_{I I V}$ ${\hat{ψ}}_{D W 1}$

Actual treatment received 99.11 (0.001) 99.09 (0.001) 99.08 (0.001) 99.06 (0.001)

Optimal treatment 99.11 (0.001) 99.12 (0.001) 99.08 (0.001) 99.11 (0.001)

	Average fitted outcome (SE $^{1}$ )
Actual treatment received	99.11 (0.001)	99.09 (0.001)	99.08 (0.001)	99.06 (0.001)
Optimal treatment	99.11 (0.001)	99.12 (0.001)	99.08 (0.001)	99.11 (0.001)

SE: standard error; BMI: body mass index; ITR: individualized treatment rule; IPT: inverse probability of treatment; OLS: Ordinary Least Squares; IIV: inverse intensity of visit; CPRD: Clinical Practice Research Datalink.

$^{1}$ Based on prediction SEs obtained from the model that were further summed and normalized to obtain the variance of the mean of all predicted outcomes, rather than the variance of individual predicted outcome values.

5. Discussion

In observational studies using longitudinal data extracted from EHR, patients are often observed at irregular times that may depend on their own characteristics. When these same characteristics are associated with the treatment and/or the outcome, causal inference on treatment effects can be affected. In developing optimal ITRs that rely directly on those treatment effects, it is important to determine how observation times can impact the inference. Drawing causal diagrams^56,57 can help in finding potential biasing paths between the treatment and the outcome that should be blocked via, for example, IPT weighting or IIV weighting.

In this work, we proposed a novel methodology to account for covariate-driven treatment and observation mechanisms simultaneously in the estimation of optimal repeated measured ITRs. In extensive simulation studies, we demonstrated the consistency of the methodology. The proposed method is a straightforward extension of previous work,³⁷ and the same asymptotic theory can be used to develop the asymptotic variance of our proposed estimators. Our method is easy to implement and more easily understood than methods such as g-estimation. We applied the method to data from the UK’s CPRD and proposed an optimal ITR for choosing between citalopram and fluoxetine (two commonly prescribed antidepressants) to treat depression while reducing BMI changes that could be detrimental for one’s health.

The proposed methodology relies on assumptions that must be met for consistent estimation of an ITR. We now discuss these in the context of the CPRD application, starting with the causal assumptions. Our propensity score model must contain all potential confounders and it should be specified correctly as a function of the confounders (correct functional formats). In the CPRD application, depression severity is one potential confounder that was not measured. We have used the number of psychiatric hospitalizations as a proxy for severity and think it is unlikely that our application would be highly affected by unmeasured confounding. We do not expect important differences at baseline across the two treatment groups because the two study drugs are prescribed rather interchangeably. We also have assumed treatment and observation positivity, which can be unrealistic in certain settings with EHR. In the CPRD application, because both drugs are prescribed interchangeably, it is unlikely that treatment positivity is violated. However, coarsening of the data in time may be used to circumvent non-positivity issues. The work of Robins et al.²¹ (Section 6) and Neugebauer et al.²² also allowed causal inference under a weaker positivity assumption for the monitoring and could possibly be extended to our setting. Moreover, given that our work focuses on one-stage ITRs (as opposed to multiple stages DTRs), the positivity assumption for treatment is weaker (i.e. easier to meet) than the one typically made when building multiple decisions rules where long sequences of treatment must have a non-zero chance of occurring. In this work, we did not consider a sequence of treatments but rather the cross-sectional impact of a binary treatment and we allowed each patient to contribute multiple measurements, one for each outcome observed. Regarding observation positivity, some segments of person-time might have been included over which patients had no chance of having their weight observed, which could affect our observation model. We suspect that the effect of violating observation positivity would be relatively modest in this study (and not differential across the two treatment groups). The third causal assumption, outcome consistency, encompasses that the treatment definition be clear and that there be no interaction between individuals (no spillover in treatment effects). In our setting, the latter assumption is realistic as the antidepressant drug taken by one patient is unlikely to affect another patient’s weight, and the CPRD data are collected over a large geographic area (such that patients are unlikely to interact with most other patients in the study cohort). The treatment definition is also clear and simple in our application to the CPRD, such that it is unlikely that there are treatment variations that might affect the consistency of the potential outcome. We also have assumed that covariates in the observation intensity model were always measured (in continuous time). Given that we used electronic health records from the CPRD, we had all the information on the treatments prescribed by general practitioners to patients, however, we could not verify whether these drugs were dispensed or used and covariates in the observation model may be misclassified (i.e. the covariates might have been measured with error). We have assumed that all other predictors in the observation model were always measured however we suspect that variables like smoking status, alcohol abuse, or anxiety were measured informatively, which could affect our inference in a way that is hard to assess. Finally, since our method uses repeated measures of the outcome, it requires the standard assumptions for one-stage repeated measures ITRs, that is, treatment effects should be acute and there should be no antagonistic or synergistic effect due to previous treatments affecting the current one.⁴³ Although measurements of the outcome were taken far apart in time in our illustration using CPRD data, we cannot exclude the possibility of a cumulative effect of the initiating medication on the BMI outcomes.

In future work, we aim to extend the proposed methodology to the more complex setting of the more traditional, multiple stage DTRs using dWOLS. In that setting, dWOLS has great advantages as it can incorporate weights that are cumulated over time (similarly to marginal structural models). As such, it is a method of choice for treating complex covariate-driven observation processes (such as those that depend on an endogenous covariate process²⁰) and time-dependent confounding, in which there can be a biasing feedback between the covariates and the processes.

Supplemental Material

sj-pdf-1-smm-10.1177_09622802231158733 - Supplemental material for Estimating individualized treatment rules in longitudinal studies with covariate-driven observation times

Supplemental material, sj-pdf-1-smm-10.1177_09622802231158733 for Estimating individualized treatment rules in longitudinal studies with covariate-driven observation times by Janie Coulombe, Erica EM Moodie, Susan M Shortreed and Christel Renoux in Statistical Methods in Medical Research

Footnotes

Acknowledgements

The authors would like to thank Dr. Shannon Holloway for reading the manuscript and providing significant help with editing. The authors are also indebted to two anonymous reviewers, the comments of whom greatly improved this manuscript.

This research was enabled in part by support provided by Compute Canada (). Computations were performed on the Niagara supercomputer at the SciNet HPC Consortium. SciNet is funded by the Canada Foundation for Innovation; the Government of Ontario; Ontario Research Fund – Research Excellence; and the University of Toronto.

The R code to reproduce the simulation studies is available in Supplemental Material E and at . The CPRD data used in our illustration are confidential and cannot be shared. Access to CPRD data requires protocol approval.

Declaration of conflicting interests

The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Dr. Shortreed has worked on grants awarded to Kaiser Permanente Washington Health Research Institute (KPWHRI) by Bristol Meyers Squibb and by Pfizer. She was also a co-Investigator on grants awarded to KPWHRI from Syneos Health, who represented a consortium of pharmaceutical companies carrying out FDA-mandated studies regarding the safety of extended-release opioids. Authors JC, EEMM and CR have no conflict of interest to declare. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Part of the research was funded by an Innovative Ideas Award from Healthy Brains for Healthy Lives [HBHL 1c-II-11]. Dr. Coulombe and Dr. Shortreed are supported by the National Institute of Mental Health of the National Institutes of Health [R01 MH114873]. Dr. Moodie holds a Canada Research Chair (Tier 1) in Statistical Methods for Precision Medicine. Dr. Moodie further acknowledges support from a Discovery Grant from the Natural Sciences and Engineering Research Council (NSERC) and a chercheur de mérite career award from the Fonds de recherche du Quèbec–Santè. Dr. Renoux is supported by a chercheur-boursier salary award from the Fonds de recherche du Québec-Santé.

ORCID iD

Janie Coulombe

Supplemental material

Supplemental material for this article is available online.

References

Bang

Robins

. Doubly robust estimation in missing data and causal inference models. Biometrics 2005; 61: 962–973.

Stuart

. Matching methods for causal inference: a review and a look forward. Stat Sci 2010; 25: 1–21.

Moodie

EEM

Chakraborty

Kramer

. Q-learning for estimating optimal dynamic treatment rules from observational data. Can J Stat 2012; 40: 629–645.

Schuler

Rose

. Targeted maximum likelihood estimation for causal inference in observational studies. Am J Epidemiol 2017; 185: 65–73.

Kosorok

Laber

. Precision medicine. Annu Rev Stat App 2019; 6: 263–286.

Robins

Blevins

Ritter

, et al. G-estimation of the effect of prophylaxis therapy for pneumocystis carinii pneumonia on the survival of AIDS patients. Epidemiology 1992; 3: 319–336.

Laber

Rose

Davidian

, et al. Q-Learning. Wiley StatsRef: Statistics Reference Online 2014: 1–10.

Wallace

Moodie

EEM

. Doubly-robust dynamic treatment regimen estimation via weighted least squares. Biometrics 2015; 71: 636–644.

Wallace

Moodie

EEM

Stephens

. Dynamic treatment regimen estimation via regression-based techniques: introducing R package DTRreg. J Stat Softw 2017; 80: 1–20.

10.

Wallace

Moodie

EEM

Stephens

. Package ‘DTRreg’. R package version 1.7, 2016. URL: https://cran.r-project.org/web/packages/DTRreg/index.html.

11.

Simoneau

Moodie

EEM

Azoulay

, et al. Adaptive treatment strategies with survival outcomes: an application to the treatment of type 2 diabetes using a large observational database. Am J Epidemiol 2020; 189: 461–469.

12.

Tsiatis

Davidian

Holloway

, et al. Dynamic treatment regimes: statistical methods for precision medicine. Boca Raton: CRC Press, 2019.

13.

Linn

Laber

Stefanski

. iqLearn: interactive Q-learning in R. J Stat Softw 2015; 64: 1–32.

14.

McGrath

Lin

Zhang

, et al. gfoRmula: an R package for estimating the effects of sustained treatment strategies via the parametric g-formula. Patterns 2020; 1: 1–12.

15.

Horvitz

Thompson

. A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 1952; 47: 663–685.

16.

Rosenbaum

. Model-based direct adjustment. J Am Stat Assoc 1987; 82: 387–394.

17.

Coulombe

Moodie

EEM

Shortreed

, et al. Can the risk of severe depression-related outcomes be reduced by tailoring the antidepressant therapy to patient characteristics? Am J Epidemiol 2021; 190: 1210–1219.

18.

Johnson

Glicksberg

Hodos

, et al. Causal inference on electronic health records to assess blood pressure treatment targets: an application of the parametric g formula. Pacific Symposium on Biocomputing 2018: Proceedings of the Pacific Symposium 2018; 2018: 180–191.

19.

Greenland

. Quantifying biases in causal models: classical confounding vs. collider-stratification bias. Epidemiology 2003; 14: 300–306.

20.

Coulombe

Moodie

EEM

Platt

, et al. Estimation of the marginal effect of antidepressants on body mass index under confounding and endogenous covariate-driven monitoring times. Ann Appl Stat 2021; 16: 1868–1890.

21.

Robins

Orellana

Rotnitzky

. Estimation and extrapolation of optimal treatment and testing strategies. Stat Med 2008; 27: 4678–4721.

22.

Neugebauer

Schmittdiel

Adams

, et al. Identification of the joint effect of a dynamic treatment intervention and a stochastic monitoring intervention under the no direct effect assumption. J Causal Inference 2017; 5: 1–66.

23.

Kreif

Sofrygin

Schmittdiel

, et al. Evaluation of adaptive treatment strategies in an observational study where time-varying covariates are not monitored systematically. arXiv preprint 2018; arXiv:1806.11153.

24.

Thall

Wooten

Logothetis

, et al. Bayesian and frequentist two-stage treatment strategies based on sequential failure times subject to interval censoring. Stat Med 2007; 26: 4687–4702.

25.

Shahn

Sun

, et al. G-computation and hierarchical models for estimating multiple causal effects from observational disease registries with irregular visits. AMIA Jt Summits Transl Sci Proc 2019; 2019: 789–798.

26.

Yang

. Semiparametric estimation of structural nested mean models with irregularly spaced longitudinal observations. Biometrics in press. URL: https://onlinelibrary.wiley.com/doi/10.1111/biom.13471.

27.

Lok

. Mimicking counterfactual outcomes to estimate causal effects. Ann Stat 2017; 45: 461–499.

28.

Goldstein

Phelan

Pagidipati

, et al. How and when informative visit processes can bias inference when using electronic health records data for clinical research. J Am Med Inform Assoc 2019; 26: 1609–1617.

29.

McCulloch

Neuhaus

Olin

. Biased and unbiased estimation in longitudinal studies with informative visit processes. Biometrics 2016; 72: 1315–1324.

30.

Lin

Scharfstein

Rosenheck

. Analysis of longitudinal data with irregular, outcome-dependent follow-up. J Roy Stat Soc B 2004; 66: 791–813.

31.

Bůžkovà

Lumley

. Semiparametric modeling of repeated measurements under outcome-dependent follow-up. Stat Med 2009; 28: 987–1003.

32.

Zhu

Lawless

Cotton

. Estimation of parametric failure time distributions based on interval-censored data with irregular dependent follow-up. Stat Med 2017; 36: 1548–1567.

33.

Liang

Ying

. Joint modeling and analysis of longitudinal data with informative observation times. Biometrics 2009; 65: 377–384.

34.

Cai

Zhang

. Time-varying latent effect model for longitudinal data with informative observation times. Biometrics 2012; 68: 1093–1102.

35.

Dai

Pan

. Joint modelling of survival and longitudinal data with informative observation times. Scand J Stat 2018; 45: 571–589.

36.

Lipsitz

Fitzmaurice

Ibrahim

, et al. Parameter estimation in longitudinal studies with outcome-dependent follow-up. Biometrics 2002; 58: 621–630.

37.

Coulombe

Moodie

EEM

Platt

. Weighted regression analysis to correct for informative monitoring times and confounders in longitudinal studies. Biometrics 2021; 77: 162–174.

38.

Herrett

Gallagher

Bhaskaran

, et al. Data resource profile: clinical practice research datalink (CPRD). Int J Epidemiol 2015; 44: 827–836.

39.

Neyman

. On the application of probability theory to agricultural experiments. Essay on principles. Section 9 (translation published in 1990). Stat Sci 1990; 5: 472–480.

40.

Rubin

. Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol 1974; 66: 688–701.

41.

Lawless

Nadeau

. Some simple robust methods for the analysis of recurrent events. Technometrics 1995; 37: 158–168.

42.

Lin

Ying

. Semiparametric and nonparametric regression analysis of longitudinal data. J Am Stat Assoc 2001; 96: 103–126.

43.

Dong

Moodie

EEM

Villain

, et al. Evaluating the use of generalized dynamic weighted ordinary least squares for individualized HIV treatment strategies. arXiv preprint 2021; arXiv:2109.01218.

44.

Hernán

Robins

. Estimating causal effects from epidemiological data. J Epidemiol Commun H 2006; 60: 578–586.

45.

Andersen

Gill

. Cox’s regression model for counting processes: a large sample study. Ann Stat 1982; 10: 1100–1120.

46.

Therneau

. A package for survival analysis in R. R package version 3.1-12, 2018. URL: https://CRAN.R-project.org/package=survival.

47.

Rosenbaum

Rubin

. The central role of the propensity score in observational studies for causal effects. Biometrika 1983; 70: 41–55.

48.

Newey

McFadden

. Large sample estimation and hypothesis in WK Newey and D McFadden (eds) Handbook of Econometrics 4th ed. Amsterdam: Elsevier, 1994, pp. 2112–2245.

49.

Chakraborty

Moodie

EEM

. Statistical methods for dynamic treatment regimes. New York: Springer, 2013. pp. 36–37.

50.

Henckel

Perković

Maathuis

. Graphical criteria for efficient total effect estimation via adjustment in causal linear models. J R Stat Soc B 2021; 84: 579–599.

51.

Sun

McCulloch

Marr

, et al. Recurrent events analysis with data collected at informative clinical visits in electronic health records. J Am Stat Assoc 2021; 116: 594–604.

52.

Robins

Hernán

Brumback

. Marginal structural models and causal inference in epidemiology. Epidemiology 2000; 11: 1044–3983.

53.

Robins

Rotnitzky

. Recovery of information and adjustment for dependent censoring using surrogate markers. AIDS epidemiology. Boston: Birkhauser, 1992: 297–331.

54.

Deas

Robson

Wong

, et al. Measuring neighbourhood deprivation: a critique of the index of multiple deprivation. Environ Plann C 2003; 21: 883–903.

55.

Lahiri

. Theoretical comparisons of block bootstrap methods. Ann Stat 1999; 27: 386–404.

56.

Pearl

. Graphs, causality, and structural equation models. Sociol Method Res 1998; 27: 226–284.

57.

Hernán

McAdams

McGrath

, et al. Observation plans in longitudinal studies with time-varying treatments. Stat Methods Med Res 2009; 18: 27–52.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.70 MB

	Average fitted outcome (SE $^{1}$ )
Treatment	${\hat{ψ}}_{O L S}$	${\hat{ψ}}_{I P T}$	${\hat{ψ}}_{I I V}$	${\hat{ψ}}_{D W 1}$
Actual treatment received	99.11 (0.001)	99.09 (0.001)	99.08 (0.001)	99.06 (0.001)
Optimal treatment	99.11 (0.001)	99.12 (0.001)	99.08 (0.001)	99.11 (0.001)