Sage Journals: Discover world-class research

Abstract

As routinely collected longitudinal data becomes more available in many settings, policy makers are increasingly interested in the effect of time-varying treatments (sustained treatment strategies). In settings such as this, many commonly used statistical approaches for estimating treatment effects, such as g-methods, often adopt the ‘no unmeasured confounding’ assumption. Instrumental variable (IV) methods aim to reduce biases due to unmeasured confounding, but have received limited attention in settings with time-varying treatments. This paper extends and critically evaluates a commonly used IV estimating approach, Two Stage Least Squares (2SLS), for evaluating time-varying treatments. Using a simulation study, we found that, unlike standard 2SLS, the extended 2SLS performs relatively well across a wide range of circumstances, including certain model misspecifications. We illustrate the methods in an evaluation of treatment intensification for Type-2 Diabetes Mellitus, exploring the exogeneity in prescribing preferences to operationalise a time-varying instrument.

Keywords

Instrumental variable time-varying two stage least squares physician preference diabetes

1. Introduction

As routinely collected data has become more available, there has been an increasing interest in long term causal effects in studies with time-varying treatments. For example, decision makers are interested in the effect of glycemic control strategies over a sustained period of time, for which evidence from randomised controlled studies is often limited. Time-varying confounding, where health-related factors impact both treatment and outcome after the first recorded time period is a major challenge. Since time-varying confounders often act as mediators of the effect of previous treatment, standard regression adjustment for these variables blocks indirect effects of the treatment. To handle this, more sophisticated methods such as the g-methods¹ are needed. Such methods often make the assumption of no unmeasured confounding. In practice, this is likely to be implausible. For example, the prescription of anti-diabetic medication at each time point may depend on a wide range of factors, such as co-morbidities and disease severity, not all of which may be measured.

A popular approach to deal with unmeasured confounding is to use Instrumental Variable (IV) analysis, which has been widely used across many disciplines such as genetics, economics and clinical research.^2–4 IV exploits sources of exogenous variation that are strongly associated with treatment assignment, and affect the outcome only through the treatment. IV estimating approaches such as the Wald estimator and Two Stage Least Squares (2SLS) for evaluating time-fixed point treatments are well established.⁵ However, these approaches have received little attention in the evaluation of time-varying treatments.

Recent research has sought to extend IV methodologies to time-varying settings.^6,7 In this setting, a major challenge is to identify an instrument that remains valid and strong across different time periods. For example, genes have been considered for the evaluation of time-varying treatments in Mendelian Randomisation (MR).^8–11 MR studies only considered time-fixed baseline IVs given the nature of genetic markers. An alternative strategy is to identify an IV that varies over time to instrument a sequential treatment assignment, which is then operationalised via $g$ -estimation.^12–16

Two promising methods have recently been documented that can apply a time-varying IV to longitudinal data. The first is a novel inverse probability weighting procedure,¹⁷ and the second is an application of $g$ -estimation methods in time-varying settings.^11,16 Our recent work¹⁶ contrasted these two methods in full time-varying settings and found that both can obtain unbiased causal estimators using a time-varying IV with strong association with treatment. Unlike the IPW-based approach, estimates via $g$ -estimation remain unbiased and precise in settings with weaker IV strengths.

However, $g$ -estimation methods tend to be relatively complex to implement, and are less familiar in many statistical fields. As such, there is substantial interest in investigating related, but more accessible IV approaches such as 2SLS for incorporating time-varying instruments. However, little attention has been paid to this area of research.

The paper’s primary aim is to investigate the application of time-varying IVs using 2SLS, for which there is little established work. We then draw on multivariate 2SLS methods in MR settings^4,18,19 and robust 2SLS methods in time-fixed settings,^20–22 and detail a residual instrument 2SLS (RI2SLS) approach that accommodates a time-varying IV. We show this method can be related to recent advances in $g$ -estimation in time varying IV settings, and compare its relative performance with standard 2SLS methods. The methods are illustrated in a case study evaluating second line treatment for type 2 diabetes (T2D), using GP prescribing preferences for a newer class of inhibitors as the time-varying IV. Preference is widely applied as an IV in clinical studies,³ and is naturally time-varying in nature. However, preference has not, to our knowledge, been used to construct multiple IVs, and we highlight the challenges of operationalising prescription preferences in this way.

2. Motivating example

Our paper is motivated by an analysis of the effectiveness of second line therapy for T2D on blood glucose levels. T2D is a progressive disease characterised by a impaired ability for pancreatic $β$ -cells to release insulin, leading to elevated glycated haemoglobin (HbA1c), or blood glucose levels. More than 4.4 million people are estimated to be living with T2D in the UK, with more than 3.2 million at an increased risk in the future,²³ and it contributes to increased risks of cardiovascular disease, chronic kidney disease and vascular difficulties.

Treatment involves prescribed medication to control and lower HbA1c levels. NICE guidelines in the UK recommends Metformin monotherapy as first line. However about 30%–50% of patients either fail to respond to first line treatment, or monotherapy becomes less effective over time, and second line intensification is often necessary. Second line therapy supplements Metformin with a second oral anti-diabetic. Owing to insufficient evidence of a preferred second line therapy, NICE (2022) guidelines²⁴ leave the choice of treatment to clinicians and primary care practices. For this reason, second line therapy preferences can differ greatly between practices and GPs, and are subject to change over time.²⁵ Patients without high risk of cardio-vascular disease are most commonly assigned Sulfonylureas (SU), or DPP4-inhibitors (DPP4) at second line therapy. As one of the first intensifications for T2D, GPs have a strong historical preference for SU. However recent studies may have shifted preference towards DPP4 in recent years.^26,27

2.1. Study population and eligibility criteria

Our motivating example includes data from routinely collected primary care records in three East London clinical commissioning groups (CCGs) based in Tower Hamlets, Newham and City, and Hackney. Data on treatment and health related information was collected and recorded in intervals of 6 months from 2012 to 2018. This period coincided with a shift in medical preferences about DPP4 versus SU.²⁶

The median follow up for patients on second line T2D treatment is two and half years (i.e. 5 time periods). We look to follow up patients for up to 2 years, or 4 periods. Time $t = 1$ is taken as the first period (baseline) the patient was recorded taking second line therapy, indicating they had begun this treatment within the last 6 months. Times $t = 2$ and $t = 3$ represent a 6 and 12 months follow up time respectively, with the outcome recorded 6 months later, approximately 2 years (or 24 months) after initiation.

Eligible patients were between 18 and 89 years of age, registered with a primary care practice, and initiated second line therapy after first line monotherapy failed. Patients were required to be on either SU or DPP4 at initiation of second line therapy, with complete relevant data available for the full follow up period. Patients who do not start on one of these two treatment regimes, leave the study before 3 follow up times, pause treatment on SU or DPP4, or begin a further intensification by taking both or another diabetic treatment, are censored from the study. Our initial data subset includes $n = 7342$ patients who were recorded as initiating second line therapy for diabetes for the first time, after monotherapy failed, between October 2012 to October 2017. Of these, there are $n = 3640$ patients who meet the above criteria and $n = 2561$ with complete data on all relevant variables.

2.2. Study design

Our treatment is a contrast of one of two first intensification second line treatments.

Treatment: Treatment intensification with DPP4.

Control: Treatment intensification with SU.

Treatment is recorded at each 6 month interval, with the Treatment group denoted 1, and the Control group, denoted 0.

Outcome is the recorded measure of HbA1c levels in mmol/mol, 2 years (4 time periods) after initiation of second line therapy. We are interested in the Average Treatment Effect (ATE) of sustained treatment with DPP4, compared to SU, over 18 months.

Due to the possibility of sustained treatment depending upon time varying prognostic measures, such as trajectory of glycemic control, or unmeasured patient characteristics, we are motivated to perform the analysis using time-varying IV methodologies, using a measure of physicians prescription preference, denoted as tendency to prescribe (TTP), taken over time as an IV. Full details are in the methods section.

Baseline characteristics are presented in the Appendix A.4 (Table A.3). Data is available at baseline on age, gender and ethnicity, and over time on HbA1c levels, Body Mass Index, systolic blood pressure, blood lipid profiles, kidney function, and history of stroke and hypoglycaemic events. Co-prescription history of statins and beta blockers was also available. Patients on DPP4 have lower HbA1c levels at baseline and prior to second line therapy, with higher levels of Body Mass Index over 34. Notably, patients were majority non-White, with around 75% recorded as Black, South Asian, or Other Ethnicity.

Table 1.
Simulation results for the simple data setup, targeting the ATE with 95% coverage based on $b$ =1000 bootstrapped samples.

Simple data set-up

2SLS 2SLS-L

n $Δ_{t}$ $F$ -stat Bias RMSE MCE Coverage Bias RMSE MCE Coverage

5000 0.5 1542 0.003 0.42 0.005 93.5 0.000 0.36 0.004 94.3

5000 0.3 506 0.007 0.52 0.009 93.6 0.002 0.45 0.006 94.8

5000 0.1 68 0.049 0.85 0.022 94.2 0.016 0.72 0.016 95.8

5000 0.5–0.1* 191 0.013 0.570 0.010 93.5 0.008 0.495 0.008 94.9

1000 0.5 305 0.006 0.63 0.012 94.5 0.010 0.55 0.010 94.9

1000 0.3 98 0.023 0.79 0.020 95.6 0.011 0.682 0.014 94.9

1000 0.1 11 0.482 2.03 0.130 98.9 0.065 1.27 0.050 98.9

1000 0.5–0.1* 41 0.072 0.904 0.026 95.6 0.022 0.770 0.019 96.2

RI2SLS RI2SLS-L

n $Δ_{t}$ $F$ -stat Bias RMSE MCE Coverage Bias RMSE MCE Coverage

5000 0.5 1542 0.003 0.42 0.005 93.5 0.000 0.36 0.004 94.3

5000 0.3 506 0.007 0.52 0.009 93.6 0.002 0.45 0.006 94.8

5000 0.1 68 0.049 0.85 0.022 94.2 0.016 0.72 0.016 95.8

5000 0.5–0.1* 191 0.013 0.570 0.010 93.5 0.008 0.495 0.008 94.9

1000 0.5 305 0.006 0.63 0.012 94.5 0.010 0.55 0.010 94.9

1000 0.3 98 0.023 0.79 0.020 95.6 0.011 0.682 0.014 94.9

1000 0.1 11 0.482 2.03 0.130 98.9 0.065 1.27 0.050 98.9

1000 0.5–0.1* 41 0.072 0.904 0.026 95.6 0.022 0.770 0.019 96.2

Simple data set-up
5000	0.5	1542	0.003	0.42	0.005	93.5	0.000	0.36	0.004	94.3
5000	0.3	506	0.007	0.52	0.009	93.6	0.002	0.45	0.006	94.8
5000	0.1	68	0.049	0.85	0.022	94.2	0.016	0.72	0.016	95.8
5000	0.5–0.1*	191	0.013	0.570	0.010	93.5	0.008	0.495	0.008	94.9
1000	0.5	305	0.006	0.63	0.012	94.5	0.010	0.55	0.010	94.9
1000	0.3	98	0.023	0.79	0.020	95.6	0.011	0.682	0.014	94.9
1000	0.1	11	0.482	2.03	0.130	98.9	0.065	1.27	0.050	98.9
1000	0.5–0.1*	41	0.072	0.904	0.026	95.6	0.022	0.770	0.019	96.2
	RI2SLS	RI2SLS-L
n	$Δ_{t}$	$F$ -stat	Bias	RMSE	MCE	Coverage	Bias	RMSE	MCE	Coverage
5000	0.5	1542	0.003	0.42	0.005	93.5	0.000	0.36	0.004	94.3
5000	0.3	506	0.007	0.52	0.009	93.6	0.002	0.45	0.006	94.8
5000	0.1	68	0.049	0.85	0.022	94.2	0.016	0.72	0.016	95.8
5000	0.5–0.1*	191	0.013	0.570	0.010	93.5	0.008	0.495	0.008	94.9
1000	0.5	305	0.006	0.63	0.012	94.5	0.010	0.55	0.010	94.9
1000	0.3	98	0.023	0.79	0.020	95.6	0.011	0.682	0.014	94.9
1000	0.1	11	0.482	2.03	0.130	98.9	0.065	1.27	0.050	98.9
1000	0.5–0.1*	41	0.072	0.904	0.026	95.6	0.022	0.770	0.019	96.2

$F$ -stat: The average value of the conditional $F$ -statistic for all $A_{t}$ over the simulations. *0.5–0.01 denotes a situation in which the strength of $Δ_{t}$ decreases over time with $Δ_{t} = (0.5, 0.3, 0.1)$ . ATE: average treatment effect; 2SLS: two stage least squares; RI2SLS: residual instrument 2SLS; RMSE: root mean square error; MCE: Monte Carlo error.

Table 2.

Simulation results for the complex data setup, targeting the ATE with 95% coverage based on $b$ = 1000 bootstrapped samples.

Complex data set-up
			2SLS				2SLS-L
n	$Δ_{t}$	$F$ -stat	Bias	RMSE	MCE	Coverage	Bias	RMSE	MCE	Coverage
5000	0.5	1234	1.900	1.379	0.003	00.0	0.607	0.784	0.003	00.0
5000	0.3	283	2.677	6.67	1.400	76.2	0.732	5.69	1.024	99.3
5000	0.1	6	3.371	1.837	0.006	2.7	0.151	0.762	0.018	94.3
5000	0.5–0.1*	88	1.536	2.550	0.200	60.7	1.185	3.336	0.350	99.7
1000	0.5	233	1.899	1.382	0.007	00.0	0.621	0.809	0.007	18.6
1000	0.3	56	1.587	5.109	0.825	78.3	0.265	3.460	0.379	99.6
1000	0.1	2	3.381	1.885	0.034	40.9	0.102	5.02	0.798	97.0
1000	0.5–0.1*	20	2.107	3.753	0.441	81.3	1.072	5.291	0.885	99.1
			RI2SLS				RI2SLS-L
n	$Δ_{t}$	$F$ -stat	Bias	RMSE	MCE	Coverage	Bias	RMSE	MCE	Coverage
5000	0.5	1234	0.010	0.328	0.003	96.1	0.009	0.329	0.003	95.9
5000	0.3	283	0.014	0.42	0.006	95.2	0.013	0.42	0.006	94.7
5000	0.1	6	0.240	1.475	0.068	96.2	0.002	1.645	0.085	96.6
5000	0.5–0.1*	88	0.011	0.389	0.004	95.1	0.008	0.399	0.005	94.9
1000	0.5	233	0.003	0.499	0.008	95.8	0.003	0.499	0.008	95.9
1000	0.3	56	0.016	0.639	0.013	95.4	0.006	0.641	0.013	96.0
1000	0.1	2	1.477	6.811	1.468	98.5	1.687	4.945	0.773	99.5
1000	0.5–0.1*	20	0.012	0.688	0.015	96.4	0.025	0.693	0.015	96.6

$F$ -stat: The average value of the conditional $F$ -statistic for all $A_{t}$ over the simulations. *0.5-0.01 denotes a situation in which the strength of $Δ_{t}$ decreases over time with $Δ_{t} = (0.5, 0.3, 0.1)$ . ATE: average treatment effect; 2SLS: two stage least squares; RI2SLS: residual instrument 2SLS; RMSE: root mean square error; MCE: Monte Carlo error.

Table 3.

Simulation results targeting the ATE, using RI2SLS for complex data setups with mispecified models for $Z$ , $A$ and $Y$ .

			Bias	RMSE	Bias	RMSE
$σ_{Z}$ , $σ_{A}$ , $σ_{Y}$	n	$Δ_{t}$	RI2SLS		RI2SLS-L
1, 1, 1	5000	0.3	0.097	0.837	0.124	0.783
1, 1, 1	1000	0.3	0.122	1.281	0.135	1.182
3, 3, 3	5000	0.3	0.485	1.10	0.564	0.937
3, 3, 3	1000	0.3	0.398	1.654	0.594	1.300
1, 0, 0	5000	0.3	0.005	0.569	0.015	0.578
1, 0, 0	1000	0.3	0.009	0.879	0.032	0.886
3, 0, 0	5000	0.3	0.199	0.614	0.182	0.612
3, 0, 0	1000	0.3	0.231	0.876	0.251	0.885
0, 1, 0	5000	0.3	0.006	0.543	0.004	0.566
0, 1, 0	1000	0.3	0.042	0.837	0.040	0.850
0, 3, 0	5000	0.3	0.015	0.585	0.019	0.610
0, 3, 0	1000	0.3	0.069	0.896	0.051	0.914
0, 0, 1	5000	0.3	0.013	0.426	0.012	0.425
0, 0, 1	1000	0.3	0.022	0.647	0.007	0.642
0, 0, 3	5000	0.3	0.011	0.441	0.010	0.430
0, 0, 3	1000	0.3	0.013	0.426	0.013	0.426

ATE: average treatment effect; RI2SLS: residual instrument two stage least squares; RMSE: root mean square error.

3. Methods

3.1. Overview

Suppose $T$ time periods for which we observe a time-varying treatment $A_{t}$ up to time $T$ . For the rest of the paper we will refer to treated $(A_{t} = 1)$ and control $(A_{t} = 0)$ groups. When referring to variables without subscripts, this is defined as the set of all observations over time of that variable, that is $A = (A_{1}, \dots, A_{T})$ etc. We observe a continuous end-of-study outcome $Y$ at time $T + 1$ , and observed, and unobserved time-varying confounders $L_{t}$ and $U_{t}$ respectively, confounding the effect of $A_{t}$ on $Y$ . We define $Z_{t}$ as the time-varying instrument.

We describe the general time-varying data setup in the Directed Acyclic Graph (DAG) shown in Figure 1. We first followed previous works with time dependent variables^15,17 which considered data setups without the dashed directional arrows. We refer to this as the ‘Simple’ data setup. Informed by our case study and prior works¹⁶ we also considered a more ‘Complex’ data setup which includes these dashed arrows, which may pose additional challenges to time-varying IV analysis. In particular the treatment and instrument are allowed to depend on past instrument and treatment history respectively, as well as on time-varying confounders.

Figure 1.

Directed acyclic graph (DAG) of data setup with $T = 3$ treatment periods. All lines represent associations between variables. The dotted lines are used for visual clarity. The dashed arrows between the $A_{t}$ and $Z_{t}$ and between $L_{t}$ and $Z_{t}$ represent more complex relationships introduced for the ‘Complex’ data setup. In the ‘Simple’ data setup, these associations are removed.

Taking our motivating example, $Z_{t}$ is General Practitioner’s (GP) preference for prescribing DPP4 over the period $t$ . A GP may continue to prescribe one treatment over another if they have historically preferred it in the past (dependence on $Z_{t - 1}$ ). Preference might also be affected by a familiarity with one treatment over another based on prescribing history (dependence on $A_{t - 1}$ ). Furthermore, it is not unlikely that a GP’s belief in one treatment over another may develop over time, based on health-related indicators. A GP with a preference for DPP4 may, if a number of their patients display poor glycemic control or worsening health outcomes, waver in this belief, and adjust their preference accordingly (dependence on $L_{t}$ ).

Define $Y (a)$ as the counterfactual outcome that would have been observed under some treatment regime $a = (a_{1}, \dots, a_{T})$ . The ATE is derived by contrasting counterfactual outcomes under alternative regimens, for example, one in which individuals always take treatment at the different time periods versus another where they always take the control. As such, the ATE is defined as

E [Y (1, 1, 1) - Y (0, 0, 0)] .

In multivariate IV settings, identifying causal estimands that represent the ATE is challenging and requires certain assumptions about the data and the substantive model of interest.²⁸

3.2. IV Assumptions and substantive model

As with most causal methods, we make the assumption of counterfactual consistency (observed outcome $Y$ under observed treatment regime $a$ is equal to counterfactual outcome $Y (a)$ ).¹¹ We will also assume that, controlling for some set $M_{t}$ , $Z_{t}$ satisfies the following three assumptions.

IV1 (IV relevance): There exists a measurement of $Z$ and $A$ at each time point, and there exists an association between $Z_{t}$ and $A_{t}$ , conditional on $M_{t}$ , at each time point. This association must be sufficiently strong.

IV2 (Conditional exchangeability): $Z_{t}$ does not share any common causes with $Y$ , conditional on the set of variables $M_{t}$ . The conditional exchangeability assumption essentially expresses that $Z_{t}$ should not depend on unmeasured common causes $U_{t}$ of treatment and outcome.

IV3 (Exclusion restriction): $Z_{t}$ cannot have any direct effect on $Y$ other than through $A$ , nor does it affect future measured or unmeasured confounding ( $Z_{t}$ may affect future values of $Z$ ).

IV2 and IV3 are often formalised by the single assumption $Y (a) ⊥ ⊥ Z_{t} | M_{t} \forall t and a$ .

We also make the following assumption about the nature of the causal relationship between

A

and

Y

4.
IV4 (No current treatment interaction): The average causal effect of treatment at time $t$ on the outcome is the same in the treated $A_{t} = 1$ , and the control $A_{t} = 0$ groups, that is it does not depend on observed treatment, within levels of $Z_{t}$ and $M_{t}$ for all $t$ and ${\bar{a}}_{t}$ :
$\begin{aligned} E [Y ({\bar{a}}_{t}, 0) - Y ({\bar{a}}_{t - 1}, 0) | {\bar{A}}_{t - 1} = {\bar{a}}_{t - 1}, A_{t} = 1, {\bar{Z}}_{t}, M_{t}] \\ = E [Y ({\bar{a}}_{t}, 0) - Y ({\bar{a}}_{t - 1}, 0) | {\bar{A}}_{t - 1} = {\bar{a}}_{t - 1}, A_{t} = 0, {\bar{Z}}_{t}, M_{t}] \end{aligned}$

where $Y ({\bar{a}}_{t}, 0)$ is the counterfactual outcome under treatment history ${\bar{a}}_{t}$ up to time $t$ , and 0 afterwards. We let $M_{t}$ be the set of variables that are sufficient to adjust for any confounding of the effect of $Z_{t}$ on $Y$ . In general, we expect that $M_{t}$ would include any variable that affects $Z_{t}$ and either $A_{t}$ , $Y_{t}$ , or both. This would typically include history of $A$ and $Z$ , and any confounding variable $L_{t}$ that affects $A_{t}$ or $Y$ . For example, in our simple setup $M_{t} = Z_{t - 1}$ and in the complex setup $M_{t} = (Z_{t - 1}, A_{t - 1}, L_{t})$ .

Assumptions IV1 to IV3 are multivariate extensions of the standard IV assumptions, whilst IV4 is necessary for the interpretation of the 2SLS estimators as the ATE.²⁹ Of these assumptions only IV1 is empirically testable from the available data, with IV2-4 requiring careful consideration by practitioners. In a time-fixed setting, the Cragg-Donald $F$ -statistic is often used to evaluate IV1.^30–32 An $F$ -statistic of 10 is historically cited as a sign of a sufficiently strong instrument, however some work suggests a value of 100 may be required for good inference.³³ For multiple IVs the Sanderson-Windmeijer conditional $F$ -test^18,34 has been proposed. This test is an adaptation of the univariate $F$ -test which can assess the overall strength of the IVs as a whole, and is one of the most reliable single tests available in this setting. Conditional $F$ -statistics of over 10 are an indicator of sufficient strength across $Z$ .

For IV2, checking the balance of measured confounders within IV groups can test for the necessity of measured variables to be included within $M_{t}$ . However, violations of IV2 or IV3 due to relationships with unmeasured confounders cannot be tested for. IV4 cannot be tested from the available data. However expert advice, logical argument, and using variables that are known to be reliable candidates for an IV in previous works can provide reassurance that IV assumptions hold.
3.2.1. Substantive model

In time-fixed 2SLS literature the substantive model of interest is usually expressed as a Linear Structural Mean Model.²⁰ In time-varying settings, the substantive model is expressed via a Structural Nested Mean Model (SNMM).³⁵ We assume in this paper a SNMM of the form.

E [Y ({\bar{a}}_{t}, 0) - Y ({\bar{a}}_{t - 1}, 0) | {\bar{A}}_{t} = {\bar{a}}_{t}, {\bar{Z}}_{t}, M_{t}] = β_{t} a_{t} t = 1, \dots, T

(1)

Here

β_{t}

is expressed as a ‘blip’, the effect of being treated versus not treated at time

t

on the counterfactual outcome in which a patient follows their observed history up to time

t - 1

(\bar{a_{t - 1}})

, and given control after time

t

The parameters $β_{t}$ have a causal interpretation that is conditional on ${\bar{Z}}_{t}$ and $M_{t}$ . To be able to estimate the ATE from equation (1), we require that they have an interpretation marginal on ${\bar{Z}}_{t}$ . For this, we require assumptions IV1–IV4 in the prior section. It is clear that if equation (1) correctly specifies the causal effect, and assume IV2 and IV3 hold, IV4 then allows the parameters of our SNMM to be related to those of a simpler Marginal Structural Model (MSM)^16,29

E [Y (a)] = τ + \sum_{t = 1}^{T} β_{t} a_{t} .

It is then immediately clear that the sum of the

β_{t}

is the ATE.

3.3. Two stage least squares methods

3.4. Standard 2SLS

The standard 2SLS methodology in time-fixed situations¹ was generalised to the case of multiple treatments via Multivariate Mendelian Randomisation.^11,18 In this paper, we have a just-identified situation, where we have as many instruments as time periods. Standard 2SLS can be generalised to the case of time-varying instruments as follows.

First stage models: Postulate and fit a series of first stage models for each $A_{t}$ $f_{A_{t}} (Z, C; α_{t}) = E (A_{t} | Z, C; α_{t})$ by fitting a main effects linear regression model using Ordinary Least Squares (OLS) for $A_{t}$ on all instruments $Z_{t}$ and a set of auxiliary variables $C$ (which are discussed below).

E (A_{t} | Z, C; α_{t}) = \sum_{j = 1}^{T} α_{t j} Z_{j} + α_{C t} C

for

t = 1, \dots, T

and

α_{t} = (α_{t 1}, \dots, α_{t T}, α_{C t})

. This yields estimates of the first stage parameters

α_{t}^{*}

Obtain predicted values for $A_{t}$ from the fitted models, defined as $\hat{A_{t}} = E (A_{t} | Z, C, α_{t}^{*})$ .

Second stage model: Postulate and fit a main effects regression model for $Y$ against the predicted values of ${\hat{A}}_{t}$ and $C$ , fitted using OLS.

f_{Y} (Z, C; β) = E (Y | {\hat{A}}_{1}, \dots, {\hat{A}}_{t}; β) = β_{0} + \sum_{t = 1}^{T} β_{t} {\hat{A}}_{t} + β_{C} C

The estimated coefficients

β_{t}^{*}

of the fitted model are the causal effects of interest of equation (1), for which the ATE can be derived. Note that it is sufficient to condition the first stage models on

Z

up to treatment time

t

Provided that 2SLS is fit using OLS in both stages, it can be shown that this method amounts to solving the following estimating equations.

0 = \sum_{i = 1}^{n} [(1, {\hat{A}}_{1 i}, \dots, {\hat{A}}_{t i}, C_{i})^{'} {Y_{i} - β_{C} C_{i} - β_{0} - \sum_{t = 1}^{T} β_{t} A_{t i}}]

(2)

where

β_{0} = E (Y (0) | C = 0)

and

β_{t}

estimate the parameters of the SNMM. Full details are in the Appendix. Note that in cases of a binary treatment, the first stage model can be fitted as a Linear Probability Model (LPM) using OLS. In which case the above results hold in the same way.

Consistent estimation using 2SLS requires two conditions on the set of auxiliary variables $C$ . First and foremost $C$ must at least include all (non-instrument) variables necessary for the IVs to satisfy assumptions IV1-IV3. Second, both stages must be fit controlling for the same set $C$ . This presents a problem in time-varying settings as in practice $C$ can only include baseline confounders, which we explain in the following paragraph. In the simple data setup we have $M = (Z_{1}, Z_{2})$ and as such $C$ is the empty set. In this setting consistent estimation of $β$ is possible with 2SLS.

However in the complex setup $M = (Z_{1}, Z_{2}, A_{1}, A_{2}, L_{1}, L_{2}, L_{3})$ and thus consistent estimation would require we set $C = (A_{1}, A_{2}, L_{1}, L_{2}, L_{3})$ in both stages. However, controlling for $L_{2}$ or $L_{3}$ in the first stage model for $A_{1}$ will collide the $A_{1} - L_{1}$ association, and create a path between $A_{1}$ and $U_{2}$ . In the second stage model, conditioning on $L_{2}$ or $L_{3}$ blocks causal pathways from $A_{t}$ to $Y$ , leading to inconsistent estimates of $β$ . Additionally $C$ cannot control for history of $A_{t}$ , as each first stage model must control for the same set $C$ , but the model for $A_{t}$ cannot control for itself. This is detailed further in the Appendix. The consequence is that 2SLS is not a viable estimation method in settings where the time-varying IV is dependent on past history of other variables.

3.5. Residual instrument 2SLS for time varying instruments

To remedy this problem we turn to robust 2SLS methodologies.^20–22 To handle the complex data structure of Figure 1 using 2SLS, we consider the following modification.

Postulate a model for each $Z_{t}$ given $M_{t}$ with coefficients $γ$

f_{Z_{t}} = E (Z_{t} | M_{t}; γ)

using an appropriate regression model. For example, a logistic regression model may be used for a binary

Z_{t}

l o g (\frac{P (Z_{t} = 1 | M_{t})}{1 - P (Z_{t} = 1 | M_{t})}) = γ_{0} + γ_{M_{t}} M_{t} .

Fit this model and calculate predictions for $Z_{t}$ as

{\hat{Z}}_{t} = E (Z_{t} | M_{t}; γ^{*}) .

From this, define residuals $Z_{t}^{r e s} = Z_{t} - {\hat{Z}}_{t}$

Now perform 2SLS, replacing $Z_{t}$ with $Z_{t}^{r e s}$

This method may be viewed as an application of 2SLS, taking

Z_{t}^{r e s}

as the time-varying IVs of the method. These are sometimes known as the Frisch–Waugh–Lovell residualised instruments, designed to allow the fitting of partial regression models.

Crucially, provided that $f_{Z_{t}}$ is correctly specified, $Z_{t}^{r e s}$ is mean independent of $M_{t}$ , that is to say that $E (Z_{t}^{r e s} | M_{t}) = E (Z_{t}^{r e s})$ , and the residual instruments are uncorrelated with $M_{t}$ .

It follows that, provided that $Z_{t}^{r e s}$ remain suitably strongly associated with $A_{t}$ to satisfy assumption IV1, the set $C$ no longer needs to include the variables contained in the $M_{t}$ . In fact RI2SLS can obtain a consistent estimate of $β$ without a need to control for $C$ at all. This solves the difficulty identified in 2SLS.

Intuitively, $Z_{t}^{r e s}$ removes the dependence on $M_{t}$ by modelling this dependence and subtracting it. What remains is an IV that, at least in expectation, depends on $Y$ only through $A_{t}$ . If $Z_{t}$ depends on $M_{t}$ only through its expectation, then IV2 and IV3 hold for $Z_{t}^{r e s}$ marginal on $M_{t}$ . If not, IV2 and IV3 hold for $Z_{t}^{r e s}$ at least in expectation, marginal on $M_{t}$ , which is sufficient for consistent estimation.

When fitted using OLS, RI2SLS amounts to solving the following estimating equations (see Appendix)

0 = \sum_{i = 1}^{n} [α^{*} (1, Z_{1 i} - E (Z_{1 i} | M_{1 i}), \dots, Z_{t i} - E (Z_{t i} | M_{t i}), C_{i})^{'} {Y_{i} - β_{C} C_{i} - β_{0} - \sum_{t = 1}^{T} β_{t} A_{t i}}]

(3)

with

α^{*}

the matrix of estimates of the first stage model coefficients. Note that under the SNMM of equation (1)

E (Y (0)) = E (Y - \sum_{i = 1}^{T} β_{t} A_{t})

. IV2 and IV3 imply conditional mean independence between

Y (0)

and

Z_{t}

, given

M_{t}

, which implies they are uncorrelated. Equation (2) finds the

β_{t}

which sets the covariances between

Y (0)

and

Z_{t}

, given

M_{t}

to zero.

Provided that both the first and second stage models are fit using OLS, RI2SLS is equivalent to the $g$ -estimation methods considered in Hogan⁶ and Robins⁷ and more recently in Tompsett et al.¹⁶ (see Appendix). This bridge between $g$ -estimation methods and 2SLS was first considered for time-fixed scenarios.^6,22 It was not fully explored however in time varying settings. G-estimators are less popular as they tend to be perceived as more complex to implement. However RI2SLS can be easily performed with standard 2SLS software. As such, this equivalence may open up the use of $g$ -estimation in time varying scenarios to a wider audience.

Calculation of the sample variance and its asymptotic properties for the estimators of $β_{t}$ or the ATE are challenging, due to the complex nature of the time-varying confounding. We therefore used the non-parametric bootstrap for deriving standard errors of the ATE as this is common practice in this area.^20,22 We adopted block bootstrap to preserve dependence over time, i.e. within each bootstrap, data is resampled at the individual level (not person-time). In addition, we found that existing robust variance methods for residualised 2SLS have been found to overestimate confidence interval (CI) coverage.

In RI2SLS, the only necessary conditions on the set $C$ is that it does not include variables beyond baseline, and $C$ is included in both the first and second stage models. In these settings, we can choose whether to control for any set $C$ , or to control for a set $C$ that includes baseline variables. This is also true of 2SLS in simple cases where $M_{t}$ is the empty set.

Recent work²⁰ demonstrated in the time-fixed case that controlling for baseline confounders between $A$ and $Y$ can improve the standard error of estimates. We will therefore also investigate 2SLS and RI2SLS, controlling for baseline confounders of $A$ and $Y$ in the first and second stage models. Based on our data generating mechanism in Figure 1, this means setting $C = L_{1}$ . We refer to these methods as 2SLS-L and RI2SLS-L, respectively.

3.6. Robustness of RI2SLS

RI2SLS in time-fixed settings was noted to be doubly robust, that is unbiased provided that either $f_{Z_{1}}$ or $f_{Y}$ correctly specified the relationship between $Z_{1}$ and $M_{1}$ , or $Y$ and $M_{1}$ .^20,21 This property can extend to time-varying settings, with consistent estimation possible if either $f_{Z_{t}}$ correctly specifies the $Z_{t} - M_{t}$ relationship, or $f_{Y}$ correctly specifies the $Y - M$ relationship. However if we cannot include variables post baseline in $C$ , the $Y - M$ cannot be correctly specified if $M$ contains covariates after baseline.

There is one exception. If $M_{t}$ includes only baseline confounding $L_{1}$ , we can perform RI2SLS-L with $C = L_{1}$ , and if the relationship between $Y$ and $L_{1}$ is correctly specified, consistent estimation is possible even if $E (Z_{t} | M_{t})$ is incorrectly modelled (see Appendix).

This means we are typically reliant on the IV models $f_{Z_{t}}$ being correctly specified to ensure consistency. How vulnerable RI2SLS is to misspecification of $f_{Z_{t}}$ is therefore a point of interest, as is whether controlling for $L_{1}$ could offer partial protection.

4. Simulations

4.1. Data generating mechanism

We simulate data according to Figure 1 with $T = 3$ treatment periods. The main objectives are 1: To determine the consistency and efficiency of standard 2SLS and RI2SLS under the data setups of Figure 1 under varying IV strengths and sample sizes, and 2: Assess the robustness of RI2SLS in complex situations in which the $Z_{t}$ , $A_{t}$ and $Y$ depend on non-linear terms.

Unmeasured confounding $U_{t}$ follows a standard normal distribution, and $L_{t}$ is simulated using a normal distribution with mean $L_{t - 1} + A_{t - 1}$ and standard deviation $\frac{U_{t}}{3}$ .

The time-varying instrument $Z_{t}$ is simulated by

l o g i t (P (Z_{t} = 1 | Z_{t - 1}, A_{t - 1}, L_{t})) = μ_{Z_{t}} - E (μ_{Z_{t}})

where in the simple data setup

μ_{Z_{t}} = Z_{t - 1}

and in the complex data setup

μ_{Z_{t}} = Z_{t - 1} + A_{t - 1} + L_{t} - σ_{Z} A_{t - 1} Z_{t - 1} - I (σ_{Z}) L_{t}^{2} .

Here

σ_{Z}

introduces terms that may trigger misspecification of

f_{Z_{t}}

and

I (σ_{Z})

is an indicator function that is 0 when

σ_{Z} = 0

and 1 otherwise.

$A_{t}$ is a binary variable, with

l o g i t (P (A_{t} = 1 | Z_{t - 1}, Z_{t}, A_{t - 1}, L_{t}, U_{t})) = Φ (μ_{A_{t}} - E (μ_{A_{t}})) (1 - Δ_{t}) + Z_{t} Δ_{t}

where

μ_{A_{t}} = A_{t - 1} + L_{t} + U_{t}

in the simple case and

μ_{A_{t}} = Z_{t - 1} + A_{t - 1} + L_{t} + U_{t} - σ_{A} A_{t - 1} Z_{t - 1} - I (σ_{A}) L_{t}^{2}

in the complex case with

Φ

denoting the standard normal cdf. We simulate this way based on work in Michael¹⁷ and Tompsett et al.¹⁶ as

Δ_{t}

is approximately the correlation strength between

Z_{t}

and

A_{t}

, allowing for easier control of IV strength. It also allows fair comparison to results of

g

-estimation and inverse probability weighting in our prior work.¹⁶

Lastly $Y$ follows a normal distribution with mean

E (Y | A, L, U) = \sum_{t = 1}^{T} (U_{t} + A_{t} + L_{t}) + σ_{Y} L_{1}^{2}

and standard deviation 1.

The true values for

β_{t}

are

(3, 2, 1)

, and subsequently the ATE’s true value is 6.

4.2. Implementation

For each simulated scenario, we generate 1000 datasets. CIs are obtained via a percentile bootstrap method using 1000 bootstrapped datasets. 2SLS and RI2SLS are performed as in Section 3, with $E (Z_{t} | M_{t})$ estimated using a main effects logistic regression model and $C$ the empty set. We will also use 2SLS-L and RI2SLS-L, by setting $C = L_{1}$ in the first and second stage models.

In Appendix A.3 (Table A.2), we also report the performance of 2SLS and RI2SLS when the first stage model is a log linear (Probit) model.

To test our first objective, we set $σ_{Z} = σ_{Y} = σ_{A} = 0$ (no non-linear terms) and vary the sample size $n$ and parameter $Δ_{t}$ , which can be set to values between 0 and 1 to set the strength of association between $Z_{t}$ and $A_{t}$ . This is set to $0.1$ (weak), $0.3$ (moderate) and $0.5$ (strong).

Secondly, we set $σ_{Z}$ and $σ_{Y}$ and $σ_{A}$ to non-zero values of $1$ or $3$ respectively, to test robustness to non-linear terms. As recommended, Morris et al.³⁶ we report for the ATE the average bias, Root Mean Square Error (RMSE) and Coverage defined as the proportion of datasets in which the bootstrap 95% CI included the true value. Additionally, we will report the average values of the conditional $F$ -test in each case.

4.3. Results

Tables 1 and 2 present the results of the simulations for the scenarios with no misspecification. For the simpler data set-up, 2SLS and RI2SLS perform identically. In this scenario values of the conditional $F$ -statistic ranged between 1500 with $Δ = 0.5$ and $n = 5000$ to around 10 for $Δ = 0.1$ and $n = 1000$ . Unbiased estimates of the ATE are attained with coverage close to nominal levels in almost all scenarios except when the instrument strength is at its weakest, $Δ_{t} = 0.1$ , and sample size is 1000 where results break down due to weak instrument bias.

As IV strength and sample size decrease, the RMSE increases as expected. The performance of the methods slightly deteriorates with weaker IV strength, although biases are still minimal (below 5%) and CI remain near nominal levels (except the scenario highlighted above).

In the complex scenario, 2SLS shows poor performance and unstable results. With coverages tending towards 0 in cases of heavy bias, or towards 100 in cases where estimates over the $m$ datasets become highly variable and unstable. This is due to violations of IV2 and IV3, and using invalid IVs, which leads to unpredictable results as the IV strength weakens. The performance of RI2SLS remains good, subject to sufficiently strong IVs. The conditional $F$ -statistics were lower on average in the complex set-up, ranging from 1200 with $Δ = 0.5$ and $n = 5000$ to less than 2 for $Δ = 0.1$ and $n = 1000$ . RI2SLS performs well outside of the scenarios where the $F$ -statistics reach below 10.

In line with the proof shown in the Appendix, the simulation results confirmed the equivalence between $g$ -estimation and RI2SLS. Adjusting for baseline confounding using 2SLS-L led to a moderate improvement in results in all cases, offering some protection against bias and lowering the RMSE. Results remained poor in the complex case, as IV2 and IV3 remain unsatisfied. Baseline adjustment with RI2SLS-L, however, made little difference, with any improvement seemingly already built into the IV models.

Table 3 shows the results using RI2SLS for the complex data setup when we add non-linear terms to the models. As expected, when all 3 models are misspecified we encounter biased and less efficient results in all cases, though the extent of this bias appears to be mitigated to an extent with a stronger instrument.

Results remain biased when the model for $E (Z_{t} | M_{t})$ is incorrect, as we are unable to model the association between $Y$ and $L_{2}$ and $L_{3}$ . Interestingly, partially modelling the $Y - L_{t}$ association, by controlling for $L_{1}$ did not mitigate this bias at all. Results remained unbiased provided the models for $Z_{t}$ were correct.

We considered 2SLS and RI2SLS using Probit based first stage models in the Appendix A.3 (Table A.2), which performed poorly and unpredictably in all cases, similarly to 2SLS in the complex case. The implication is that the relation to $g$ -estimation, which applies only when fitting an LPM using OLS at the first stage, is crucial to consistent estimation. Also considered in the Appendix A.2 (Table A.1) was allowing for $Z_{t}$ to have heteroskedastic errors, dependent on $L_{t}$ , which did not lead to inconsistent results.

5. Case study

5.1. Instrument definition: GP prescribing preferences

Our instrument is a measure of GP prescribing preferences (TTP) for DPP4 over SU over time. There are 139 GPs in our data, with an average of 27 patients each, ranging from 1 to 98 patients. A recent paper³ summarised well the various measures to approximate GP preference via proportion of prescriptions issued. Our data does not include specific dates, and hence, we are unable to derive subject-specific measures of TTP. We instead consider GP specific measures of preference at each 6 month period based on the definitions considered in prior works.³

$T T P^{c}$ : A measure of GP preference at each 6 monthly calendar period is taken as the proportion of all prescriptions of DPP4 as second line treatment within that 6 monthly calendar period. An individual’s value of $T T P^{c}$ at some follow up time $t$ is then GP preference during the 6 months calendar period when $t$ occurred.

$T T P^{t}$ : Alternatively, a measure of TTP over follow up time $t$ , rather than calendar time, can be considered. The proportion of a GP’s prescriptions at initiation is taken as $T T P_{1}^{t}$ . This is repeated for follow up times 2 and 3. This represents a measure of how GP may change preference based on how long a patient has been taking second line treatment.

5.1.1. IV1: IV relevance

It is important in a practical context to investigate if $Z_{t}$ satisfy the three main IV assumptions. Unfortunately, only IV1, IV relevance can be tested from the available data. Patients switching occurs in less than 5% of patients. We need to investigate if there is sufficient strength of association between TTP and treatment at initiation, and then assess the change in this association over time for multiple instruments.

The first stage $F$ -statistics for TTP at $t = 1$ were 1422 and 927 for $T T P^{c}$ and $T T P^{t}$ respectively. Strengths similar to this were found in a recent paper on TTP in assessment of T2D second line treatment.³⁷ The conditional $F$ -tests for $T T P^{c}$ and $T T P^{t}$ ranged between 30 and 70. The measures of TTP for $t > 1$ were highly correlated with TTP at initiation, with correlations ranging between 0.85, and near 1.

5.1.2. IV2: Exclusion restriction

It cannot be determined from the available data if there is a direct association between GP preference and HbA1c levels at any future time. We anticipate that any effect of TTP on future HbA1c levels is likely via its effect on assigned treatment, though this may depend on how well treatment is measured. A possible pathway through which IV2 could be violated would be if the GPs with a preference for DPP4 provided better quality of care, or possessed greater clinical capacity, in a way that might have led to greater improvement in HbA1c levels. Our discussions with clinical experts suggested that this is unlikely to be the case as both SU and DPP4 are both well regarded, and easily available treatments.

5.1.3. IV3: Exchangeability

GP preference may be affected by past confounders, and measurements of outcome. Dependence on past preference and treatment history is likely, but can be easily controlled. However observing poor past performance of patients on one drug may change preference as a result. This may mean that prescribing preferences drifting over time towards DPP4 being dependent on other health related measures such as HbA1c history.

It is unlikely for TTP to be independent of confounding at baseline, only to be dependent at later follow up times. As such, testing the balance of observed confounding at baseline offers an insight into what needs to be conditioned on to meet IV3. Table A.4 in the Appendix A.5 shows correlations between $T T P^{c}$ and baseline confounders at baseline, and identified that TTP is somewhat correlated with HbA1c levels at initiation, ethnicity, calendar period and smoking status. Figure 4 visualised the difference in density/proportions for HbA1c levels and ethnicity at different quantiles of TTP, and noticed that patients with high GP TTP presence for DPP4 generally had a lower initial HbA1c level and proportion of south asian ethnicity. To minimise bias due to these imbalances, we adjust for these confounders in the analysis.

5.2. Estimating approaches

We perform 2SLS and RI2SLS to estimate the ATE of DPP4 versus SU on HbA1c levels at 18 months. We repeat these analyses using both $T T P^{c}$ and $T T P^{t}$ . CIs are obtained by non-parametric bootstrap with $b = 1000$ resampled datasets.

Firstly we use 2SLS with no adjustment for confounders as an illustration. We then perform 2SLS-L, adjusting for Median HbA1c levels prior to initiation, smoking status, calendar period and ethnicity at baseline in the first and second stage models. RI2SLS and RI2SLS-L are then performed, using no adjustment and adjustment for the above confounders respectively. For RI2SLS $E (T T P_{t} | M_{t})$ is estimated using a main effects linear regression model with $M_{t}$ involving the above confounders, along with history of treatment, instrument and history of HbA1c levels. In all cases, the first and second stage models are main effects models fit using OLS.

5.3. Results

We present results for the ATE in Figure 2 (and Appendix A.6, Table A.5). All methods suggest a reduction in HbA1c levels with sustained treatment with DPP4 compared to SU over the 2 year period. The 95% CI in each case suggests that this effect is significant. Results with $T T P^{c}$ were slightly more precise than with $T T P^{t}$ , but gave similar conclusions. Standard 2SLS seems to overestimate the effect of DPP4 versus SU by reporting an ATE around 5 mmol/mol, when compared to RI2SLS and 2SLS-L which reported much smaller reductions (2.5-3 points) in HbA1c. ATEs reported by RI2SLS and 2SLS-L were similar.

Figure 2.

Forest plot of the ATE of DPP4 versus SU on Hba1c levels at 18 months. The intervals represent the bounds of the 95% confidence interval. ATE: average treatment effect; SU: sulfonylureas.

RI2SLS reported slightly narrower 95% CIs compared to those suggested by 2SLS (with or without adjustment). The RI2SLS-L method gave similar results to RI2SLS (Table A.5, Appendix A.6). This indicates that TTP dominantly depended on history of TTP and baseline blood glucose levels for which regressions of TTP on confounding history indicate. This implies that both 2SLS-L and RI2SLS were an effective estimating approach.

A visual representation of the definitions of $T T P$ is shown in Figures 3 and 4. Preference taken at 6 month period shows a clear shift in preference over time towards DPP4, with a proportion of around 15% in 2013 to 40% in 2017. However, $T T P^{t}$ shows minimal change between follow up periods.

Figure 3.

Plot of trends of $T T P^{c}$ and $T T P^{t}$ over time. TTP: tendency to prescribe.

Figure 4.

Plots showing characteristics of baseline HbA1c levels and Ethnicity, for data within specific percentiles of $T T P^{c a l}$ at baseline. These quantiles correspond to 0 to 33rd percentile ‘Lower’, 33rd to 66th percentile ‘Middle’ and 66th to 100th percentile ‘Upper’. TTP: tendency to prescribe.

6. Discussion

This study explores the practical implementation of 2SLS methods in a full time-varying setting with both confounders, instruments, and treatments varying over time. We propose a 2SLS approach that uses residualised IVs to handle challenges in complex time-varying data structures, and assess its relative performance in a simulation study. We clarify how 2SLS methods relate to $g$ -estimation with time-varying IVs, with a view to implement methods using standard 2SLS software. We consider the operationalisation of medical prescribing preferences as a time-varying IV in an evaluation of second line treatments for patients with type-2 diabetes.

This is the first study to investigate the statistical properties of 2SLS methods for incorporating time-varying IVs. We showed that the standard 2SLS can accommodate a time-varying IV, but only provided consistent estimates in simple time-varying settings where the IV depends only on its past history. We proposed a residualised 2SLS and showed, using theory and simulations, that this approach can attain the same performance as $g$ -estimation in both simple and complex time-varying settings. An added novelty of the RI2SLS is that it can be implemented using standard software, which provides a practical advantage compared to the $g$ -estimation approach.

This paper adds on to emerging literature exploring the use of time-varying IVs in several ways. Firstly, unlike the recently proposed inverse probability weighting method,¹⁷ we showed that the RI2SLS provides an appropriate estimating approach for time-varying IV analysis in settings with weaker IV strengths (e.g. $F$ -stat of 10 and 30). The SNMM fitted by the RI2SLS is arguably more difficult to interpret than the MSM fitted by the weighting approach, and requires the IV4 assumption to interpret the causal parameters as the ATE. The weighting method does not require IV4, but requires that both treatment and IV are binary and makes a different untestable assumption, that the compliance to treatment (taking Z as randomisation) is not affected by unmeasured confounders.^16,17

Secondly, we found that the double robust property of 2SLS extends to time-varying scenarios if the instrument depends only on baseline confounding, and consistent estimation is possible even when treatment and outcome depend on non-linear relationships. Thirdly, we illustrate the applicability of the conditional $F$ -test to assess the strength of a time-varying IV. Our simulation study demonstrated that consistent estimation can typically be achieved provided a conditional $F$ -test of 10, which is in line with other studies that employ multiple IVs.⁴

Fourthly, we showed that when the first and second stage models were not fit with OLS, the equivalence between RI2SLS and $g$ -estimation did not hold, and the approach performed poorly. As such, this currently limits the data types and causal relationships RI2SLS can be applied to, i.e. continuous endpoints. Extending RI2SLS to settings with binary or ordinal outcomes provides an interesting avenue for future research.

Lastly, we illustrated the modelling and operationalisation of medical prescribing preferences as a time-varying instrument. We find that sustained treatment with DPP4 over a two year period led to a significant reduction in HbA1c levels of around 3 mmol/mol, which is in line with previous studies.^38–40 Inferences did not differ according to estimating approach, but the standard 2SLS overestimated the ATE and led to somewhat wider 95% CIs. A notable aspect of the study cohort was that it included 75% non-White patients, a cohort for which prevalence of diabetes is estimated to be two to four times higher in the UK.⁴¹ Our findings provide, therefore, valuable evidence for the external validity of these studies to non-White populations.

Our work has some limitations. Firstly, while the correct specification of $E (Z_{t} | M_{t})$ is important to the consistency of RI2SLS, simulated scenarios showed that the approach was relatively robust to some degree of model misspecification, for example, when the model for $E (Z_{t} | M_{t})$ omitted certain non-linear and interaction effects. One could use graphical methods to identify non-linear or interaction terms with measured confounders that could be included in that model. Alternatively, flexible modelling techniques such as spline regression methods^42,43 could be used to model complex non-linear regression relationships in cases of IVs with more complex distributions. These methods could make the assumption of a correctly specified model for $Z_{t}$ on $M_{t}$ more plausible, and practical to apply in real world studies.

However, as with most IV methods, if a main effect of a variable of $M_{t}$ is missed entirely, or there is dependence on an unmeasured confounder, this will heavily affect the consistency of estimates. It remains crucial therefore to identify well defined IVs that have either minimal, or well understood dependencies on other variables.

Secondly, our case study was limited by missing data and the nature of the treatment. We also required patients with at least three recorded phases and complete data on HbA1c levels, treatment, and confounders. This risks selection bias, as patients with more recorded follow-ups and complete data could be in worse health than those with less complete data.

We also excluded patients who took insulin or other diabetic treatments, preselecting patients who took either SU and DPP4. Recent relevant work⁴⁴ highlighted potential selection biases when using IV methods to compare two treatments (DPP4 and SU), when more than two treatments are available (e.g. SGLT2 inhibitors or insulin), if the propensity to give these alternative treatments differs between preference groups. An investigation into the effects of SU versus DPP4 on BMI in time fixed settings suggested this could be handled by sensitivity analyses.^44,45 Extending such sensitivity analyses to time-varying treatments would likely involve a sensitivity analysis of large dimensions, and is beyond the scope of this paper. However, this would be a very worthy area of further work.

Patients who initiated second-line therapy with DPP4 or SU rarely switched between treatments, which meant that the IV was weakly associated with treatment assignment over time, and hence limited our ability to estimate time-specific treatment effects. As such, the case study did not enable us to demonstrate the wider strengths of the RI2SLS approach. An interesting extension to our work would be to more closely examine the relationship between treatment switching rate, and IV strength.

Thirdly, the IV strength of TTP after initiation has room for improvement. For example, with richer datasets, GP’s last prescribed treatment may better capture TTP over time. We may also consider model-based TTP methods, such as the Abrahamowics method, Bidulka et al.³⁷ to identify the period in which a GP switches preference. Near far matching methodologies could identify pairs of GPs with the furthest possible difference in preference at initiation⁴⁶ to boost IV strength.

To conclude, RI2SLS provides a promising approach to perform time-varying IV analysis. It has good theoretical properties and can be performed using standard regression techniques. Identifying a strong time-varying IV, remains a major barrier to its wider adoption.

Supplemental Material

sj-pdf-1-smm-10.1177_09622802251404064 - Supplemental material for Two stage least squares with time-varying instruments: An application to an evaluation of treatment intensification for type-2 diabetes

Supplemental material, sj-pdf-1-smm-10.1177_09622802251404064 for Two stage least squares with time-varying instruments: An application to an evaluation of treatment intensification for type-2 diabetes by Daniel Tompsett, Stijn Vansteelandt, Richard Grieve, John Robson and Manuel Gomes in Statistical Methods in Medical Research

Supplemental Material

sj-R-2-smm-10.1177_09622802251404064 - Supplemental material for Two stage least squares with time-varying instruments: An application to an evaluation of treatment intensification for type-2 diabetes

Supplemental material, sj-R-2-smm-10.1177_09622802251404064 for Two stage least squares with time-varying instruments: An application to an evaluation of treatment intensification for type-2 diabetes by Daniel Tompsett, Stijn Vansteelandt, Richard Grieve, John Robson and Manuel Gomes in Statistical Methods in Medical Research

Footnotes

Acknowledgments

The authors thank the Queen Mary University of London, Clinical Effectiveness Group and Barts Charity for access to the deidentified data and GPs in North East London for sharing deidentified patient data for research for patient benefit.

ORCID iD

Daniel Tompsett

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research is supported by the Medical Research Council, grant number MR/V020935/1.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article

Data availability

The data that support the findings of this study are available from the QMUL Clinical Effectiveness Group but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of QMUL CEG.

Supplemental material

Supplemental material for this article is available online.

Appendix

References

Hernán

Robins

. Causal Inference:What If. Boca Raton: Chapman and Hall/CRC, 2020.

Angrist

Imbens

. Average causal response with variable treatment intensity. J Am Stat Assoc 1995; 90: 431–432.

Gudemann

Shields

Dennis

, et al. Just what the doctor ordered: An evaluation of provider preference-based instrumental variable methods in observational studies, with application for comparative effectiveness of type 2 diabetes therapy, 2023.

Sanderson

Smith

Windmeijer

, et al. An examination of multivariable mendelian randomization in the single-sample and two-sample summary data settings. Int J Epidemiol 2020; 49. DOI: 10.1093/ije/dyaa101.

Baiocchi

Cheng

Small

. Instrumental variable methods for causal inference. Stat Med 2014; 33: 4859–4860.

Hogan

. Longitudinal data analysis edited by g. fitzmaurice, m. davidian, g. verbeke, and g. molenberghs. Biometrics 2010; 66. DOI: 10.1111/j.1541-0420.2010.01478.x.

Robins

. Marginal structural models versus structural nested models as tools for causal inference. In Statistical models in epidemiology, the environment, and clinical trials. New York: Springer-Verlag, 2000.

Burgess

Foley

Allara

, et al. A robust and efficient method for mendelian randomization with hundreds of genetic variants. Nat Commun 2020; 11: 376. DOI: 10.1038/s41467-019-14156-4.

Tian

Burgess

. Estimation of time-varying causal effects with multivariable mendelian randomization some cautionary notes. Int J Epidemiol 2023; 52: 846–857.

10.

Tian

Patel

Burgess

. Estimating time-varying exposure effects through continuous-time modelling in mendelian randomization. 2024. 2403.05336.

11.

Shi

Swanson

Kraft

, et al. Mendelian randomization with repeated measures of a time-varying exposure: An application of structural mean models. Epidemiology 2022; 33: 84–94.

12.

Bowden

Madsen

Goldman

, et al. Instrumental variable methods to target hypothetical estimands with longitudinal repeated measures data: Application to the step 1 trial. 2024. DOI:10.48550/arXiv.2407.02902.

13.

Yende-Zuma

Mwambi

Vansteelandt

. Adjusting the effect of integrating antiretroviral therapy and tuberculosis treatment on mortality for noncompliance: A time-varying instrumental variables analysis. Epidemiology (Cambridge, Mass) 2019; 30. DOI: 10.1097/ede.0000000000000923.

14.

Michiels

Vandebosch

Vansteelandt

. Adjusting for time-varying treatment switches in randomized clinical trials: the danger of extrapolation and how to avoid it. 2023. DOI: 10.48550/arXiv.2303.06099.

15.

Chen

Zhang

. Estimating and improving dynamic treatment regimes with a time-varying instrumental variable. 2021. 2104.07822.

16.

Tompsett

Vansteelandt

Grieve

, et al. Instrumental variable approaches for estimating time-varying treatments in comparative effectiveness research, In Preparation.

17.

Michael

Cui

Lorch

, et al. Instrumental variable estimation of marginal structural mean models for time-varying treatment. J Am Stat Assoc 2023; 1–23. DOI: 10.1080/01621459.2023.2183131.

18.

Sanderson

Richardson

Morris

, et al. Estimation of causal effects of a time-varying exposure at multiple time points through multivariable mendelian randomization. PLoS Genet 2022; 18: e1010290.

19.

Richardson

Sanderson

Elsworth

, et al. Use of genetic variation to separate the effects of early and later life adiposity on disease risk: Mendelian randomisation study. BMJ 2020; 369. DOI: 10.1136/bmj.m1203.

20.

Vansteelandt

Didelez

. Robustness and efficiency of covariate adjusted linear instrumental variable estimators. Scandinavian Journal of Statistics 2015; 45. DOI: 10.1111/sjos.12329.

21.

Diaz-Ordaz

Daniel

Kreif

. Data-adaptive doubly robust instrumental variable methods for treatment effect heterogeneity. Journal de la Societe Francaise de Statistique 2018; 161: 135–163.

22.

Okui

Small

Tan

, et al. Doubly robust instrumental variable regression. Stat Sin 2012; 22. DOI: 10.5705/ss.2009.265.

23.

Diabetes.org.uk. https://www.diabetes.org.uk/about-us/about-the-charity/our-strategy/statistics (accessed: 2024-29-05).

24.

National institute for health and care excellence. type 2 diabetes in adults: management. nice guideline ng28. nice, 2022. https://www. nice.org.uk/guidance/ng28/chapter/Recommendations#reviewingdrug-treatments.

25.

Curtis

Dennis

Shields

, et al. Time trends and geographical variation in prescribing of drugs for diabetes in england 1998-2017. Diabetes, Obesity and Metabolism 2018; 20. DOI: 10.1111/dom.13346.

26.

Eriksson

Bodegard

Nathanson

, et al. Sulphonylurea compared to dpp-4 inhibitors in combination with metformin carries increased risk of severe hypoglycemia, cardiovascular events, and all-cause mortality. Diabetes Res Clin Pract 2016; 117. DOI: 10.1016/j.diabres.2016.04.055.

27.

Ling

Cheng

, et al. The efficacy and safety of dipeptidyl peptidase-4 inhibitors for type 2 diabetes: a bayesian network meta-analysis of 58 randomized controlled trials. Acta Diabetol 2018; 56: 1–24.

28.

Mogstad

Torgovitsky

Walters

. The causal interpretation of two-stage least squares with multiple instrumental variables. Am Econ Rev 2021; 111: 3663–3698.

29.

Hernán

Robins

. Instruments for causal inference: An epidemiologist’s dream? Epidemiology (Cambridge, Mass) 2006; 17: 360–372.

30.

Cragg

Donald

. Testing identifiability and specification in instrumental variable models. Econ Theory 1993; 9: 222–240.

31.

Stock

Yogo

. Testing for weak instruments in linear iv regression. SSRN eLibrary 2002; 11. DOI: 10.1017/CBO9780511614491.006.

32.

Staiger

Stock

. Instrumental variables regression with weak instruments. Econometrica 1997; 65: 557–586.

33.

Moler Zapata

Grieve

Basu

, et al. How does a local instrumental variable method perform across settings with instruments of differing strengths? A simulation study and an evaluation of emergency surgery. Health Econ 2023; 32: 2113–2126.

34.

Sanderson

Windmeijer

. A weak instrument f -test in linear iv models with multiple endogenous variables. J Econom 2015; 190. DOI: 10.1016/j.jeconom.2015.06.004.

35.

Vansteelandt

Joffe

. Structural nested models and g-estimation: The partially realized promise. Stat Sci 2015; 29. DOI: 10.1214/14-STS493.

36.

Morris

White

Crowther

. Using simulation studies to evaluate statistical methods. Stat Med 2019; 38: 2074–2102.

37.

Bidulka

Lugo-Palacios

Carroll

, et al. Comparative effectiveness of second line oral antidiabetic treatments among people with type 2 diabetes mellitus: emulation of a target trial using routinely collected health data. BMJ 2024; 385. DOI: 10.1136/bmj-2023-077097.

38.

Ferrannini

Fonseca

Zinman

, et al. Fifty-two-week efficacy and safety of vildagliptin vs. glimepiride in patients with type 2 diabetes mellitus inadequately controlled on metformin monotherapy. Diabetes, obesity and metabolism 2009; 11: 157–166.

39.

Nauck

Meininger

Sheng

, et al. Efficacy and safety of the dipeptidyl peptidase-4 inhibitor, sitagliptin, compared with the sulfonylurea, glipizide, in patients with type 2 diabetes inadequately controlled on metformin alone: a randomized, double-blind, non-inferiority trial. Diabetes, obesity and metabolism 2007; 9: 194–205.

40.

Fadini

Bottigliengo

D’Angelo

, et al. Comparative effectiveness of dpp-4 inhibitors versus sulfonylurea for the treatment of type 2 diabetes in routine clinical practice: A retrospective multicenter real-world study. Diabetes Ther 2018; 9. DOI: 10.1007/s13300-018-0452-y.

41.

Nagar

Napoles

Jordan

, et al. Socioeconomic deprivation and genetic ancestry interact to modify type 2 diabetes ethnic disparities in the united kingdom. EClinicalMedicine 2021; 37: 100960. DOI: 10.1016/j.eclinm.2021.100960.

42.

Marsh

Cormier

. Spline regression models. California: Sage Publications, 2001, p. 137.

43.

Perperoglou

Sauerbrei

Abrahamowicz

, et al. A review of spline function procedures in r. BMC Med Res Methodol 2019; 19: 1–16.

44.

Ertefaie

Small

Flory

, et al. Selection bias when using instrumental variable methods to compare two treatments but more than two treatments are available. Int J Biostat 2016; 12: 219–232.

45.

Ertefaie

Small

Flory

, et al. A sensitivity analysis to assess bias due to selecting subjects based on treatment received. Epidemiology 2016; 27: e5–e7.

46.

Baiocchi

Small

Yang

, et al. Near/far matching: A study design approach to instrumental variables. Health Services and Outcomes Research Methodology 2012; 12. DOI: 10.1007/s10742-012-0091-0.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.01 MB

0.27 MB