Estimating dynamic treatment regimes for ordinal outcomes with household interference: Application in household smoking cessation

Abstract

The focus of precision medicine is on decision support, often in the form of dynamic treatment regimes, which are sequences of decision rules. At each decision point, the decision rules determine the next treatment according to the patient’s baseline characteristics, the information on treatments and responses accrued by that point, and the patient’s current health status, including symptom severity and other measures. However, dynamic treatment regime estimation with ordinal outcomes is rarely studied, and rarer still in the context of interference – where one patient’s treatment may affect another’s outcome. In this paper, we introduce the weighted proportional odds model: a regression based, approximate doubly-robust approach to single-stage dynamic treatment regime estimation for ordinal outcomes. This method also accounts for the possibility of interference between individuals sharing a household through the use of covariate balancing weights derived from joint propensity scores. Examining different types of balancing weights, we verify the approximate double robustness of weighted proportional odds model with our adjusted weights via simulation studies. We further extend weighted proportional odds model to multi-stage dynamic treatment regime estimation with household interference, namely dynamic weighted proportional odds model. Lastly, we demonstrate our proposed methodology in the analysis of longitudinal survey data from the Population Assessment of Tobacco and Health study, which motivates this work. Furthermore, considering interference, we provide optimal treatment strategies for households to achieve smoking cessation of the pair in the household.

Keywords

Dynamic treatment regimes ordinal outcomes household interference weighted proportional odds models double robustness

1. Introduction

Precision medicine, also known as personalized medicine, refers to treating patients according to their unique characteristics. Dynamic treatment regimes (DTRs), as a statistical framework for precision medicine, provide individualized treatment recommendations based on patients’ individual information. Recently, the consideration of interference, where one individual’s outcome is possibly affected by others’ treatment, has gained importance in estimating optimal DTRs,^1–3 which are sequences of treatment rules that yield the best-expected health outcome across a population.

Lately, some researchers, such as Su et al.,² Jiang et al.¹ and Park et al.,⁴ have focused on optimal DTR estimation in the presence of interference. In such cases, treatment-decision rules should involve others’ information such as treatments and covariates. To conduct robust optimal DTR estimation with interference for continuous outcomes in the regression-based estimation framework, Jiang et al.¹ developed network balancing weights to extend the method of dynamic weighted ordinary least squares (dWOLS, Wallace and Moodie⁵). This method focused on a decision framework in cases where there is an ego (i.e., an individual of primary interest in a social network) and alters (i.e., those to whom the ego is linked). The covariates or treatments of the alters could affect the treatment or outcome of the ego, and the goal is to optimize the mean of the outcome of egos in the network. These recently developed interference-aware DTR estimation methods, and even many of the interference-unaware DTR estimation methods, however, focus primarily on continuous outcomes. Few publications have considered optimal DTR estimation for discrete outcomes, such as binary and ordinal outcomes, in the presence of interference.

In order to estimate optimal DTRs with discrete outcomes, some methods have been developed without interference. Moodie et al.⁶ first implemented a more flexible modelling by adapting $Q$ -learning to discrete utilities, such as Bernoulli and Poisson utilities. Investigating discrete outcomes, Wallace et al.⁷ introduced an extension of G-estimation to the case of non-additive treatment effects. Building on dWOLS, Simoneau et al.⁸ extended dWOLS to time-to-event data and developed DWSurv to determine the optimal DTR with right-censored survival outcomes. Further, focusing on discrete outcomes and particularly on binary outcomes, Jiang et al.⁹ proposed a dynamic weighted generalized linear model in a multi-stage treatment decision analysis, employing two-step weighted logistic regression at each stage for binary outcomes.

The methods developed in this paper are also motivated using data from the Population Assessment of Tobacco and Health (PATH) study, a longitudinal study of smoking behaviors and cessation. Some studies have focused on the idea that a desire to quit smoking alone may not be sufficient motivation in itself.^10,11 Meanwhile, a growing body of literature suggests that e-cigarettes, such as vaping, can be useful as a cessation aid.^12,13 Few studies have explored smoking cessation within couples or households, where interference may be present, and even fewer have examined the impact of participants’ e-cigarette usage. However, the PATH study provides a unique chance to investigate these contexts. Motivated by this, in contrast to Jiang et al.’s approaches to optimizing individual outcomes, our proposed framework for modelling household interference focuses on optimizing household utilities by making decisions for the household as a whole. In particular, we explore optimizing an ordinal utility across a household, combining a couple’s two binary outcomes of quitting (or attempting to quit) smoking into a single ordinal outcome.

This paper is organized as follows. In Section 2, we introduce the proposed approximately doubly robust regression-based DTR estimation framework for ordinal household utilities under household interference. Then, to achieve approximate double robustness in the face of model misspecification, we propose the estimation process of the joint propensity scores and construct the corresponding balancing weights. Through simulations of both single- and multi-stage treatment decisions, Section 3 demonstrates that our method is approximately doubly robust against misspecification of either the treatment-free or the joint propensity score model. Section 4 illustrates the implementation of our methods on PATH data. Section 5 concludes with a discussion of future research.

2. Methodology

2.1. Household interference modeling framework with ordinal utilities

In the presence of household interference, we aim to estimate treatment decisions for both individuals in the same household, so the outcomes of interest should be related to both individuals of a couple in the same household¹⁴; thus, both covariates and treatments of individuals in the same household need to be considered in a household outcome model. First, we define the household utility function as a combination of the individuals’ outcomes in the same household. For example, for a pair $(s, r)$ , we may have the utility that $U (Y^{s}, Y^{r})$ is equivalent to $ω_{s} Y^{s} + ω_{r} Y^{r}$ , where the combination weights ( $ω_{s}$ and $ω_{r}$ ) can be set based on the specific analytical goals.

For instance, we might set $ω_{s} = ω_{r}$ if outcomes of both $s$ and $r$ are considered equally of interest. Alternatively, we may instead consider a case where one individual of primary interest – the ego – is the sole focus of our optimization, but may be influenced by the treatments of their neighbour (the alter(s)). In this case, we would therefore set a weight of $1$ to the ego and $0$ to the alters.

For the binary outcome pairs $(Y^{s}, Y^{r})$ , where $(Y^{s}, Y^{r}) \in {(0, 0), (0, 1), (1, 0), (1, 1)}$ , for simplicity and the goal of studying DTR with ordinal outcomes, we will specify that all the combination weights are equal to one ( $ω_{s} = ω_{r} = 1$ ). Adding 1 to each sum, there are three possibilities 1, 2, or 3 of $U (Y^{s}, Y^{r})$ for a pair in the same household, and these can be considered ordinal outcomes for the household. That is, in this setting of a household’s utility, the utilities of households, $U (Y^{s}, Y^{r}) = 1, 2, 3$ , can be interpreted in an ordered way: for a pair in a household, (1) neither, (2) one, or (3) both of them incur a benefit such as smoking cessation, and the largest value (i.e., 3) is preferred. We consider that the model for such a household utility can be captured in the form of a function of

f (x^{β}) + d_{ξ} (a^{s}, x^{ξ}) + d_{ψ} (a^{r}, x^{ψ}) + d_{i n t} (a^{s} a^{r}, x^{ϕ}),

(1)

where

x^{β}

, often termed predictive variables, function to increase the precision of estimates, and

x^{ξ}

x^{ψ}

x^{ϕ}

, the so-called prescriptive or tailoring variables, are used to adapt treatment decisions to pairs in a household. That is, the model has a treatment-free function

f (x^{β})

and some decision functions

d_{ξ} (a^{s}, x^{ξ})

d_{ψ} (a^{r}, x^{ψ})

, and

d_{i n t} (a^{s} a^{r}, x^{ϕ})

In practice, in the household-level model (1), covariates $x^{ξ}$ and $x^{ψ}$ can be individual-level covariates from each individual in the same household. These two covariates that contain individuals’ characteristics indicate the ‘personalized’ side of the model (1). Covariates $x^{ϕ}$ can be household-level covariates, and thus they represent households’ characteristics. As such $x^{ϕ}$ are special tailoring variables for our household-interference treatment decisions case. Given the above utility model, the goal is to identify an optimal household treatment decision rule $d^{*} (x_{s}, x_{r})$ that maximizes the utility $U (Y^{s}, Y^{r})$ , for binary outcomes $Y^{s}$ and $Y^{r}$ . In our household case, the treatment decision rule $d (x_{s}, x_{r})$ takes as input both individuals’ covariates and outputs a treatment configuration for a couple in the same household.

2.2. Proportional odds model and target decision parameters

Let $U$ be an ordinal outcome with $C = 3$ categories. Then $P (U \leq c)$ is the cumulative probability that $U$ is less than or equal to a specific category $c$ . The log-odds of being less than or equal to a particular $c$ category can be defined as

\begin{aligned} l o g \frac{P (U \leq c)}{P (U > c)} = l o g i t [P (U \leq c)], for c = 1, \dots, C - 1 \end{aligned}

where the logit link function is defined as

l o g i t (p) = p / (1 - p)

. Note that the denominator

P (U > c)

of the above equation will be zero if

c = C

; thus,

c = 1, \dots, C - 1

. The proportional odds model (POM) that specifies the cumulative log-odds for a particular category assumes that each explanatory variable exerts the same effect on each cumulative logit regardless of the cutoff

c

, and is proposed by McCullagh¹⁵ to be

l o g i t [P (U_{h} \leq c ∣ x_{h})] = ζ_{c} - θ^{⊤} x_{h}

, where coefficients

ζ_{c}

are category-specific intercepts and

θ

are coefficients of covariates

x_{h}

. The intercepts

ζ_{c}

are the only part that varies across the equations, and the effects of covariates

x_{h}

are assumed to be constant for all

c

, i.e.,

θ_{c} = θ

. Building on the typical POM and our treatment decision set-up, we propose a POM for household ordinal utilities as follows, for

c = 1, 2; h = 1, 2, \dots H,

l o g i t [P (U_{h} \leq c ∣ a_{h}^{s}, a_{h}^{r}, x_{h})] = ζ_{c} - β^{⊤} x_{h}^{β} - a_{h}^{s} ξ^{⊤} x_{h}^{ξ} - a_{h}^{r} ψ^{⊤} x_{h}^{ψ} - a_{h}^{s} a_{h}^{r} ϕ^{⊤} x_{h}^{ϕ} .

(2)

According to the general utility model (1), we note that the treatment-free functions are identified in the above POM model as a linear form

f (x^{β}) = ζ_{c} - β^{⊤} x^{β}

, and the decision functions are specified as

d_{ξ} (a^{s}, x^{ξ}) = - a^{s} ξ^{⊤} x^{ξ}

d_{ψ} (a^{r}, x^{ψ}) = - a^{r} ψ^{⊤} x^{ψ}

and

d_{i n t} (a^{s} a^{r}, x^{ϕ}) = - a^{s} a^{r} ϕ^{⊤} x^{ϕ}

, respectively.

Focussing on the household ordinal outcome (2), we define the household blip function as $γ [(A^{s}, A^{r}), x_{h}; ξ, ψ, ϕ] = A^{s} ξ^{⊤} x_{h}^{ξ} + A^{r} ψ^{⊤} x_{h}^{ψ} + A^{s} A^{r} ϕ^{⊤} x_{h}^{ϕ}$ , which represents the effects of the treatment configuration $(A^{s}, A^{r})$ for a household compared with the null treatment configuration $(0, 0)$ . The estimation goal is to estimate target decision parameters, i.e., the blip parameters $ξ$ , $ψ$ , $ϕ$ . From these blip-parameter estimates and given the household tailoring variables, the optimal treatment decisions for a pair in the household can be made. Given the four choices of $(A^{s}, A^{r}) = (1, 1), (1, 0), (0, 1) or (0, 0)$ , the corresponding blip value $γ [(A^{s}, A^{r}), x_{h}; ξ, ψ, ϕ]$ is $ξ^{⊤} x_{h}^{ξ} + ψ^{⊤} x_{h}^{ψ} + ϕ^{⊤} x_{h}^{ϕ}$ , $ξ^{⊤} x_{h}^{ξ}$ , $ψ^{⊤} x_{h}^{ψ}$ , and $0$ , respectively. The decision goal is to maximize the outcome across a couple, which is equivalent to maximizing the blip function. Taking into account the blip values of all possible treatment configurations, an optimal treatment rule must choose the configuration that corresponds to the maximum blip value. Therefore, we have the following treatment decision rules for a household:

Decision 1

The optimal household decision rules:

Rule 1: $d^{*} (x^{ξ}, x^{ψ}, x^{ϕ}) = (1, 1),$ if $ξ^{⊤} x_{h}^{ξ} + ψ^{⊤} x_{h}^{ψ} + ϕ^{⊤} x_{h}^{ϕ} > 0$ and $ψ^{⊤} x_{h}^{ψ} + ϕ^{⊤} x_{h}^{ϕ} > 0$ , and $ξ^{⊤} x_{h}^{ξ} + ϕ^{⊤} x_{h}^{ϕ} > 0$ .

Rule 2: $d^{*} (x^{ξ}, x^{ψ}, x^{ϕ}) = (1, 0),$ if $ψ^{⊤} x_{h}^{ψ} + ϕ^{⊤} x_{h}^{ϕ} < 0$ and $ξ^{⊤} x_{h}^{ξ} > ψ^{⊤} x_{h}^{ψ}$ and $ξ^{⊤} x_{h}^{ξ} > 0$ .

Rule 3: $d^{*} (x^{ξ}, x^{ψ}, x^{ϕ}) = (0, 1),$ if $ξ^{⊤} x_{h}^{ξ} + ϕ^{⊤} x_{h}^{ϕ} < 0$ and $ψ^{⊤} x_{h}^{ψ} > ξ^{⊤} x_{h}^{ξ}$ and $ψ^{⊤} x_{h}^{ψ} > 0$ .

Rule 4: $d^{*} (x^{ξ}, x^{ψ}, x^{ϕ}) = (0, 0),$ if $ξ^{⊤} x_{h}^{ξ} + ψ^{⊤} x_{h}^{ψ} + ϕ^{⊤} x_{h}^{ϕ} < 0$ and $ξ^{⊤} x_{h}^{ξ} < 0$ and $ψ^{⊤} x_{h}^{ψ} < 0$ .

Further, if we know the blip parameters $ξ, ψ$ , and $ϕ$ , then we have $γ^{*} [d^{*} (x^{ξ}, x^{ψ}, x^{ϕ}); ξ, ψ, ϕ] = A^{s *} ξ^{⊤} x^{ξ} + A^{r *} ψ^{⊤} x^{ψ} + A^{s *} A^{r *} ϕ^{⊤} x^{ϕ}$ , where $γ^{*}$ means the arguments $(A^{s}, A^{r})$ in the $γ$ function follow the optimal household decision rules Decision 1, and $d^{*} (x^{ξ}, x^{ψ}, x^{ϕ})$ and $(A^{s *}, A^{r *})$ are the corresponding optimal treatments for the pair $(s, r)$ . Therefore, to make decisions for the household, the estimates of blip parameters $ξ, ψ$ , and $ϕ$ are necessary, and we present our approximately doubly robust methods in the following section.

2.3. Proposed method and approximate double robustness

Under household interference, in a single-stage decision setting, we assume that the true ordinal-outcome model is, for $c = 1, 2$ ,

\begin{aligned} l o g i t [P (U \leq c ∣ a^{s}, a^{r}, x)] = ζ_{c} - f (x^{β}; β) - γ [(A^{s}, A^{r}), x; ξ, ψ, ϕ] . \end{aligned}

Our proposed method, the weighted proportional odds model (WPOM), for a single-stage decision is applied by specifying three models: (1) Treatment-free model:

f (x^{β}; β);

(2) Blip model:

γ [(A^{s}, A^{r}), x_{h}; ξ, ψ, ϕ] = a^{s} ξ^{⊤} x^{ξ} + a^{r} ψ^{⊤} x^{ψ} + a^{s} a^{r} ϕ^{⊤} x^{ϕ};

(3) Joint propensity score model¹:

π^{a^{s} a^{r}} (x_{s}, x_{r}) = P (A^{s} = a^{s}, A^{r} = a^{r} ∣ x_{s}, x_{r}) .

Further, let

w^{s t d}

denote ‘standard’ interference-aware balancing weights,¹ which satisfy:

π^{00} w^{s t d} (0, 0, x) = π^{01} w^{s t d} (0, 1, x) = π^{10} w^{s t d} (1, 0, x) = π^{11} w^{s t d} (1, 1, x) .

(3)

In this context, we employ the term ‘interference-aware balancing weights’ to differentiate them from those in non-interference settings, where the weights that are not ‘interference-aware’ only involve the common propensity score (see the balancing weights in Wallace and Moodie⁵). The work by Jiang et al.,¹ where the focus is primarily on balancing weights considering network interference, outlined the criteria for balancing weights in network settings. Equation (3) mentioned above represents a special case of their balancing weights criteria specifically adapted for household settings. For example, the inverse probability-based interference-aware balancing weight for the correlated treatments in a household is given by:

w^{s t d} (a^{s}, a^{r}) \propto \frac{1}{π^{a^{s} a^{r}}} \times \frac{1}{\sum_{a^{s}, a^{r}} 1 / π^{a^{s} a^{r}}}, for a^{s} = 0, 1; a^{r} = 0, 1

(4)

The weight is proportional to the inverse of the joint propensities and divided by a ‘normalization’ factor

\sum_{a^{s}, a^{r}} 1 / π^{a^{s} a^{r}}

. In particular, considering the balancing weights criterion in the form

π^{00} w (0, 0, x) = π^{01} w (0, 1, x) = π^{10} w (1, 0, x) = π^{11} w (1, 1, x) = π^{00} π^{10} π^{01} π^{11}

, we propose overlap-type balancing weights^16,17:

w^{s t d} (a^{s}, a^{r}) \propto \frac{π^{00} π^{10} π^{01} π^{11}}{π^{a^{s} a^{r}}}, for a^{s} = 0, 1; a^{r} = 0, 1

(5)

The overlap-type weight for one treatment pair realization is proportional to the product of the joint propensities for the other possible realizations. Then, the WPOM for robust estimation of

ξ

ψ

, and

ϕ

is applied:

As stated in the following theorem, Step 2 in WPOM serves as the crucial key to ensuring approximate double robustness in consistently estimating the blip parameters, even when one of the treatment-free or joint propensity models is not correctly specified. Theorem 1

Approximate Double Robustness of WPOM: Under the identifiability assumptions of (1) consistency¹⁸; (2) no unmeasured confounders; and (3) positivity,¹⁹ and suppose that the true ordinal-outcome model satisfies

\begin{aligned} l o g i t [P (U \leq c ∣ a^{s}, a^{r}, x)] = ζ_{c} - f (x^{β}; β) - a^{s} ξ^{⊤} x^{ξ} - a^{r} ψ^{⊤} x^{ψ} - a^{s} a^{r} ϕ^{⊤} x^{ϕ}, \end{aligned}

for

c = 1, 2,

and any treatment-free function

f (x^{β}; β)

. Suppose we use weights that satisfy

π^{00} w (0, 0) κ (0, 0) = π^{01} w (0, 1) κ (0, 1) = π^{10} w (1, 0) κ (1, 0) = π^{11} w (1, 1) κ (1, 1),

(7)

where

κ (a^{s}, a^{r}) = expit (η_{2}) [1 - expit (η_{1})] [1 - expit (η_{2}) + expit (η_{1})] .

(8)

Then, a WPOM based on the corresponding linear model will yield approximately consistent estimators of

ξ

ψ

as well as

ϕ

if at least one of the joint propensity score and treatment-free models is correctly specified.

Proof: See Appendix A of the Supplemental materials. Note that the definitions of

η_{1}, η_{2}

in (8) are

\begin{aligned} {\begin{cases} η_{1} (a^{s}, a^{r}, x) := ζ_{1}^{*} + {β^{*}}^{⊤} x^{β} + {ξ^{*}}^{⊤} a^{s} x^{ξ} + {ψ^{*}}^{⊤} a^{r} x^{ψ} + {ϕ^{*}}^{⊤} a^{s} a^{r} x^{ϕ}, \\ η_{2} (a^{s}, a^{r}, x) := ζ_{2}^{*} + {β^{*}}^{⊤} x^{β} + {ξ^{*}}^{⊤} a^{s} x^{ξ} + {ψ^{*}}^{⊤} a^{r} x^{ψ} + {ϕ^{*}}^{⊤} a^{s} a^{r} x^{ϕ}, \end{cases} \end{aligned}

where

ξ^{*}

ψ^{*}

and

ϕ^{*}

are the solutions of the estimation functions of the POM (2) with ‘standard’ interference-aware balancing weights that satisfy (3). As a result, like dWOLS, a family of weights can be used if the criterion (7) is satisfied. Equation (6) in Step 2 provides an example of such weights because it is derived from equating (7) to

π^{00} π^{10} π^{01} π^{11} \times κ (0, 0, x) κ (1, 0, x) κ (0, 1, x) κ (1, 1, x)

Remarks Similar to the balancing properties of dWOLS, the balancing properties of POM rely on the propensity score; however, the inference of household interference depends on the joint propensity functions, which will be discussed in the following subsection 2.4, in terms of estimation and construction of the balancing weights. The key factor of the balancing criterion (7) is $κ$ , called the ‘adjustment factor’.⁹ It adjusts for the nonlinearity of the link function, and it is special for the POM. Based on (8), we can conclude that the adjustment factor is the product of three terms: $expit (η_{2})$ , $1 - expit (η_{1})$ , and $1 - expit (η_{2}) + expit (η_{1})$ , where the first term $expit (η_{2})$ represents the estimated cumulative probabilities of categorical utilities $1$ and $2$ , the second $1 - expit (η_{1})$ represents the estimated cumulative probabilities of categorical utilities $2$ and $3$ , and the third term $1 - expit (η_{2}) + expit (η_{1})$ represents the estimated cumulative probabilities of categorical utilities $1$ and $3$ . Alternatively, the three terms of $κ$ in (8) can be expressed as $expit (η_{2}) = 1 - P (U = 3)$ , $1 - expit (η_{1}) = 1 - P (U = 1)$ , and $1 - expit (η_{2}) + expit (η_{1}) = 1 - P (U = 2)$ . Then the ‘adjustment factor’ of (7) can be written as $κ (a^{s}, a^{r}, x) = Π_{c = 1}^{3} [1 - P (U = c)]$ .

Regarding the approximate double robustness of WPOM, the ‘approximate’ corresponds to ‘approximately consistent’, which refers to a case where the estimators are derived from the estimating functions which are approximately unbiased with a small quantifiable bias (see proof of Theorem 1 in Appendix A). We also use terms such as ‘nearly unbiased’ or ‘approximately unbiased’, and this quantifiable bias will be small when a linear predictor tends to vary in an interval where the $expit$ function is approximately linear.²⁰

2.4. Estimating joint propensity score

To maximize the assurance of approximate double robustness, we further present methods of estimating the joint propensity score that takes account of the correlations between treatments of individuals in the same household.³ In a case where the treatments are correlated, the joint propensity functions are not equal to the product of the marginal propensities. To build accurate balancing weights and thus make robust estimations of optimal DTRs, we take into account the dependence among treatments observed in the same household.

To estimate the joint propensity score, letting $A_{h} = (A_{h}^{s}, A_{h}^{r})^{⊤}$ be the treatment vector for the $h^{t h}$ household, we define $p_{h s} (α) := P (A_{h}^{s} = 1 ∣ x_{h s}, α)$ and $p_{h r} (α) := P (A_{h}^{r} = 1 ∣ x_{h r}, α)$ and $A_{h s r} := I (A_{h}^{s} = 1, A_{h}^{r} = 1) = A_{h}^{s} A_{h}^{r}$ , where $I (x)$ is an indicator function. Also, we define $p_{h s r} := P (A_{h s r} = 1) = P (A_{h}^{s} = 1, A_{h}^{r} = 1)$ . Then, we provide a three-step estimation algorithm (i.e., Algorithm 2).

For the first step, where we estimate marginal propensity score models ( $p_{h t} (α)$ ) we employ Liang and Zeger’s²² first-order generalized estimating equation method for estimating parameter $α$ . Regarding the second step, we model the association between pairs’ treatments ( $τ_{h s r}$ ), we use Lipsitz et al.’s²¹ pairwise odds ratios model, such that $l o g τ_{h s r} (o) = o^{⊤} x_{h s r}$ , where $x_{s r}$ suppressing the $h$ are some pair-level covariates that may influence the odds-ratio between $A^{s}$ and $A^{r}$ , and $o$ represents the corresponding coefficients. Finally, in the third step, we calculate the joint propensity score based on Lipsitz et al.’s²¹ formula, and the detailed instructions and techniques are outlined in Appendix B of the Supplemental Materials. Therefore, building on estimators of both marginal probabilities ( $p_{h s} (\hat{α})$ and $p_{h r} (\hat{α})$ ) and the odds ratios ( ${\hat{τ}}_{h s r}$ ), we can construct the estimator of joint propensity $π^{11} (x_{h s}, x_{h r}) = p_{h s r}$ by equation (9). Further, we have other estimators: ${\hat{π}}^{10} (x_{h s}, x_{h r}) = p_{h s} (\hat{α}) - {\hat{π}}^{11} (x_{h s}, x_{h r})$ , ${\hat{π}}^{01} (x_{h s}, x_{h r}) = p_{h r} (\hat{α}) - {\hat{π}}^{11} (x_{h s}, x_{h r})$ , and ${\hat{π}}^{00} (x_{h s}, x_{h r}) = 1 - p_{h s} (\hat{α}) - p_{h r} (\hat{α}) + {\hat{π}}^{11} (x_{h s}, x_{h r})$ . Therefore, using equation (5), we have overlap-type estimators of weights $\hat{w} (a^{s}, a^{r}) = \frac{{\hat{π}}^{00} {\hat{π}}^{10} {\hat{π}}^{01} {\hat{π}}^{11}}{{\hat{π}}^{a^{s} a^{r}}}$ , for $a^{s} = 0, 1; a^{r} = 0, 1.$

2.5. Multiple-stage decisions with household ordinal utilities

For the multi-stage treatment decision setting, backward induction is utilized in most methods for sequential decision problems. Therefore, multi-stage treatment decision problems can be broken down into a group of single-stage decision problems. Then, for each stage, we employ a WPOM to consistently estimate the blip parameters, i.e., $ξ$ , $ψ$ , and $ϕ$ . Accordingly, we name our novel approach for DTR estimation with ordinal outcomes the dynamic weighted proportional odds model (dWPOM).

If we acquire parameter estimates $\hat{β}$ , $\hat{ψ}$ , and $\hat{ϕ}$ , then we have the estimated optimal treatment blip

\begin{aligned} \hat{γ} [{\hat{d}}^{*} (x^{ξ}, x^{ψ}, x^{ϕ}); \hat{ξ}, \hat{ψ}, \hat{ϕ}] = {\hat{A}}^{s *} {\hat{ξ}}^{⊤} x^{ξ} + {\hat{A}}^{r *} {\hat{ψ}}^{⊤} x^{ψ} + {\hat{A}}^{s *} {\hat{A}}^{r *} {\hat{ϕ}}^{⊤} x^{ϕ}, \end{aligned}

where the estimated optimal decisions

{\hat{d}}^{*} (x^{ξ}, x^{ψ}, x^{ϕ}) = ({\hat{A}}^{s *}, {\hat{A}}^{r *})

also depend on estimates

\hat{β}

\hat{ψ}

, and

\hat{ϕ}

, and can be calculated by decision rules in Decision 1. Further, we can generate household level ordinal pseudo-utility based on the ordinal pseudo-utility probability that:

\begin{aligned} P (\tilde{U_{h}} = 1 ∣ {\hat{d}}_{h}^{*}, x_{h}) & = expit ({\hat{ζ}}_{1} - {\hat{β}}^{⊤} x_{h}^{β} - {\hat{γ}}_{h} [{\hat{d}}_{h}^{*}; \hat{ξ}, \hat{ψ}, \hat{ϕ}]) \\ P (\tilde{U_{h}} = 2 ∣ {\hat{d}}_{h}^{*}, x_{h}) & = expit ({\hat{ζ}}_{2} - {\hat{β}}^{⊤} x_{h}^{β} - {\hat{γ}}_{h} [{\hat{d}}_{h}^{*}; \hat{ξ}, \hat{ψ}, \hat{ϕ}]) \\ - expit ({\hat{ζ}}_{1} - {\hat{β}}^{⊤} x_{h}^{β} - {\hat{γ}}_{h} [{\hat{d}}_{h}^{*}; \hat{ξ}, \hat{ψ}, \hat{ϕ}]) \\ P (\tilde{U_{h}} = 3 ∣ {\hat{d}}_{h}^{*}, x_{h}) & = 1 - expit ({\hat{ζ}}_{2} - {\hat{β}}^{⊤} x_{h}^{β} - {\hat{γ}}_{h} [{\hat{d}}_{h}^{*}; \hat{ξ}, \hat{ψ}, \hat{ϕ}]) . \end{aligned}

(10)

Thus, building on equation (10) and the estimates, we can compute the ordinal pseudo-utility probability, which is employed in the multiple-stage treatment decision settings. This ordinal pseudo-utility probability represents the probability of the potential outcome that a household with the given history would have if they went on to receive the optimal treatment configuration in the current stage. We incorporate the Brant-Wald test²³ into the algorithm to evaluate whether the conditional expectations of the proposed ordinal pseudo-utility outcomes, specifically concerning the covariate values at the earlier stage, conform to a POM. The Brant-Wald test involves approximating a generalized ordinal logistic regression model and comparing it to the calculated POM, and the Wald test is then applied to assess the significance of the difference in model coefficients, generating a chi-square statistic. A low

p

-value (e.g., less than 0.05) in the Brant-Wald test suggests that the coefficients in the generalized model do not satisfy the proportional odds assumption. The brant package in R facilitates the implementation of the Brant-Wald test, and in our simulations supports the conclusion that the proportional odds assumption holds. In addition, in the multiple-stage decisions, to increase the estimation efficiency, we generate the ordinal pseudo-utility probability

R

times, and conduct

R

times estimation for the parameters of interest. Then, the final estimates of parameters are the averages of these

R

estimates. The detailed algorithm for a multiple-stage decision problem is outlined in Supplemental Materials Appendix C. As an illustrative demonstration, we present a two-stage setup in Algorithm 3.

3. Simulation studies

In this section, we provide two simulation studies (Study 1 and 2) to illustrate our proposed methods for estimating optimal DTRs with ordinal outcomes under household interference. In each study, we first verify the approximate double robustness of our estimation method and then check that the corresponding estimated optimal DTR outperforms those corresponding to other estimation methods. In Study 1, we consider single-stage treatment decision problems, and in Study 2, we investigate a multi-stage decision problem in a two-stage case.

To assess the performance of the methods, we construct three measures: (1) Optimal treatment rate (OTR), (2) mean regret value (MRV), and (3) value functions for ordinal outcomes. First, based on the data-generating parameters, we can calculate the truly optimal treatments for each household. Then, we can construct the recommended treatments from the estimated rules based on the estimated decision parameters. The OTR is then the percentage of the estimated recommended treatments that are in accord with the authentic optimal treatments. Second, the MRV measures the difference between the blip value under the true optimal regime and under the estimated regime and therefore measures the ‘loss’ experienced by using the estimated regime instead of the truly optimal one. The detailed performance matrix outlining the definitions of the OTR and MRV is provided in Appendix D of the Supplemental Materials (subsection 5.1). Finally, we construct value functions for ordinal outcomes, which mainly compare the estimated optimal treatments with the observed treatments. We will give the formal definition of the value functions for ordinal outcomes building on the concept of the odds ratio. It is important to note that, because of the specific nature of ordinal outcomes, for the single-stage settings in Study 1, we compare the WPOM with methods that ignore interference (Study 1a), and focus on consistent estimation of the WPOM (Study 1b). For the multi-stage settings in Study 2, we primarily concentrate on the long-term treatment effects of the estimated DTRs. In that case, we examine value functions for ordinal outcomes to compare different methods.

3.1. Single-stage treatment decision for a couples case

3.1.1. Single-stage treatment decision for a couples case

In Study 1a, to evaluate the performance of the proposed WPOM, we compare the proposed approach with a simpler alternative that neglects interference. Specifically, we fit a standard logistic regression model separately for husbands and wives, each with their respective treatment, covariates, and household-level covariates. An example would be the model in Theorem H.2 in Jiang,²⁰ which is developed without considering the treatment of the spouse. We refer to this method as an interference-unaware approach. We derive optimal treatment regimes by maximizing the conditional logistic probabilities associated with each individual and compare their performance with that of the interference-aware method. Furthermore, we have developed a cross-validation variant of WPOM by partitioning the data into $K$ folds and have investigated the performance of $K$ -fold cross-validated WPOM.

In the generation of ordinal outcomes for the households, based on the mixed cumulative logit model (2), the ordinal outcome is a random function of household treatment assignments and covariates $x^{β}$ , $x^{ξ}$ , $x^{ψ}$ and $x^{ϕ}$ . For simulation settings, in this part of the study, we take $B = 500$ Monte Carlo replicates, generating the covariates, treatments, and outcomes for each replicate. A comprehensive explanation of the steps involved in generating covariates and functions can be found in Section 5.2 of Appendix D of the Supplemental Materials.

Table 1 presents the three aforementioned performance metrics for the proposed WPOM method with the weights computed based on Algorithm (2), comparing it with the interference-unaware approach and the developed $20$ -fold cross-validated WPOM. These performance metrics are (1) household OTR, (2) individual OTR, and (3) the MRV. Compared to the interference-unaware approach, both the proposed WPOM and the 20-fold cross-validated WPOM yielded higher values for household OTR and individual OTR, as well as a smaller value for the mean regret. We can conclude that in the context of household interference, decisions may be significantly compromised if the interference is disregarded. In comparison to the cross-validated WPOM, the standard WPOM demonstrates similar performance in these three performance measures. It is worth noting that $K -$ fold validation is typically employed in a prediction context, but our approach focuses on estimation; this method serves as a means to assess the stability of the estimate and provides an error estimate that accounts for some of the model uncertainty.

Table 1.
Methods’ performance measure estimates and their standard errors (in parenthesis) in Study 1b.

Method

$H$ Performance Interference-unaware WPOM Cross-validated WPOM

$500$ OTR-H 0.063 (0.005) 0.395 (0.018) 0.393 (0.019)

OTR-I 0.751 (0.006) 0.829 (0.016) 0.827 (0.016)

MRV 0.375 (0.008) 0.055 (0.018) 0.060 (0.018)

$1500$ OTR-H 0.062 (0.003) 0.530 (0.018) 0.522 (0.018)

OTR-I 0.750 (0.005) 0.872 (0.014) 0.873 (0.014)

MRV 0.375 (0.006) 0.032 (0.017) 0.037 (0.018)

$3000$ OTR-H 0.063 (0.003) 0.622 (0.017) 0.618 (0.018)

OTR-I 0.750 (0.004) 0.901 (0.012) 0.900 (0.013)

MRV 0.374 (0.005) 0.024 (0.016) 0.028 (0.016)

		Method
$500$	OTR-H	0.063 (0.005)	0.395 (0.018)	0.393 (0.019)
	OTR-I	0.751 (0.006)	0.829 (0.016)	0.827 (0.016)
	MRV	0.375 (0.008)	0.055 (0.018)	0.060 (0.018)
$1500$	OTR-H	0.062 (0.003)	0.530 (0.018)	0.522 (0.018)
	OTR-I	0.750 (0.005)	0.872 (0.014)	0.873 (0.014)
	MRV	0.375 (0.006)	0.032 (0.017)	0.037 (0.018)
$3000$	OTR-H	0.063 (0.003)	0.622 (0.017)	0.618 (0.018)
	OTR-I	0.750 (0.004)	0.901 (0.012)	0.900 (0.013)
	MRV	0.374 (0.005)	0.024 (0.016)	0.028 (0.016)

$H$ denotes the number of households. OTR-H: household optimal treatment rate; OTR-I: individual optimal treatment rate; MRV: mean regret value; WPOM: weighted proportional odds model.

3.1.2. Approximate double robustness of WPOM

In Study 1b, to examine the approximate double robustness of the proposed method, we examine four scenarios. Scenario 1: Neither the treatment-free model nor the treatment model is correctly specified. Scenario 2: The treatment-free model is correctly specified but the treatment model is misspecified. Scenario 3: The treatment model is correctly specified but the treatment-free model is misspecified. Scenario 4: Both treatment-free model and treatment model are correctly specified.

Scenario 1 fails to specify a correct model, so consistent estimation of the blip parameters cannot be guaranteed. However, Scenarios 2, 3, and 4 correctly specify at least one of the treatment-free and treatment models, so the estimator of blip parameters should be close to consistent. In addition, note that we have only linear terms in our POM, while the true models can contain non-linear terms. If the true models contain non-linear terms, then we have misspecified the model. Moreover, in a real application, it is typically more challenging to correctly specify the treatment-free model than the treatment model, so we particularly highlight the results of Scenario 3.

In each scenario, five different methods are investigated. Method 0 (M0) employs the proposed POM (2) without any balancing weights. Method 1 (M1) considers the same POM and uses the standard balancing weights, but assumes independence between the treatments like the weights in Jiang et al.¹ That is, $w = | A^{s} - P (A^{s} = 1 ∣ x_{s}) | * | A^{r} - P (A^{r} = 1 ∣ x_{r}) |$ . However, Methods 2 and 3 (M2 and M3) both consider the same POM, yet use the proposed interference balancing weights, which allow dependence between the treatments within the same household. In particular, M2 employs the inverse probability-based weights (4) and M3 uses the overlap-type weights (5). Furthermore, to contrast the performance of the weights in M2 and M3 with the weights that include the adjustment factor, i.e., (7), we also consider Method 4 (M4), that is, using the same POM (2) with the adjusted overlap weights (6) that is based on proposed Algorithm 2.

Note that M0 is $Q$ -learning in a single-stage decision setting, and M1, M2, M3, and M4 belong to our proposed WPOM yet with different balancing weights. M1 uses a no-treatment-association WPOM, but M2, M3, and M4 use treatment-association-aware WPOMs. Methods M2 and M3 employ inverse probability type and overlap type weights, respectively. However, M4 utilizes adjusted overlap type weights. The adjusted weights (6) satisfy the weight criterion in Theorem 1, hence M4 is expected to provide close to consistent blip parameter estimators in Scenarios 2, 3, and 4.

The treatment decision rules in Decision 1 rely on the estimates of blip parameters, that is, $\hat{ξ}, \hat{ψ}$ , and $\hat{ϕ}$ . Figure 1 presents the distribution of blip parameter estimates from Methods 0, 1, 2, 3, and 4 in Scenario 3, where the treatment model is correctly specified but the treatment-free model is misspecified. Moreover, the distributions of the blip parameter estimates in Scenarios 1, 2, and 4 are presented in Supplemental Materials Appendix D. From figures depicting these results, in particular Figure 1, the approximately consistent estimation of blip parameters ( $ξ, ψ$ , and $ϕ$ ) from M4 is as expected. That is, the estimates of Method 4 from Scenarios 2, 3, and 4 appear consistent, and this verifies the approximate double robustness of our proposed adjusted weights (6) in the simulation setting. However, in Scenario 3, M0, M1, M2, and M3 offer biased blip parameter estimators. Even though the M1, M2, and M3 estimators are biased, the bias is smaller than for the M0 estimator which does not employ any balancing weights. Moreover, in Scenario 3, compared with M1, where independence of the treatments is assumed, M2 and M3, which address the association between treatments, provide less biased estimators. This result confirms that if a correlation exists between treatments in the same household in truth, failing to take that into account will lead to biased estimation. Furthermore, as Figures 4 and 5 in Appendix D indicate, in both Scenarios 2 and 4, where treatment-free models are correctly specified, all the methods, even for M1, provide unbiased estimators of blip parameters. Thus, we find in the case where the treatment-free model is correctly specified that there is little distinction among the methods.

Figure 1.

Blip function parameter estimates, $\hat{ξ}$ (top row), $\hat{ψ}$ (middle row), and $\hat{ϕ}$ (bottom row) via Method 0 (M0, $Q$ -learning), Method 1 (M1, no treatment-association weighted proportional odds model (WPOM)), Method 2 (M2, treatment-association aware WPOM with IPW-type weights), Method 3 (M3, treatment-association aware WPOM with overlap-type weights) and Method 4 (M4, treatment-association aware WPOM with adjusted overlap-type weights), when the treatment model is correctly specified but the treatment-free model is misspecified (Scenario 3).

In this second part of Study 1b, focusing on Scenario 3, the settings are the same as those introduced above, except that we set different numbers of households $H$ . Table 5 (subsection 5.3 of Appendix D) presents the three performance measures for all methods, i.e., household and individual OTR, and the MRV. As expected, in all the cases, compared with the $Q$ -learning (M0), WPOM (M1–M4) methods provide higher values of both household and individual OTR, and lower MRV. From Table 5, in the larger household sample cases, that is, $H \geq 3000$ , compared with either M0 ( $Q$ -learning) or M1, M2, and M3 (WPOM with different types of weights), M4 which is WPOM with adjusted weights provides the highest in both household and individual OTRs, and the lowest MRV. As well, M0 which does not use any balancing weights outputs the lowest OTRs and the highest MRV. These results verify that the estimated treatment configuration from M4 is the closest to the optimal treatment configuration. Thus, in these large household sample cases, M4 performs best among all these methods, and M0 performs the worst.

Simulation Study 2, a second simulation in a two-stage decision setting to demonstrate proposed dWPOM, is presented in Supplemental Materials Appendix E. These simulation results further demonstrate the robust estimation of the blip parameters resulting from the proposed method.

4. Population assessment of Tobacco and Health study

4.1. Data source and definition of treatments and outcome

Investigating household ordinal outcomes, we now apply our approach, dWPOM with household interference, to longitudinal survey data in the PATH study. We aim to estimate the optimal DTR for a pair in the same household, based on a sequence of rules of e-cigarette use or non-use, for achieving the smoking cessation of the pair in the household. Building on the PATH analysis in Jiang et al.,¹ we consider the subset of participant pairs both of whom smoke at the beginning of the study. In the PATH study, data were gathered in waves, starting from 2011, with each subsequent wave beginning approximately one year after the previous one. Studying the first four waves, we formulate the PATH analysis as a three-stage decision problem by defining the $j^{t h}$ stage, for $j = 1, 2, 3$ , as the time from Wave $j$ to but not including Wave $j + 1$ .

In this analysis, the treatment variable is the use of e-cigarettes by cigarette smokers. Because waves were separated for approximately one year, we define e-cigarette use reported at the wave of the measured outcome as indicative of the pre-wave treatment. The e-cigarette usage variable is determined by the question ‘Do you now use e-cigarettes (a) Every day (b) Some days (c) Not at all’. Answers of either ‘Every day’ or ‘Some days’ are coded as $A = 1$ , and answers of ‘Not at all’ as $A = 0$ .

Further, our household ordinal utility is constructed by a combination of binary outcomes of individuals sharing a household, where the binary outcome variable is an indicator of whether participants have either given up smoking (traditional cigarettes) or have tried to quit smoking or using tobacco product(s). That is, the household utility is the sum of the final binary outcomes of a pair in the same household, which is interpreted, for a pair in a household, as (a) neither, (b) one, or (c) both of them incur a benefit such as smoking cessation. Jiang²⁰ provides a comprehensive discussion on the precise construction of binary outcomes using questionnaires.

4.2. Household covariates choice and model settings

As for the household covariates choice in our POM, for the $j$ th stage, we first select the individual-level Wave $j$ variables: age (‘less than 35’ or ‘35+’), education, non-Hispanic, race and ‘plan to quit’. Then, building on these individual-level covariates, we can construct household or joint covariates for our household POM. For instance, the individual-level age variable is an indicator of ‘less than 35’; for the household-level age variable, we thus have three possibilities for a pair in the same household: Both, one of them, or neither of them is less than 35. Therefore, we construct the single age variable with three categories for the household model. Similarly, with three possible values for each variable, we construct the non-Hispanic, race, and ‘plan to quit’ variables at the household level.

For the household covariates, we have denoted, age, education, non-Hispanic, race, and ‘plan to quit’, as the covariates $x_{j}^{β} = (x_{j 1}, x_{j 2}, x_{3}, x_{4}, x_{j 5})^{⊤}$ in the Stage $j$ treatment-free model. We note that, compared with the PATH analysis in Jiang et al.,¹ we omit the sex variable in the household covariates, because it has not been significant when we focus on the household-level model. Individuals $s$ and $r$ are respectively the first and second listed members of the household pair in the data set. In addition, building on previous work studying moderators in the relationships of prior wave predictors of quitting smoking (e.g., Le Grande et al.²⁴), we select at each stage the variables age and ‘plan to quit’ as tailoring variables, that is, $x_{j}^{ξ} = (1, x_{j 1}^{s})^{⊤}$ , $x_{j}^{ψ} = (1, x_{j 1}^{r})^{⊤}$ , and $x_{j}^{ϕ} = (1, x_{j 5}^{s} + x_{j 5}^{r})^{⊤}$ . Therefore, in estimation, the blip model is set up as $γ [(a_{j + 1}^{s}, a_{j + 1}^{r}), x_{j}; ξ_{j}, ψ_{j}, ϕ_{j}] = a_{j + 1}^{s} ξ_{j}^{⊤} x_{j}^{ξ} + a_{j + 1}^{r} ψ_{j}^{⊤} x_{j}^{ψ} + a_{j + 1}^{s} a_{j + 1}^{r} ϕ_{j}^{⊤} x_{j}^{ϕ}$ , and the treatment-free model as $f (x_{j}^{β}; β_{j}) = β_{j}^{⊤} x_{j}^{β}$ . Accordingly, solving the sequential decision problem by backward induction, for the Stage $j = 3, 2, 1$ and $c = 1, 2$ , we have the POM that $l o g i t [P (\tilde{U_{j}^{r}} \leq c ∣ a_{j}^{s}, a_{j}^{r}, x_{j}; ξ_{j}, ψ_{j}, ϕ_{j})] = ζ_{c j} - β_{j}^{⊤} x_{j}^{β} - a_{j}^{s} ξ_{j}^{⊤} x_{j}^{ξ} - a_{j}^{r} ψ_{j}^{⊤} x_{j}^{ψ} - a_{j}^{s} a_{j}^{r} ϕ_{j}^{⊤} x_{j}^{ϕ} .$

To construct the balancing weights for the proposed POM, as introduced in Section 2.4, we estimate the marginal propensity scores, the pairwise odds ratios ( $τ_{s r}$ ), and the joint propensity scores. We first choose covariates at the individual level for the marginal treatment propensity models, based on the previous PATH studies of Benmarhnia et al.¹² and Jiang et al.,¹ as $x_{j}^{α} = (x_{j 1}, x_{j 2}, x_{3}, x_{4}, x_{5 j}, x_{6})^{⊤}$ , namely, age, education, non-Hispanic, race, ‘plan to quit’, and sex. Then, we employ logistic regression to acquire marginal treatment propensity scores. The pairwise odds ratios are modelled through a generalized linear model with the log link and covariates $x_{s r} = [1, (x_{j 5}^{s} + x_{j 5}^{r})]^{⊤}$ , which represents the number of individuals in the same household who have a plan to quit. That is, with parameter $o$ , $l o g τ_{s r} (o) = o^{⊤} x_{s r}$ .

Following the methods that were introduced in Section 2, we can further estimate the joint propensity score, and the corresponding weights. In particular, we compare four different weights, which are (I) no balancing weights (M0), (II) no-association overlap weights (M1), where the joint propensity functions are equal to the product of marginal propensities, (III) association-aware overlap weights (5) (M3), and (IV) adjusted association-aware overlap weights (66) (M4). In this PATH analysis, we call them Methods I ( $Q$ -learning), II, III, and IV. It is important to note that Method IV employs the adjusted balancing weights, and is our desired treatment-association-aware dWPOM, which in theory guarantees approximately consistent estimators of the blip parameters.

4.3. PATH analysis results

Table 2 summarizes the blip estimates and their replication standard errors (in parenthesis) from Methods I, II, III, and IV in this PATH analysis. It is important to note that, for both members of a couple to either quit or attempt to quit smoking, the optimal DTRs for the household are functions of blip parameter estimates and the couple’s tailoring variables, that is, the decision rules in Decision 1. Method IV, which employs the adjusted balancing weights, is expected to provide a consistent estimation of these blip parameters. Thus, we particularly focus on the results from Method IV, while accounting for those from other methods.

Table 2.
Blip estimates and their replication standard errors (in parenthesis) from the analysis of PATH data.

Methods

Wave Est. I ( $Q$ -learning) II III IV

$1 \sim 2$ ${\hat{ξ}}_{0}$ −0.100 (0.037) −0.124 (0.044) 0.218 (0.072) 0.107 (0.047)

${\hat{ξ}}_{1}$ 0.180 (0.043) 0.130 (0.055) −0.188 (0.052) −0.150 (0.068)

${\hat{ψ}}_{0}$ −0.167 (0.040) −0.321 (0.049) −0.140 (0.047) −0.070 (0.050)

${\hat{ψ}}_{1}$ 0.251 (0.044) 0.257 (0.058) 0.017 (0.052) −0.273 (0.059)

${\hat{ϕ}}_{0}$ 0.102 (0.079) 0.376 (0.078) 0.009 (0.081) −0.001 (0.099)

${\hat{ϕ}}_{1}$ 0.001 (0.047) 0.106 (0.048) 0.087 (0.050) 0.045 (0.058)

$2 \sim 3$ ${\hat{ξ}}_{0}$ 0.341 (0.036) 0.116 (0.046) 0.081 (0.045) −0.031 (0.045)

${\hat{ξ}}_{1}$ −0.276 (0.043) 0.188 (0.062) −0.044 (0.056) 0.200 (0.058)

${\hat{ψ}}_{0}$ −0.078 (0.037) 0.052 (0.052) 0.205 (0.048) 0.004 (0.048)

${\hat{ψ}}_{1}$ 0.068 (0.045) 0.101 (0.067) 0.054 (0.058) 0.064 (0.062)

${\hat{ϕ}}_{0}$ 0.377 (0.097) 0.663 (0.110) 0.360 (0.108) 0.358 (0.126)

${\hat{ϕ}}_{1}$ −0.317 (0.060) −0.568 (0.065) −0.414 (0.067) −0.507 (0.076)

$3 \sim 4$ ${\hat{ξ}}_{0}$ 0.966 (0.040) 1.067 (0.041) 1.233 (0.040) 0.785 (0.047)

${\hat{ξ}}_{1}$ −0.419 (0.055) −0.268 (0.055) −0.391 (0.054) 0.304 (0.060)

${\hat{ψ}}_{0}$ 0.808 (0.038) 1.527 (0.047) 1.217 (0.043) 0.690 (0.044)

${\hat{ψ}}_{1}$ 0.448 (0.044) −0.507 (0.054) 0.138 (0.049) 0.987 (0.052)

${\hat{ϕ}}_{0}$ −1.612 (0.137) −0.150 (0.150) −1.479 (0.147) −1.151 (0.228)

${\hat{ϕ}}_{1}$ 0.331 (0.069) 0.085 (0.076) 0.004 (0.076) −0.091 (0.115)

		Methods
$1 \sim 2$	${\hat{ξ}}_{0}$	−0.100 (0.037)	−0.124 (0.044)	0.218 (0.072)	0.107 (0.047)
	${\hat{ξ}}_{1}$	0.180 (0.043)	0.130 (0.055)	−0.188 (0.052)	−0.150 (0.068)
	${\hat{ψ}}_{0}$	−0.167 (0.040)	−0.321 (0.049)	−0.140 (0.047)	−0.070 (0.050)
	${\hat{ψ}}_{1}$	0.251 (0.044)	0.257 (0.058)	0.017 (0.052)	−0.273 (0.059)
	${\hat{ϕ}}_{0}$	0.102 (0.079)	0.376 (0.078)	0.009 (0.081)	−0.001 (0.099)
	${\hat{ϕ}}_{1}$	0.001 (0.047)	0.106 (0.048)	0.087 (0.050)	0.045 (0.058)
$2 \sim 3$	${\hat{ξ}}_{0}$	0.341 (0.036)	0.116 (0.046)	0.081 (0.045)	−0.031 (0.045)
	${\hat{ξ}}_{1}$	−0.276 (0.043)	0.188 (0.062)	−0.044 (0.056)	0.200 (0.058)
	${\hat{ψ}}_{0}$	−0.078 (0.037)	0.052 (0.052)	0.205 (0.048)	0.004 (0.048)
	${\hat{ψ}}_{1}$	0.068 (0.045)	0.101 (0.067)	0.054 (0.058)	0.064 (0.062)
	${\hat{ϕ}}_{0}$	0.377 (0.097)	0.663 (0.110)	0.360 (0.108)	0.358 (0.126)
	${\hat{ϕ}}_{1}$	−0.317 (0.060)	−0.568 (0.065)	−0.414 (0.067)	−0.507 (0.076)
$3 \sim 4$	${\hat{ξ}}_{0}$	0.966 (0.040)	1.067 (0.041)	1.233 (0.040)	0.785 (0.047)
	${\hat{ξ}}_{1}$	−0.419 (0.055)	−0.268 (0.055)	−0.391 (0.054)	0.304 (0.060)
	${\hat{ψ}}_{0}$	0.808 (0.038)	1.527 (0.047)	1.217 (0.043)	0.690 (0.044)
	${\hat{ψ}}_{1}$	0.448 (0.044)	−0.507 (0.054)	0.138 (0.049)	0.987 (0.052)
	${\hat{ϕ}}_{0}$	−1.612 (0.137)	−0.150 (0.150)	−1.479 (0.147)	−1.151 (0.228)
	${\hat{ϕ}}_{1}$	0.331 (0.069)	0.085 (0.076)	0.004 (0.076)	−0.091 (0.115)

Optimal DTRs are functions of blip parameter estimates based on decision rules in Decision 1. Est. stands for the blip parameters’ estimates. DTR: dynamic treatment regime; PATH: Population Assessment of Tobacco and Health.

Because the household case with four treatment configurations is more complicated than in the previous individual-level analysis, based on Rule 1, we give several examples of how the results may be interpreted. For Method IV, in Stage 3 (Wave $3 \sim 4$ ), for example, the blip estimate is $A^{s} (0.785 + 0.304 * a g e_{s}) + A^{r} (0.690 + 0.987 * a g e_{r}) + A^{s} * A^{r} (- 1.151 - 0.091 * P Q)$ , where $P Q$ represents the plans of quitting for a couple in the same household, and $a g e_{s}$ and $a g e_{r}$ are ages of $s$ and $r$ . When we plug in four possibilities of $(A^{s}, A^{r}) = (1, 1), (1, 0), (0, 1), (0, 0)$ , the blip estimates are $0.785 + 0.304 * a g e_{s} + 0.690 + 0.987 * a g e_{r} - 1.151 - 0.091 * P Q$ , $0.785 + 0.304 * a g e_{s}$ , $0.690 + 0.987 * a g e_{r}$ , and $0$ , respectively. Table 3 summaries the blip estimates for different treatment configurations $(A_{h}^{s}, A_{h}^{r})$ from Method IV (Stage 3). To interpret these results, we provide the following examples.

Table 3.

Blip estimates for different treatment configurations $(A_{h}^{s}, A_{h}^{r})$ from Method IV (Stage 3).

( $A_{h}^{s}, A_{h}^{r}$ )	Blip estimate from Method IV (Stage 3)
$(1, 1)$	$0.785 + 0.304 * a g e_{s} + 0.690 + 0.987 * a g e_{r} - 1.151 - 0.091 * P Q$
$(1, 0)$	$0.785 + 0.304 * a g e_{s}$
$(0, 1)$	$0.690 + 0.987 * a g e_{r}$
$(0, 0)$	0

Example 1

If we have household tailoring variables such that $a g e_{s} = 1$ $a g e_{r} = 0$ , and $P Q = 0$ , the blip estimates are $0.785 + 0.304 + 0.690 - 1.151 = 0.628$ , $0.785 + 0.304 = 1.089$ , $0.690$ , and $0$ , respectively. The largest blip estimate is $1.089$ , and corresponds to the treatment configuration $(A^{s} = 1, A^{r} = 0)$ . Therefore, in Stage 3 (Wave $3 \sim 4$ ), if individual $s$ is less than $35$ but $r$ is not, and both of them have no plan to quit, then the treatment recommendation for this household should be $(A^{s} = 1, A^{r} = 0)$ .

Example 2

If we have household tailoring variables such that $a g e_{s} = 0$ , $a g e_{r} = 1$ , and $P Q = 2$ (individual $s$ is over $35$ but $r$ is not, and both of them have plans to quit), the blip estimates are $0.785 + 0.690 + 0.987 - 1.151 - 0.091 * 2 = 1.192$ , $0.785$ , $0.690 + 0.997 = 1.687,$ and $0$ , respectively; then the treatment recommendation should be $(A^{s} = 0, A^{r} = 1)$ .

Finally, we note that an important aspect of rigorous real data analysis is implementing cross-validation to evaluate proposed methods properly. While the complexity of the PATH design may pose challenges, McConville,²⁵ Opsomer and Miller²⁶ and You²⁷ suggest that there are alternative cross-validation strategies and literature available to address these challenges, specifically in the context of complex survey data. Users should carefully choose and adapt cross-validation methods to suit their specific data and research needs, ensuring the reliability and validity of their results. Consequently, a further research direction for our methods entails an examination of suitable cross-validation strategies for analyzing data from complex longitudinal surveys.

5. Conclusion and discussion

In this paper, considering household interference and household utility, we proposed a robust DTR estimation method for ordinal outcomes to consistently estimate optimal DTRs. This method, namely dWPOM, uses sequential WPOM with adjusted balancing weights. We theoretically and empirically demonstrated the approximate double robustness property of our WPOM approach, which utilizes the proposed adjusted balancing weights. In the presence of household interference, our WPOM addresses household ordinal utility problems and provides optimal treatment recommendations for both individuals in the household. To address the ordinal outcomes challenge, we consider a POM because of its easy estimation and interpretation and note that any POM-related tools or techniques, such as those for variable selection or model diagnosis of POMs, can be employed in our method. Regarding inference, a single-stage decision can use standard errors from WPOM to create confidence intervals for blip parameters directly. However, for multi-stage decisions, non-regularity issues arise.²⁸ Hence, future research is needed to develop methods like adaptive bootstrap and m-out-of-n bootstrap²⁸ for constructing multi-stage decision confidence intervals.

We have also made a methodological contribution to the study of interference. In addition to considering the effects of neighbours’ treatments on an individual’s outcome, we considered a possible association between their treatments. Building on this, we presented the estimation process for joint propensity scores in the case where there exists an association between treatments of individuals in the same household, then estimated the corresponding balancing weights that satisfy the balancing criterion. Our simulation studies have revealed that if there exists an association between treatments but we fail to consider it, then the DTR estimation process will lead to bias. It would be straightforward to extend our household interference case to cases of partial interference, where treatments of individuals blocked by clusters can affect outcomes of the individuals in the same cluster, while also accounting for the association between these treatments of individuals in the same cluster. However, the association-aware estimation has an extra cost: Modelling association between pairs of binary treatments, such as through a pairwise odds ratio model in our household case. For the cluster partial-interference case, we suggest considering the log-linear model to extend our work to estimate the ‘higher-order’ odds ratio association.²⁹ Note that in cases of association, the final goal is to estimate the joint propensity scores; therefore, we recommend employing machine learning methods, such as random forest or deep neural network, to directly train the model for the joint propensity scores.

We acknowledge that any misspecification in the association model (Step 2 of the Algorithm 2) could impact the accuracy of our joint propensity score estimation, and thus affect the approximate double robustness of the proposed method. This step could be refined by using flexible data-adaptive approaches that accommodate correlated data. Examples of such approaches include mixed-effect machine learning³⁰ and smoothed kernel regression designed for dependent data.³¹ Further investigation is needed to assess the robustness of the proposed methods in terms of association models and employing these data-driven approaches. Our formulation through the household utility function also allows for some association among the responses of the household members, conditional on their treatments. In particular, when we define household utility as a function of the sum of individual utilities, which happens to be the sum of their response indicators, the utility distribution will imply an association between the responses of paired members.

We also note that individuals within the same household may experience varying magnitudes of interference effects. When the objective is to optimize the household’s utility function, the fourth term in our model (Equation 1) captures these interference effects, and distinguishing between the effects on individuals may not be necessary. However, if the goal is to optimize individual outcomes or understand the role of interference effects for each person (e.g., husband or wife), distinguishing between these interference effects becomes essential. To achieve this, we can examine individual outcome models, which involve modelling the outcomes of the husband and wife separately, such as using distinct logistic regression models. For a consistent estimation approach in logistic regression, we refer to Appendix H.1.2 in Jiang,²⁰ where the balancing property with household interference for binary outcomes was proposed for generalized linear models that consider interference. It would be of interest to explore further the estimation and development of optimal decision rules for this approach, in order to compare it with the approach based on household outcomes.

Extending the proposed method to households with varying numbers of individuals, including those with more than two individuals, presents a challenging yet important endeavour. This extension can be likened to addressing a partial interference^32,4 problem, where treatments of individuals blocked by clusters can affect outcomes of the individuals in the same cluster, and households can be viewed as distinct clusters. To address the partial interference problem, two modelling are necessary for the investigation: (1) modelling the outcomes, this entails developing regression models for the outcomes. It is worth noting that dealing with high dimensionality (e.g., the treatment indicators of all study units in the cluster and cluster-level pre-treatment covariates) may necessitate additional assumptions, such as conditional stratified interference, to mitigate the challenges posed by the curse of dimensionality; see Section 2.2 of Park et al.⁴ for additional discussions about these assumptions; (2) modelling the joint propensity scores: In the context of the joint propensity scores, it’s important to consider the potential removal of the treatment-independent assumption as we investigated in Section 2.4. This can be achieved by accounting for the associations between treatments of individuals within the same cluster.

Through our analysis of the PATH study, we have demonstrated the practical applicability of our proposed methods. We estimated a treatment decision function for household pairs to maximize the probability of achieving smoking cessation under the assumptions of a model for their treatment and success. In particular, we modelled the potential association of e-cigarette usage between members of a household pair and estimated the joint propensity scores that play a crucial role in approximately doubly robust estimation with interference. We acknowledge some limitations of our analysis: the PATH data points are a year apart, which is not an ideal spacing for treatment decisions, and the size and period of the PATH subsample provide insufficient data on some of the possible treatment sequences to make the findings easily interpretable or conclusive. In addition, we have implicitly assumed that there is meaning in having been the first of the two household members to be interviewed. While this could well be the case in practice (e.g., the household head is interviewed first), it is important to recognize the semi-arbitrary nature of this labelling, and to assess its impact in future research. Due to these limitations, it is essential to note that the results of our PATH analysis are not intended as authentic treatment recommendations for smoking cessation. They nonetheless serve to demonstrate the underlying principles of household interference in such a context and the methodology we propose in this analysis.

Supplemental Material

sj-pdf-1-smm-10.1177_09622802241242313 - Supplemental material for Estimating dynamic treatment regimes for ordinal outcomes with household interference: Application in household smoking cessation

Supplemental material, sj-pdf-1-smm-10.1177_09622802241242313 for Estimating dynamic treatment regimes for ordinal outcomes with household interference: Application in household smoking cessation by Cong Jiang, Mary Thompson and Michael Wallace in Statistical Methods in Medical Research

Supplemental Material

sj-zip-2-smm-10.1177_09622802241242313 - Supplemental material for Estimating dynamic treatment regimes for ordinal outcomes with household interference: Application in household smoking cessation

Supplemental material, sj-zip-2-smm-10.1177_09622802241242313 for Estimating dynamic treatment regimes for ordinal outcomes with household interference: Application in household smoking cessation by Cong Jiang, Mary Thompson and Michael Wallace in Statistical Methods in Medical Research

Footnotes

Data availability

The data that support the findings of this study come from the PATH Study. Restrictions apply to the availability of these data, which were used under license for this study. Data collected in the PATH Study are available from with the permission of the Population Assessment of Tobacco and Health Study Restricted-Use Files (ICPSR 36231).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work has been supported by the Ontario Institute for Cancer Research (OICR) Biostatistics Training Initiative (BTI) Studentship Award through funding provided by the Government of Ontario, by funds from a CIHR Project Grant to M. P. Wallace, and by a Discovery Grant to M. E. Thompson (RGPIN-2016-03688) from NSERC.

ORCID iDs

Cong Jiang

Michael Wallace

Supplemental material

Additional supporting information can be found online in the Supporting Information section.

References

Jiang

Wallace

Thompson

. Dynamic treatment regimes with interference. Can J Stat 2023; 51: 469–502.

Song

. Modelling and estimation for optimal treatment decision with interference. Stat 2019; 8: e219.

Sherman

Arbour

Shpitser

. General identification of dynamic treatment regimes under interference. Proc Mach Learn Res 2020; 108: 3917–3927.

Park

Chen

et al. Optimal allocation of water and sanitation facilities to prevent communicable diarrheal diseases in senegal under partial interference. arXiv preprint arXiv:211109932 2021.

Wallace

Moodie

EEM

. Doubly-robust dynamic treatment regimen estimation via weighted least squares. Biometrics 2015; 71: 636–644.

Moodie

Dean

Sun

. Q-learning: Flexible learning about useful utilities. Stat Biosci 2014; 6: 223–243.

Wallace

Moodie

Stephens

. Model selection for g-estimation of dynamic treatment regimes. Biometrics 2019; 75: 1205–1215.

Simoneau

Moodie

Nijjar

et al. Estimating optimal dynamic treatment regimes with survival outcomes. J Am Stat Assoc 2020; 115: 1531–1539.

Jiang

Wallace

Thompson

. Doubly-robust dynamic treatment regimen estimation for binary outcomes. arXiv preprint, arXiv:220308269 2022; EPRINT2203.08269.

10.

Hubbard

Gorely

Ozakinci

et al. A systematic review and narrative summary of family-based smoking cessation interventions to help adults quit smoking. BMC Fam Pract 2016; 17: 73.

11.

Foulstone

Kelly

Kifle

. Partner influences on smoking cessation: A longitudinal study of couple relationships. J Subst Use 2017; 22: 501–506.

12.

Benmarhnia

Pierce

Leas

et al. Can e-cigarettes and pharmaceutical aids increase smoking cessation and reduce cigarette consumption? findings from a nationally representative cohort of American smokers. Am J Epidemiol 2018; 187: 2397–2404.

13.

Hajek

Phillips-Waller

Przulj

et al. A randomized trial of e-cigarettes versus nicotine-replacement therapy. New England J Med 2019; 380: 629–637.

14.

Lewis

McBride

Pollak

et al. Understanding health behavior change among couples: An interdependence and communal coping approach. Soc Sci Med 2006; 62: 1369–1380.

15.

McCullagh

. Regression models for ordinal data. J R Stat Soc: Ser B (Methodological) 1980; 42: 109–127.

16.

Morgan

Zaslavsky

. Balancing covariates via propensity score weighting. J Am Stat Assoc 2018; 113: 390–400.

17.

Thomas

. Addressing extreme propensity scores via the overlap weights. Am J Epidemiol 2019; 188: 250–257.

18.

Rubin

. Comment: Randomization analysis of experimental data: The fisher randomization test. J Am Stat Assoc 1980; 75: 591–593.

19.

Robins

. Optimal structural nested models for optimal sequential decisions. In Proceedings of the Second Seattle Symposium in Biostatistics. Springer, 189–326.

20.

Jiang

. Dynamic Treatment Regimes with Interference. PhD dissertation, University of Waterloo, https://uwspace.uwaterloo.ca/handle/10012/18565, 2022.

21.

Lipsitz

Laird

Harrington

. Generalized estimating equations for correlated binary data: Using the odds ratio as a measure of association. Biometrika 1991; 78: 153–160.

22.

Liang

Zeger

. Longitudinal data analysis using generalized linear models. Biometrika 1986; 73: 13–22.

23.

Brant

. Assessing proportionality in the proportional odds model for ordinal logistic regression. Biometrics 1990; 46: 1171–1178.

24.

Le Grande

Borland

Yong

et al. Predictive power of dependence measures for quitting smoking. findings from the 2016 to 2018 ITC four country smoking and vaping surveys. Nicotine Tob Res 2021; 23: 276–285.

25.

McConville

. Improved estimation for complex surveys using modern regression techniques. PhD Thesis, Colorado State University, 2011.

26.

Opsomer

Miller

. Selecting the amount of smoothing in nonparametric regression estimation for complex surveys. Nonparametr Stat 2005; 17: 593–611.

27.

You

. Cross-validation in model-assisted estimation. Ames, Lowa: Iowa State University, Digital Repository, 2009.

28.

Chakraborty

Murphy

Strecher

. Inference for non-regular parameters in optimal dynamic treatment regimes. Stat Methods Med Res 2010; 19: 317–343.

29.

Thompson

. Marginal and association regression models for longitudinal binary data with drop-outs: A likelihood-based approach. Can J Stat 2005; 33: 3–20.

30.

Ngufor

Van Houten

Caffo

et al. Mixed effect machine learning: A framework for predicting longitudinal change in hemoglobin a1c. J Biomed Inform 2019; 89: 56–67.

31.

Park

Kang

. Efficient semiparametric estimation of network treatment effects under partial interference. Biometrika 2022; 109: 1015–1031.

32.

Sobel

. What do randomized studies of housing mobility demonstrate? causal inference in the face of interference. J Am Stat Assoc 2006; 101: 1398–1407.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.04 MB

0.00 MB

1.26 MB

		Method
$H$	Performance	Interference-unaware	WPOM	Cross-validated WPOM
$500$	OTR-H	0.063 (0.005)	0.395 (0.018)	0.393 (0.019)
	OTR-I	0.751 (0.006)	0.829 (0.016)	0.827 (0.016)
	MRV	0.375 (0.008)	0.055 (0.018)	0.060 (0.018)
$1500$	OTR-H	0.062 (0.003)	0.530 (0.018)	0.522 (0.018)
	OTR-I	0.750 (0.005)	0.872 (0.014)	0.873 (0.014)
	MRV	0.375 (0.006)	0.032 (0.017)	0.037 (0.018)
$3000$	OTR-H	0.063 (0.003)	0.622 (0.017)	0.618 (0.018)
	OTR-I	0.750 (0.004)	0.901 (0.012)	0.900 (0.013)
	MRV	0.374 (0.005)	0.024 (0.016)	0.028 (0.016)

		Methods
Wave	Est.	I ( $Q$ -learning)	II	III	IV
$1 \sim 2$	${\hat{ξ}}_{0}$	−0.100 (0.037)	−0.124 (0.044)	0.218 (0.072)	0.107 (0.047)
	${\hat{ξ}}_{1}$	0.180 (0.043)	0.130 (0.055)	−0.188 (0.052)	−0.150 (0.068)
	${\hat{ψ}}_{0}$	−0.167 (0.040)	−0.321 (0.049)	−0.140 (0.047)	−0.070 (0.050)
	${\hat{ψ}}_{1}$	0.251 (0.044)	0.257 (0.058)	0.017 (0.052)	−0.273 (0.059)
	${\hat{ϕ}}_{0}$	0.102 (0.079)	0.376 (0.078)	0.009 (0.081)	−0.001 (0.099)
	${\hat{ϕ}}_{1}$	0.001 (0.047)	0.106 (0.048)	0.087 (0.050)	0.045 (0.058)
$2 \sim 3$	${\hat{ξ}}_{0}$	0.341 (0.036)	0.116 (0.046)	0.081 (0.045)	−0.031 (0.045)
	${\hat{ξ}}_{1}$	−0.276 (0.043)	0.188 (0.062)	−0.044 (0.056)	0.200 (0.058)
	${\hat{ψ}}_{0}$	−0.078 (0.037)	0.052 (0.052)	0.205 (0.048)	0.004 (0.048)
	${\hat{ψ}}_{1}$	0.068 (0.045)	0.101 (0.067)	0.054 (0.058)	0.064 (0.062)
	${\hat{ϕ}}_{0}$	0.377 (0.097)	0.663 (0.110)	0.360 (0.108)	0.358 (0.126)
	${\hat{ϕ}}_{1}$	−0.317 (0.060)	−0.568 (0.065)	−0.414 (0.067)	−0.507 (0.076)
$3 \sim 4$	${\hat{ξ}}_{0}$	0.966 (0.040)	1.067 (0.041)	1.233 (0.040)	0.785 (0.047)
	${\hat{ξ}}_{1}$	−0.419 (0.055)	−0.268 (0.055)	−0.391 (0.054)	0.304 (0.060)
	${\hat{ψ}}_{0}$	0.808 (0.038)	1.527 (0.047)	1.217 (0.043)	0.690 (0.044)
	${\hat{ψ}}_{1}$	0.448 (0.044)	−0.507 (0.054)	0.138 (0.049)	0.987 (0.052)
	${\hat{ϕ}}_{0}$	−1.612 (0.137)	−0.150 (0.150)	−1.479 (0.147)	−1.151 (0.228)
	${\hat{ϕ}}_{1}$	0.331 (0.069)	0.085 (0.076)	0.004 (0.076)	−0.091 (0.115)