Sage Journals: Discover world-class research

Abstract

When using the propensity score method to estimate the treatment effects, it is important to select the covariates to be included in the propensity score model. The inclusion of covariates unrelated to the outcome in the propensity score model led to bias and large variance in the estimator of treatment effects. Many data-driven covariate selection methods have been proposed for selecting covariates related to outcomes. However, most of them assume an average treatment effect estimation and may not be designed to estimate quantile treatment effects (QTEs), which are the effects of treatment on the quantiles of outcome distribution. In QTE estimation, we consider two relation types with the outcome as the expected value and quantile point. To achieve this, we propose a data-driven covariate selection method for propensity score models that allows for the selection of covariates related to the expected value and quantile of the outcome for QTE estimation. Assuming the quantile regression model as an outcome regression model, covariate selection was performed using a regularization method with the partial regression coefficients of the quantile regression model as weights. The proposed method was applied to artificial data and a dataset of mothers and children born in King County, Washington, to compare the performance of existing methods and QTE estimators. As a result, the proposed method performs well in the presence of covariates related to both the expected value and quantile of the outcome.

Keywords

Causal inference inverse probability weighting (IPW) estimator observational study propensity score quantile regression

1. Introduction

The investigation of causal effects is a topic of interest in many research fields. To investigate the causal effects, the effect of treatment on individual outcomes was evaluated. Additionally, one may not only be interested in the average treatment effect on multiple targets but also in the effect of treatment on the upper or lower side of the distribution of outcome by evaluating quantile treatment effects (QTEs). Studies using QTEs include an evaluation of the effect of maternal smoking during pregnancy on birth weight¹ and the effect of job training on annual income.²

When estimating QTEs from the observed data, the distribution of covariates affecting the outcome may differ between the group receiving the treatment (treatment group) and the group not receiving the treatment (control group). It is difficult to distinguish the effect of treatment from the effect of confounding variables when the distribution of confounding variables differs between the two groups. In such cases, an adjustment (covariate adjustment) is necessary to ensure that the distribution of covariates affecting the outcome is equal among treatments. Several methods have been proposed to estimate QTEs from observed data with covariate adjustments. Using an estimator weighed by the propensity score,^3,4 the method of estimating the QTEs on panel data⁵ are well-known examples. In this study, we consider estimating QTEs by performing covariate adjustment using the method weighed by propensity scores, as this method is easy to apply to real data.

Selecting the covariates to be included in the propensity score model is an important issue when estimating the treatment effects using propensity scores. The covariates included in the propensity score model are well discussed.^6–8 In fact, it has been reported that including covariates unrelated to both the outcome and treatment in the propensity score model can bias the estimator of the treatment effect and that including covariates related to the treatment reduces the efficiency of the estimator. Therefore, the inclusion of covariates in the propensity score model should be carefully considered. In many cases, selecting covariates to be included in a propensity model is difficult a priori. To address this issue, several data-driven covariate selection methods have been proposed in the context of average treatment effects using propensity scores.^9–11

Although covariate selection methods for estimating average treatment effects have been well discussed, covariate selection methods for QTE estimation have not been sufficiently discussed. QTE estimation considers two possible relationships for the outcome, expected value and quantile value. In a linear model, it is easy to imagine that when it is related to the expected value, it is also related to the quantile. However, some variables were only related to the quantiles. For example, in the case of normal distribution, the covariate associated with variance is associated with its conditional quantile. Thus, existing covariate selection methods for average treatment effects are unsuitable for selecting covariates related to the outcome quantile.

We propose a covariate selection method called quantile outcome adaptive lasso (QOAL) for QTE estimation using the propensity score method. Specifically, we modified the weights of regularization terms in the outcome adaptive lasso (OAL)¹¹ covariate selection method, which assumes an average treatment effect. In OAL, the weights of regularization terms are partial regression coefficients of the multiple regression model, with the outcome as the objective variable. In QOAL, the weights of the regularization terms are partial regression coefficients of the quantile regression model,¹² with the outcome as the objective variable. Using the partial regression coefficients of quantile regression as weighting terms, the proposed method is expected to select covariates associated with outcome quantiles.

This study proposes a covariate selection method for QTE estimation. Section 2 describes the details of QOAL, a covariate selection method for QTE estimation; Section 3 reports the results of numerical experiments to evaluate the performance of QOAL; Section 4 reports the results of applying the proposed method to real data; and Section 5 concludes the paper.

2. Quantile outcome adaptive lasso (QOAL)

2.1. Estimand

To define QTE, let $Y (j) (j = 0, 1)$ be the potential outcome¹³ and assume that the relationship between the observed and potential outcome is $Y = Z Y (1) + (1 - Z) Y (0)$ using the binary treatment variable $Z \in {0, 1}$ .

For the definition of a QTE, We define a quantile. For $j = 1, 0$ , $F_{Y (j)} (y)$ be the distribution function of the potential outcome $Y (j)$ . The $100 τ %$ quantile of $Y (j)$ $q_{j, τ}$ is defined as³

\begin{aligned} q_{j, τ} & \equiv \inf_{q} {F_{Y (j)} (q) \geq τ} \\ = \inf_{q} {\Pr [Y (j) \leq q] \geq τ} j = 0, 1 \end{aligned}

(1)QTE is the difference between

τ \in (0, 1)

the quartile point of distribution of the potential outcome for the treated and untreated groups. QTE is defined by using equation (1) as follow¹⁴

QTE = q_{1, τ} - q_{0, τ}

2.2. Propensity score

Let $n$ be the sample size, $Y$ be a random variable representing the observed outcome, and $y_{i}$ be the realization of the outcome for target $i$ . Let $Z$ be a binary (0: no treatment, 1: treatment) random variable (treatment variable) representing the treatment and $z_{i}$ be the realization of the treatment variable for target $i$ . Let $X = (X_{1}, X_{2}, \dots, X_{p})^{'} \in R^{p}$ be the covariate vector, where $X^{'}$ denotes the transposition of $X$ . Let $x_{i} = (x_{i 1}, x_{i 2}, \dots, x_{i p})^{'}$ be the realization of the covariate vector of target $i$ . The pair of observed random variables ( $Y_{i}, Z_{i}, X_{i}^{'}$ ) is assumed independent and identically distributed. To estimate QTEs, we make two assumptions, the assignment of a strongly ignorable treatment and the uniqueness of the quantile.

Assumption 2.1 (Strong ignorability¹³)

The following condition holds for all $X \in X$ , where $X$ denotes support provided by $X$ .

(i)
$(Y (1), Y (0))$ and the treatment $Z$ are conditionally independent under the condition of the covariate $X$ .
$(Y (1), Y (0)) ⊥ ⊥ Z ∣ X$

(ii)
Conditional on the covariate $X = x$ , the probability of assignment to a treatment is $> 0$ and $> 1$ .
$0 < \Pr (Z = 1 ∣ X = x) < 1$

Consider two quantiles, $q_{1, τ}$ and $q_{0, τ}$ defined by equation (1). We assume that these two quantiles are unique to any $τ \in (0, 1)$ .
Assumption 2.2 (Uniqueness of quantiles³)

For $j = 0, 1$ , $Y (j)$ is a continuous-valued random variable with $R$ as its support and the following holds:

There exists a non-empty set $Y_{1}, Y_{0}$ such that $Y_{j} = {τ \in (0, 1); \Pr (Y (j) \leq q_{j, τ} - c) < \Pr (Y (j) \leq q_{j, τ} + c), \forall c \in R, c > 0}$ .

Next, we defined the propensity scores. The probability that a subject is assigned to a treatment conditional on the covariate $X = x$ is called the propensity score and defined by the following equation:

π (x) = \Pr (Z = 1 ∣ X = x)

The propensity scores are generally unknown. We used a logistic regression model between the treatment variables and covariates (propensity score model).

\begin{aligned} logit {π (x)} & = logit {\Pr (Z = 1 ∣ X = x)} \\ = x^{'} α \end{aligned}

(2)where

logit (p) = \log (p / (1 - p))

and

α = (α_{1}, α_{2}, \dots, α_{p})^{'}

is the partial regression coefficient vector and

α_{p + 1}

is the intercept. From Assumptions 2.1 and 2.2, some methods have been proposed to estimate QTE.^3–5 We estimate the QTE using the method proposed by Firpo.³ This is because Firpo’s method estimates QTEs by weighted quantile regression, which can be easily applied to real data. Firpo shows that under Assumptions 2.1 and 2.2, the quantile of the distribution of a potential outcome can be expressed as a function of the observable variable using propensity scores. See Firpo’s study for further details.³

This study proposes a covariate selection method for the propensity score model in equation (2) assuming QTE estimation. Let $C$ be an index set of confounding variables, $P_{E}$ be an index set of covariates related to the expected value of the outcome, $P_{Q}$ be an index set of covariates related to the quantile of the outcome, $F$ be an index set of covariates related to the treatment, and $I$ be an index set of covariates that are neither treatment nor outcome-related. We assume that the number of covariates $p = | C | + | P_{E} | + | P_{Q} | + | F | + | I |$ , where $| \cdot |$ denotes the cardinality of a set. In the estimation of the average treatment effect, the bias is reduced when covariate $X_{C}$ is included in the propensity score model, and the statistical efficiency is improved when $P_{E}$ is included in the propensity score model.

The proposed covariate selection method estimated the following propensity scores:

logit {\tilde{π} (x_{i})} = \sum_{j \in C} {\tilde{α}}_{j} x_{i j} + \sum_{j \in P_{E} \cup P_{Q}} {\tilde{α}}_{j} x_{i j}

where

\tilde{π} (x_{i})

denotes the propensity score model. Since

\tilde{π} (x_{i})

differs from

π (x_{i})

, the bias in the estimator of propensity score based on

\tilde{π} (x_{i})

tends to be large when

| F |

is large or the distribution of covariates is skewed. This is because the true propensity score model estimates the nonzero regression coefficient as

{\tilde{α}}_{j} = 0 (j \in F)

2.3. Model and parameter estimation of QOAL

In this study, we assume that the $100 τ %$ quantile of the outcome at the quantile level $τ (0 < τ < 1)$ can be expressed using covariates and treatment variables (outcome regression model) as follows:

Q_{τ} (Y ∣ X = x, Z = z) = x^{'} β + z β_{p + 1} + β_{p + 2}

(3)where

Q_{τ} (Y ∣ X = x, Z = z)

is the conditional

100 τ %

quantile of

Y

conditioning

X = x

β = (β_{1}, β_{2}, \dots, β_{p})^{'}

is the partial regression coefficient vector corresponding to the covariates, and

β_{p + 1}

is the partial coefficient corresponding to the treatment variable,

β_{p + 2}

is the intercept term. The partial regression coefficients are estimated by minimizing the check function defined by

ρ_{τ} (u) = u (τ - 1 (u \leq 0))

QOAL uses the estimated regression coefficients of the outcome regression model in equation (3) to estimate the partial regression coefficient vector of the propensity score model.

\hat{α} = \underset{α}{argmin} [\sum_{i = 1}^{n} {- z_{i} (x_{i}^{'} α) + \log (1 + e^{x_{i}^{'} α})} + λ \sum_{j = 1}^{p} {\hat{ω}}_{j} | α_{j} |]

(4)where

λ \geq 0

is the regularization parameter.

{\hat{ω}}_{j}

{\hat{ω}}_{j} = | {\hat{β}}_{j} |^{- η}

for any

η > 1

, where

{\hat{β}}_{j} (j = 1, 2, \dots, p)

is the estimated partial regression coefficient of the

j

th covariate in equation (3). We define

η_{r}

to satisfy

λ_{r} n^{η_{r} / 2 - 1} = n^{2}

The QOAL estimate ${\hat{α}}_{j} (QOAL)$ satisfies the following properties:

Theorem 2.1 (Consistency)

For $η > 1$ , $λ / \sqrt{n} \to 0$ and $λ n^{η / 2 - 1} \to \infty$ . Under the regularity condition, the QOAL estimate $\hat{α} (QOAL)$ satisfies

lim_{n \to \infty} Pr {{\hat{α}}_{j} (QOAL) = 0 ∣ j \in A^{c}} = 1

Let

A^{c} = F \cup I

be the set of the indices of the partial regression coefficient corresponding to the covariate that is unrelated to the outcome. Additionally,

\hat{β}

is a root

n

-consistent estimator. Refer to the Appendix for the regularity condition and proof.

The above theorem guarantees that as the sample size increases, QOAL estimates the partial regression coefficients that are not related to the quantile of the outcome to be zero. This does not guarantee that the partial regression coefficients associated with the quantiles of the outcome are estimated as nonzero.

2.4. Selection criteria for regularization parameters

The candidate regularization parameter predefined by the analyst is $Λ = {λ_{r} | r = 1, 2, \dots, R}$ , where $R$ is the number of regularization parameters. The regularization parameter $λ_{r}$ , which minimizes the weighted average absolute sum (wAMD) among the candidates, was used in the analysis. $λ$ selected using wAMD minimizes the absolute difference between two groups of weighted covariates, where wAMD is defined as

wAMD (λ_{r}) = \sum_{j = 1}^{p} | {\hat{β}}_{j} | | \frac{\sum_{i = 1}^{n} {\hat{ϕ}}_{i}^{(λ_{r})} x_{i j} z_{i}}{\sum_{i = 1}^{n} {\hat{ϕ}}_{i}^{(λ_{r})} z_{i}} - \frac{\sum_{i = 1}^{n} {\hat{ϕ}}_{i}^{(λ_{r})} x_{i j} (1 - z_{i})}{\sum_{i = 1}^{n} {\hat{ϕ}}_{i}^{(λ_{r})} (1 - z_{i})} |

(5)Note that

{\hat{ϕ}}_{i}^{(λ_{r})}

is defined using the propensity score-fitted value

{\hat{ϕ}}_{i}^{(λ_{r})}

when estimated with

λ_{r}

as follows:

{\hat{ϕ}}_{i}^{(λ_{r})} = \frac{z_{i}}{{\hat{π}}_{i}^{(λ_{r})} (x_{i})} + \frac{1 - z_{i}}{1 - {\hat{π}}_{i}^{(λ_{r})} (x_{i})}

2.5. Algorithm of QOAL

We describe the procedure for estimating the parameters of the QOAL. First, we estimate the partial regression coefficients of the outcome regression model. Next, we calculate the partial regression coefficients of the propensity score model and estimate the propensity score by penalized logistic regression using the partial regression coefficients of the outcome regression model. Next, the estimated propensity score is used to calculate the QTE by Firpo’s method. The $\hat{α}$ calculated using $λ_{r}$ that minimizes wAMD is the estimated partial regression coefficient of the propensity score model (Algorithm 1).

3. Simulation

Numerical simulations are conducted to evaluate the performance of the proposed method. We generated a dataset for use in numerical experiments with reference to the OAL article.¹¹ The 20 covariates of each target $i (= 1, 2, \dots, n)$ are independent and identical standard normal distributions. We denote the covariate vector by $X_{i} = (X_{i 1}, X_{i 2}, \dots, X_{i 20})^{'}$ . The binary treatment variable $Z_{i}$ is generated from the Bernoulli distribution of $logit {Pr (Z_{i} = 1 ∣ X = X_{i})} = \sum_{j = 1}^{20} α_{j} X_{i j}$ . $α_{j}$ is the true value of the $j th (= 1, 2, \dots, 20)$ partial regression coefficient in the propensity score model. A continuous-valued outcome $Y$ is generated from the following four scenarios. Our source code is available in the online supplemental material.

Scenario 1
Simple regression model
$Y_{i} = 2 Z_{i} + \sum_{j = 1}^{20} β_{j} X_{i j} + ε_{i}$

Scenario 2
Models with heterogeneous error variance
$Y_{i} = 2 Z_{i} + \sum_{j = 1}^{20} β_{j} X_{i j} + (1 + 0.75 (X_{i 2} + X_{i 10})) ε_{i}$

Scenario 3
Model with interaction between treatment variables and covariates
$Y_{i} = 2 Z_{i} (1 + X_{i 2}) + \sum_{j = 1}^{20} β_{j} X_{i j} + (1 + 0.75 (X_{i 2} + X_{i 10})) ε_{i}$

Scenario 4
Outcome regression models that differed by treatment
$\begin{aligned} Y_{i} (1) = \sum_{j = 1}^{20} β_{j}^{(1)} X_{i j} + (1 + 0.75 (X_{i 2} + X_{i 10})) ε_{i} \\ Y_{i} (0) = \sum_{j = 1}^{20} β_{j}^{(0)} X_{i j} + (1 + 0.75 (X_{i 2} + X_{i 10})) ε_{i} \end{aligned}$

where $ε_{i} \overset{i . i . d .}{\sim} N (0, 1)$ . The true values of partial regression coefficients for the outcome regression and propensity score models were as follows:
$\begin{aligned} α & = (1, 1, 0.4, 0, 0, 0, 1, 1.8, 1.8, \overset{11}{\overset{⏞}{0, \dots, 0}})^{'}, β = (0.6, 0.6, 0.2, 0.6, 0.6, 0.6, \overset{14}{\overset{⏞}{0, \dots, 0}})^{'} \\ β^{(1)} & = (0.6, 0.6, 0.2, 0.6, 0.6, 0.6, \overset{14}{\overset{⏞}{0, \dots, 0}})^{'}, β^{(0)} = (- 0.6, 0.6, 0.2, 0.6, 0.6, - 0.6, \overset{14}{\overset{⏞}{0, \dots, 0}})^{'} \end{aligned}$
Let $X_{1}, X_{2}, X_{3}$ be the confounding variables; $X_{4}, X_{5}, X_{6}$ be covariates related only to the outcome; $X_{7}, X_{8}, X_{9}$ be covariates related only to the treatment variable; and the remaining are irrelevant covariates related to neither the treatment variable nor the outcome. In Scenarios 2, 3, and 4, $X_{10}$ is a covariate that does not affect the expected value of the outcome but does affect the variance of the outcome.

To determine the effect of sample size on performance, $n = 500$ and $1000$ were used. The number of iterations $K$ in the numerical experiments was set to 1000. The treatment effect estimators relative bias (rBias), relative root mean square error (rRMSE), and standard deviation (SD) were used as evaluation indices. rBias, rRMSE, and SD were defined as follows:
$\begin{aligned} rBias & = \frac{1}{K} \sum_{k = 1}^{K} \frac{t - f_{k}}{t} \\ rRMSE & = \sqrt{\frac{1}{K} \sum_{k = 1}^{K} {(\frac{t - f_{k}}{t})}^{2}} \\ SD & = \sqrt{\frac{1}{K - 1} \sum_{k = 1}^{K} (f_{k} - \bar{f_{k}})^{2}} \end{aligned}$
where $f_{k}$ is the estimated QTE for the $100 τ %$ quantile repeated $k (1 \leq k \leq K)$ times, $\bar{f_{k}}$ is the mean of $f_{k}$ and $t$ is the true value of the QTE for the $100 τ %$ quantile. The true value $t$ of the QTE at the $100 τ %$ quantile is the difference between the $100 τ %$ points of the potential outcome obtained by generating 200,000 data samples from the true-outcome regression model.

Eight methods were used for comparison. The four methods with covariate selection are Lasso,¹⁵ adaptive lasso (Adl),¹⁶ average treatment effect estimation assumed outcome adaptive lasso (OAL),¹¹ and the proposed QOAL. Lasso and Adl are expected to select the covariates associated with treatment. The OAL selects the covariate associated with the expected outcome value. For further details on Lasso, OAL, and Adl, please refer to their respective studies.

The four methods without covariate selection fixed the covariates to be included in the propensity score model. The methods that include covariates and confounding related only to the expected value of the outcome in the propensity score model (Targ_Ex) include only confounding (Conf) and covariates related to the expected value of the outcome and treatment (Pot_Conf).

The results of Scenario 1 are listed in Table 1. In Scenario 1, the outcome was generated using a normal regression model. Thus, Scenario 1 is ideal for OAL. The method with covariate selection produced smaller values for all evaluation indices of OAL at almost all quantiles. For the method without covariate selection, the values of rBias and rRMSE for Targ_Ex are small. The covariate selection rates of QOAL and OAL in Scenario 1 are shown in Figure 1. There were no differences in the covariate selection rates between the two groups.

Figure 1.
Rate of covariates selection for $n = 1000$ in Scenario 1.

Table 1.
Results of Scenario 1: Scenario 1 generated the outcome from a normal regression model.

The results of Scenario 2 are presented in Table 2. Scenario 2 is the ideal situation for QOAL because the variance of the outcome depends on the covariates. QOAL has smaller SD and rRMSE values for all quantiles among the method with covariate selection. For some quantiles, OAL has smaller values of rBias in the $n = 1000$ case. The performance of the method without variable selection is better for Targ_Ex. The covariate selection rates for QOAL and OAL in Scenario 2 are shown in Figure 2. OAL and QOAL had high selection rates for confounding variables. The main difference between QOAL and OAL is the selection rate of the 10th covariate. This covariate does not affect the expected value of the outcome, but it does affect its variance. We observed that the 10th covariate was selected at a higher rate in the $25 %$ and $75 %$ percentile quartiles for QOAL. Therefore, the rRMSE and SD of QOAL were smaller than those of QOAL.

Figure 2.
Rate of covariates selection for $n = 1000$ in Scenario 2.

Table 2.
Results of Scenario 2: Scenario 2 generated an outcome from a model with heterogeneous error variance.

The results for Scenario 3 are listed in Table 3. In Scenario 3, the outcome was generated using a model with an interaction between the treatment variable and covariates. For $n = 500$ , all the evaluated indices in QOAL are small among the methods with covariate selection. The values of rBias for Targ_Ex indices were smaller than those of the other methods without covariate selection under some conditions.

Table 3.
Results of Scenario 3: Scenario 3 was generated from a model with an interaction between the treatment variable and the covariates.

For $n = 1000$ , OAL had the best results for all evaluation indices among the methods with covariate selection in the $25 %$ quartile. On the other hand, QOAL had the best results for all evaluation indexes among the methods with covariate selection at the $50 %$ and $75 %$ percentile quartiles. The trend for the method without covariate selection does not differ from $n = 500$ . The covariate selection rates for QOAL and OAL covariates in Scenario 3 are shown in Figure 3. From Figure 3, QOAL and OAL have a larger selection proportion of covariates that are irrelevant to the outcome than Scenarios 1 and 2.

Figure 3.
Rate of covariates selection for $n = 1000$ in Scenario 3.

The results of Scenario 4 are listed in Table 4. In Scenario 4, the outcome model differed depending on the treatment. Among the methods with covariate selection, OAL had a smaller SD and rRMSE at $25 %$ and $50 %$ . On the other hand, QOAL has a smaller rRMSE of $75 %$ among the methods with covariate selection. The performance of Targ_Ex was better than that of the methods without covariate selection. The covariate selection rates of QOAL and OAL are shown in Figure 4. Both QOAL and OAL have low covariate selection rates according to the first and third covariates, whose signs of the partial regression coefficients of the outcome regression model differ between the treatment and control groups. The selection rate of the 10th covariate, which is related to the outcome quantile, was high for QOAL.

Figure 4.
Rate of covariates selection for $n = 1000$ in scenario 4.

Table 4.
Results of Scenario 4: Scenario 4 generated outcomes from different models by treatment.

4. Application

This study investigated the effects of maternal smoking during pregnancy on birth weight using QTEs. Several studies have investigated the relationship between smoking during pregnancy and birth weight. For example, prospective population studies have shown that smoking at conception leads to an average reduction in birth weight of $5 %$ ¹⁷ and that smoking leads to a reduction in birth weight of approximately 250 to 640 g, depending on the maternal genotype.¹⁸ Studies using QTE¹⁹ suggest that maternal smoking during pregnancy leads to a weight loss of approximately 220 g in the $50 %$ quantile.

We evaluated the QTE of smoking at conception on birth weight (g) using a dataset²⁰ of mothers and children born in King County, Washington, in 2001. The sample size was 2500. The dataset includes 15 covariates. The covariates included in the dataset are as follows: Table 5. Among the covariates in Table 5, “plural” was excluded from the analysis because all observations are 1. “smoker” and “drinker” were excluded from the dataset because they are highly correlated with “smokeN” and “drinkN.” The non-continuous values “gender,” “race,” “married,” “firstep,” and “welfare” were dummied and used in the analysis. The “race” category contains five races,“asian,” “black,” “hispanic,” “other,” and “white.”

Table 5.
List of covariates used in application.

“gender” M = male, F = female baby

“plural” 1 = singleton, 2 = twin, 3 = triplet

“age” mother’s age in years

“race” race categories (for mother)

“parity” number of previous live-born infants

“married” Y = yes, N = no

“bwt” birth weight in grams

“smokeN” number of cigarettes smoked per day during pregnancy

“drinkN” number of alcoholic drinks per week during pregnancy

“firstep” 1 = participant in program; 0 = did not participate

“welfare” 1 = participant in public assistance program; 0 = did not

“smoker” Y = yes, N = no, U = unknown

“drinker” Y = yes, N = no, U = unknown

“wpre” mother’s weight in pounds prior to pregnancy

“wgain” mother’s weight gain in pounds during pregnancy

“education” highest grade completed (add 12 + 1 / year of college)

“gestation” weeks from last menses to birth of child

As we did not identify any relationship between the variables, we applied methods with covariate selection. We used OAL and QOAL as covariate selection methods because they perform well in the simulations. Sampling was repeated 100 times. The mean of estimated QTE by each method is shown in Table 6. The results suggest that the $25 %$ quartile of QOAL leads to a weight loss of 136.86 g, the $50 %$ quartile leads to a weight loss of 161.69 g, and $75 %$ quartiles of QOAL lead to a weight loss of 157.49 g.

Table 6.

The mean of estimated QTE of maternal smoking on birth weight.

	$τ = 0.25$	$τ = 0.5$	$τ = 0.75$
QOAL	$-$ 120.25	$-$ 171.31	$-$ 163.35
OAL	$-$ 110.81	$-$ 173.21	$-$ 158.42
SimDiff	$-$ 227.81	$-$ 194.96	$-$ 212.95

QTE: quantile treatment effect; QOAL: quantile outcome adaptive lasso; OAL: outcome adaptive lasso.

5. Discussion

This study proposes a covariate selection method for a propensity score model that assumes QTE estimation. By using the partial regression coefficients of the model to estimate the conditional quantile of the outcome as weights for the regularization term, it was possible to select covariates related to the quantiles of the outcome. The theoretical properties of QOAL guarantee that an increase in sample size does not select covariates that are irrelevant to either the treatment or the outcome. This does not guarantee selection associated with the outcome when the sample size increases.

Through simulation, we confirmed that QOAL performed better in QTEs than the other methods, with covariate selection in many cases. Additionally, QOAL tended to select covariates related to the outcome quantile. The results of the simulation suggest that using QOAL tends to provide a better QTE estimation than using OAL when the relationship between the covariates and outcome is not obvious, although we examined the performance of these methods in a simple situation. However, if a linear model is assumed for the outcome regression model and the covariates associated with the expected value of the outcomes are known a priori, it is recommended to use a method without variable selection instead of a method with covariate selection.

We examined the effect of maternal smoking on birth weight using QTE with covariate selection. These results suggested that maternal smoking during pregnancy leads to weight loss in the upper quartiles. The results are similar to those of a previous study,¹⁷ in which maternal smoking during pregnancy was known to reduce birth weight by approximately $5 %$ .

Four points should be noted when using the proposed method. First, the regularization method employed in QOAL estimates the regression coefficients of covariates related to the treatment to be 0. Thus, QOAL tends to misspecify the propensity score model. Particularly, if the proportion of covariates related only to treatment among all covariates is high, the difference between the estimated and true models of the propensity score will be large. This may have caused a bias in the treatment effect. Although the propensity score can be estimated for data with more covariates than the sample size, the difference between the estimated and true models of propensity score may also be large. Hence, we carefully applied the proposed method to high-dimensional data and selected covariates for QTE estimation a priori.

Second, as in Scenario 3, when the outcome regression model has an interaction term between the treatment variable and covariates, the covariate selection rate related to the outcome is lower than that in the no-interaction case. Additionally, in Scenario 3, the selection rate of the covariate that was neither treatment nor outcome-related was higher. This may have caused a difference in the selection rates of the confounding variables. To achieve this, we extend the proposed method to consider the interaction term of the quantile regression.

Third, as in Scenario 4, we did not assume a situation in which the outcome regression models of the treatment and control groups differed. This may lead to instability in the selection of covariates and estimation of the treatment effect.

Fourth, the estimation of the treatment effect in extreme quantiles may be unstable. This may be improved using the QTE²¹ for extreme quantile and quantile regression²² for the extreme quantile as weights for the regularization term. The performance of the proposed method was highly dependent on the quantile regression model, which is an outcome regression model.

Supplemental Material

sj-zip-1-smm-10.1177_09622802241299410 - Supplemental material for Quantile outcome adaptive lasso: Covariate selection for inverse probability weighting estimator of quantile treatment effects

Supplemental material, sj-zip-1-smm-10.1177_09622802241299410 for Quantile outcome adaptive lasso: Covariate selection for inverse probability weighting estimator of quantile treatment effects by Takehiro Shoji, Jun Tsuchida and Hiroshi Yadohisa in Statistical Methods in Medical Research

Footnotes

Acknowledgements

We would like to thank the reviewer for their considered and valuable comments, which have vastly improved this manuscript.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by JSPS KAKENHI Grant Number JP22K17862.

ORCID iD

Takehiro Shoji

Supplemental material

Supplemental material for this article is available online.

Proof

We assumed the following regularity condition: Assumption A.1

Regularity conditions

The Fisher information matrix is positive definite.

I ( α * ) = E [ ϕ ″ ( x ′ α * ) x x ′ ]

where ϕ ( θ ) = log ( 1 + exp ( θ ) ) .

ϕ ( x ′ α * ) is third-order partial differentiable with respect to α .

For a neighborhood α * of α , there exists M 1 ( x ) , M 2 ( x ) such that

| ∂ ϕ ( x ; α ) ∂ α j | ≤ M 1 ( x ) , | ∂ 2 ϕ ( x ; α ) ∂ α j α ℓ | ≤ M 1 ( x ) , | ∂ 3 ϕ ( x ; α ) ∂ α j α ℓ α m | ≤ M 3 ( x )

∫ M 1 ( x ) d x < ∞ , and E [ M 2 ( x ) | x j , x ℓ , x m ] < ∞ , ( 1 ≤ j , ℓ , m ≤ p ) .

We assume that β ^ is a root n -consistent estimator of β ^ * . Define α = ( α A , α A c ) = α * + b / n , where α A is the partial regression coefficient corresponding to the covariate associated with the outcome and α A c is the partial regression coefficient corresponding to the covariate not associated with the outcome. The penalized log-likelihood function for the proof is defined as:

ℓ ~ n ( α ) = ℓ n ( α ) − p n ( α )

Let ℓ n ( α ) be the log-likelihood function of logistic loss.

p n ( α ) = λ n ∑ j = 1 p ω ^ j | α j |

Let ( α ^ A , 0 ) be the maximizer of penalized log-likelihood function ℓ ~ n ( α A , 0 ) . It suffices to show that in the neighborhood ‖ α − α * ‖ = O p ( n − 1 / 2 ) , ℓ ~ n ( α A , α A c ) − ℓ ~ n ( α ^ A , 0 ) < 0 with probability tending to 1 as n → ∞ . ‖ ⋅ ‖ denotes the Euclidean norm. (6)

ℓ ~ n ( α A , α A c ) − ℓ ~ n ( α ^ A , 0 ) = ( ℓ ~ n ( α A , α A c ) − ℓ ~ n ( α A , 0 ) ) + ( ℓ ~ n ( α A , 0 ) − ℓ ~ n ( α ^ A , 0 ) ) ≤ ( ℓ ~ n ( α A , α A c ) − ℓ ~ n ( α A , 0 ) )

Consider the positivity and negativity of the right-hand side of the inequality (6). By the definition of ℓ ~ n ( ⋅ ) , it follows that

ℓ ~ n ( α A , α A c ) − ℓ ~ n ( α A , 0 ) = ( ℓ n ( α A , α A c ) − ℓ n ( α A , 0 ) ) − ( p n ( α A , α A c ) − p n ( α A , 0 ) )

We first focus on the first term. As the log-likelihood function is differentiable, using the multivariate mean value theorem, (7)

ℓ n ( α A , α A c ) − ℓ n ( α A , 0 ) = [ ∂ ℓ n ( α A , ξ ) ∂ α A ] ′ ( α A − α A ) + [ ∂ ℓ n ( α A , ξ ) ∂ α A c ] ′ ( α A c − 0 ) = [ ∂ ℓ n ( α A , ξ ) ∂ α A c ] ′ α A c

for certain ‖ ξ ‖ ≤ ‖ α A c ‖ . The following holds for vectors a and b (triangle inequalities).

‖ a + b ‖ ≤ ‖ a ‖ + ‖ b ‖

Applying the above inequality, we get (8)

‖ ∂ ℓ n ( α A , ξ ) ∂ α A c − ∂ ℓ n ( α A * , 0 ) ∂ α A c ‖ = ‖ ∂ ℓ n ( α A , ξ ) ∂ α A c − ∂ ℓ n ( α A , 0 ) ∂ α A c + ∂ ℓ n ( α A , 0 ) ∂ α A c − ∂ ℓ n ( α A * , 0 ) ∂ α A c ‖ ≤ ‖ ∂ ℓ n ( α A , ξ ) ∂ α A c − ∂ ℓ n ( α A , 0 ) ∂ α A c ‖ + ‖ ∂ ℓ n ( α A , 0 ) ∂ α A c − ∂ ℓ n ( α A * , 0 ) ∂ α A c ‖

From the definition of ℓ ( ⋅ ) , we have

ℓ n ( α ) = ∑ i = 1 n ( z i x i ′ α − ϕ ( x i ′ α ) )

The first term on the right-hand side of equation (8) is based on the mean value theorem, there exist ξ such that

‖ ∂ ℓ n ( α A , ξ ) ∂ α A c − ∂ ℓ n ( α A , 0 ) ∂ α A c ‖ = ‖ ∑ i = 1 n ( ∂ ϕ ( x i ′ ( α A , ξ ) ) ∂ α A c − ∂ ϕ ( x i ′ ( α A , 0 ) ) ∂ α A c ) ‖ = ‖ ∑ i = 1 n ∂ 2 ϕ ( x i ′ ( α A , η ) ) ∂ α A c 2 ξ ‖

Based on the regularity conditions, we have

‖ ∑ i = 1 n ∂ 2 ϕ ( x i ′ ( α A , η ) ) ∂ α A c 2 ξ ‖ ≤ [ ∑ i = 1 n M 1 ( x i ) ] ‖ ξ ‖

The second term on the right-hand side of (8) is similarly

‖ ∂ ℓ n ( α A , 0 ) ∂ α A c − ∂ ℓ n ( α A * , 0 ) ∂ α A c ‖ ≤ [ ∑ i = 1 n M 1 ( x i ) ] ‖ α A − α A * ‖

From equation (8), we obtain

‖ ∂ ℓ n ( α A , ξ ) ∂ α A c − ∂ ℓ n ( α A * , 0 ) ∂ α A c ‖ ≤ ‖ ∂ ℓ n ( α A , ξ ) ∂ α A c − ∂ ℓ n ( α A , 0 ) ∂ α A c ‖ + ‖ ∂ ℓ n ( α A , 0 ) ∂ α A c − ∂ ℓ n ( α A * , 0 ) ∂ α A c ‖ ≤ [ ∑ i = 1 n M 1 ( x i ) ] ‖ ξ ‖ + [ ∑ i = 1 n M 1 ( x i ) ] ‖ α A − α A * ‖ = { ‖ ξ ‖ + ‖ α A − α A * ‖ } O p ( n )

For j ∈ F , ‖ ξ ‖ ≤ ‖ α F ‖ = O p ( n − 1 / 2 ) , we have thus

‖ ∂ ℓ n ( α A , ξ ) ∂ α A c − ∂ ℓ n ( α A * , 0 ) ∂ α A c ‖ ≤ O p ( n − 1 / 2 )

For j ∈ I , ‖ ξ ‖ ≤ ‖ α I ‖ = O p ( 1 ) . Thus

‖ ∂ ℓ n ( α A , ξ ) ∂ α A c − ∂ ℓ n ( α A * , 0 ) ∂ α A c ‖ ≤ O p ( n )

Therefore, for j ∈ A c , we have

‖ ∂ ℓ n ( α A , ξ ) ∂ α A c − ∂ ℓ n ( α A * , 0 ) ∂ α A c ‖ ≤ O p ( n )

Applying these results to equation (7), we have

ℓ n ( α A , α A c ) − ℓ n ( α A , 0 ) = [ ∂ ℓ n ( α A , ξ ) ∂ α A c ] ′ α A c = ∑ j ∈ A c − | α j | O p ( n )

Since,

p n ( α A , α A c ) − p n ( α A , 0 ) = λ n ∑ j ∈ A c ω ^ j | α j |

we have (9)

ℓ ~ n ( α A , α A c ) − ℓ ~ n ( α A , 0 ) = ∑ j ∈ A c ( − | α j | O p ( n ) − λ n ω ^ j | α j | )

The proof is complete because the probability that equation (9) is negative when n → ∞ is 1.

Application

Summary statistics for King County 2001 birth data are shown in Table 7. Partial regression coefficients and covariate selection rates for the propensity score model applied to real data are shown in Table 8. For categorical variables, the frequency (%) should be stated, and for numerical variables, the mean ± standard deviation should be stated. A violin plot of the QTE estimates for the real data application is shown in Figure 5. For some datasets, the method using propensity scores tends to produce unstable estimates. Figure 5.

Violin plot of estimated quantile treatment effects (QTEs).

Table 7.

Summary statistics for King County 2001 birth data.

Characteristic	Treated group (smoker: yes)	Control group (smoker: no)
gender (male)	91 ( 52.00 )	1200 ( 51.61 )
plural
singleton	175 ( 100.00 )	2325 ( 100.00 )
twin	0 ( 0.00 )	0 ( 0.00 )
triplet	0 ( 0.00 )	0 ( 0.00 )
age	26.16 ± 6.01	29.54 ± 5.94
race
hispanic	8 ( 4.57 )	212 ( 9.12 )
white	138 ( 78.86 )	1541 ( 66.28 )
black	14 ( 8.00 )	164 ( 7.05 )
asian	8 ( 4.57 )	384 ( 15.52 )
other	7 ( 4.00 )	24 ( 1.03 )
parity	1.15 ± 1.31	0.79 ± 1.01
married (yes)	0.34 ± 0.47	0.81 ± 0.39
bwt	3186 ± 583.78	3431 ± 553.83
smokeN	6.00 ± 5.80	0.00 ± 0.00
drinkN	0.07 ± 0.47	0.03 ± 0.48
firstep (yes)	0.29 ± 0.46	0.15 ± 0.36
welfare (yes)	0.08 ± 0.27	0.01 ± 0.11
drinker (yes)	6 ( 3.43 )	23 ( 0.99 )
wpre	155.1 ± 43.45	146.30 ± 33.76
wgain	33.09 ± 17.32	32.22 ± 13.07
education	12.18 ± 1.80	14.22 ± 2.62
gestation	38.23 ± 3.36	38.92 ± 2.28

Table 8.

Average of estimated coefficients and rate of covariates selection.

	Average of estimated coefficients ( % selected)
	25 % QOAL	50 % QOAL	75 % QOAL	OAL
gender	−0.03₍₈₁₎	−0.02₍₈₁₎	−0.02₍₈₀₎	−0.02₍₇₉₎
age	−0.13₍₇₈₎	−0.14₍₇₉₎	−0.13₍₈₅₎	−0.12₍₈₂₎
parity	0.43₍₉₇₎	0.44₍₉₇₎	0.43₍₉₇₎	0.44₍₉₇₎
married	−1.04₍₉₉₎	−1.03₍₉₉₎	−1.03₍₁₀₀₎	−1.04₍₁₀₀₎
drinkN	0.95₍₈₅₎	0.81₍₈₂₎	1.04₍₇₉₎	0.82₍₇₄₎
firstep	−0.08₍₇₇₎	−0.07₍₇₃₎	−0.08₍₇₈₎	−0.07₍₈₀₎
welfare	0.94₍₈₂₎	1.15₍₈₃₎	6.76₍₈₆₎	1.53₍₈₄₎
wpre	0.30₍₉₁₎	0.29₍₉₀₎	0.30₍₉₀₎	0.30₍₉₂₎
wgain	0.04₍₈₀₎	0.04₍₈₀₎	0.04₍₈₀₎	0.04₍₈₁₎
education	−0.86₍₉₈₎	−0.83₍₉₇₎	−0.86₍₉₈₎	−0.86₍₉₉₎
gestation	−0.23₍₈₈₎	−0.23₍₈₈₎	−0.23₍₈₉₎	−0.23₍₈₉₎
race_other	0.56₍₈₇₎	0.51₍₈₈₎	8.31₍₈₈₎	0.55₍₈₃₎
race_black	−0.05₍₈₆₎	−0.03₍₈₇₎	−0.04₍₈₆₎	−0.04₍₈₄₎
race_hispanic	−0.38₍₉₀₎	−0.35₍₉₃₎	−0.36₍₉₂₎	−0.37₍₉₆₎
race_white	0.59₍₉₄₎	0.63₍₉₈₎	0.62₍₉₇₎	0.61₍₉₉₎

QOAL: quantile outcome adaptive lasso; OAL: outcome adaptive lasso.

References

Tang

Cai

Fang

, et al. A new quantile treatment effect model for studying smoking effect on birth weight during mother’s pregnancy. J Manage Sci Eng 2021; 6: 336–343.

Frumento

Mealli

Pacini

, et al. Evaluating the effect of training on wages in the presence of noncompliance, nonemployment, and missing outcome data. J Am Stat Assoc 2012; 107: 450–466.

Firpo

. Efficient semiparametric estimation of quantile treatment effects. Econometrica 2007; 75: 259–276.

Zhang

Chen

Troendle

, et al. Causal inference on quantiles with an obstetric application. Biometrics 2012; 68: 697–706.

Callaway

. Quantile treatment effects in difference in differences models with panel data. Quant Econ 2019; 10: 1579–1618.

Brookhart

Schneeweiss

Rothman

, et al. Variable selection for propensity score models. Am J Epidemiol 2006; 163: 1149–1156.

De Luna

Waernbaum

Richardson

. Covariate selection for the nonparametric estimation of an average treatment effect. Biometrika 2011; 98: 861–875.

Greenland

. Invited commentary: Variable selection versus shrinkage in the control of multiple confounders. Am J Epidemiol 2008; 167: 523–529.

Vansteelandt

Bekaert

Claeskens

. On model selection and model misspecification in causal inference. Stat Methods Med Res 2012; 21: 7–30.

10.

Koch

Vock

Wolfson

. Covariate selection with group lasso and doubly robust estimation of causal effects. Biometrics 2018; 74: 8–17.

11.

Shortreed

Ertefaie

. Outcome-adaptive lasso: Variable selection for causal inference. Biometrics 2017; 73: 1111–1122.

12.

Koenker

Bassett Jr

. Regression quantiles. Econometrica 1978; 46: 33–50.

13.

Rosenbaum

Rubin

. The central role of the propensity score in observational studies for causal effects. Biometrika 1983; 70: 41–55.

14.

Doksum

. Empirical probability plots and statistical inference for nonlinear models in the two-sample case. Ann Stat 1974; 2: 267–277.

15.

Tibshirani

. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 1996; 58: 267–288.

16.

Zou

. The adaptive lasso and its oracle properties. J Am Stat Assoc 2006; 101: 1418–1429.

17.

Brooke

Anderson

Bland

, et al. Effects on birth weight of smoking, alcohol, caffeine, socioeconomic factors, and psychosocial stress. Br Med J 1989; 298: 795.

18.

Wang

Zuckerman

Pearson

, et al. Maternal cigarette smoking, metabolic gene polymorphism, and infant birth weight. JAMA 2002; 287: 195–202.

19.

Xie

Cotton

Zhu

. Multiply robust estimation of causal quantile treatment effects. Stat Med 2020; 39: 4238–4251.

20.

Dukes

Vansteelandt

. How to obtain valid tests and confidence intervals after propensity score variable selection? Stat Methods Med Res 2020; 29: 677–694.

21.

Deuber

Engelke

, et al. Estimation and inference of extremal quantile treatment effects for heavy-tailed distributions. J Am Stat Assoc 2023; 119: 2206–2216.

22.

Chernozhukov

. Extremal quantile regression. Ann Stat 2005; 33: 806–839.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.01 MB

0.00 MB