Sage Journals: Discover world-class research

Abstract

Estimating the causal treatment effects by subgroups is important in observational studies when the treatment effect heterogeneity is present. Existing propensity score methods rely on a correctly specified propensity score model. Model misspecification results in biased treatment effect estimation and covariate imbalance. We proposed a method for the propensity score analysis with controlled subgroup balance (G-SBPS) to achieve covariate mean balance in all subgroups. We further incorporated nonparametric kernel regression for the propensity scores and developed a kernelized G-SBPS (kG-SBPS) to improve the subgroup mean balance of covariate transformations in a rich functional class. This extension increased robustness to propensity score model misspecification. Extensive numerical studies showed that G-SBPS and kG-SBPS improve both subgroup covariate balance and subgroup treatment effect estimation, compared to existing approaches. For illustration, we applied G-SBPS and kG-SBPS to a dataset on right heart catheterization to estimate the subgroup average treatment effects on the hospital length of stay and a dataset on diabetes self-management training to estimate the subgroup average treatment effects for the treated on the hospitalization rate.

Keywords

Causal inference covariate balance inverse probability weighting nonparametric kernel regression subgroup analysis,treatment effect heterogeneity

1. Introduction

An important goal of observational studies is to estimate the treatment effect. Naive comparison between treatment groups is subject to selection bias when covariates are unbalanced between treatment groups due to lack of randomization. Propensity score, the conditional probability of treatment assignment given the covariates, is widely used to adjust for covariate imbalance and remove selection bias through matching,¹ stratification,² regression,³ and weighting.⁴ Commonly used estimands of the treatment effect include the average treatment effect (ATE) and the average treatment effect for the treated (ATT). When there are heterogeneous treatment effects (HTEs), subgroups with different characteristics respond to the treatment differently. For example, a drug may have better efficacy on patients with certain genetic traits. The overall treatment effects that ignore the underlying heterogeneity, such as the ATE or ATT, do not provide sufficient granular information for scientific investigation and clinical practice. The HTE is common in biomedical, epidemiological, and social research. In this article, we study HTEs among pre-specified subgroups of scientific interest, and these subgroups are defined through covariates.

The subgroup HTEs can be estimated with or without modeling the outcome. Examples of the former approach include regression models stratified on subgroups, Bayesian additive regression trees,^5,6 causal forest,⁷ etc. However, their performance depends on a correctly specified outcome model. Additionally, there are benefits to being blinded from the outcome data when developing causal models.⁸ In this article, we focus on the latter approach and study the causal subgroup analysis, an HTE estimation method that adjusts subgroup covariate imbalance in a propensity score analysis.^9,10 Our proposed approach is built upon propensity score weighting, and the weights development does not incorporate any information about the outcomes.

The propensity score is a balancing score, that is the treatment assignments and covariates are independent conditional on the propensity score.¹¹ Theoretically, the propensity score balances covariates in the overall population and any covariate-defined subgroups. We define overall balance as the mean difference of covariates or their transformations between the two treatments in the overall population. This is the type of covariate balance that most published propensity score methods deal with. We define subgroup balance as the mean difference of covariates or their transformations between the two treatments in subgroups. This is the focus of this article. The propensity scores are unknown and must be estimated from a parametric or nonparametric model, such as logistic regression, covariate balancing propensity score (CBPS),¹² boosting (GBM),¹³ or covariate balancing scoring rules (CBSR).¹⁴ When the estimated model deviates from the true propensity score model or has an explicit or hidden lack of fit, the estimated propensity scores do not have the desired balancing property.¹⁵ This scenario may result in a lack of overall or subgroup balance. The latter is usually more severe because the majority of propensity score analysis procedures are developed to achieve overall balance. The subgroups have smaller sample sizes and hence are more prone to covariate imbalance, and there are often many subgroups under simultaneous consideration. For example, when the estimated propensity model is misspecified, the CBPS ensures exact overall balance by method design but subgroup covariate imbalance may still arise, which causes bias in the subgroup treatment effect estimation.⁹

Nonparametric propensity score models, such as boosting,¹³ random forest¹⁶ and CBSR,¹⁴ do not guarantee overall and subgroup balance.^9,10,15,17 Although the nonparametric methods may reduce model misspecification and bias due to the higher flexibility than their parametric counterparts, the estimation may have more variability, a typical bias-variance trade-off phenomenon. This has been observed, for example, in the comparison between CBSR and CBPS.¹⁷ This trade-off is amplified in the subgroup analysis due to the large number of subgroups under research and the limited sample size of each subgroup.

Developing methods that ensure both overall and subgroup covariate balance is essential when studying subgroup HTEs. For this purpose, Dong et al.⁹ proposed the subgroup balancing propensity score (SBPS). SBPS selects among either parametric logistic regression models with covariate-by-subgroup interactions fitted to the overall sample or parametric logistic regression models fitted to the subgroup samples. However, this method cannot guarantee subgroup balance when the propensity score model is misspecified or when the sampling variability results in extreme inverse probability weights. The SBPS needs to examine up to $2^{R}$ logistic regression models with various combinations of covariates and covariate-by-subgroup interactions, where $R$ is the number of subgroups. When $R$ is large, the computational burden can be tremendous. Furthermore, the SBPS requires mutually exclusive subgroups, which limits its use in statistical practice. Yang et al.¹⁰ extended the overlap weights, originally developed for exact overall balance,¹⁸ to the subgroup analysis. The overlap weights require a deviation from the widely used estimands such as ATE and ATT. When the propensity score model is misspecified, the overlap weights produce biased estimation despite showing no signs of covariate imbalance.¹⁹

In this article, we propose the propensity score weighting analysis with controlled subgroup balance (G-SBPS), which optimizes both overall and subgroup balance simultaneously. The G-SBPS does not require mutually exclusive subgroups. We estimate the propensity scores by solving a system of equations that achieves the mean independence between the treatment indicator and the covariate terms in the propensity score model, which includes covariates, subgroup indicators and their interactions. We show that the G-SBPS controls both overall and subgroup balance. To further improve the flexibility of propensity score models and reduce misspecification, we extend the G-SBPS to nonparametric estimation by using kernel principle component analysis (PCA). We propose a parameter tuning algorithm tailored for the subgroup analysis, which optimizes the subgroup covariate balance while controlling the overall balance. This kernelized G-SBPS (kG-SBPS) optimizes the overall and subgroup balance of the covariates and their transformations from a rich functional class. In simulations and two empirical data applications, both the G-SBPS and kG-SBPS demonstrated robustness to model misspecification compared to existing propensity score methods or subgroup propensity score analysis methods. The proposed methods are implemented in statistic software R, which is available from GitHub (https://github.com/fiona19832008/GSBPS).

The rest of the article is organized as follows. Section 2 presents the model, the G-SBPS algorithm, the kG-SBPS algorithm, and the tuning algorithm. Section 3 evaluates the numerical performance of G-SBPS and kG-SBPS and compares them with other published methods in a simulation study. Section 4 presents two real data applications, one for the estimation of ATE and one for ATT. We conclude this article with a summary and discussion in Section 5. Some tables and figures are included in the online supplementary materials and numbered as Table S1, Figure S1, etc.

2. Methodology

2.1. Model set-up

We consider a sample of $N$ observations with $N_{0}$ untreated subjects, denoted by $T_{i} = 0$ , and $N_{1}$ treated subjects, denoted by $T_{i} = 1$ . For each subject, we observe a vector of $M$ covariates $Z_{i} = (Z_{i 1}, \dots, Z_{i M})^{T}$ and the outcome variable $Y_{i}$ . The observed outcome is $Y_{i} = T_{i} Y_{i} (1) + (1 - T_{i}) Y_{i} (0)$ , where $Y_{i} (1)$ and $Y_{i} (0)$ are two potential outcomes corresponding to the treated and untreated, respectively. Let the pre-specified subgroups of interest be denoted by $S_{i} = (S_{i 1}, \dots, S_{i K})^{T}$ , where $S_{i k} = 1$ if the $i^{t h}$ subject belongs to the $k^{t h}$ ( $k = 1, \dots, K$ ) subgroup and $0$ otherwise. The subgroup variables ${S_{i k}}$ are functions of the covariates $Z_{i}$ . For example, $S_{i k} = I (Z_{i 1} > 50 years, Z_{i 2} = male, Z_{i 3} = no diabetes)$ represents a male subject who is older than 50 and has no diabetes. The $K$ groups do not need to be mutually exclusive, and each subject can belong to multiple subgroups, that is $\sum_{k = 1}^{K} S_{i k} \neq 1$ . Let $N_{k}$ be the number of subjects in the $k^{t h}$ subgroup with $N_{0 k}$ untreated and $N_{1 k}$ treated.

The propensity score of subject $i$ is denoted by $p (Z_{i}) = P (T_{i} = 1 ∣ Z_{i})$ . We use standard assumptions for propensity score analysis,¹¹ including the stable unit treatment value assumption (SUTVA), the no unmeasured confounding assumption, and the overlap assumption ( $0 < P (Z) < 1$ ). The propensity score is a balancing score, that is the treatment assignment is independent of the covariates conditional on the propensity score: $T ⊥ Z ∣ p (Z)$ . The treatment assignment can be viewed as being “randomized” within a small neighborhood of the propensity scores. Thus, the propensity score can balance covariates in not only the overall sample but also the subgroups. The overall and subgroup treatment effects, such as ATE or ATT, can be estimated without bias. In the next two subsections, we present the parametric G-SBPS model for the propensity scores, and the nonparametric kG-SBPS model. The presentation focuses on how these models achieve covariate balance in the subgroups.

2.2. Parametric propensity score model with controlled subgroup balance (G-SBPS)

This subsection discusses G-SBPS as a parametric approach. We model the propensity score by the logit link as

π_{i} = p_{θ} (Z_{i}, S_{i}) = p_{θ} (X_{i}) = \frac{\exp (X_{i}^{T} θ)}{1 + \exp (X_{i}^{T} θ)},

(1)

where

X_{i} = ϕ (Z_{i}, S_{i}) = [1, S_{i}^{T}, Z_{i}^{T}, S_{i 1} Z_{i}^{T}, \dots, S_{i K} Z_{i}^{T}]

is the design vector that includes the observed covariates

Z_{i}

, the subgroup indicator

S_{i}

and their interactions.

We discuss ATE estimation first. Zhao et al.¹⁴ constructed a special loss function for the CBSR method and demonstrated that the function can be maximized by solving similar score equations as those used in CBPS,¹² which guarantees covariate mean balance between the two treatment groups in the overall study population. To optimize both the overall and subgroup balance, we consider the following adaptation of that loss function:

\begin{aligned} L_{A T E} & = \sum_{i = 1}^{N} T_{i} [\log (\frac{p_{θ} (X_{i})}{1 - p_{θ} (X_{i})}) - \frac{1}{p_{θ} (X_{i})}] \\ + \sum_{i = 1}^{N} (1 - T_{i}) [\log (\frac{1 - p_{θ} (X_{i})}{p_{θ} (X_{i})}) - \frac{1}{1 - p_{θ} (X_{i})}] . \end{aligned}

(2)

The propensity scores are estimated by maximizing this loss function with respect to the modeling parameters $θ$ , and the estimated parameters are consistent.¹⁴ When $π_{i} = p_{θ} (X_{i})$ is twice continuously differentiable with respect to $θ$ , the corresponding estimating equation is

\frac{\partial L_{A T E}}{\partial θ} = \sum_{i = 1}^{N} B_{A T E} (θ | T_{i}, X_{i}) = \sum_{i = 1}^{N} (\frac{T_{i}}{p_{θ} (X_{i})} - \frac{1 - T_{i}}{1 - p_{θ} (X_{i})}) X_{i} = 0 .

(3)

Notably, Equation (3) is the overall covariate balancing condition, as studied in CBPS.¹² However, our design vector

X_{i}

includes an intercept, the covariates

Z_{i}

, the subgroup indicator

S_{i}

, and the interactions between the covariates and subgroup indicators

Z_{i} \times S_{i}

. The balancing conditions corresponding to

Z_{i}

ensure the overall balance, and more importantly, the balancing conditions corresponding to

Z_{i} \times S_{i}

control the subgroup balance. The conditions corresponding to the intercept and

S_{i}

ensure an equal total sum of weights between the treated and untreated in the overall population and subgroups. There are a total of

(1 + K) (1 + M)

of balance conditions and parameters.

The loss function in (2) is concave with a global maximum. The Hessian matrix ( the matrix of second-order partial and cross-partial derivatives) is:

H_{A T E} (θ) = \sum_{i = 1}^{N} - [T_{i} \exp (- X_{i}^{T} θ) + (1 - T_{i}) \exp (X_{i}^{T} θ)] X_{i} X_{i}^{T},

which is negative semi-definite. Estimating

θ

by solving the balance conditions in (3) is equivalent to maximizing equation (2), which produces globally optimal solutions. Furthermore, we proved that the proposed ATE estimator is

\sqrt{N}

-consistent under correct model specification (Section 2 in Supplemental Materials).

Therefore, we estimate the coefficient $θ$ in the propensity score model by solving the balance conditions using the generalized method of moments (GMM), a similar approach utilized by CBPS.¹² Specifically, we consider the following GMM estimator,

{\hat{θ}}_{GMM} = \underset{θ \in Θ}{argmin} \bar{B} (θ | T, X)^{T} Σ (T, X)^{- 1} \bar{B} (θ | T, X),

(4)

with

\bar{B} (θ | T, X) = \frac{1}{N} \sum_{i = 1}^{n} B_{A T E} (θ | T_{i}, X_{i}) .

In the expressions above,

X_{i} = ϕ (Z_{i}, S_{i})

includes the covariates, subgroup indicators, and their interactions. A natural choice of the weight matrix

Σ (T, X)^{- 1}

is derived from the inverse of the following variance matrix

\begin{aligned} Σ (T, X) & = \frac{1}{N} \sum_{i = 1}^{n} E [B_{A T E} (θ | T_{i}, X_{i}) B_{A T E} (θ | T_{i}, X_{i})^{T} | X_{i}] \\ = \frac{1}{N} \sum_{i = 1}^{n} \frac{X X^{T}}{p_{θ} (X_{i}) (1 - p_{θ} (X_{i}))}, \end{aligned}

which leads to the “continuous updating” GMM estimator discussed in Hansen et al.²⁰ Since the consistency of the GMM estimator arises from the zero expectation of

\bar{B} (θ | T, X)

, other forms of the weight matrix can also be used. An alternative choice, analogous to the two-step estimator in Hansen et al.,²⁰ is

Σ (T, X) = X X^{T} .

The resulting weight matrix is simpler and numerically more robust compared with the “continuous updating” GMM estimator, though it may be less efficient because the weights do not exactly match the variability of

\bar{B} (θ | T, X)

. We have used the latter approach in our numerical studies.

Next, we discuss the estimation procedure for the ATT. The procedure is similar to the ATE but with the following changes. The CBSR loss function, the estimating equation, and the Hessian matrix become

\begin{aligned} L_{ATT} & = \sum_{i = 1}^{N} {T_{i} \log (\frac{p_{θ} (X_{i})}{1 - p_{θ} (X_{i})}) - \frac{1 - T_{i}}{1 - p_{θ} (X_{i})}}, \\ \frac{\partial L_{ATT}}{\partial θ} & = \sum_{i = 1}^{N} B_{ATT} (θ | T_{i}, X_{i}) = \sum_{i = 1}^{N} (T_{i} - (1 - T_{i}) \frac{p_{θ} (X_{i})}{1 - p_{θ} (X_{i})}) X_{i} = 0, \\ H_{ATT} (θ) & = \sum_{i = 1}^{N} - (1 - T_{i}) \exp (X_{i}^{T} θ) X_{i} X_{i}^{T} . \end{aligned}

Because

H_{ATT} (θ)

are negative semi-definite,

θ

can be estimated as a global maximum of

L_{ATT}

, which is also the solutions to the balance equations

\frac{\partial L_{ATT}}{\partial θ} = 0

The proposed G-SBPS avoids some of the limitations of SBPS mentioned in the Introduction section. The parameter estimation of G-SBPS produces global optimization, because it minimizes the loss due to the overall and subgroup covariate imbalance. In data analysis practice, we often observed that the G-SBPS achieved near-zero weighted mean differences in covariates between treatment groups within each subgroup, which is referred to as exact subgroup (covariate) balance in this article. G-SBPS builds on the CBPS method, which has been shown to achieve exact overall covariate balance.¹² Unlike CBPS, however, G-SBPS incorporates interactions between covariates and subgroup indicators, enabling it to achieve exact subgroup covariate balance. The SBPS stochastically searches through a large number of parametric propensity score models to find one with the best overall and subgroup balance among the candidate models. The result of this process depends on the set of candidate models and may not produce the optimal solution or exact overall or subgroup balance.

The G-SBPS relies on a parametric propensity score model (1). It can be viewed as an application of the CBPS¹² to the subgroup propensity score analysis problems. It inherits some desired properties of CBPS, such as good covariate balance and also limitations, such as requiring correct specification of the propensity score model. In the next section, we further extend the G-SBPS to mitigate the effect of model misspecification.

2.3. Nonparametric propensity score model with controlled subgroup balance (kG-SBPS)

A misspecified propensity score model leads to covariate imbalance and bias in estimated treatment effects.¹⁷ A commonly used practice to alleviate this problem is to tweak the logistic model by adding covariate transformations or interaction terms, until a satisfactory covariate balance is achieved. However, this process is ad hoc, lacks methodologically justified guidelines, and does not achieve good balance. Some methods that force overall covariate balance (e.g. CBPS¹²) are subject to model misspecification and biased treatment effect estimation.¹⁹

We propose to improve the flexibility of the propensity score model in G-SBPS through reproducing kernel Hilbert space (RKHS), which transforms the observed covariate vector into an $N$ -dimensional vector of features on which overall and subgroup balance can be achieved. Specifically, we use the kernel PCA.²¹ First, we construct the kernel matrix $K_{N \times N}$ by producing a measure of similarity between any two subjects using a pre-specified kernel function. In this article, we use the Gaussian kernel $k (Z_{i}, Z_{j}) = \exp (- ‖ Z_{i} - Z_{j} ‖^{2} / σ)$ . Second, we conduct eigen-decomposition of the kernel matrix, that is $K = P D P^{T}$ . The feature space containing the covariate transformations is $ω (Z) = P D^{1 / 2}$ . Each column of $ω (Z)$ represents one transformed feature of the covariate matrix $Z$ . Because we would like to balance the most informative features, we only select a finite number of columns in $ω (Z)$ , corresponding to $99 %$ variance calculated from the eigenvalues. We denote the transformed feature space as $ω_{l_{99}} (Z)$ . We replace the observed covariates $Z_{i}$ in equation (4) of G-SBPS with the new transformed features $ω_{l_{99}} (Z)$ such that $X_{i} = [1, S_{i}^{T}, ω_{l_{99}} (Z)_{i}^{T}, S_{i 1} ω_{l_{99}} (Z)_{i}^{T}, \dots, S_{i K} ω_{l_{99}} (Z)_{i}^{T}]$ . We aim to achieve overall and subgroup balances in those transformed covariates instead of the original covariates. The bandwidth of the Gaussian kernel $σ$ , which determines the set of covariate transformations, is tuned by the algorithm below.

The existing subgroup propensity score methods, such as the SBPS,⁹ models the propensity score parametrically. Although it produces improved subgroup balance, it suffers from propensity score model misspecification (shown in the simulation results below). Nonparametric methods not designed for subgroup analysis, such as boosting,¹³ often give unsatisfactory subgroup balance, leading to biased subgroup treatment effect estimation.^9,10 The kernelized G-SBPS is the first subgroup propensity score analysis method that aims at both goals: flexible nonparametric modeling and the overall and subgroup balance.

When modeling the propensity score without using the outcome, the commonly used out-of-sample target for optimization, such as prediction error, does not always produce overall or subgroup balance, which may lead to suboptimal performance in treatment effect estimation. We propose to optimize the subgroup balance, while controlling the overall balance. The standardized difference (S/D) is used to measure covariate balance.^22,23 The S/D is the absolute difference in weighted mean between treated and untreated groups, divided by the pooled standard deviation of the weighted data.²⁴ In addition to the S/D of covariates in the overall population, we optimize the S/D in the subgroups. The details of this tuning process are presented in the supplementary materials (Algorithm 1). We use this algorithm to tune the bandwidth $σ$ of the kernelized G-SBPS. Specifically, we set the range of $σ$ to be the $0.1$ and $0.9$ quantile of the Euclidean distances between samples. The 20 candidate values of $σ$ are equally spaced on the log scale within this range. We choose the $σ$ that optimizes subgroup balance while controlling the overall balance. We name the analytical framework that couples the kernelized G-SBPS and this tuning process as kG-SBPS.

3. Simulations

In this section, we compare the proposed G-SBPS (parametric method) and kG-SBPS (nonparametric method) with two popular propensity score methods and two representative subgroup propensity score analysis methods under various numerical settings.

3.1. Simulation design

Let $G$ be the subgroup indicator, taking values in ${1, 2, \dots, K}$ . We use $K = 4$ . Consistent with prior recommendations for estimating subgroup treatment effects using propensity score weighting,^9,10 we choose $4$ subgroups for our simulation design to maintain adequate covariate overlap, stable weights, and manageable model complexity. These studies have shown that an excessively large number of subgroups may increase variance and deteriorate covariate balance in propensity score weighting methods. To assess the robustness of our approach, we additionally conducted simulations with ten subgroups; the corresponding design and results are presented in Section 4 of the Supplemental Material. The sample size of the $k$ -th subgroup is $N_{k} = 500$ . There are four covariates. $X_{1} \sim N (μ_{k}, 1)$ , where $μ_{k} = 3 - 3 (k - 1) / (K - 1)$ . $X_{2} \sim Uniform (0, 1)$ . $X_{3} \sim N (0, 1)$ . $X_{4} \sim Bernoulli (0.4)$ . We consider two propensity score model specifications:

PS1:
(Correct PS model) The propensity score model is a logistic regression with the main effects of covariates and subgroup-specific intercept:
$logit (π) = \sum_{k = 1}^{K} δ_{k} 1 {G = k} + β_{1} X_{1} + β_{2} X_{2} + β_{3} X_{3} + β_{4} X_{4}$
where $β = (- 0.2, - 0.2, 0.4, - 0.4)$ .
PS2:
(Misspecified PS model) The propensity score model has additional interaction and nonlinear terms:
$logit (π) = \sum_{k = 1}^{K} δ_{k} 1 {G = k} + β_{1} X_{1} + β_{2} X_{2} + β_{3} X_{3} + β_{4} X_{4} + β_{5} X_{1}^{2} + β_{6} X_{1} X_{4}$
where $β = (- 1.5, - 0.5, 0.5, - 0.5, 0.5, 0.5)$ for ATE estimation and $β = (- 1.5, - 0.8, 0.2, - 0.8, 0.5, 0.5)$ for ATT estimation.
In practice, the data analyst usually just uses the main effect terms. With many covariates, there are numerous possible interactions and nonlinear terms, and it is difficult to determine which ones should be added to the model. In this simulation study, we call PS1 the “correct PS model” because our parametric data analysis uses this model, and call PS2 the “misspecified PS model” because the parametric analysis does not include any interaction or nonlinear terms. For both PS1 and PS2, $δ_{k} = - 1 + 2 (k - 1) / (K - 1)$ .

We consider two outcome models: OM1:
(Standard outcome model): The outcome model includes the main effects of all covariates in the propensity score model. The treatment effects ${η_{k}}$ vary across subgroups.
$Y = 200 + \sum_{k = 1}^{K} η_{k} [1 (G = k) T] + 20 X_{1} + 10 X_{2} + 10 X_{3} + 10 X_{4} + ϵ$

OM2:
(Extended outcome model): The outcome model includes additional interactions and nonlinear transformations of the covariates. These additional terms are unknown to the data analyst and hence are not explicitly accounted for in the data analysis.
$Y = 200 + \sum_{k = 1}^{K} η_{k} [1 (G = k) T] + 20 X_{1} + 10 X_{2} + 10 X_{3} + 10 X_{4} - 5 X_{1}^{2} + 10 X_{1} X_{4} + ϵ$

The residual is $ϵ \sim N (0, 1)$ . The true subgroup treatment effects are $η_{k} = - 10 + 20 (k - 1) / (K - 1)$ . As stated above, our parametric propensity score model includes all the main effects of covariates. Theory suggests that consistent treatment effect estimation can be achieved even when the propensity score model is misspecified, as long as the propensity score adjustment balances all the linear terms in the outcome model.²⁵ This result is established in the context of a linear outcome model. If all covariates included in the outcome model have equal means between treated and control units after weighting, then their confounding effects are canceled out in a two-sample comparison of the outcome variable between the treatment groups. This unbiasedness result holds regardless of whether the propensity score model is correctly specified. Therefore, the OM1 and OM2 help us observe a performance difference in the different methods under model misspecification. In addition, as shown in Section 5 of the Supplemental Material, we examined the effect of violating the assumption of no unmeasured confounder by omitting one covariate in the propensity score analysis of our main simulation setting. Although consistent estimation is theoretically unattainable in this case, the proposed G-SBPS still achieved relatively small bias (around $10 %$ ), demonstrating some robustness to unmeasured confounding. We emphasize that all propensity score methods are built on the assumption of no unmeasured confounders. This simulation result only reflects the effect of omitting a confounder from our main simulation setting and may not generalize. Nonetheless, by minimizing imbalance in the observed covariates, the G-SBPS may also indirectly reduce imbalance in the omitted covariate when it is correlated with the observed covariates in certain ways.

The two propensity score models (PS1, PS2) and the two outcome models (OM1, OM2) produce four scenarios. We simulated data under each scenario, and evaluated the performance of the following six methods in the subgroup treatment effect estimation and overall and subgroup balance. (a)
Logistic: the logistic regression analysis with the main effects of observed covariates and the subgroup indicator (R package glm).
(b)
Logistic-S: separately fitted logistic models within each subgroup. Each model includes the main effects of covariates. This was implemented in R package WeightIt.
(c)
CBPS: the just-identified CBPS with the main effects of covariates and the subgroup indicator (R package CBPS).
(d)
SBPS: implemented in the SBPS function of R package WeightIt with the main effects of covariates.
(e)
G-SBPS: the proposed parametric G-SBPS method with the main effects of covariates.
(f)
kG-SBPS: the proposed nonparametric kG-SBPS method.

We studied both ATE and ATT estimation these are widely used estimands. The treatment effect estimators were evaluated by percent bias and root mean squared error (RMSE) in each subgroup. The overall or subgroup covariate balance were quantified by the S/D of covariates in the overall population or the subgroups. In each simulation scenario, the results were aggregated from 500 Monte Carlo repetitions.
3.2. Covariate balance

First, we examine the overall balance. When the propensity score model was correctly specified (PS1), all methods under comparison had good overall balance in the sense that the S/Ds were generally less than $5 %$ (Figures S1 and S2). This result held when the estimand was either ATE or ATT. CBPS and G-SBPS had nearly zero imbalance in $X_{1}$ - $X_{4}$ because they both used the correct model and had balance constraints on these covariates. The balance is slightly worse with the interaction terms because they are not in the propensity score model used by logistic, logistic-S, CBPS, SBPS, and G-SBPS. The kG-SBPS, in contrast, achieved notably better balance in these two interaction terms because it is a nonparametric method and it has built-in balance constraints on a large number of covariate transformations. The SBPS and G-SBPS are both parametric methods, but the latter had better performance, probably due to its more effective balance control. While G-SBPS had constraints on subgroup balance and CBPS had constraints on overall balance, the G-SBPS achieved comparable overall balance as the CBPS, because good subgroup balance implies good overall balance.

When the propensity score model was misspecified (PS2), the only method that maintained good performance is the kG-SBPS because it is the only nonparametric method. CBPS and G-SBPS maintained good balance in $X_{1}$ - $X_{4}$ , because this was what their balance constraints were designed for. However, the enlarged imbalance in the interaction terms contradicted with the property of the propensity scores, suggesting that the “twisting” of a parametric propensity score model to satisfy the balance constraints under misspecification may inadvertently create scores that are not propensity scores. Therefore, checking the goodness-of-fit is as important as checking the balance when using a parametric model for propensity score.

Next, we study the subgroup balance and subgroup treatment effects. Figures 1 and 2 present the results from ATE estimation, and we comment on those results here. The results from ATT estimation are presented in Fig S3-S4 and they produce similar conclusions. Comparing Figures 1 and 2, we observe the expected results that all parametric methods performed better under the correct PS model (PS1). The subgroup S/Ds are usually higher than the corresponding overall S/Ds because the subgroups have smaller sample sizes. Nonetheless, the subgroup S/Ds are generally less than 10% under the PS1. The Logistic-S and SBPS methods performed better than Logistic and CBPS in subgroup balance, because the former methods were designed for subgroup propensity score analysis. This is the opposite of the overall balance results in Figure S1, where the latter methods were better. The proposed G-SBPS and kG-SBPS had equivalent or better performance than all other methods in terms of subgroup balance.

Figure 1.

Boxplots of the standardized differences (S/D; %) in the four subgroups when estimating the ATE in the simulation studies. The data are simulated from the Correct PS model (PS1). The boxplots show the distribution of S/D from 500 Monte Carlo repetitions. Red line: 10% S/D; green line: 5% S/D.

Figure 2.

Boxplots of the standardized differences (S/D; %) in the four subgroups when estimating the ATE in the simulation studies. The data are simulated from the Misspecified PS model (PS2). The boxplots show the distribution of S/D from 500 Monte Carlo repetitions. Red line: 10% S/D; green line: 5% S/D.

Under the misspecified PS model (PS2), Logistic, Logistic-S, CBPS and SBPS have deteriorated subgroup balance performance in all covariates and their transformations, as expected. The G-SBPS still gives exact subgroup balance of $X_{1}$ - $X_{4}$ despite the propensity score model misspecification, because it enforces the subgroup balance constraints. This is a step forward in the protection against model misspecification compared to the other 4 methods. The G-SBPS does not properly balance $X_{1}^{2}$ and $X_{1} X_{4}$ terms because they are not in the propensity score model. This deficiency was addressed by the kG-SBPS, which produced the best subgroup balance results in all scenarios.

In addition, to evaluate the effect of small sample size on the performance of the proposed methods, we applied all methods on the simulations with a correct PS model (PS 1) with only 40 units for subgroup 2. We found that only G-SBPS achieves good subgroup balance for subgroup 2 if $5 %$ of S/D is used as the threshold; this was observed in both the ATE and ATT estimation (Figures S7 and S8). This finding is supported by the theoretical justification that the estimation of G-SBPS is globally optimal. Therefore, G-SBPS always leads to the smallest weighted mean difference between two treatments in the covariate-defined subpopulations. Intriguingly, kG-SBPS does not achieve subgroup balance in this case, which suggests that it needs a larger sample size for accurate estimation, as expected from a nonparametric method.

3.3. Treatment effect estimation

Tables 1 and 2 show the estimation of subgroup ATEs under the correct or misspecified PS models. Lower % bias and smaller RMSE indicate better performance. The two methods that explicitly deal with subgroup balance (Logistic-S, SBPS) perform better than the ones that do not (Logistic, CBPS). This is consistent with the subgroup covariate balance results above. Compared with the other four methods (Logistic, Logistic-S, CBPS, SBPS), the G-SBPS has the best % bias and RMSE, due to its use of the globally optimal solution to the subgroup balance constraints (Section 2). The kG-SBPS has more variation than the G-SBPS, due to its nonparametric nature. Nonetheless, the kG-SBPS still has better % bias and RMSE than the other four methods.

Table 1.
The performance of various methods in the estimation of subgroup ATEs in the simulation.

Outcome Subgroup Logistic Logistic-S CBPS SBPS G-SBPS kG-SBPS

%Bias Standard 1 −0.35 0.23 −0.04 0.23 −0.09 −1.45

2 −4.85 −1.56 −4.54 −1.56 −0.17 −1.20

3 −1.41 0.37 −1.53 0.37 0.22 0.59

4 −0.80 −0.31 −0.91 −0.31 −0.01 0.38

Extended 1 −1.76 −1.49 −1.72 −1.49 −1.45 −1.23

2 −3.95 −2.33 −3.60 −2.33 −2.07 −2.43

3 0.23 3.13 0.06 3.13 2.95 0.76

4 −1.15 −0.71 −1.21 −0.71 −0.38 0.14

RMSE Standard 1 2.98 1.34 2.59 1.34 0.14 0.42

2 1.97 0.67 2.01 0.67 0.11 0.25

3 1.92 0.26 2.11 0.26 0.10 0.18

4 2.03 0.47 2.13 0.47 0.10 0.20

Extended 1 3.48 2.02 3.10 2.02 1.31 0.60

2 1.98 1.13 1.98 1.13 0.96 0.27

3 1.87 0.85 2.02 0.85 0.82 0.23

4 2.53 1.10 2.62 1.10 0.88 0.33

	Outcome	Subgroup	Logistic	Logistic-S	CBPS	SBPS	G-SBPS	kG-SBPS
%Bias	Standard	1	−0.35	0.23	−0.04	0.23	−0.09	−1.45
		2	−4.85	−1.56	−4.54	−1.56	−0.17	−1.20
		3	−1.41	0.37	−1.53	0.37	0.22	0.59
		4	−0.80	−0.31	−0.91	−0.31	−0.01	0.38
	Extended	1	−1.76	−1.49	−1.72	−1.49	−1.45	−1.23
		2	−3.95	−2.33	−3.60	−2.33	−2.07	−2.43
		3	0.23	3.13	0.06	3.13	2.95	0.76
		4	−1.15	−0.71	−1.21	−0.71	−0.38	0.14
RMSE	Standard	1	2.98	1.34	2.59	1.34	0.14	0.42
		2	1.97	0.67	2.01	0.67	0.11	0.25
		3	1.92	0.26	2.11	0.26	0.10	0.18
		4	2.03	0.47	2.13	0.47	0.10	0.20
	Extended	1	3.48	2.02	3.10	2.02	1.31	0.60
		2	1.98	1.13	1.98	1.13	0.96	0.27
		3	1.87	0.85	2.02	0.85	0.82	0.23
		4	2.53	1.10	2.62	1.10	0.88	0.33

The correct PS model (PS1) was used. The true treatment effect for subgroup 1 to 4 are $- 10$ , $- 10 / 3$ , $10 / 3$ , and $10$ , respectively. The results were aggregated from 500 Monte Carlo repetitions.

Table 2.

The performance of various methods in the estimation of subgroup ATEs in the simulation.

	Outcome	Subgroup	Logistic	Logistic-S	CBPS	SBPS	G-SBPS	kG-SBPS
%Bias	Standard	1	−204.95	108.82	−207.29	32.86	−0.14	−5.20
		2	−272.65	197.57	−286.45	180.88	−0.15	−7.14
		3	−291.67	32.14	−277.67	32.14	−0.11	−0.61
		4	−193.97	−17.49	−191.34	−17.49	−0.15	−2.64
	Extended	1	32.62	53.51	33.52	37.22	33.16	2.34
		2	−9.94	216.10	−14.31	205.92	114.39	1.19
		3	317.42	−73.53	−305.91	−73.53	−95.91	−4.97
		4	−236.97	−47.23	−234.09	−47.23	−29.74	−6.52
RMSE	Standard	1	20.57	16.21	20.80	12.73	0.15	0.66
		2	9.45	7.56	9.85	7.44	0.11	0.38
		3	9.95	1.30	9.49	1.30	0.09	0.21
		4	19.49	2.20	19.23	2.20	0.12	0.41
	Extended	1	3.94	8.61	4.03	4.93	3.91	0.46
		2	2.16	8.02	2.12	7.57	4.05	0.29
		3	10.78	2.65	10.40	2.65	3.33	0.29
		4	23.80	5.09	23.51	5.09	3.24	0.79

The benefit of kG-SBPS is best shown when the model is under misspecification, where it is the only method that performs well in all scenarios. Even under misspecification, the G-SBPS did not break down in all scenarios like the other four methods: it still performed well under standard outcome model. This is due to the theory by Hazlett,²⁵ which states that even when a propensity score model is misspecified, as long as it balances the linear additive terms in the outcome model (which is the case here with the standard outcome model), the misspecification does not cause bias to the treatment effect estimation. Of the three parametric subgroup propensity score analysis methods (Logistic-S, SBPS, G-SBPS), only our proposed method exploited this theoretical result, which produces the doubly robust-like performance shown in Table 2. This is because the optimal solution from the G-SBPS results in exact subgroup balance, while Logistic-S and SBPS do not have this guarantee.

The estimation of ATT is presented in the online supplementary materials (Tables S1 and S2). The results are similar, demonstrating better performance of the proposed G-SBPS and kG-SBPS methods over the four other existing methods (Logistic, Logistic-S, SBPS, CBPS). The G-SBPS has no bias under the standard outcome model but some bias under the extended outcome model. The kG-SBPS shows some bias in ATT estimation under both outcome models, although the bias is generally smaller than the other four existing methods. Increasing the subgroup sample size to 1000 per group can reduce the bias of kG-SBPS to no or slight bias in ATT estimation, which suggests that kG-SBPS requires a larger sample size as a nonparametric method (Table S3).

4. Data applications

4.1. Right heart catheterization (RHC) data

We applied the proposed methods, G-SBPS and kG-SBPS, to the right heart catheterization (RHC) data²⁶ to examine the ATE of RHC versus non-RHC on the length of hospital stay. The data set contains $5735$ subjects, including $2184$ receiving RHC and $3551$ who did not (non-RHC). The study design was reported previously.^26,27 We excluded one subject from the analysis due to a missing outcome value. The observed covariates include demographic characteristics, comorbidity conditions, lab test results, etc. Among the 72 covariates, 57 were tested to have statistically significant mean difference between the RHC and non-RHC groups, and these covariates were included in our subgroup analysis.

For illustration, we studied the subgroup treatment effects of RHC in a non-overlapping subgroup scheme and an overlapping subgroup scheme (Table S4). In the former scheme, three non-overlapping subgroups (“3-subgroup scheme”) were defined from mean blood pressure (<80, 80–120,or >120 mmHg); each patient belongs to only one subgroup. In the latter scheme, six overlapping subgroups were defined (“6-subgroup scheme”), with three based on the mean blood pressure and another three based on the estimated probability of surviving 2 months. These six subgroups overlap because a patient can belong to a blood pressure subgroup and a survival probability subgroup simultaneously. Here the estimated 2 months survival probability was calculated using the SUPPORT prognostic model.²⁸ It is known that high blood pressure can induce cardiovascular damage, which may lead to worse prognosis after RHC treatment. In addition, patients with lower estimated survival probability are usually sicker and may need longer hospital stay after the RHC treatment. These are the motivations to study those subgroups. The Logistic-S and SBPS methods do not work with overlapping subgroups. Therefore, we compared all six methods in the simulation among the three non-overlapping subgroups, but excluded Logistic-S and SBPS from the analysis with the overlapping subgroups.

All methods achieved overall covariate balance for the two subgroup schemes in the sense that the S/Ds were generally less than 5%, with CBPS and G-SBPS being the best (Figure S5). The subgroup balance results are in Figure 3. For the 3-subgroup scheme in Figure 3(a), all methods give reasonably good subgroup balance. The two subgroup analysis methods, Logistic-S and SBPS, performed better than Logistic and CBPS, but performed worse than the proposed methods, G-SBPS and kG-SBPS. For the 6-subgroup scheme in Figure 3(b), G-SBPS and kG-SBPS had notably better subgroup balance than Logistic and CBPS. On average, G-SBPS has slightly better subgroup balances than kG-SBPS, which may be attributed to the larger variation in kG-SBPS results, a typical bias-variance trade-off phenomenon between parametric versus nonparametric methods. These observations are consistent with the results of simulation studies.

Figure 3.

Boxplots of the subgroup standardized differences (S/D) of all covariates in the RHC data analysis. Red line: 10% S/D; green line: 5% S/D.

Before propensity score adjustment, the average lengths of hospital stay in the RHC was on average 4.20, 6.28, 8.22, 3.54, 5.59, and 4.87 days longer than in the non-RHC in the subgroups 1 to 6, respectively. The estimated subgroup ATEs by various methods are reported in Table S5. They are considerable differences in the estimated ATEs across subgroups, but these results are ignored in the usual propensity score analysis. This example highlights the importance of exploring subgroup treatment effects. The estimated subgroup ATEs are generally smaller after the propensity score adjustment, regardless which method was used. Notably, the estimated ATEs by the four subgroup analysis methods are smaller than the general propensity score methods (Logistic and CBPS), which may suggest that the improved subgroup covariate balance reduced heterogeneity between the RHC and non-RHC and hence also reduced bias. The ATEs of subgroup 1 (low blood pressure, <80) and subgroup 2 (normal blood pressure, 80–120) are similar after propensity score adjustment, but they are smaller than the ATE of subgroup 3 (high blood pressure, >120). These results show that the RHC causes longer hospital stay among patients with higher blood pressure. The RHC causes longer hospital stay in subgroup with highest mortality risk (subgroup4), whose baseline health may be worse.

4.2. Diabetes outpatient self-management training (DSMT) services data

We applied G-SBPS and kG-SBPS to a second dataset to illustrate their performance in estimating the ATT. The ATT is best applied to situations where the treated group has a much smaller sample size than the control group. We used Texas Cancer Registry-Medicare linkage data for diabetes self-management training (DSMT) program among 5 $+$ year cancer survivors with diabetes, aged 66 $+$ , and alive in 2006–2019.²⁹ If patients were eligible for multiple years, the first year meeting the eligibility criteria was selected as the index year. The treatment variable is the receipt of the DSMT training versus not. The outcome variable is the hospitalization rate within 3 years of the index year. The original data contain 3348 patients who received DSMT for the first time and 71,307 controls. We excluded around $2 %$ patients whose race is not identified as Hispanic, White or Black. The resulting dataset includes 3283 treated patients (DSMT) and 69,871 controls (no DSMT) in 2006–2018. As a pre-processing step to balance key patient characteristics, we performed 1 to 5 matches between treated and untreated subjects by gender, the type of diabetes, and incident diabetes. Since the pool of controls is very large and our goal is to estimate the ATT, this step controls for key covariates without excessive loss in statistical power. This step is for the DSMT data application only and is not a part of our proposed methods. The resulting matched data contains all 3283 treated units and $16, 401$ matched controls, which we used to illustrate our methods. Table S6 presents the summary statistics of the baseline covariates in the matched data. Baseline demographic covariates were extracted from Medicare enrollment file in the index year. Baseline comorbidities and diabetic complications were identified from inpatient and outpatient claims in the year before the index year. This study examines the effect of DSMT training on the hospitalization rate.

We are interested in the ATTs in subgroups of various social statuses, which are described in Table S7. Patients eligible for dual Medicaid or living in the metropolitan area have easier access to medical care. But we also speculate that patients living in rural areas or city may have different lifestyles, which can contribute to the heterogeneity in the effect of DSMT training. Similarly, married and unmarried patients might respond to DSMT training differently due to the differences in their way of life or personalities. There is overlap among the six subgroups. Again, Logistic-S and SBPS were only applied to the non-overlapped subgroups 1–4 (Table S7). All subgroups have relatively sufficient sample sizes except subgroup 4, which allows us to study the proposed method with a small subgroup size.

For both overlapped and non-overlapped subgroups, global balances are attained by all methods, with less than $1 %$ of S/D in most covariates (Figure S6). However, the subgroup balances are not achieved by all methods (Figure 4). For subgroup 1, all methods gave good balance. For subgroups 2 to 4, Logistic and CBPS did not achieve good balance for all covariates at the benchmark level of $10 %$ S/D. Logistic-S, SBPS, G-SBPS, and kG-SBPS achieved good balance, although G-SBPS and kG-SBPS resulted in even smaller S/D on average. For the overlapping subgroups 5 and 6, G-SBPS and kG-SBPS achieved better balance than other methods. However, the balances of Logistic and CBPS are reasonably well. In summary, Logistic and CBPS produced the worst subgroup balance, while G-SBPS and kG-SBPS resulted in the best subgroup balance.

Figure 4.

Boxplots of the subgroup standardized differences (S/D) of all covariates in the DSMT data analysis. Red line: 10% S/D; green line: 5% S/D.

Before propensity score adjustment, the average rates of 3-year hospitalization from the index year are $5.9 %$ , $18.7 %$ , $6.5 %$ , $33.7 %$ , $11.7 %$ , and $2.7 %$ lower in the DSMT than the control, from subgroups 1 to 6, respectively. Generally, the effect size becomes smaller after propensity score adjustment (Table S8). The impact of DSMT is most pronounced among individuals residing in metropolitan areas and possessing a dual-eligible health plan during the index year, and least among the population characterized by both factors being negative (Tables S7 and S8), following propensity score adjustment. These observations support our hypothesis that easier access to medical care may boost the effect of DSMT. For subgroup 1 with the largest sample size, the estimated ATTs are around $4 %$ after adjustment of all methods. For subgroups 2, 3, and 4, the estimated ATTs are similar for the four subgroup analysis methods, but larger for the general propensity score methods (Table S8). We attribute this result to the better subgroup covariate balance from the subgroup analysis methods (Table S6). DSMT training has a larger effect on the rate of hospitalization among the unmarried subgroups (Table S8). The ATTs were reduced to around $3 %$ by G-SBPS and kG-SBPS and $1 %$ by Logistic and CBPS among unmarried subgroups (Table S8).

5. Discussion

Subgroup causal effect estimation has wide application, but received limited attention in the propensity score analysis field. Previously, we demonstrated that global covariate balance is not equivalent to having propensity score’s balancing property when the fitted propensity score model is subject to misspecification.¹⁵ Thus, propensity score methods that optimize the global balance, such as CBPS, may result in subgroup imbalance and biased subgroup treatment effect estimation. There are critical limitations in the current subgroup propensity score analysis methods. Firstly, subgroup analysis methods, such as SBPS, are not applicable to overlapped subgroups. Secondly, while the SBPS can improve the subgroup balance (as shown in the numerical studies in this article), it suffers from suboptimal parameter estimation and may not ensure adequate subgroup balance. We propose the novel G-SBPS (parametric method) and kG-SBPS (nonparametric method) that achieve exact subgroup balance through globally optimal parameter estimations. Our numerical studies demonstrated that the proposed methods significantly outperform the existing methods in the literature.

G-SBPS shows a doubly robust-like property, that is if the fitted model contains all the terms in either the true propensity score model or the true outcome model, the estimated treatment effect shows no bias and small RMSE. Our simulations demonstrate this (Tables 1, 2, S1, and S2). Being a nonparametric method, kG-SBPS requires large subgroup sample sizes for its good performance (Table S3, Figures S7 and S8). However, kG-SBPS is more robust to model misspecification, especially when both the propensity score and outcome models are unknown (Tables 2, S2, and S3). While not pursued in this article, it is straightforward to augment the G-SBPS and kG-SBPS estimators with an outcome model for each treatment group to form a doubly robust estimator. Closed-form formula are available (e.g. Lunceford and Davidian⁴). Since the kG-SBPS reduces the misspecification of the propensity score model, this will help the bias control and efficiency improvement in the doubly robust estimation. The constraints used by the G-SBPS improves the finite sample balancing properties of the propensity score weights, for both the overall sample and subgroups. This is expected to improve the finite sample performance of the doubly robust estimation. Hence, it is necessary to conduct model diagnostics, particularly when the subgroup sample sizes are small. If the data analysts are confident that enough covariate terms have been included in the model, G-SBPS should perform the best because it has less variability than the kG-SBPS.

The kG-SBPS may depend on a kernel function to measure similarity in covariate space. In this work, we use the Gaussian kernel due to its smoothness, flexibility, and widespread success in nonparametric learning applications. While other kernels may behave differently, existing evidence suggests that performance differences are often modest when reasonable kernels are used. Thus, we expect kG-SBPS to be relatively robust to kernel choice. Furthermore, the bandwidth $σ$ is tuned by optimizing global and subgroup balance (Algorithm 1), without using outcome information. This outcome-free tuning helps prevent bias arising from data snooping in treatment effect estimation.

Next, we would like to discuss on how to choose between G-SBPS and kG-SBPS in practice. This choice can be guided by considerations of model complexity, potential nonlinearities, and covariate balance diagnostics. G-SBPS relies on a parametric specification of the propensity score and achieves global covariate balance through direct optimization, making it efficient when the parametric model is approximately correct. In contrast, kG-SBPS employs a kernel-based approach, which provides greater flexibility to capture nonlinear relationships and interactions, but may reduce efficiency if the parametric model is adequate. In practical applications, if prior knowledge or exploratory analysis suggests that the relationships between covariates and treatment assignment are relatively simple, G-SBPS is generally preferred. Conversely, if complex or nonlinear relationships are suspected, kG-SBPS may offer improved balance. Ultimately, empirical covariate balance diagnostics after weighting provide critical guidance for method selection, ensuring that the chosen method adequately balances both main effects and desirable higher-order terms.

We have focused on propensity score-based methods in this article, and the propensity score model development does not use any information from the outcomes. In addition, since our method is based on inverse probability of treatment weighting, it is straightforward to incorporate outcome models and calculate a doubly robust estimator.^4,19 The subgroup balancing can also be performed using “balancing weights,” that is searching for weights that directly balance covariates without explicitly using propensity scores.³⁰ The relationship and relative merits of model-based versus direct balancing methods are an ongoing debate,^17,31 and a comparison of their performance in the context of subgroup causal analysis will be future work. Other topics of future interest include developing new model selection methods for the subgroup propensity score analysis, and extending G-SBPS and kG-SBPS to multiple treatment groups.

Supplemental Material

sj-pdf-1-smm-10.1177_09622802251415157 - Supplemental material for Parametric and nonparametric propensity score weighting analysis with subgroup covariate balance

Supplemental material, sj-pdf-1-smm-10.1177_09622802251415157 for Parametric and nonparametric propensity score weighting analysis with subgroup covariate balance by Yan Li, Yong-Fang Kuo and Liang Li in Statistical Methods in Medical Research

Footnotes

ORCID iD

Liang Li

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research is supported by NIH grants R01CA225646 and P30CA016672, and CPRIT grant RP210130.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Supplemental material

Supplemental material for this article is available online.

References

Stuart

. Matching methods for causal inference: a review and a look forward. Stat Sci 2010; 25: 1–21.

Rosenbaum

Rubin

. Reducing bias in observational studies using subclassification on the propensity score. J Am Stat Assoc 1984; 79: 516–524.

Vansteelandt

Daniel

. On regression adjustment for the propensity score. Stat Med 2014; 33: 4053–4072.

Lunceford

Davidian

. Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Stat Med 2004; 23: 2937–2960.

Hill

. Bayesian nonparametric modeling for causal inference. J Comput Graph Stat 2011; 20: 217–240.

Chipman

George

McCulloch

. BART: Bayesian additive regression trees. Ann Appl Stat 2010; 4: 266–298.

Wager

Athey

. Estimation and inference of heterogeneous treatment effects using random forests. J Am Stat Assoc 2018; 113: 1228–1242.

Rubin

. For objective causal inference, design trumps analysis. Ann Appl Stat 2008; 2: 0.

Dong

Zhang

Zeng

, et al. Subgroup balancing propensity score. Stat Methods Med Res 2020; 29: 659–676.

10.

Yang

Lorenzi

Papadogeorgou

, et al. Propensity score weighting for causal subgroup analysis. Stat Med 2021; 40: 4294–4309.

11.

Rosenbaum

Rubin

. The central role of the propensity score in observational studies for causal effects. Biometrika 1983; 70: 41–55.

12.

Imai

Ratkovic

. Covariate balancing propensity score. J R Stat Soc B: Stat Methodol 2014; 76: 243–263.

13.

McCaffrey

Ridgeway

Morral

. Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychol Methods 2004; 9: 403–425.

14.

Zhao

. Covariate balancing propensity score by tailored loss functions. Ann Stat 2019; 47: 965–993.

15.

. Propensity score analysis with local balance. Stat Med 2023; 42: 2637–2660.

16.

Lee

Lessler

Stuart

. Improving propensity score weighting using machine learning. Stat Med 2009; 29: 337–346.

17.

. Propensity score analysis methods with balancing constraints: a Monte Carlo study. Stat Methods Med Res 2021; 30: 1119–1142.

18.

Morgan

Zaslavsky

. Balancing covariates via propensity score weighting. J Am Stat Assoc 2017; 113: 390–400.

19.

Mao

Greene

. Propensity score weighting analysis and treatment effect discovery. Stat Methods Med Res 2018; 28: 2439–2454.

20.

Hansen

Heaton

Yaron

. Finite-sample properties of some alternative GMM estimators. J Bus Econ Stat 1996; 14: 262–280.

21.

Schölkopf

Smola

. Learning with kernels: support vector machines, regularization, optimization, and beyond. Cambridge, MA, USA: The MIT Press, 2002.

22.

Griffin

McCaffrey

Almirall

, et al. Chasing balance and other recommendations for improving nonparametric propensity score models. J Causal Inference 2017; 5: 0.

23.

Cannas

Arpino

. A comparison of machine learning algorithms and covariate balance measures for propensity score matching and weighting. Biometr J 2019; 61: 1049–1072.

24.

Greene

. A weighting analogue to pair matching in propensity score analysis. Int J Biostat 2013; 9: 215–234.

25.

Hazlett

. Kernel balancing: a flexible non-parametric weighting procedure for estimating causal effects. Stat Sin 2020; 30: 1155–1189.

26.

Connors

AFJ

Speroff

Dawson

, et al. The effectiveness of right heart catheterization in the initial care of critically ill patients. SUPPORT investigators. JAMA 1996; 276: 889–897.

27.

Hirano

Imbens

. Estimation of causal effects using propensity score weighting: an application to data on right heart catheterization. Health Serv Outcomes Res Methodol 2001; 2: 259–278.

28.

Knaus

Harrell

Lynn

, et al. The SUPPORT prognostic model: prediction of survival for seriously ill hospitalized patients. Ann Intern Med 1995; 122: 191–203.

29.

Lee

Digbeu

BDE

Serag

, et al. Utilization of diabetes self-management program among breast, prostate, and colorectal cancer survivors: using 2006–2019 texas medicare data. PLoS One 2023; 18: e0289268.

30.

Ben-Michael

Feller

Rothstein

. Varying impacts of letters of recommendation on college admissions: approximate balancing weights for subgroup effects in observational studies. 2020, arXiv e-prints. arXiv:2008.04394.

31.

Ben-Michael

Feller

Hirshberg

, et al. The balancing act in causal inference. 2021, arXiv e-printsp. arXiv:2110.14831.