Abstract
Causal mediation effect estimates can be obtained from marginal structural models using inverse probability weighting with appropriate weights. In order to compute weights, treatment and mediator propensity score models need to be fitted first. If the covariates are high-dimensional, parsimonious propensity score models can be developed by regularization methods including LASSO and its variants. Furthermore, in a mediation setup, more efficient direct or indirect effect estimators can be obtained by using outcome-adaptive LASSO to select variables for propensity score models by incorporating the outcome information. A simulation study is conducted to assess how different regularization methods can affect the performance of estimated natural direct and indirect effect odds ratios. Our simulation results show that regularizing propensity score models by outcome-adaptive LASSO can improve the efficiency of the natural effect estimators and by optimizing balance in the covariates, bias can be reduced in most cases. The regularization methods are then applied to MIMIC-III database, an ICU database developed by MIT.
Keywords
1 Introduction
1.1 Background
In the field of Biostatistics or medical research, often the focus is to establish causal relations between risk factors and disease outcomes using observational data, for which adjustments must be done to address confounding issues. Marginal structural models using inverse probability weighting is an effective method to handle confounders.1,2 This method utilizes propensity score models which are commonly fitted by logistic regression. In a simple binary treatment setting, the propensity score is defined as the probability of being treated given the covariates. Building propensity score models using logistic regression can sometimes be challenging, since data nowadays may have very high dimensionality. Regularization methods including LASSO and its variants, have been developed to solve this problem and to select essential variables for propensity score models.3,4
In this paper, different regularization methods will be applied to a unique database: the MIMIC-III database. In the past 10 years, there has been a trend towards implementation of electronic health record systems in hospitals and the Medical Information Mart for Intensive Care (MIMIC) database developed by the Laboratory for Computational Physiology at MIT is freely accessible under a data agreement. 5,6,7 The database contains clinical data of patients admitted to the Beth Israel Deaconess Medical Center in Boston, Massachusetts. Medical information including demographics, vital signs, laboratory results and nursing progress notes are available in this database; International Classification of Diseases (ICD-9) codes were documented on patient discharge. A code repository is available for the MIMIC-III database and researchers are encouraged to make contributions to the code repository, to enhance reproducibility in health research. 8
Using the MIMIC-III database, we are interested in investigating how the use of transthoracic echocardiography (TTE) affects 28-day mortality for patients with sepsis, a life-threatening condition. 9 However, diseases can develop from risk factors through a complicated indirect pathway. In other words, the risk factor of interest may affect the disease outcome through intermediate variables. Mediation analysis is a research area that focuses on modeling this indirect pathway to estimate both the direct and indirect effects of the risk factors on the outcome.
One of the earliest approaches to conduct mediation analysis, proposed by Baron and Kenny, 10 models associations between treatment, mediator, and outcome by linear regressions. Direct and indirect effects can then be estimated from the fitted regression models. However, the estimated direct and indirect effects do not readily bear causal meanings. Robins et al. 2 highlighted that marginal structural models (MSMs) with inverse probability weighting can be used to establish a causal relationship between treatment and outcome in a setting without mediators. Later, VanderWeele, 11 Hong et al., 12 and Lange et al. 13 extended ideas of Robins et al. 2 and suggested that causal mediation analysis can be conducted using MSMs with inverse probability weighting. In the following, we will define notation and review the MSM approach for mediation analysis in detail.
1.2 Notation and review of MSMs
Consider a binary outcome Y and a baseline covariate vector
In this paper, we will follow the MSM approach by Lange et al.
13
for mediation analysis. Marginal structural models have many advantages in terms of assessing natural direct and indirect effects of a given treatment. Specifically, MSMs can model both the natural direct and indirect effects at the same time without incorporating a mediator into the models, resulting in parsimonious models.
13
A generalized linear MSM can be written in the following form
If the interaction term (c3) is zero and weights computed by equation (2) are used in MSMs, then the natural direct and indirect effect odds ratio is estimated as
In order to compute the stabilized weights in equation (2), the treatment and mediator propensity score models need to be fitted. We assume the treatment propensity score model
When the covariate space is large, regularizing the fitted propensity score models is necessary. Parsimonious models are usually preferred, either from an estimation perspective or an inference perspective. In this paper, LASSO and its variants are used to select the important variables for both treatment and mediator propensity score models. For a given model, LASSO conducts variable selection by shrinking the coefficients of some variables to exactly zero, leading to a simpler model.
16
Suppose we have a logistic regression model
Adaptive LASSO (AdaLASSO) is one variant of the traditional LASSO and has better statistical properties than the traditional LASSO. Zou
17
developed an efficient algorithm for computing the AdaLASSO estimator, which is a global optimizer of the objective function
In Section 2, outcome-adaptive LASSO which is another variant of LASSO proposed by Shortreed and Ertefaie, 4 is outlined in detail. We then extend it to the mediation setting to estimate the natural direct and indirect effects. A simulation study is conducted to illustrate the performance of different regularization methods in Section 3. These methods are then applied to analyze a dataset extracted from the MIMIC-III database in Section 4. Section 5 is for discussion and conclusion.
2 Methodology
In the causal inference literature, variables that are only related to outcome are suggested for inclusion in the propensity score models. Rubin 18 stated that including variables that are not related to the outcome in the propensity score models can result in efficiency loss. In a later paper, Brookhart et al. 19 conducted simulation studies and discovered that adding variables that are only related to the outcome in the propensity score models can improve efficiency without increasing bias. Wyss et al. 20 also found similar results in a simulation setting with multiple outcomes. Zhu et al. 21 found that, by achieving balance in the covariates that are related to the outcome, both finite bias and variance of the causal estimates can be reduced. Following similar ideas, outcome-adaptive LASSO (OAL) was developed for estimating the propensity scores in a high-dimensional setting by incorporating outcome information in the penalty function. 4 In a mediation setting, it is natural to ask if such a principle of variable selection on the covariates still applies. In our simulation section, by examining the performance of several benchmark models, we found the propensity score model including the variables that are marginally related to the outcome variable leads to the most efficient estimates of the natural direct and indirect effects and yield the smallest mean squared error, compared to other models. This again supports our speculation that the variable selection principle in the non-mediation setting also applies to the mediation framework. Therefore, we would extend the outcome-adapative LASSO to mediation analysis.
2.1 Outcome-adaptive LASSO for mediation analysis
Outcome-adaptive LASSO is a modified version of adaptive LASSO. When restricting γ to be greater than 1 and
Let
and
For logistic regression, the outcome-adaptive LASSO estimate has similar form as the adaptive LASSO estimate. The outcome-adaptive LASSO estimates,
and
2.2 Selecting tuning parameters by optimizing balance
According to Rosenbaum and Rubin, 22 the propensity score is one type of balancing score, which means that the covariates will be balanced among treatment groups conditioning on the propensity scores. This balancing property, along with the strongly ignorable treatment assignment (SITA) assumption, implies that unbiased causal treatment effects can be obtained if sufficient covariate balancing has been achieved. 23 Following this idea, Shortree and Ertefaie 4 proposed that λA in equation (5) can be tuned by minimizing weighted absolute mean difference (wAMD), which is a balance statistic instead of a deviance statistic.
wAMD is a measure of how similar different exposure groups are and can be computed using the fitted treatment propensity score model as
Similarly, we can define wAMD in a way that utilizes the fitted mediator propensity score model, as the following
Small wAMD indicates that, after applying the weighting procedure, the means of the covariates between different exposure groups are close to each other. Therefore, optimal λA and λM should be chosen by minimizing wAMD values. We will illustrate this idea in Section 3.3.
3 A simulation study
In this section, a simulation study was conducted to investigate the finite sample performance of the proposed methods. Results of multiple benchmark models and regularized models are presented. These methods are evaluated based on the relative bias, standard deviation, and mean squared error (MSE) of estimated NDE, NIE, and TE odds ratios. Assuming there is no interaction between the treatment and the mediator, we simply denote the natural direct effect as NDE and the natural indirect effect as NIE, since
3.1 Simulation design
In the simulation study, 500 subjects (n = 500) and 200 continuous covariates (p = 200) are generated for each data set. Covariates are generated from a multivariate normal distribution with zero mean vector and identity covariance matrix, i.e.
After the covariates are created, binary variables including the treatment, mediator, and outcome variables are generated using the covariates. The treatment of each subject is generated using a Bernoulli distribution with probability of being treated produced by a logistic regression model, i.e.
We set Scenario A (weekly related): Scenario B (moderately related): Scenario C (strongly related):
In the simulation setup,

Relations among different simulated variables.
3.2 Modeling procedure
The treatment and mediator propensity score models need to be fitted before we can estimate NDE and NIE odds ratios by MSMs. After normalizing covariates to have mean zero and standard deviation one, treatment and mediator propensity score models are fitted using logistic regression with either LASSO, AdaLASSO, or OAL. For OAL, a linear regression model of
In addition, four types of benchmark models are also fitted using logistic regression based on different subsets of covariates for comparison. For example, the true treatment benchmark model is fitted using
Following the modeling procedure proposed by Lange et al.,
13
NDE and NIE odds ratios are estimated by MSMs (equation (1)) with weights computed by equation (2). In addition, the original data set needs to be replicated twice to create a new data set before we can calculate
3.3 Parameter tuning process
When fitting regularized treatment and mediator propensity score models, tuning parameters in the penalty term need to be chosen carefully to achieve reasonable coefficient estimates. In the simulation study, the LASSO regularization parameter (λ) is chosen by minimizing deviance with 10-fold cross-validation. Similarly, for each pre-specified γ value from a list: {0.5,1,2, …}, AdaLASSO and OAL regularization parameters (λn) are also chosen by minimizing deviance with 10-fold cross-validation.
In addition, for each pre-specified power parameter γ value: {0.5,1,2,…}, AdaLASSO and OAL regularization parameters are chosen by minimizing both wAMD(λA) and wAMD(λM), from a set of regularization parameter values: {
Simulation results show that the performance of different adaptive methods is not sensitive to the value of γ and there is a slight increasing trend in the bias and standard deviation as the value of γ increases. Therefore, in the following, we only show the simulation results for
3.4 Simulation results
The simulation results based on 1000 replications are displayed in Table 1 for NDE, Table 2 for NIE, and Table 3 for TE. Additional graphs used to visualize simulation results are available in Web Appendix C in the Supporting Information.
Performance measures of natural direct effect odds ratio estimators.
Performance measures of natural indirect effect odds ratio estimators.
Performance measures of total effect odds ratio estimators.
First, we observe that the four benchmark models yield fairly close relative bias, standard deviation, and MSE of the estimated odds ratio. Across all benchmark models, the outcome benchmark model yields the smallest standard deviation and MSE of all the estimated NDE, NIE, and TE odds ratios. Adding outcome-related variables into the propensity score model improve the efficiency of the causal estimators, which is a common phenomenon found in the causal inference literature. 19
Comparing OAL with AdaLASSO, i.e. OAL (deviance) vs. AdaLASSO (deviance) or OAL (wAMD) vs. AdaLASSO (wAMD), we found that by incorporating outcome information in the penalty weights, the standard deviation of the estimated odds ratios is reduced in most cases, indicating more efficient estimators. It also leads to smaller MSEs. This is again consistent with the finding from the benchmark models.
By comparing the criteria for choosing the tuning parameters, i.e. AdaLASSO (wAMD) vs. AdaLASSO (deviance) or OAL (wAMD) vs. OAL (deviance), we found in Scenarios B and C, where the outcome-related covariates are at least moderately related to the outcome variable, wAMD leads to much less biased estimates of NDE and TE odds ratios. This indicates the bias of the causal estimates can be reduced if the balance in the covariates is optimized.24–26
Overall, all regularization methods we implement here outperform the benchmark models where the relationships among the variables are completely known. The underlying reason is that the propensity scores are nuisance parameters and by over-fitting the propensity scores, we can correct for the randomness and obtain better causal estimates. The phenomenon that the true propensity score model may perform worse than the estimated propensity scores has been shown in the literature, both theoretically and empirically, e.g. Lunceford and Davidian 27 and Brookhart et al. 19 In general, OAL (deviance) tends to outperform the other regularization methods.
In addition, we examine which variables are selected by different regularization methods. Under Scenario B, we record the percentage of 1000 replications that each variable of
Variables selection results in scenario B.
4 Data analysis: Transthoracic echocardiography and mortality
TTE has been widely adopted in hospitals during the past 10 years and some research has been conducted to examine the effectiveness of TTE on patients’ mortality. Feng et al. 9 found that the use of TTE can significantly reduce the 28-day mortality for ICU patients with sepsis and suggested that secondary analyses might be helpful to conduct, in order to understand the underlying causal mechanism. We employ the proposed methods to study whether the use of TTE affects patients’ mortality indirectly. More specifically, we are interested in knowing whether the use of TTE would affect 28-day mortality by changing how the doctors treat the patients. Our analysis is based on the same data set used in the article of Feng et al. 9 and this data set is extracted from the MIMIC-III database introduced in Section 1.
In this data set, patients who had TTE performed less than 24 h before their ICU admission or during their ICU stay are considered as the treatment group and the remaining patients are considered as the control group. Moreover, 28-day mortality from the date of ICU admission is the binary outcome variable. There are 39 variables in the covariate space, including demographics, interventions, comorbidities, vital signs, laboratory results, and admission information. Regularization methods discussed in this paper are used to select important variables from the covariate space, for both treatment and mediator propensity score models.
The data set contains 6361 patients and 3262 of them are in the treatment group, whereas 3099 of them are in the control group. In our data analysis, we focus on the four potential mediators with no missing values: ventilation-free days, vasopressor-free days, the use of dobutamine, and the maximum dose of norepinephrine, which are mentioned as secondary outcomes in Feng et al.. 9 We first use the mediation package in R 28 to conduct an exploratory data analysis. The estimated natural direct, indirect, and total effects (on the linear scale) by the bootstrap approach are summarized in Table 5. According to Table 5, vasopressor-free days, the use of dobutamine, and the maximum dose of norepinephrine have significant total effects and indirect effects. Since ventilation-free days do not have a significant indirect effect, we removed it from the list of potential mediators. Moreover, we found that the maximum dose of norepinephrine has a significant interaction between the treatment and the mediator using the test.TMint function in the mediation package. We did not observe a significant interaction term for the other three potential mediators.
Exploratory data analysis results of four potential mediators (p-value is the number in parentheses and * is used to indicate a significant effect at
It is worth noticing that more than half of the values are missing for several covariates including CVP values and laboratory results for BNP, troponin, and creatinine kinase. In Feng et al., 9 indicators are created to indicate whether patients are missing these covariates and we use the same set of indicators in our data analysis as well. Given that there are a large number of other variables that contain missing values, using imputation might create a tremendous amount of uncertainty. Therefore, we handle the other missing values using complete case analysis and patients who have any other covariates missing are excluded in our analysis; our reduced sample has 3021 patients with no missingness. In addition, six dummy variables are created to indicate which day of the week patients are admitted to ICU for the sake of computation efficiency. For mediators, the vasopressor-free days and the maximum dose of norepinephrine are dichotomized at their median for the applicability of the proposed method. For example, we replace each patient’s maximum dose of norepinephrine with 1 if the patient’s maximum dose of norepinephrine exceeds the median and 0, otherwise.
After handling the missing data issues and creating dummy variables, we then estimated the NDE and NIE log odds ratios using MSMs with different regularization methods for each of the three mediators: ventilation-free days, vasopressor-free days, and the maximum dose of norepinephrine. Here, we assume the potential mediators are conditionally independent given the treatment and baseline covariates, so natural direct and indirect effects can be estimated separately for each mediator. The treatment and mediator propensity score models were fitted using different regularization methods discussed in Sections 1 and 2, to compute weights for MSMs. NDE and NIE log odds ratio estimates can then be obtained from those MSMs. All MSMs for the three mediators suggest that the interaction term between the treatment and the mediator is not significant. Therefore, we only present results of the MSMs without the interaction term in this section. Treatment and mediator propensity score models with no variable selection were also fitted for comparison; we refer to these propensity score models as the full models. Data analysis results of the different regularization methods for the three mediators are summarized in Table 6. As in the simulation study, for AdaLASSO and OAL, we tried different values of the power parameter γ in the penalty weight. We only report results of the different regularization methods with the γ value that leads to the smallest standard error of the estimates.
Data analysis results of different regularization methods for three different mediators (p-value is the number in parentheses; only NIE estimates of dobutamine are non-significant and all other NDE/NIE estimates in this table are significant with a p-value < 0.001).
According to Table 6, all the estimated NDE log odds ratios obtained from the different regularization methods are significant for all three potential mediators. All these estimated NDE log odds ratios have negative signs indicating that the use of TTE can reduce the 28-day mortality directly. Moreover, the standard error of the estimated NDE log odds ratios obtained from the regularization methods is smaller than the standard error obtained from the full propensity score models with no variable selection conducted, which indicates that variable selection in propensity score models can lead to efficiency improvement.
From Table 6, we observe that the estimated NIE log odds ratios are not significant for the use of dobutamine and the estimated NIE log odds ratios are significant for vasopressor-free days and the maximum dose of norepinephrine. The positive signs for the estimated NIE log odds ratios for all three potential mediators indicate that the use of TTE may increase 28-day mortality through an indirect pathway. There are two possible explanations for the indirect pathway. The first explanation is that the use of TTE leads to an increase in the mediator level and then the increased mediator level increases patients’ mortality. An alternative explanation is that TTE reduces the mediator level and the reduced mediator level leads to a decrease in patients’ mortality. To find which explanation is more plausible, we can implement MSMs with inverse probability weighting to assess the causal relations between the potential mediator and mortality. In fact, according to the results of MSMs summarized in Web Appendix D in the Supporting Information, the use of TTE increases patients’ mortality rate by increasing the use of dobutamine or the maximum dose of norepinephrine; the use of TTE increases patients’ mortality rate by reducing the vasopressor-free days. In particular, the positive NIE log odds ratio estimates for dobutamine and norepinephrine seem to be reasonable since both have some adverse effects reported in the literature. 29
In Figure 2, matrix plots are used to show which covariates are selected in both the treatment and mediator propensity score models for dobutamine. If the cell of a given covariate is red, then the covariate is excluded from the propensity score model; if the cell of a given covariate is yellow, then the covariate is included in the propensity score model. We observe that the wAMD-based methods select more covariates in both the treatment and mediator propensity score models, compared to the deviance-based methods. In terms of estimating the NDE log odds ratio, the wAMD-based methods are less efficient since the propensity score models include too many covariates. Moreover, compared to AdaLASSO, OAL tends to select more covariates in the treatment propensity score model and thus is less efficient when estimating NDE and NIE log odds ratios. Matrix plots for the other two mediators are included in Web Appendix D in the Supporting Information.

Matrix plots of covariates selected in propensity score models for the use of dobutamine (indices at X-axis represent LASSO, AdaLASSO (deviance,
In conclusion, data analysis results of all 3 potential mediators suggest that the use of TTE can affect the 28-day mortality via a direct pathway and via an indirect pathway. It is worth noticing that when the power parameter
5 Concluding remarks
In this paper, the idea of outcome adaptive LASSO introduced by Shortreed and Ertefaie 4 is implemented and extended from a traditional causal inference setting to a mediation setting. By examining both oracle and regularized models, we observe that the efficiency of both NDE and TE odds ratio estimators can be improved significantly by incorporating outcome information into the propensity score models. Smaller relative bias is obtained by wAMD-based methods, which indicates bias reduction might be achieved by covariate balancing. In conclusion, based on our simulation results, estimating NDE and NIE odds ratios with propensity score models regularized by OAL (deviance) would be the best choice across all regularization methods, since this method yields the smallest MSEs. One limitation of the simulation study is that the logistic regression models we fit match the data-generating models. The setting is limited with respect to the number of covariates related to the treatment, mediator, and outcome variables, as well as to the zero relationship of the 196 out of the 200 covariates. The issue of model misspecification is not considered in our simulation study.
In the data illustration, the underlying mechanism of how TTE affects mortality is fully examined by the MSM approach. While previous studies show the use of TTE can reduce mortality rate, an interesting phenomenon is observed where the use of TTE can increase mortality by increasing either dobutamine use or norepinephrine dose; TTE can also increase mortality by reducing vasopressor-free days, according to our findings in Section 3.3.
While we focus on a binary treatment and a binary mediator variable in this article, the proposed methodology can be extended to other types of treatments and mediators. For example, if both the treatment and the mediator are continuous, the form of the inverse weights in equation (2) will remain the same but the conditional (marginal) probabilities should be replaced by the conditional (marginal) densities. While it appears to be a natural extension, the estimation of the conditional densities can be challenging because of the “curse of dimensionality”. Under the normality assumption, we can transform the estimation of the conditional density to the estimation of the conditional mean.2,30 For example, instead of estimating
Our methodological development and simulation results are based on a simple setting with only one mediator. However, it is possible that the indirect effect is carried by multiple mediators simultaneously and the mediators may interact with each other. This is often observed in real-world settings, such as in electronic health records. Advanced variable selection methods need to be developed to select the right subset of mediators and to estimate the mediator propensity score models. Regularization methods illustrated in this paper can be extended to this setting with multiple mediators.
Supplemental Material
sj-pdf-1-smm-10.1177_0962280221997505 - Supplemental material for Variable selection for causal mediation analysis using LASSO-based methods
Supplemental material, sj-pdf-1-smm-10.1177_0962280221997505 for Variable selection for causal mediation analysis using LASSO-based methods by Zhaoxin Ye, Yeying Zhu and Donna L Coffman in Statistical Methods in Medical Research
Footnotes
Acknowledgements
The authors thank Dr. Joel Dubin for critical readings of the original version of the paper and Sunny Huang for preparing the dataset. The content is solely the responsibility of the authors and does not necessarily represent the official views of NCI, OBSSR, or NIH.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Preparation of this article was supported by the National Institutes of Health (NIH) grant 1R01CA229542-01 (PI: Coffman) funded by the National Cancer Institute (NCI) and the Office of Behavioral and Social Science Research (OBSSR). Yeying Zhu’s research is supported by the National Sciences and Engineering Research Council of Canada (grant number RGPIN- 2017-04064).
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
