Sage Journals: Discover world-class research

Abstract

We used Monte Carlo simulations to compare the performance of marginal structural models (MSMs) based on weighted univariate generalized linear models (GLMs) to estimate risk differences and relative risks for binary outcomes in observational studies. We considered four different sets of weights based on the propensity score: inverse probability of treatment weights with the average treatment effect as the target estimand, weights for estimating the average treatment effect in the treated, matching weights and overlap weights. We considered sample sizes ranging from 500 to 10,000 and allowed the prevalence of treatment to range from 0.1 to 0.9. We examined both the robust variance estimator when using generalized estimating equations with an independent working correlation matrix and a bootstrap variance estimator for estimating the standard error of the risk difference and the log-relative risk. The performance of these methods was compared with that of direct weighting. Both the direct weighting approach and MSMs based on weighted univariate GLMs resulted in the identical estimates of risk differences and relative risks. When sample sizes were small to moderate, the use of an MSM with a bootstrap variance estimator tended to result in the most accurate estimates of standard errors. When sample sizes were large, the direct weighting approach and an MSM with a bootstrap variance estimator tended to produce estimates of standard error with similar accuracy. When using a MSM to estimate risk differences and relative risks, in general it is preferable to use a bootstrap variance estimator than the robust variance estimator. We illustrate the application of the different methods for estimating risks differences and relative risks using an observational study on the effect on mortality of discharge prescribing of a beta-blocker in patients hospitalized with acute myocardial infarction.

Keywords

Inverse probability of treatment weighting relative risk risk difference propensity score variance estimation bootstrap

1. Introduction

Propensity score weighting is frequently used to estimate the effects of treatments and interventions when using observational data. In a setting with a binary treatment or intervention (e.g. treatment vs control), the propensity score is the probability of a subject being treated (vs receiving the control) conditional on the subject's observed baseline covariates.¹ Propensity score weighting entails constructing weights that are derived from the propensity score. Several sets of propensity score-based weights have been proposed, each of which has a different target estimand. The original set of propensity score-based weights, referred to as inverse probability of treatment weights, are equal to the inverse of the probability of receiving the treatment that subject received.² These weights allow one to estimate the average treatment effect (ATE). Alternative sets of weights have as target estimand the average treatment effect in the treated (ATT) or the effect of treatment in those subjects for whom there is equipoise in treatment selection (i.e. those subjects whose propensity score is close to 0.5).^3–6 Note that the different estimands should not be seen as interchangeable and the choice of target estimand should be based on the research question of interest.⁷

Binary outcomes (e.g. death within 30 days or symptom resolution) are common in clinical and health services research.⁸ When outcomes are binary, the effect of treatment can notably be quantified using four metrics: the risk difference, the relative risk, the number needed to treat (NNT), and the odds ratio. Let Y denote the binary outcome, with Y = 1 denote the occurrence of the outcome (e.g. death occurred within 30 days) and Y = 0 denoting the absence of the outcome (e.g. death did not occur within 30 days) and let Z denote treatment status (Z = 0 control vs Z = 1 treated). Let P₁ and P₀ denote the probability of the outcome in treated and control subjects, respectively: P₁= Pr(Y = 1|Z = 1) and P₀= Pr(Y = 1|Z = 0). The risk difference and the relative risk are defined as P₁–P₀ and P₁/P₀, respectively. The NNT is the reciprocal of the risk difference and is equal to the number of subjects who must be treated to avoid one outcome event. The odds ratio is equal to $\frac{P_{1} / (1 - P_{1})}{P_{0} / (1 - P_{0})}$ .

Several clinical commentators have suggested that, at the very least, absolute measures of treatment effect be reported to complement relative measures of effect. Laupacis and colleagues suggested that the risk difference is superior to the relative risk because it incorporates both baseline risk and the reduction in risk due to treatment.⁹ However, they suggest that the NNT is more clinically interpretable. This is echoed by Cook and Sackett who suggested that the NNT is more meaningful for clinical decision making.¹⁰ Sinclair and Bracken suggest that risk differences, relative risks, and the NNT be reported when addressing clinically important questions.¹¹ Sackett, Deeks, and Altman suggested that the odds ratio not be used as a measure of effect in randomized controlled trials (RCTs).¹² The CONSORT statement for the reporting of RCTs recommends that both relative and absolute measures of treatment effects be reported in RCTs with dichotomous outcomes (Statement 17b).¹³ For RCTs, the BMJ (formerly the British Medical Journal) requires that authors report the absolute event rate in each arm, the relative risk reduction, and the NNT.¹⁴ The measures of effect reported in observational studies should reflect those reported in comparable RCTs.¹⁵ Thus, it is important to assess the performance of statistical methods for estimating risk differences and relative risks in observational studies.

When using propensity score-based weighting there are two primary approaches that can be used to estimate the effect of treatment when outcomes are binary. First, one can compute the weighted proportion of success or failures in treated and control subjects separately. The risk difference and the relative risk can be computed as the difference and ratio of these two weighted proportions, respectively. Different authors have described asymptotic variance estimators for risk differences and relative risks.^16,17 A recent study examined the use of the bootstrap when using propensity score-based weights with continuous and binary outcomes.¹⁸ When sample sizes were ≤ 1000 and the ATE was the target estimand, then the use of the bootstrap resulted in estimates of standard error that were more accurate than the asymptotic variance estimates. Second, one can fit a marginal structural model (MSM), in which the binary outcome is regressed on an indicator variable denoting treatment status using a weighted regression model.^19,20 Robins and colleagues suggested that weighted generalized linear models (GLMs) with a binomial distribution and identity link can be used to estimate risk differences, weighted GLMs with a binomial distribution and log link can be used to estimate relative risks, while a weighted GLM with a binomial distribution and a logistic link function (i.e. a weighted logistic regression model) can be used to estimate odds ratios.²⁰ Joffe and colleagues used a weighted logistic regression model to estimate a marginal odds ratio for a binary outcome.¹⁹ Both sets of authors suggested that the variance of the estimated treatment effect can be estimated using estimating equation methods with a robust variance estimator, with Robins and colleagues noting that the resultant estimated confidence intervals will be conservative (i.e. the true coverage rate will be at least as large as the advertised rate).^19,20 Neither set of authors evaluated the performance of these methods, either in terms of bias or the accuracy of variance estimation. Despite Robins and colleagues suggesting that marginal risk differences and marginal relative risks can be estimated using an appropriate MSM, it appears that this is rarely done in applied research. In the context of conventional multivariable regression adjustment, Zou proposed that an unweighted multivariable Poisson regression with a robust variance estimator be used to estimate adjusted relative risks for binary outcomes.²¹ Talbot and colleagues justified the use of Poisson regression to estimate relative risks through a semi-parametric framework that does not require specification of the nature of the distribution of the outcome.²²

There is a paucity of research on the performance of MSMs to estimate risk differences and relative risks with fixed binary exposures for binary outcomes. The objective of the current study was to assess the performance of MSMs for estimating risk differences and relative risks. We consider MSMs based on both weighted binomial-identity and Poisson-identity GLMs for estimating risk differences and both weighted binomial-log and Poisson-log GLMs for estimating relative risks. We examine the performance of both a robust variance estimator and the bootstrap for estimating the variance of the estimated risk difference or the logarithm of the relative risk. We used Monte Carlo simulations and considered scenarios with small, moderate and large sample sizes, as well as low, medium and high prevalences of treatment. The paper is structured as follows: In Section 2, we describe the design of a series of Monte Carlo simulations to examine the performance of MSM for estimating risk differences and relative risks. In Section 3, we report the results of these Monte Carlo simulations. In Section 4, we provide a case study illustrating the application of different methods for estimating risk differences and relative risks in an observational study on the effect on mortality of discharge prescribing of a beta-blocker in patients hospitalized with acute myocardial infarction (AMI or heart attack). Finally, in Section 5 we summarize our findings and place them in the context of the existing literature.

2. Monte Carlo simulations – methods

We conducted a series of Monte Carlo simulations to assess the performance of MSMs based on weighted GLMs to estimate risk differences and relative risks when using propensity score-based weights. We examined four different sets of weights. The data-generating process that we used was identical to that used in a recent study that compared asymptotic variance estimates with bootstrap estimates when using direct weighting to estimate differences in means, risk differences and relative risks.¹⁸

2.1 Factors in the Monte Carlo simulations

We allowed two factors to vary in the Monte Carlo simulations: the size of the random sample drawn from the super-population and the prevalence of treatment in the super-population. The former took on four values: 500, 1000, 5000 and 10,000. The latter took on nine values: from 0.1 to 0.9 in increments of 0.1. We used a full factorial design and thus examined 36 different scenarios.

2.2 Data-generating process

We used a data-generating process identical to that used in a recent study.¹⁸ The reader is referred to that study for greater details on the data-generating process. Briefly, for each scenario, we simulated subjects from a super-population of size 1,000,000. For each subject we generated five continuous and five binary baseline covariates. We then generated a binary treatment variable, Z_i, using a logistic regression model, using a bisection approach to identify the model intercept that resulted in the desired prevalence of treatment.²³ The following model was used for simulating the binary treatment variable:

\begin{aligned} logit (p_{i,treat}) & = α_{0, treat} + \log (1.1) x_{1, i} + \log (1.2) x_{2, i} + \log (1.5) x_{3, i} + \log (1.75) x_{4, i} + \log (2) x_{5, i} \\ + \log (1.25) x_{6, i} + \log (1.5) x_{7, i} + \log (2) x_{8, i} + \log (0.8) x_{9, i} + \log (0.5) x_{10, i} . \end{aligned}

The value of

α_{0, treat}

for the nine different prevalences of treatment are reported in Table 1. The distribution of the propensity score in treated and control subjects is depicted in Figure A1 in the Online Supplemental Material in the earlier publication that used the same data-generating process.¹⁸ In all scenarios there was moderate to good overlap in the distribution of the propensity score between treated and control subjects. The c-statistic of the propensity score model was approximately 0.80 across the scenarios.

Table 1.
Intercept for treatment selection model and value of different target estimands.

Risk difference Relative risk

Prevalence of treatment} $α_{0, treat}$ ATE ATT OW MW ATE ATT OW MW

0.1 −2.68559 −0.02 −0.029 −0.026 −0.027 0.893 0.905 0.908 0.91

0.2 −1.69434 −0.02 −0.026 −0.024 −0.025 0.893 0.907 0.904 0.905

0.3 −1.00098 −0.02 −0.026 −0.024 −0.024 0.893 0.903 0.896 0.897

0.4 −0.41748 −0.02 −0.025 −0.023 −0.023 0.893 0.901 0.893 0.892

0.5 0.127611 −0.02 −0.025 −0.021 −0.021 0.893 0.897 0.892 0.891

0.6 0.668945 −0.02 −0.024 −0.021 −0.021 0.893 0.897 0.886 0.883

0.7 1.253323 −0.02 −0.023 −0.019 −0.019 0.893 0.899 0.884 0.884

0.8 1.946411 −0.02 −0.023 −0.018 −0.017 0.893 0.895 0.882 0.88

0.9 2.929688 −0.02 −0.022 −0.014 −0.013 0.893 0.895 0.892 0.893

		Risk difference	Relative risk
0.1	−2.68559	−0.02	−0.029	−0.026	−0.027	0.893	0.905	0.908	0.91
0.2	−1.69434	−0.02	−0.026	−0.024	−0.025	0.893	0.907	0.904	0.905
0.3	−1.00098	−0.02	−0.026	−0.024	−0.024	0.893	0.903	0.896	0.897
0.4	−0.41748	−0.02	−0.025	−0.023	−0.023	0.893	0.901	0.893	0.892
0.5	0.127611	−0.02	−0.025	−0.021	−0.021	0.893	0.897	0.892	0.891
0.6	0.668945	−0.02	−0.024	−0.021	−0.021	0.893	0.897	0.886	0.883
0.7	1.253323	−0.02	−0.023	−0.019	−0.019	0.893	0.899	0.884	0.884
0.8	1.946411	−0.02	−0.023	−0.018	−0.017	0.893	0.895	0.882	0.88
0.9	2.929688	−0.02	−0.022	−0.014	−0.013	0.893	0.895	0.892	0.893

We generated a binary outcome using a logistic model so that the prevalence of the outcome in the super-population was 0.2 if all subjects were untreated and so that the risk difference for treatment was −0.02 with the ATE as the target estimand. The following model was used for simulating binary outcomes:

\begin{aligned} logit (p_{i, outcome}) & = - 2.33643 - 0.15625 Z_{i} + \log (2) x_{1, i} + \log (1.75) x_{2, i} + \log (1.1) x_{3, i} + \log (1.5) x_{4, i} \\ + \log (1.2) x_{5, i} + \log (2) x_{6, i} + \log (1.5) x_{7, i} + \log (1.1) x_{8, i} + \log (1.25) x_{9, i} + \log (2) x_{10, i} . \end{aligned}

In the super-population we determined the true value of the risk difference and relative risk when the ATE and the ATT were the target estimands and for the target estimands targeted by matching weights (MW) and overlap weights (OW) (weights defined below). The true value of the target estimands with different sets of weights are reported in Table 1.

2.3 Statistical analyses

We drew a random sample of size N from the super-population. In the random sample we estimated the propensity score using a logistic regression model in which treatment status was regressed on the 10 baseline covariates. We estimated four different sets of weights: ATE weights, ATT weights, MW, and OW.^2–6 Let $e (X)$ denote the estimated propensity score for a subject with covariate vector $X$ and let Z denote treatment status (Z = 0 for control; Z = 1 for treated). The four sets of weights are defined as:

\begin{aligned} w_{ATE} (X) & = \frac{Z}{e (X)} + \frac{1 - Z}{1 - e (X)}, \end{aligned}

\begin{aligned} w_{ATT} (X) & = Z + \frac{(1 - Z) e (X)}{1 - e (X)}, \end{aligned}

\begin{aligned} w_{MW} (X) & = \frac{Z min (e (X), 1 - e (X))}{e (X),} + \frac{(1 - Z) min (e (X), 1 - e (X))}{1 - e (X),}, and \\ w_{OW} (X) & = Z (1 - e (X)) + (1 - Z) (e (X)) . \end{aligned}

The use of the first set of weights targets the ATE, while the second set of weights targets the ATT. The latter two sets of weights target inference at the subpopulation for whom there is the greatest clinical equipoise about treatment (i.e. those subjects whose propensity score is close to 0.5).⁶

In the random sample of size N we used five methods to estimate the risk difference and its standard error: first, the risk difference was estimated as $\frac{\sum_{i = 1}^{N} w_{1} (X) Z_{i} Y_{i}}{\sum_{i = 1}^{N} w_{1} (X) Z_{i}} - \frac{\sum_{i = 1}^{N} w_{0} (X) (1 - Z_{i}) Y_{i}}{\sum_{i = 1}^{N} w_{0} (X) (1 - Z_{i})}$ , where $w_{1} (X) and w_{0} (X)$ are the weights under treatment and control, respectively, Z denotes treatment status (Z = 1 for treated; Z = 0 for control), and Y denotes the binary outcome. Asymptotic estimates of the standard error of the risk difference were obtained using methods described by Zhou and colleagues.¹⁷ Second, an MSM based on a weighted GLM with a binomial distribution and identity link function was used. The model was estimated using generalized estimating equation (GEE) methods assuming an independent working correlation matrix (the choice of an independent working correlation matrix is based on the recommendation of Robins and colleagues²⁰). The standard robust variance estimator was used. Third, an MSM based on a weighted GLM with a binomial distribution and identity link function was used. Standard errors of the estimated risk difference were obtained using bootstrap methods with 200 bootstrap replicates. Fourth, an MSM based on a weighted GLM with a Poisson distribution and identity link function was used. The model was estimated using GEE methods assuming an independent working correlation matrix. The standard robust variance estimator was used. Fifth, an MSM based on a weighted GLM with a Poisson distribution and identity link function was used. Standard errors of the estimated risk difference were obtained using bootstrap methods with 200 bootstrap replicates. We refer to these five methods as direct weighting, MSM-BI-RobustSE, MSM-BI-BS, MSM-PI-RobustSE, and MSM-PI-BS, respectively. For the latter four names, the first component of the name indicates that an MSM was used, the second component of the name indicates the distribution family and link for the GLM, while the third component of the name indicates the method used to estimate standard errors.

We used five methods to estimate the relative risk and the standard error of the log-relative risk. First, the relative risk was estimated as $\frac{\sum_{i = 1}^{N} w_{1} (X) Z_{i} Y_{i}}{\sum_{i = 1}^{N} w_{1} (X) Z_{i}} / \frac{\sum_{i = 1}^{N} w_{0} (X) (1 - Z_{i}) Y_{i}}{\sum_{i = 1}^{N} w_{0} (X) (1 - Z_{i})}$ . Asymptotic estimates of the standard error of the log-relative risk were obtained using methods described by Zhou and colleagues.¹⁷ Second, an MSM based on a weighted GLM with a binomial distribution and log link function was used. The model was estimated using GEE methods assuming an independent working correlation matrix and the standard robust standard errors were used. Third, an MSM based on a weighted GLM with a binomial distribution and log link function was used. Standard errors of the estimated log-relative risk were obtained using bootstrap methods with 200 bootstrap replicates. Fourth, an MSM based on a weighted GLM with a Poisson distribution and log link function was used (i.e. a Poisson regression model). The model was estimated using GEE methods assuming an independent working correlation matrix and the standard robust standard errors were used. Fifth, an MSM based on a weighted GLM with a Poisson distribution and log link function was used. Standard errors of the estimated log-relative risk were obtained using bootstrap methods with 200 bootstrap replicates. We refer to these five methods as direct weighting, MSM-BL-RobustSE, MSM-BL-BS, MSM-PL-RobustSE, and MSM-PL-BS, respectively. The naming convention of the last four methods follows a similar structure as the naming convention described in the previous paragraph.

We constructed 95% confidence using standard normal-theory methods based on the estimated standard error of the estimated treatment effect. We then determined whether the estimated 95% confidence interval contained the true value of the target estimand.

The above methods were repeated using each of the four sets of weights. The above set of analyses was repeated 1000 times. We thus drew 1000 samples, and in each sample we obtained an estimate of the risk difference and the logarithm of the relative risk using each set of weights and each of the methods. We also obtained estimates of the standard error of each estimate and constructed 95% confidence intervals.

2.4 Performance measures

We used three performance measures to assess the performance of the different methods for estimating the risk difference and the relative risk: (i) relative bias in the risk difference or log-relative risk; (ii) relative per cent error in the estimated standard error of the risk difference and the log-relative risk; (iii) empirical coverage rates of estimated 95% confidence intervals.

Relative bias was computed as $100 \times \frac{\frac{1}{1000} \sum_{i = 1}^{1000} ({\hat{ϕ}}_{i} - ϕ_{true})}{ϕ_{true}}$ , where ${\hat{ϕ}}_{i}$ denotes the estimated treatment effect (risk difference or logarithm of the relative risk) in the ith simulation replicate and $ϕ_{true}$ denotes the true value of the treatment effect.

Relative per cent error in the estimated standard error of the estimated treatment effect was computed as $100 \times (\frac{\frac{1}{1000} \sum_{i = 1}^{1000} se({\hat{ϕ}}_{i})}{SD(\hat{ϕ})} - 1)$ , where $se({\hat{ϕ}}_{i})$ denotes the estimated standard error in the ith simulation replicate and $SD (\hat{ϕ})$ denotes the standard deviation of the estimated treatment effect across the 1000 simulation replicates.²⁴ If the relative error is equal to zero, then the estimated standard error is correctly estimating the standard deviation of the sampling distribution of the estimated treatment effect. If the relative error is less than zero, then the estimated standard errors are underestimating the standard deviation of the sampling distribution of the estimated treatment effects. If the relative error is greater than zero, then the estimated standard errors are overestimating the standard deviation of the sampling distribution of the estimated treatment effects.

Empirical coverage rates of estimated 95% confidence intervals were computed as the proportion of estimated confidence intervals that contained the true value of the treatment effect.

2.5 Software

The simulations were conducted using the R statistical programming language (version 3.6.3). The weighted analyses were conducted using the PSweight function from the PSweight package (version 1.1.6). In PSweight, the variance of the estimated treatment effect is obtained using the empirical sandwich variance estimator for propensity score weighting estimators based on M-estimation theory.¹⁷ This variance estimator accounts for the uncertainty in estimating the propensity score. The reader is referred elsewhere for further details.¹⁷ Estimation of weighted GLMs using GEE methods was done using the geeglm function in the geepack package (version 1.3-2).

3. Monte Carlo simulations – results

The binomial-identity GLM and the Poisson-identity GLM resulted in identical estimates of both the risk difference and its standard error across all 1000 iterations of each of the 36 scenarios. For this reason, we only report results for the binomial-identity MSM and not the Poisson-identity MSM when estimating risk differences. Similarly, the binomial-log MSM and the Poisson-log MSM resulted in identical estimates of both the logarithm of the relative risk and its standard error across all 1000 iterations of each of the 36 scenarios. Consequently, we only report results for the binomial-log MSM and not the Poisson-log GLM when estimating relative risks.

3.1 Relative bias in estimating risk differences and log-relative risks

For each of the four sets of weights (ATE/ATT/MW/OW), the estimated risk difference or log-relative risk was equal across the different estimation methods within each simulation replicate. Thus, for a given set of weights and measure of effect, the different estimation methods resulted in the same estimated effect. Consequently, the relative bias was equal for the different estimation methods. Results for relative bias are reported in Figure 1. Results for risk differences are reported in the left panel while results for log-relative risks are reported in the right panel. Each panel contains a nested loop plot, with two loops.²⁵ The outer loop depicts sample size (500, 1000, 5000, and 10,000), while the inner loop depicts prevalence of treatment (0.1 to 0.9 in increments of 0.1). There is one set of lines for each of the four target estimands.

Figure 1.

Relative bias (%): risk difference and log-relative risk.

For each target estimand, the magnitude of the relative bias tended to decrease with increasing sample size. Relative bias tended to be the greatest when using ATT weights with a sample size of 500 and a high prevalence of treatment. Note that the relative biases are not intended to be compared between target estimands. For instance, the relative bias when using ATE weights is not comparable to the relative bias when using OW, as the true target estimand for the ATE weights may differ from the true target estimand for the OW (see Table 1 for the values of the different target estimands under different weights).

3.2 Comparison of estimated standard error to the standard deviation of the sampling distribution

Relative per cent error in the estimated standard error of the estimated treatment effect is reported in Figure 2 (risk difference) and Figure 3 (log-relative risk). Each figure consists of four panels, one for each of the four target estimands (ATE/ATT/MW/OW). Each panel depicts three nested loop plots, one for each estimation method, with each nested loop plot having two loops. The outer loop depicts sample size (500, 1000, 5000, and 10,000), while the inner loop depicts prevalence of treatment (0.1 to 0.9 in increments of 0.1).

Figure 2.

Relative per cent error in the estimated standard error (risk difference).

Figure 3.

Relative per cent error in the estimated standard error (log-relative risk).

We first discuss results when the risk difference was the measure of effect. When the ATE was the target estimand and the sample size was ≤1000, then MSM-BI-BS tended to produce the most accurate estimates of standard error. When the sample size was 5000 or 10,000, both direct weighting and MSM-BI-BS tended to result in the most accurate estimates of standard error, with minimal differences between the two methods. When the ATT was the target estimand, MSM-BI-BS tended to result in the most accurate estimates of standard error across the different scenarios. When the sample size was ≤1000, all methods tended to underestimate the sampling variability when the prevalence of treatment was high. As with the ATE, when the sample size was 5000 or 10,000, then both direct weighting and MSM-BI-BS tended to result in the most accurate estimates of standard error, with minimal differences between the two methods. When using MW or OW, the use of direct weighting with asymptotic estimates of the standard error tended to result in the most accurate estimates of standard error. However, differences between direct weighting and MSM-BI-BS tended to be minimal. MSM-BI-RobustSE tended to result in the least accurate estimates of standard errors when using MW or OW. In particular, MSM-BI-RobustSE tended to over-estimate the sampling variability of the risk difference when using MW and OW.

We now discuss results when the relative risk was the measure of effect. Across all four sets of weights, when the prevalence of treatment was 0.9 and the sample size was 500, MSM-BL-BS resulted in a large over-estimation of the variance of the sampling distribution of the log-relative risk. Apart from when the prevalence of treatment was 0.9,the use of MSM-BL-BS tended to result in the most accurate estimates of the standard error of the log-relative risk. The large relative error in the estimated standard error when using MSM-BL-BS when the sample size was 500 and the prevalence of treatment was 0.9 makes it difficult to discern differences between the methods in the other scenarios. Figure S1 in the Supplemental Online Material replicates the results from Figure 3, after the exclusion of scenarios in which the prevalence of treatment was 0.9. In scenarios in which the prevalence of treatment was less than 0.9 and the ATE or ATT was the target estimand, none of the three methods had consistently superior performance compared to the other methods. However, the MSM-BL-BS method tended to have good performance. When overlap or MW were used, the direct weighting approach tended to have the best performance of the three methods.

We hypothesize that the large relative error in the estimated standard error for the log-relative risk when using the MSM-BL-BS method with a prevalence of treatment of 0.9 and a sample size of 500 was due to the occurrence of very few outcome events in control subjects in a few of the bootstrap samples. With 500 subjects and a prevalence of treatment of 0.9, there would, on average, be 50 control subjects in each simulated dataset. We simulated outcomes such that the prevalence of the outcome in the super-population was 0.2 if all subjects were untreated. Thus, using a rough approximation, on average, one would anticipate approximately 10 outcome events amongst the 50 controls. The probability of observing three or fewer outcome events would be approximately 0.0057. One would anticipate that, on average, across the 200 bootstrap samples, at least one bootstrap would have three or fewer outcome events in the control subjects, potentially resulting in a very large estimated relative risk with a correspondingly large standard error. In a post-hoc analysis of the simulation results, across the 1000 simulation replicates when using ATE weights, the median standard error of the estimated log-relative risk was 1.00, while the 95th and 99th percentiles were 6.42 and 9.34, respectively. Thus, in a small number of simulation replicates the estimated standard error was very large. Qualitatively similar results were observed for the three other sets of weights.

As noted in Section 3.1, for a given target estimand and measure of effect, each of the methods resulted in the same point estimate across all 1000 iterations of each of the 36 scenarios. Consequently, the empirical standard error (the standard deviation of the estimated measure of effect across the 1000 simulation replicates) was the same for each of the estimation methods. For this reason, we do not compare the empirical standard error across the estimation methods.

3.3 Empirical coverage rates of estimated 95% confidence intervals

Empirical coverage rates are reported in Figure 4 (risk differences) and Figure 5 (log-relative risk). The figures have a structure similar to Figures 2 and 3. Given our use of 1000 simulation replicates, empirical coverage rates that were less than 0.9365 or greater than 0.9635 were statistically significantly different from the advertised rate of 0.95 using a standard normal-theory test. We added horizontal lines denoting empirical coverage rates of 0.9365 and 0.9635 to each of the panels.

Figure 4.

Coverage of 95% confidence intervals (risk difference).

Figure 5.

Coverage of 95% confidence intervals (relative risk).

We first discuss results when the risk difference was the measure of effect. When the ATE was the target estimand, all methods tended to result in confidence intervals with lower than advertised coverage rates when the prevalence of treatment was either low or high. The magnitude of the divergence from the advertised coverage rates decreased with increasing sample size. When the prevalence of treatment was close to 0.5, MSM-BI-BS tended to result in confidence intervals whose empirical coverage rates were closest to the advertised rate. When the ATT was the target estimand, all methods tended to result in confidence intervals whose coverage rate was lower than advertised when the prevalence of treatment was high. As with the ATE, the magnitude of divergence from the advertised rate decreased with increasing sample size. When using MW or OW, both direct weighting and MSM-BI-BS tended to produce confidence intervals whose empirical coverage rates did not differ from the advertised rate. In contrast to this, MSM-BI-RobustSE often produced confidence intervals whose empirical coverage rates were conservative.

We now discuss results when the relative risk was the measure of effect. When the ATE was the target estimand, both direct weighting and MSM-BL-RobustSE produced confidence intervals whose empirical coverage rates were lower than advertised when the sample size was ≤1000 and the prevalence of treatment was low or high. In contrast to this, in most scenarios MLM-BL-BS produced confidence intervals whose empirical coverage rates were not different from the advertised rate. When the ATT was the target estimand, both direct weighting and MSM-BL-RobustSE tended to produce confidence intervals whose empirical coverage rates were lower than the advertised rate when the prevalence of treatment was high. The magnitude of divergence from the advertised rate decreased with increasing sample size. When using MW and OW, all methods tended to produce confidence intervals whose empirical coverage rates were at least 0.9365. In many scenarios, MLM-BL-RobustSE tended to produce confidence intervals whose empirical coverage rates were conservative.

As noted above, the use of the Poisson-identity GLM produced estimates identical to the binomial-identity GLM in each of the simulation replicates, while the use of the Poisson-log GLM produced estimates identical to the binomial-log GLM. However, for the sake of completeness, results for the Poisson-based MSMs are reported in Figures S2 to S5 in the Supplemental Online Material.

4. Case study

We provide a case study to compare the use of different methods for estimating the risk difference and the relative risk. The case study consisted of patients discharged from hospital with a diagnosis of AMI. The exposure was receipt of a prescription for a beta-blocker at hospital discharge. The binary outcome was death within one year of hospital discharge.

4.1 Methods

We used a subset of data from a previously-published study, consisting of 6984 patients who were discharged alive from hospital with a diagnosis of AMI in Ontario, Canada, between 1 April 1999 and 31 March 2001^26,27 and who were eligible for beta-blocker therapy at hospital discharge. Baseline information was available on patient demographics, presenting signs and symptoms, classic cardiac risk factors, comorbid conditions and vascular history, vital signs on admission, and results of laboratory tests. The exposure of interest was whether the patient was prescribed a beta-blocker at hospital discharge. Seventy-two per cent of subjects were prescribed a beta-blocker at hospital discharge. The outcome was a binary outcome denoting whether the patient died within one year of hospital discharge. Twelve per cent of subjects died within one year of hospital discharge. The propensity score was estimated using a logistic regression model in which receipt of a beta-blocker at hospital discharge was regressed on 34 baseline covariates.

The statistical methods described above for estimating the risk differences and relative risks along with their associated 95% confidence were used. We did not use the MSM based on a Poisson-identity GLM for estimating risk differences, as it produced identical results to the binomial-identity GLM. Similarly, we did not use the MSM based on a Poisson-log GLM for estimating relative risks, as it produced identical results to the binomial-log GLM.

4.2 Results

The estimated risk differences and their associated 95% confidence intervals are reported in the left panel of Figure 6, while the estimated logarithms of the relative risk and their associated 95% confidence intervals are reported in the right panel of Figure 6. Each panel is a forest plot with one horizontal bar for each combination of estimands and method. For a given combination of estimand (ATE/ATT/MW/OW) and measure of effect (risk difference or log-relative risk), the point estimate was identical across estimation methods. When estimating confidence intervals for the risk difference, the ratio of length of the widest confidence interval to the narrowest confidence interval ranged from 1.20 to 1.23 across the four sets of weights. When estimating confidence intervals for the log-relative risk, these ratios ranged from 1.19 to 1.23 across the four sets of weights. For a given set of weights, the use of the MSM with a bootstrap variance estimator resulted in the narrowest 95% confidence intervals, while the use of the MSM with the robust variance estimator resulted in the widest 95% confidence intervals. Despite these differences, in examining Figure 6, for a given target estimand and measure of treatment effect, one would draw qualitatively similar conclusions regardless of which estimation method was used.

Figure 6.

Case study: estimates of risk difference and log-relative risk.

5. Discussion

We assessed the performance of MSMs based on weighted univariate GLMs for estimating risk differences and relative risks. We compared the conventional robust variance estimator for a GLM estimated using GEE methods with an independence working correlation matrix with the use of the bootstrap for estimating the standard error of risk differences and the log-relative risk. These methods were compared with the use of direct weighted estimators with asymptotic variance estimators. We summarize our findings as follows. First, both direct weighting and MSMs resulted in identical point estimates of the risk difference or relative risk. Consequently, all methods had the same empirical standard deviation. Thus, the difference between the methods was the degree to which the estimated standard errors approximated the standard deviation of sampling distribution of the risk difference or the log-relative risk. Second, when using ATE or ATT weights and the sample size was ≤1000, then an MSM with the bootstrap variance estimator tended to result in the most accurate estimates of the standard error. When the sample size was >1000, then the direct weighting approach with asymptotic estimates of the standard error and the MSM with a bootstrap variance estimator tended to have comparable performance. Third, when using OW or MW, the direct weighting approach and the MSM with the bootstrap variance estimator tended to have comparable performance. Based on differences between these two methods when estimating the relative risk and both the sample size was small and the prevalence of treatment was very high, we would recommend the use of the direct weighting estimator. Fourth, in general, the MSM with a robust variance estimator had worse performance than did the MSM with the bootstrap variance estimator. Consequently, when using an MSM, we would recommend that the bootstrap variance be used, particularly if the prevalence of treatment is not close to one. Fifth, when using a univariate MSM, identical estimates of both the risk difference or the log-relative risk and its associated standard error obtained when using either the binomial distribution or the Poisson distribution. Thus, in applied studies, either of these distributions can be used.

The robust variance estimator with conventional ATE weights or OW is known to be conservative. Despite this, the robust variance estimator, rather than the direct weighting method, is often used by many applied investigators. We hypothesize that this is because the robust variance estimator can be implemented easily using standard statistical software (e.g. using standard procedures or functions in R, SAS or Stata). In contrast to this, the use of the appropriate asymptotic variance estimator for the direct weighted estimator often requires that a specialized package be used (e.g. the PSweight package in R). Similarly, the use of the bootstrap variance estimator with weighted GLMs may require more complex programming by the analyst.

Only a few studies have examined the performance of a robust variance estimator when using weighted regression models to estimate causal effects. An earlier study examined methods based on the propensity score to estimate marginal hazard ratios.²⁸ Weighted Cox models resulted in estimates of the marginal hazard with minimal bias when either the ATE or the ATT was the target estimand. The use of a robust variance estimator with ATE weights tended to under-estimate the standard deviation of the log-hazard ratio by between approximately 0% and 10% depending on the true marginal hazard ratio. However, the use of a robust variance estimator with ATT weights tended to over-estimate the standard deviation of the log-hazard ratio by between approximately 30% and 45% depending on the true marginal hazard ratio. In a subsequent article, the author showed that the use of the bootstrap with weighted Cox models resulted in accurate estimation of the standard error of log-hazard ratios when either the ATE or the ATT was the target estimand.²⁹ Arona Diop and colleagues stated that the robust variance estimator used with OW is conservative.³⁰ Reifeis and Hudgens examined the performance of the robust variance estimator when used with ATT weights and found that the estimator may be either conservative or anticonservative, concluding that the resultant confidence intervals will not be valid.³¹ We are unaware of the previous similar results for MW.

Only a few studies have examined the performance of propensity score methods for estimating risk differences and relative risks. The current study restricted its focus on the use of propensity score weighting to estimate risk differences and relative risks. A study from 2008 compared the performance of matching on the propensity score, stratification on the quintiles of the propensity score, and covariate adjustment using the propensity score (but not propensity score weighting) to estimate relative risks.³² Propensity score matching and stratification on the quintiles of the propensity score resulted in estimates of relative risk with similar mean squared error (MSE), with matching producing estimates with less bias and stratification producing estimates with greater precision. A study from 2010 compared the performance of different propensity score methods for estimating risk differences.³³ Estimators based on inverse probability of treatment weighting (IPTW) had lower MSE compared to other propensity score methods, while a doubly robust version of IPTW had superior performance compared to the other methods. A 2010 study examined the performance of weighted binomial-identity MSMs for estimating risk differences.³⁴ Using simulations, the authors showed that the absolute bias tended to be minimal. However, empirical coverage rates had lower than advertised coverage rates when confounding was strong. There were four primary differences between their simulations and those in the current study. First, their simulation design only included two confounding variables, whereas the design in the current study had 10 confounding variables, which is more representative of settings in which propensity score methods are used. Second, while they restricted their focus on the robust variance estimator, the current study also considered the use of the bootstrap. We found that in several settings, the use of the bootstrap resulted in improved inferences compared to the use of the robust variance estimator. Third, the earlier study restricted attention to the risk difference, while we focused on estimation of both the risk difference and the relative risk. Fourth, the earlier study focused only on estimation using the MSM approach, whereas we compared the use of MSMs with direct weighting. A 2011 study compared the use of paired versus unpaired analyses when making inferences about risk differences when using propensity score matching.³⁵ It was shown that the use of paired analyses (e.g. the use of McNemar's test) was superior to that of unpaired analyses (e.g. the use of the standard Chi-squared test). A 2017 study compared the performance of full matching on the propensity score with that of nearest neighbour matching (NNM) and IPTW for estimating risk differences and relative risks.³⁶ All methods worked well when the strength of confounding was relatively weak. With stronger confounding, NNM with a caliper had good performance across a range of scenarios. A study from 2017 described how NNM on the propensity score can be combined with covariate adjustment using the propensity score (so called double propensity score adjustment) to make inferences about risk differences.³⁷

There are certain limitations to the current study. The primary limitation relates to our use of Monte Carlo simulations. Due to the computational intensity of using Monte Carlo simulations to examine the performance of bootstrap methods, we were only able to examine a limited number of scenarios. However, we examined 36 different scenarios characterized by different combinations of sample size and prevalence of treatment. These scenarios reflect many of the settings in which propensity score methods are applied in clinical and epidemiological research. When using direct weighting we only examined the use of an asymptotic variance estimator and did not consider bootstrapping. The rationale for that limitation was that we had examined the performance of the bootstrap with direct weighting in a recent study.¹⁸ In that prior study we found that when the ATE was the target estimand and sample sizes were ≤1000, that the bootstrap resulted in more accurate estimates of standard errors than the asymptotic variance estimator. Similarly, when the ATT was the target estimand and both sample sizes were ≤1000 and the prevalence of treatment was moderate to high, then the bootstrap produced estimates of standard errors that were more accurate than the asymptotic variance estimators. When using MW and OW, both the asymptotic variance estimator and the bootstrap resulted in accurate estimates of standard errors. Another limitation relates to our restriction to the use of univariate GLMs and we did not consider adjustment for additional covariates. While we observed that the use of binomial distribution and a Poisson distribution resulted in identical results, we do not anticipate that this would necessarily be true if using multivariable GLMs. The results of the current study should not be extrapolated to analyses in which GLMs are adjusted for a set of baseline covariates in addition to the binary treatment variable. Another limitation is that we only used parametric regression models to estimate the propensity score from which the weights were derived. Further research is required to examine whether our conclusions persist when machine learning algorithms are used to estimate the propensity score.

We compared the performance of variance estimators for two target estimands (risk differences and relative risks) and four sets of weights (ATE weights, ATT weights, OW, and MW). Our intent was not that results be compared between different sets of weights. In general, in applied research, analysts should select the set of weights that best addresses the study question. The different sets of weights should not be seen as interchangeable.⁷ The results of the current study should be used to guide the analyst in the choice of variance estimation method once a specific set of weights has been selected.

In the current paper we restricted our attention to weighting-based methods for estimating risk differences and relative risks. Regression-based methods, also known as G-computation, can be used to estimate risk differences and relative risks.^38–40 Cheung described a modified least-squares regression approach to estimating the risk difference.⁴¹ The current study restricted its focus on the use of propensity score weighting to estimate risk differences and relative risks.

In summary, both a direct weighting approach and MSMs based on weighted univariate GLMs resulted in the identical estimates of risk differences and relative risks. When sample sizes were small to moderate, the use of an MSM with a bootstrap variance estimator tended to result in the most accurate estimates of standard errors. When sample sizes were large, the direct weighting approach and an MSM with a bootstrap variance estimator tended to produce estimates of standard error with similar accuracy. Finally, when using a MSM to estimate risk differences and relative risks, it is, in general, preferable to use a bootstrap variance estimator than the robust variance estimator.

Supplemental Material

sj-pdf-1-smm-10.1177_09622802241247742 - Supplemental material for The performance of marginal structural models for estimating risk differences and relative risks using weighted univariate generalized linear models

Supplemental material, sj-pdf-1-smm-10.1177_09622802241247742 for The performance of marginal structural models for estimating risk differences and relative risks using weighted univariate generalized linear models by Peter C Austin in Statistical Methods in Medical Research

Footnotes

Acknowledgements

This study was supported by ICES, which is funded by an annual grant from the Ontario Ministry of Health (MOH) and the Ministry of Long-Term Care (MLTC). This study also received funding from: the Canadian Institutes of Health Research (CIHR) (PJT 183902). This document used data adapted from the Statistics Canada Postal Code^OM Conversion File, which is based on data licensed from Canada Post Corporation, and/or data adapted from the Ontario Ministry of Health Postal Code Conversion File, which contains data copied under license from ©Canada Post Corporation and Statistics Canada. Parts of this material are based on data and/or information compiled and provided by: CIHI and the Ontario Ministry of Health. The analyses, conclusions, opinions and statements expressed herein are solely those of the authors and do not reflect those of the funding or data sources; no endorsement is intended or should be inferred. As a prescribed entity under Ontario's privacy legislation, ICES is authorized to collect and use health care data for the purposes of health system analysis, evaluation and decision support. Secure access to these data is governed by policies and procedures that are approved by the Information and Privacy Commissioner of Ontario. The dataset from this study is held securely in coded form at ICES. While legal data sharing agreements between ICES and data providers (e.g. healthcare organizations and government) prohibit ICES from making the dataset publicly available, access may be granted to those who meet pre-specified criteria for confidential access, available at (email: das@ices.on.ca). The use of data in this project was authorized under section 45 of Ontario's Personal Health Information Protection Act, which does not require review by a research ethics board. The opinions, results and conclusions reported in this paper are those of the authors and are independent from the funding sources.

Declaration of conflicting interests

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Canadian Institutes of Health Research (CIHR) (grant number PJT 183902).

ORCID iD

Peter C Austin

Supplemental material

Supplemental material for this article is available online.

References

Rosenbaum

Rubin

. The central role of the propensity score in observational studies for causal effects. Biometrika 1983; 70: 41–55.

Rosenbaum

. Model-based direct adjustment. J Am Stat Assoc 1987; 82: 387–394.

Morgan

Winship

. Counterfactuals and causal inference: methods and principles for social research. New York, NY: Cambridge University Press, 2007.

Greene

. A weighting analogue to pair matching in propensity score analysis. Int J Biostat 2013; 9: 215–234.

Morgan

Zaslavsky

. Balancing covariates via propensity score weighting. J Am Stat Assoc 2018; 113: 390–400.

Zhou

Matsouaka

Thomas

. Propensity score weighting under limited overlap and model misspecification. Stat Methods Med Res 2020; 29: 3721–3756.

Austin

. Differences in target estimands between different propensity score-based weights. Pharmacoepidemiol Drug Saf 2023; 32: 1103–1112.

Austin

Manca

Zwarenstein

, et al. A substantial and confusing variation exists in handling of baseline covariates in randomized controlled trials: a review of trials published in leading medical journals. J Clin Epidemiol 2010; 63: 142–153.

Laupacis

Sackett

Roberts

. An assessment of clinically useful measures of the consequences of treatment. N Engl J Med 1988; 318: 1728–1733.

10.

Cook

Sackett

. The number needed to treat: a clinically useful measure of treatment effect. Br Med J 1995; 310: 452–454.

11.

Sinclair

Bracken

. Clinically useful measures of effect in binary analyses of randomized trials. J Clin Epidemiol 1994; 47: 881–889.

12.

Sackett

. Down with odds ratios!. Evid Based Med 1996; 1: 164–166.

13.

Schulz

Altman

Moher

. Consort 2010 statement: updated guidelines for reporting parallel group randomised trials. Br Med J 2010; 340: c332.

14.

http://www.bmj.com/about-bmj/resources-authors/article-types (2018).

15.

Austin

Laupacis

. A tutorial on methods to estimating clinically and policy-meaningful measures of treatment effects in prospective observational studies: a review. Int J Biostat 2011; 7: 1–32.

16.

Lunceford

Davidian

. Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Stat Med 2004; 23: 2937–2960.

17.

Zhou

Tong

, et al. PSweight: an R package for propensity score weighting analysis. In. arXiv:2010.08893v4 [stat.ME] 2021.

18.

Austin

. Bootstrap versus asymptotic variance estimation when using propensity score weighting with continuous and binary outcomes. Stat Med 2022; 41: 4426–4443.

19.

Joffe

Ten Have

Feldman

, et al. Model selection, confounder control, and marginal structural models: review and new applications. Am Stat 2004; 58: 272–279.

20.

Robins

Hernan

Brumback

. Marginal structural models and causal inference in epidemiology. Epidemiology 2000; 11: 550–560.

21.

Zou

. A modified poisson regression approach to prospective studies with binary data. Am J Epidemiol 2004; 159: 702–706.

22.

Talbot

Mesidor

Chiu

, et al. An alternative perspective on the robust poisson method for estimating risk or prevalence ratios. Epidemiology 2023; 34: 1–7.

23.

Austin

. The iterative bisection procedure: a useful tool for determining parameter values in data-generating processes in Monte Carlo simulations. BMC Med Res Methodol 2023; 23: 45.

24.

Morris

White

Crowther

. Using simulation studies to evaluate statistical methods. Stat Med 2019; 38: 2074–2102.

25.

Rucker

Schwarzer

. Presenting simulation results in a nested loop plot. BMC Med Res Methodol 2014; 14: 129.

26.

Austin

Stuart

. Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Stat Med 2015; 34: 3661–3679.

27.

Donovan

Lee

, et al. Effectiveness of public report cards for improving the quality of cardiac care: the EFFECT study: a randomized trial. J Am Med Assoc 2009; 302: 2330–2337.

28.

Austin

. The performance of different propensity score methods for estimating marginal hazard ratios. Stat Med 2013; 32: 2837–2849.

29.

Austin

. Variance estimation when using inverse probability of treatment weighting (IPTW) with survival analysis. Stat Med 2016; 35: 5642–5655.

30.

Diop

Duchesne

Cumming

, et al. Confounding adjustment methods for multi-level treatment comparisons under lack of positivity and unknown model specification. J Appl Stat 2022; 49: 2570–2592.

31.

Reifeis

Hudgens

. On variance of the treatment effect in the treated when estimated by inverse probability weighting. Am J Epidemiol 2022; 191: 1092–1097.

32.

Austin

. The performance of different propensity-score methods for estimating relative risks. J Clin Epidemiol 2008; 61: 537–545.

33.

Austin

. The performance of different propensity-score methods for estimating differences in proportions (risk differences or absolute risk reductions) in observational studies. Stat Med 2010; 29: 2137–2148.

34.

Ukoumunne

Williamson

Forbes

, et al. Confounder-adjusted estimates of the risk difference using propensity score-based weighting. Stat Med 2010; 29: 3126–3136.

35.

Austin

. Comparing paired vs non-paired statistical methods of analyses when making inferences about absolute risk reductions in propensity-score matched samples. Stat Med 2011; 30: 1292–1301.

36.

Austin

Stuart

. Estimating the effect of treatment on binary outcomes using full matching on the propensity score. Stat Methods Med Res 2017; 26: 2505–2525.

37.

Austin

. Double propensity-score adjustment: a solution to design bias or bias due to incomplete matching. Stat Methods Med Res 2017; 26: 201–222.

38.

Austin

. Absolute risk reductions, relative risks, relative risk reductions, and numbers needed to treat can be obtained from a logistic regression model. J Clin Epidemiol 2010; 63: 2–6.

39.

Snowden

Rose

Mortimer

. Implementation of G-computation on a simulated data set: demonstration of a causal inference technique. Am J Epidemiol 2011; 173: 731–738.

40.

Localio

Margolis

Berlin

. Relative risks and confidence intervals were easily computed indirectly from multivariable logistic regression. J Clin Epidemiol 2007; 60: 874–882.

41.

Cheung

. A modified least-squares regression approach to the estimation of risk difference. Am J Epidemiol 2007; 166: 1337–1344.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.04 MB

		Risk difference				Relative risk
Prevalence of treatment}	$α_{0, treat}$	ATE	ATT	OW	MW	ATE	ATT	OW	MW
0.1	−2.68559	−0.02	−0.029	−0.026	−0.027	0.893	0.905	0.908	0.91
0.2	−1.69434	−0.02	−0.026	−0.024	−0.025	0.893	0.907	0.904	0.905
0.3	−1.00098	−0.02	−0.026	−0.024	−0.024	0.893	0.903	0.896	0.897
0.4	−0.41748	−0.02	−0.025	−0.023	−0.023	0.893	0.901	0.893	0.892
0.5	0.127611	−0.02	−0.025	−0.021	−0.021	0.893	0.897	0.892	0.891
0.6	0.668945	−0.02	−0.024	−0.021	−0.021	0.893	0.897	0.886	0.883
0.7	1.253323	−0.02	−0.023	−0.019	−0.019	0.893	0.899	0.884	0.884
0.8	1.946411	−0.02	−0.023	−0.018	−0.017	0.893	0.895	0.882	0.88
0.9	2.929688	−0.02	−0.022	−0.014	−0.013	0.893	0.895	0.892	0.893