Sage Journals: Discover world-class research

Abstract

In this article, we introduce the evalue package, which performs sensitivity analyses for unmeasured confounding in observational studies using the methodology proposed by VanderWeele and Ding (2017, Annals of Internal Medicine 167: 268–274). evalue reports E-values, defined as the minimum strength of association on the risk-ratio scale that an unmeasured confounder would need to have with both the treatment assignment and the outcome to fully explain away a specific treatment-outcome association, conditional on the measured covariates. evalue computes E-values for point estimates (and optionally, confidence limits) for several common outcome types, including risk and rate ratios, odds ratios with common or rare outcomes, hazard ratios with common or rare outcomes, standardized mean differences in outcomes, and risk differences.

Keywords

st0593 evalue E-value sensitivity analysis treatment effects causality confounding

1 Introduction

A fundamental concern when conducting evaluations using observational data is that unmeasured confounding—one or more additional factors that cause both the treatment assignment and the outcome—might be mistaken for a treatment effect. For this reason, researchers endeavor to adjust for all variables considered to influence these associations when performing analyses. However, in observational research, it is unlikely that data for all potential confounding variables will be available. Thus, one should conduct a postestimation sensitivity analysis to assess how strong a relationship would have to be between an unmeasured confounder and the treatment assignment, as well as between the unmeasured confounder and the outcome, to explain away an observed treatment effect.

Several sensitivity analyses have been developed for different statistical models (see, for example, Cornfield et al. [1959]; Rosenbaum and Rubin [1983]; Manski [1990]; Lin, Psaty, and Kronmal [1998]; Rosenbaum [2002, 2010]; Brumback et al. [2004]; Vander-Weele and Arah [2011]; Imbens [2003]; Imbens and Rubin [2015]; Ding and VanderWeele [2016]; and VanderWeele and Ding [2017]). Four community-contributed packages are currently available for conducting sensitivity analysis in Stata: rbounds (Gangl 2004), mhbounds (Becker and Caliendo 2007), sensatt (Nannicini 2007), and episens (Orsini et al. 2008). The first three commands are designed for use with matching estimators based on the approaches developed by Rosenbaum and Rubin (1983) and Rosenbaum (2002), and the fourth uses the methods described by Greenland (1996) for assessing sensitivity in epidemiology (2 × 2) tables.

In this article, we introduce the evalue package, which performs sensitivity analyses for unmeasured confounding in observational studies using the methodology proposed by VanderWeele and Ding (2017). evalue reports the E-value, which is defined as the minimum strength of association, on the risk-ratio (RR) scale that an unmeasured confounder would need to have with both the treatment assignment and the outcome, conditional on the measured covariates, to explain away a treatment-outcome association. In contrast with most other sensitivity analysis approaches that focus on whether confounding of a specified strength would suffice to explain away an effect estimate, the E-value focuses on the magnitude of the confounder associations that could produce confounding bias equal to the observed treatment-outcome association. The E-value approach and formulas are applicable for multiple confounders. The magnitude of the confounding associations is then interpreted as the maximum RRs that could be produced comparing any two values of the whole set of unmeasured confounders (conditional on the measured covariates). See VanderWeele, Ding, and Mathur (2019) for further discussion and examples. The investigator does not choose the confounding variables (or specify their confounding associations) but merely reports how strongly an unmeasured confounder must be related to the treatment assignment and outcome to explain away an effect estimate; readers or other researchers may then assess whether the confounder associations of that magnitude are plausible.

2 Methods

The E-value is computed on the RR scale, so results of statistical models other than the RR must be converted to the RR scale. In this section, we present the methods involved in computing the E-value for various model types.

2.1 E-value for RR and rate ratio

The basic formula for computing an E-value for any outcome type on the RR scale (and its confidence limit closest to the null) is as follows (VanderWeele and Ding 2017):¹

If RR > 1:

E-value (point estimate) $= RR + \sqrt{RR \times (RR - 1)}$

E-value (lower limit [LL]) = 1 if LL ≤ 1, else $LL + \sqrt{LL \times (LL - 1)}$

If RR < 1:

E-value (point estimate) $= 1 / RR + \sqrt{1 / RR \times (1 / RR - 1)}$

E-value (upper limit [UL]) = 1 if UL ≥ 1, else $1 / UL + \sqrt{1 / UL \times (1 / UL - 1)}$

2.2 E-value for odds ratio

When the outcome is relatively rare (for example, < 15% prevalence by the end of follow-up), the odds ratio (OR) approximates the RR, so the basic E-value formula (in section 2.1) should be used. In a case–control study, the outcome needs to be rare only in the underlying population, not in the study sample (the same considerations hold when the outcome prevalence is instead approximately > 85% by the end of follow-up because the variable coding can simply be reversed). When the outcome is not rare (between 15% and 85% prevalence at the end of follow-up), an approximate E-value may be obtained by replacing the RR with the square root of the OR (VanderWeele 2017); that is, $RR \approx \sqrt{OR}$ in the E-value formula presented in section 2.1. Note that when the outcome is rare, the $\sqrt{OR}$ transformation provides a poor approximation, so the calculations under the “rare” outcome assumption should be used. However, when the probability of the outcome is between 15% and 85%, the $\sqrt{OR}$ approximation works quite well (Ding and VanderWeele 2016).

2.3 E-value for hazard ratio

When the outcome is relatively rare as described above, the basic E-value formula (in section 2.1) should be used. When the outcome is common, an approximate E-value may be obtained (VanderWeele 2017) by applying the approximation $RR \approx (1 - 0. 5^{\sqrt{HR}}) / (1 - 0. 5^{\sqrt{1 / HR}})$ in the E-value formula in section 2.1.

2.4 E-value for standardized mean difference

With standardized effect sizes d (mean of the outcome variable divided by the pooled standard deviation [SD] of the outcome) and a standard error for this standardized effect size SD, an approximate E-value may be obtained (Lipsey and Wilson 2001; Vander-Weele 2017; Linden 2019) by applying the approximation RR ≈ e ^[0 ^. ⁹¹ ^× ^d ^] in the E-value formula. Similarly, an approximate confidence interval (CI) for the RR may be obtained by using the approximation (e ^[0 ^. ⁹¹ ^× ^d ⁻ ¹ ^. ⁷⁸ ^× ^SD], e ^[0 ^. ⁹¹ ^× ^d ⁺¹ ^. ⁷⁸ ^× ^SD]). This approach relies on additional assumptions and approximations. Other sensitivity analysis techniques have been developed for this setting (Lin, Psaty, and Kronmal 1998; Imbens 2003; VanderWeele and Arah 2011), but they generally require additional assumptions, and the variables do not necessarily have a corresponding E-value.

2.5 E-value for risk difference

If the adjusted risks for the treated and untreated are p ₁ and p ₀, then the E-value may be obtained by replacing the RR with p ₁ /p ₀ in the E-value formula. The E-value for the CI on a risk-difference (RD) scale is complex, requiring the computation of several measures and then the use of a grid search to find the corresponding bias factor that, when transformed to the RR scale, will elicit the E-value of the lower confidence limit (see Ding and VanderWeele [2016] for a comprehensive discussion). Alternatively, if the outcome probabilities p ₁ and p ₀ are not small or large (for example, if they are between 0.20 and 0.80), then the approximate approach for differences in continuous outcomes given in section 2.4 may be used. Other sensitivity analysis techniques have been developed for this setting (Lin, Psaty, and Kronmal 1998; Imbens 2003; VanderWeele and Arah 2011) but generally require additional assumptions and do not provide a corresponding E-value.

2.6 E-values for nonnull hypotheses

Thus far, we have described how to calculate E-values to assess the minimum strength of the association an unmeasured confounder would need to have with both the treatment assignment and the outcome to move the point estimate, or one limit of the CI, to the null. However, a similar procedure can be used to assess the minimum magnitude of both confounder associations that would be needed to move an estimate to some other value of the RR. If we have an observed RR of RR and want to assess the minimum strength of both associations that would be needed to shift the estimate to some other value RR ^T , then we first take the ratio of the two values, RR/RR ^T , and then apply the E-value formula presented in section 2.1 to this ratio. We encourage investigators to read the original article introducing the E-value (VanderWeele and Ding 2017) to aid in understanding and interpretation prior to using the package.

3 The evalue package

This section describes the syntax of the commands in the evalue package for various model types.