Sage Journals: Discover world-class research

Abstract

Randomized experiments remain the “gold standard” for establishing causality, yet ethical and practical constraints in certain fields often require researchers to rely on observational data. Although psychologists recognize that correlation does not imply causality, the conventional cautionary statements regarding correlation typically found at the end of articles have not sufficiently advanced psychological science, particularly in subfields, such as developmental and personality psychology, that predominantly rely on observational data. Sensitivity analyses commonly used in biostatistics and epidemiology offer powerful tools to quantify the risk of unmeasured confounding in observational data analysis, essentially encouraging applied researchers to assess how strongly an unmeasured confounder must be associated with both the predictor and outcome to negate an observed predictor-outcome association (i.e., reduce the effect to null). In this tutorial, we explore the frequently overlooked but critical issue of unmeasured confounding in psychological research and introduce psychologists to the E-value, a novel and straightforward method for assessing the robustness of exposure-outcome associations to unmeasured confounding. We demonstrate the application of E-value using common psychological-research scenarios in R and discuss its strengths, limitations, and recommended best practices. Psychologists can more accurately assess and transparently report research findings, particularly in subfields relying primarily on observational data, by more explicitly considering unmeasured confounding and incorporating sensitivity-analysis techniques such as the E-value into their methodological tool kits.

Keywords

E-value sensitivity analysis causal inference unmeasured confounder observational study open materials

Randomized experiments remain the “gold standard” for establishing causality (Hariton & Locascio, 2018). However, circumstances sometimes render experiments unethical or unfeasible, leaving researchers to rely on observational data. Indeed, almost entire bodies of literature in psychology (e.g., parental warmth’s “effect” on youth adjustment) have long depended on nonexperimental data, (mis)guiding clinical practices even when evidence is weak and inconclusive because of lack of rigorous selection of confounders and considerations for the threat of (unmeasured) confounders (Davis et al., 2024).

Psychologists widely recognize that correlations identified in observational studies do not imply causality. This acknowledgment stems from the inherent susceptibility of observational studies to confounders—factors that, whether measured or unmeasured, causally influence both the treatment/exposure and the outcome, which can lead to spurious associations (Haneuse et al., 2019). Nevertheless, along with statistical-adjustment practices among applied researchers that are often deemed too flexible (Rohrer & Lucas, 2020; VanderWeele et al., 2020), the ubiquitous statement that “correlation does not infer causality” as a brief disclaimer in the limitations section of nonexperimental psychological studies has not effectively served the research community. This limitation impedes the advancement of cumulative psychological science, which ultimately relies on causal evidence (Brady et al., 2023; Grosz, 2023). In the causal-inference literature, the lack of randomization in observational studies often violates a central assumption termed the “exchangeability assumption” (i.e., no unmeasured confounding)—the idea that the treatment and control groups are comparable in all important ways except for the treatment/exposure (for brief explanations of key concepts in causal inference, see Table 1), making it challenging to establish valid counterfactual comparisons between treated and untreated groups. In the absence of random assignment, differences between these groups may be driven by confounding variables that influence both the exposure and the outcome rather than the exposure alone. Although methods that improve the plausibility of this assumption exist (e.g., regression control, propensity score matching [PSM]), the exchangeability assumption remains untestable. In practice, measuring and adjusting for all potential confounders is excruciatingly difficult—if not impossible. When observational data are used and rigorous statistical adjustments are made, it is advisable to conduct sensitivity analysis, such as the E-value technique, to quantify how much residual unmeasured confounding is needed to change the substantive conclusion of a given observational study’s finding (e.g., bringing the point estimate to null).

Table 1.

A Brief Primer of Key Concepts in the Causal-Inference Literature

Concept	Explanation	Example
Treatment/exposure	The intervention or variable that is being studied to see its effect on an outcome.	If you are studying the effect of social support on stress levels, social support is the “treatment” or “exposure.”
Confounders	Variables that have a causal effect on both the treatment/exposure and the outcome. They can distort the true effect.	If you are studying the effect of social support (exposure) on stress reduction (outcome), personality traits might be a confounder because they can influence both the level of social support a person receives and the person’s stress levels.
Exchangeability	The idea that the treatment and control groups are similar in all important ways except for the treatment/exposure status.	In a well-conducted observational study, two groups (one with high social support and one with low social support) should be exchangeable if they are similar in aspects (e.g., age, socioeconomic status) except for the level of social support they receive.
Consistency	The assumption that potential outcomes can actually be linked to observed outcomes. This implies that the treatment should be well defined, and there should be only one version of the treatment.	This assumption is violated, for example, when different people receive different kinds of social support (e.g., emotional support, financial help, or advice) but they are all lumped together as “social support.”
Noninterference	The noninterference assumption posits that the outcome for any unit does not depend on the treatment assigned to other units (i.e., no spillover effect).	When studying the effect of social support on stress, noninterference means that an individual’s level of stress is solely influenced by the individual’s own level of social support and not by the social support that other people (friends, family, etc.) receive. For instance, if someone’s friend receives a lot of support, it should not affect the individual’s stress level under the noninterference assumption.
Positivity	The assumption that all levels of treatment/exposure are possible for all values of the covariates in the study population. This means there is a nonzero probability of receiving each treatment level for all participants.	In a study examining social support and stress, positivity requires that each participant, regardless of factors such as age or socioeconomic status, has a nonzero probability of receiving different levels of social support, ranging from low to high.

When guided by the causal-inference framework, researchers typically follow three key steps to approach causality. (a) Define the estimand: The first step is to define the causal quantity of interest, known as the “causal estimand.” This could be the average treatment effect (ATE) or the average treatment effect among the treated (ATT), depending on the study design and method used, which may estimate effects for different subpopulations. (b) Identify the causal estimand: Identifying the estimand refers to the process of determining whether the causal effect can be estimated from the data. This requires making assumptions such as exchangeability (i.e., treated and untreated groups are comparable after adjustment) and positivity (i.e., there is a positive probability of receiving each treatment level). (c) Estimate the effect: Once the causal estimand is identified, various methods can be used to estimate it (i.e., translating the causal estimand into a statistical estimand). In observational data, this often involves adjusting for confounders using methods such as regression, PSM, or instrumental variables (IVs; Rubin, 2005; for introductory texts for psychologists, see Grosz et al., 2024; Rohrer, 2018).

Recent methodological advancements in causal inference (Rubin, 2005; for introductory texts for psychologists, see Grosz et al., 2024; Rohrer, 2018) that are commonly applied in biostatistics and econometrics have provided researchers analyzing observational data with more effective tools to address the threat of confounders,¹ allowing them to better meet arguably the most crucial assumption—the exchangeability assumption—during the identification phase of causal inference so that the causal estimand of interest is identifiable (Angrist et al., 1996; Caliendo & Kopeinig, 2008; Emdin et al., 2017). For example, matching methods, such as PSM, that, unlike the traditional analysis, clearly separate the design and analysis phases and avoid obtaining an adjusted effect through extrapolation (Chan et al., 2022; Thoemmes & Ong, 2016) allow researchers to minimize the measured differences between a treatment group and a control group beyond the treatment itself. As a result, they enhance the validity of comparisons, ensuring they are more akin to comparing “apples with apples” (i.e., exchangeability) rather than “apples with oranges” (Caliendo & Kopeinig, 2008; Iacus et al., 2012). As another example, IV estimation² (Angrist et al., 1996; for an accessible introduction for psychologists, see Grosz et al., 2024) enables researchers to mitigate even unmeasured confounding, provided certain stringent assumptions are met (Lal et al., 2024). Even in the field of psychology, methodologists focusing on within-units analyses have started to explicitly situate these methods within the potential-outcome framework, a principled approach in the causal-inference literature (e.g., Usami, 2023; Usami et al., 2019). For example, models such as the random-intercept cross-lagged panel model (RI-CLPM), which account for stable time-insensitive trait variances (Hamaker et al., 2015), are increasingly used in psychology to control for unmeasured time-invariant confounders (e.g., Bi et al., 2024). Despite methodological advances both within and outside psychological science, psychologists, however, rarely explicitly discuss whether the causal-identification assumption of exchangeability is likely met when reporting their findings and the potential threats of unmeasured confounding (Grosz et al., 2020). Indeed, a notable gap exists in addressing confounding even in epidemiology, a field that traditionally emphasizes causal interpretations from observational data. For example, a recent study revealed that of 69 sampled articles in a prominent epidemiological journal, 75.4% (52/69) omitted comments on potential unmeasured confounding (Blum et al., 2020).

Sensitivity analysis, commonly applied in biostatistics and epidemiology, offers psychologists working with observational data a valuable tool to critically evaluate the potential threat of unmeasured confounding (i.e., violation of the exchangeability assumption). In essence, sensitivity analysis prompts the researcher to ask a fundamental but often overlooked question: How strongly would the unmeasured confounder(s) need to be associated with both the exposure variable (i.e., independent variable) and the outcome variable to reduce the observed exposure-outcome association to zero (e.g., an odds ratio [OR] of 1.00) or render it statistically nonsignificant? Although sensitivity analysis has a long history in the psychological literature and in medical and social sciences more broadly (Cornfield et al., 1959), with significant contributions by Paul Rosenbaum (Rosenbaum & Rubin, 1983; see also G. Hong, 2004), many of the existing approaches are criticized for relying on simplified but untestable assumptions about the unmeasured confounder(s) (Ding & VanderWeele, 2016). For example, common assumptions include that the unmeasured confounder is binary, that there are no interactions between the effects of exposure and the confounder on the outcome, or that only one unmeasured confounder exists (Rosenbaum & Rubin, 1983; Schlesselman, 1978).

VanderWeele and Ding (2017) proposed an intuitive and straightforward sensitivity analysis technique without these assumptions, termed the “E-value,” where “E” stands for evidence. The E-value represents the minimum strength of association an unmeasured confounder³ would need to have with both the treatment and the outcome after accounting for measured confounders to fully explain the observed treatment-outcome relationship (for a conceptual model with more technical explanations, see Box 1). This metric is expressed on the risk-ratio (RR) scale⁴ but can also be calculated from ORs, hazard ratios, and standardized mean differences (SMDs; Haneuse et al., 2019). For readers interested in the technical aspects of the E-value, we refer them to Ding and VanderWeele (2016) and VanderWeele et al. (2019). Compared with other sensitivity-analysis techniques, the E-value is more intuitive (i.e., the lowest possible E-value is an RR of 1.00), easier to implement, and less subjective and requires fewer assumptions (Ding & VanderWeele, 2016; Haneuse et al., 2019; Trinquart et al., 2019). Note also that the E-value does not tell one whether unmeasured confounding is actually present—it simply quantifies the strength of confounding needed to fully explain an observed association.

Box 1.

Technical Aspects of the E-Value

Let RR_UD be the maximum risk ratio for the outcome when comparing any two categories of the unmeasured confounder (U) in either the exposed or unexposed group and let RR_EU be the maximum risk ratio for any specific level of the unmeasured confounder comparing individuals with and without treatment (i.e., the extent of confounder imbalance across treatment conditions). For example, RR_EU = 2.00 if the exposed group has a 40% prevalence of an unmeasured binary confounder and the unexposed group has a 20% prevalence (VanderWeele & Ding, 2017). To reduce RR_ED (the observed risk ratio for the exposure-outcome association) down to the null, RR_EU and RR_UD must be at least as large as the E-value (i.e., the minimum required joint strength for both) when RR_EU equals RR_UD. Ding and VanderWeele (2016) showed that adjusting RR_ED down to 1.00 (i.e., no effect) requires the magnitude of the confounder associations to produce confounding bias equal to RR_ED. That is, RR_ED divided by the bias factor must equal 1.00. This bias factor is determined by the equation RR_EU × RR_UD / (RR_EU + RR_UD – 1). Thus, the E-value formula provides the positive solution for the bias factor when RR_EU = RR_UD. Note that because the maximum risk ratios were considered, the resultant E-value was considered a worst case scenario. This means that it is conservative, and any given unmeasured confounder(s) may or may not generate this amount of bias—for example, a confounder with low prevalence may not cause this amount of bias.

In other words, the E-value essentially represents the minimum association strength needed for both the exposure-confounder and confounder-outcome relationships to (potentially) adjust RR_ED down to null, conditional on measured confounders.

Although the E-value is increasingly used in epidemiology (Haneuse et al., 2019), it remains relatively unfamiliar to most psychologists. Therefore, in this tutorial, we introduce psychologists to the E-value, a novel sensitivity-analysis technique that can help assess the robustness of treatment-to-outcome associations against unmeasured confounding. Furthermore, our tutorial responds to recent calls from methodologists advocating for practices that strengthen causal inference in psychological science (e.g., Davis et al., 2024; Grosz et al., 2020; Hamaker et al., 2015; Rohrer, 2018; Rohrer & Murayama, 2023). We contend that psychologists, many of whom (at least sometimes) rely on observational data, can benefit from the appropriate and contextualized use of the E-value.

In this tutorial, we first demonstrate how to calculate E-values across various scenarios and parameter estimates, including RR, OR, SMDs, and linear-regression coefficient. Second, using the E-value package (Mathur et al., 2018) in R (R Core Team, 2024), we illustrate the application of E-values in three common research scenarios encountered by psychologists: Scenario 1 involves a continuous exposure and a binary outcome, Scenario 2 addresses a continuous exposure and a continuous outcome, and Scenario 3 focuses on a binary exposure and a binary outcome (for a step-by-step pipeline for calculating and interpreting E-values, see Box 2). Finally, we end with discussions on the E-value’s applicability in more complicated research settings and strengths and limitations and offer recommendations for best practices in its implementation.

Box 2.

Study Pipeline for Reporting E-Values of a Given Study

Step 1: Define the causal question and identify the estimand

Specify the causal effect of interest (e.g., exposure on outcome), clearly identifying the estimand (e.g., average treatment effect) and key assumptions (e.g., exchangeability, consistency, positivity).

Step 2: Estimate the causal effect in an observational model

Adjust for measured confounders using appropriate statistical methods, such as regression, inverse probability weighting, or propensity score matching.

Step 3: Calculate and report the E-value

Calculate E-values using appropriate formulas to assess robustness to unmeasured confounding. This can be implemented using the EValue package in R. Report two E-values: (a) the E-value for the point estimate and (b) the E-value for the confidence interval.

Step 4: Contextualize the E-value in reporting

Discuss if rigorous statistical control is achieved, report/tabulate the associations between the measured covariates and exposure/outcome, and interpret whether realistic magnitude of unmeasured confounding is possible to explain away and/or change the statistical significance of the observed exposure-outcome association and whether the E-value is small or large given the context.

Step 5: Provide cautions for small E-values

Although the absence of evidence for causality is not the same as evidence of no causal effect, if E-values suggest vulnerability to unmeasured confounding, discuss how this affects the study’s conclusions and be transparent about how unmeasured confounding might have influenced the findings.

Definition and Implementation

As previously outlined, the E-value is calculated on the RR scale and measures the magnitude of association an unmeasured confounder would need to have with both the treatment and outcome to negate the observed treatment-outcome association in a nonexperimental study. In general, higher E-values indicate more robust treatment-outcome associations because unmeasured confounders would require correspondingly large RRs to negate the findings. In the sections that follow, we present the calculations of the E-value for four parameter estimates commonly of interest to psychologists, including RR, OR, SMD (Cohen’s d), and linear-regression coefficients. To facilitate understanding, we have provided numerical examples of how E-values can be calculated under different circumstances in Table 2.

Table 2.

Numerical E-Value Calculations Examples Under Different Scenarios

Outcome type	Point estimate	Transformation to RR	E-value
RR	RR = 2.00	—	2.00 + sqrt(2.00 × (2.00 – 1.00)) = 3.41
OR, rare outcome	OR = 3.00	3.00 (RR ≈ OR)	3.00 + sqrt(3.00 × (3.00 – 1.00)) = 5.45
OR, common outcome	OR = 3.00	1.73 (RR ≈ sqrt(OR))	1.73 + sqrt(1.73 × (1.73 – 1.00)) = 2.85
SMD	SMD = 0.80	2.07 (RR ≈ exp(SMD × 0.91))	2.07 + sqrt(2.07 × (2.07 – 1.00)) = 3.56
Linear-regression coefficient	β = 0.15, residual SE = 0.90	1.16 (RR ≈ exp(β/residual SE × 0.91))	1.16 + sqrt(1.16 × (1.16 – 1.00)) = 1.59

Note: RR = risk ratio; OR = odds ratio; SMD = standardized mean difference.

RR

Because the E-value is on the RR scale, we first demonstrate how to compute the E-value from RRs. For any given parameter, it is recommended that researchers report two E-values: one for the point estimate and another for the limit of the 95% confidence interval (CI) that is closer to the null (i.e., the strength of unmeasured confounding needed to make the association statistically nonsignificant). The formula for 95% CI for RR is (e^{ln(RR)–1.96×SE}, e^{ln(RR)+1.96×SE}). The E-value for an observed outcome based on the RR is derived from the following equation⁵:

E - v a l u e = R R + \sqrt{R R \times (R R - 1)} .

(1)

OR

To compute E-values from OR estimates, researchers must first assess the prevalence of the outcome at follow-up. Prevalence rates play a crucial role in how E-values are computed for ORs because the relationship between ORs and RRs changes depending on the prevalence of the outcome (VanderWeele, 2017). When the outcome is rare, the OR and RR are nearly equivalent, making it straightforward to use the OR in E-value calculations without adjustments. However, when the outcome is common, the OR tends to overestimate the RR, necessitating a transformation of the OR (e.g., a square root approximation; VanderWeele, 2017) to better approximate the RR before applying the E-value formula. If the prevalence of the outcome is relatively rare (< 15%),⁶ Equation 1 can then be directly applied (Ding & VanderWeele, 2016). For outcomes that are common (> 15%), the E-value can be approximated by replacing the RR with the square root of the OR (i.e., RR ≈ $\sqrt{O R}$ ) in Equation 1 (VanderWeele, 2017).

Differences in continuous outcomes

Calculating the E-value from differences in continuous outcomes, such as SMDs, defined as the mean difference of the outcome divided by its pooled standard deviation (i.e., Cohen’s d), is slightly more complex. The RR can be approximated from Cohen’s d subject to distributional assumptions using Equation 2 (Chinn, 2000; Mathur et al., 2018):⁷

R R \approx \exp (0.91 \times d) .

(2)

The RR can then be used to calculate the E-value using Equation 1. For instance, with d = 0.32, the corresponding approximate risk ratio is 1.34, and the E-value is 2.01. An E-value of 2.01 suggests that the unmeasured confounder would need to be twice as prevalent among the exposed than among the unexposed (i.e., the exposure groups are highly imbalanced in the unmeasured confounder; VanderWeele & Ding, 2017) in addition to doubling the probability of being “high” versus “low” on the outcome following a hypothetical dichotomization of the continuous outcome conditional on covariates already adjusted for. Because effects of this magnitude (i.e., RRs ≥ 2 or 3) are relatively uncommon in medical and social sciences when the estimate is already conditional on important measured confounders, in many situations, an E-value of 2 would suggest relative robustness of findings, especially when the observed exposure-to-treatment effect is already conditional on a rigorous selection of measured known confounders (VanderWeele & Ding, 2017). To approximate the 95% CI for the corresponding RR, the standard error⁸ of d is inserted into the following formula⁹: (exp{0.91 × d – 1.78 × SE_d}, exp{0.91 × d + 1.78 × SE_d}).

Linear-regression coefficient

Because linear-regression coefficients, both standardized and unstandardized, are among the most widely estimated parameters in psychological science, in this tutorial, we focus on detailing the calculation and interpretation of E-values derived from these coefficients.

In the context of a categorical exposure (e.g., no exposure vs. exposure, lowest quartile vs. highest quartile), linear-regression coefficients quantify the adjusted mean difference(s) between the reference group and comparison group(s), controlling for covariates. These coefficients represent the unique effects on the outcome above and beyond covariates. To convert a linear-regression coefficient into the SMD measure d, divide the coefficient by the residual standard error from the model, which reflects the variability in the outcome that is not explained by the exposure and covariates (Linden et al., 2020). Subsequently, the RR can be approximated from d using Equation 2 to compute the E-value using Equation 1.

When calculating the E-value for a continuous exposure, an additional parameter called “delta” is required. Delta defines a dichotomization of the exposure variable between a hypothetical group of participants with an exposure value equal to an arbitrary value c versus another hypothetical group with an exposure value equal to c + delta (Mathur et al., 2018). Delta, therefore, represents the contrast in the exposure variable, typically defined as a 1-unit change. For example, this could mean a 1-SD increase in the exposure variable. The process involves converting the linear-regression coefficient, which reflects the effect of the continuous exposure on the outcome, into an SMD (Cohen’s d). This conversion is achieved by dividing the regression coefficient by the residual standard error from the regression model. For example, if a 1-unit increase in the exposure results in a Cohen’s d of 0.32, the corresponding RR would be approximately 1.34. The E-value in this scenario would be calculated to be approximately 2.01. This implies that the unmeasured confounder would need to be twice as prevalent among the hypothetical exposed group (E = c + delta) than among the hypothetical unexposed group (E = c; i.e., the hypothetical exposure groups are highly imbalanced in the unmeasured confounder; VanderWeele & Ding, 2017) in addition to doubling the probability of being in the “high” versus “low” group on the dichotomized outcome variable to fully explain away the observed association conditional on covariates already adjusted for (Mathur et al., 2018). The use of delta allows the continuous exposure to be treated similarly to a binary exposure, making applying the E-value framework to continuous variables easier. In other words, delta essentially represents the hypothetical intervention “dosage” or the magnitude of the exposure change that researchers would consider clinically or theoretically meaningful had they had access to experimental data. Depending on the unit (e.g., a 1-point increase on a scale of 1 to 7 or a standardized 1-unit increase) and the chosen increment (1.0 unit or 2.0 units), the larger the difference of the increase (i.e., the larger the hypothetical dosage), the larger the resulting E-value will be. However, this comes with the caveat that two more extreme hypothetical exposure groups are likely more imbalanced on unmeasured confounders. This also implies that a larger E-value is needed for an indication of genuine robustness (VanderWeele et al., 2019). In general, we recommend a standardized 1-unit increase (i.e., delta = 1 with a standardized continuous exposure) as the unit of choice. Alternatively, one can categorize the continuous exposure into tertiles or quartiles and calculate the E-value by comparing the highest and lowest groups. This categorization approach is common in studies that calculated the E-value for continuous exposure (e.g., Chen et al., 2024; J. H. Hong et al., 2023; Kim et al., 2021, Kim, Wilkinson, Case, et al., 2024) because it allows the researchers to assess threshold effects and obtain more intuitive interpretations (Kim et al., 2021). In this case, one is essentially evaluating whether a hypothetical intervention that moves participants from, for example, the first tertile to the third tertile may have a potentially causal effect on an outcome of interest provided that the causal assumptions (e.g., exchangeability, positivity, and consistency) are met (Kim et al., 2021).

Transparency

The R script for the present tutorial is publicly available on OSF (https://osf.io/dbxm6/). The data set used is publicly available at https://www.icpsr.umich.edu/web/ICPSR/studies/21600.

Introducing the Data

In this tutorial, we use data from the public-use version of the National Longitudinal Study of Adolescent to Adult Health (Add Health; Harris et al., 2019), a population-based longitudinal cohort study of U.S. adolescents. It was approved by the Institutional Review Board of the University of North Carolina-Chapel Hill. For the purpose of our tutorial, we used data from Waves 1 (prebaseline, 1994–1995), 2 and 3 (baselines), and 4 and 5 (follow-ups) of the Add Health study. In the tutorial, we focus on three scenarios commonly encountered by psychologists: (a) regressing a binary rare outcome (occurrence of suicidal ideation) on a continuous exposure (depressive symptoms), (b) regressing a continuous outcome (poor self-rated health) on a continuous exposure (depressive symptoms), and (c) regressing a binary rare outcome (occurrence of binge drinking) on a binary exposure (recent volunteering). We note that the last scenario was a replication of Nakamura et al. (2024). Nakamura et al. recently used the Add Health study to understand the potentially causal effect of volunteering in young adulthood, a binary exposure assessed in Wave 3, on a range of subsequent health and psychosocial outcomes. They found that conditional on a rich set of covariates including both prebaseline outcome and retrospectively assessed volunteering during adolescence (a proxy for prebaseline exposure), the association between volunteering in young adulthood and reduced likelihood of binge drinking at Wave 5 (defined as at least one to two times a week of drinking four to five alcohols in a row [coded as 1] vs. fewer frequencies [coded as 0]) was at least moderately robust, with an E-value of 2.40. Here, we replicated this finding to demonstrate how to calculate the E-value for a binary exposure. All analyses control for several known confounders measured at prebaseline (Wave 1), including age, biological sex, family income, neighborhood satisfaction, and most importantly, prebaseline exposure and outcome, which are among the strongest confounders in many studies (VanderWeele et al., 2020). We chose to control for prebaseline confounders instead of baseline confounders to avoid overcontrolling in case the confounders we chose include potential mediators (Rohrer, 2018; VanderWeele et al., 2020), with the caveat that controlling for covariates in the prebaseline wave may not capture time-varying confounding that took place between the prebaseline wave and the baseline wave,¹⁰ especially if the time interval is long (e.g., 10 years vs. 1 year; Pelham et al., 2021). Note that the statistical adjustment is simplified for demonstration purposes and that the adjustment is not comprehensive. Selecting appropriate confounders for a particular analysis requires strong domain knowledge and can benefit greatly from drawing directed acyclic graphs (DAGs)—a principled and systematic way to transparently and explicitly encode causal assumptions among variables through the use of visual graphs even if important variables (e.g., a confounder) are not measured (for an accessible introduction for psychologists, see Rohrer, 2018). For recommendations on what confounders to consider for longitudinal analysis in general, we refer readers to VanderWeele et al. (2020), Mathur and VanderWeele (2022), and Davis et al. (2024).

Analytic Plan

For Scenario 1, a multivariable logistic regression model was conducted via the glm() function with ”family = “binomial” and stored as an R object (a data structure with attributes stored in R’s memory) labeled ”glm1.” Specifically, depressive symptoms at baseline (named ”dep2”) were entered as the exposure predicting suicidal ideation at follow-up (named ”suicide4”), adjusting for known prebaseline confounders, including age, biological sex, family income, neighborhood satisfaction, and most importantly, depressive symptoms and suicide ideation at prebaseline, with the following code: glm1 <- glm(suicide4 ~age+BIO_SEX+income+neigh_satis+dep1+suicide1+dep2, family = “binomial”, data = dat). Continuous variables, including depressive symptoms at both waves, age, income, and neighborhood satisfaction, were standardized (M = 0, SD = 1).

For Scenario 2, a multivariable linear-regression model was conducted via the lm() function and stored as an R object named ”lm1.” Specifically, depressive symptoms at baseline (named ”dep2”) were entered as the exposure predicting poor self-rated health at follow-up (named ”health4”), adjusting for known prebaseline confounders, including age, biological sex, family income, neighborhood satisfaction, and most importantly, depressive symptoms and self-rated health at prebaseline, using the following code: lm1 <- lm(health4 ~ age+BIO_SEX+income+neigh_satis+dep1+health1+dep2, data = dat). Continuous variables, including the outcome variable (poor self-rated health), depressive symptoms at both waves, age, income, and neighborhood satisfaction, were standardized (M = 0, SD = 1). In addition to the lm() function, in the online R script (https://osf.io/dbxm6/), we also conducted the same analysis via the lavaan package under a structural equation modeling (SEM) framework¹¹ because SEM represents one of the psychologists’ most common analytic tools.

For Scenario 3, a multivariable logistic regression model was conducted via the glm() function with “family = “binomial” and stored as an R object labeled “glm2.” Specifically, recent volunteering (0 = no recent volunteering, 1 = recent volunteering) at baseline (named “volunteer3”) was entered as the exposure predicting binge drinking (defined as at least one to two times a week of drinking four to five alcohols in a row [coded as 1] vs. fewer frequency or never [coded as 0]) at follow-up (named ”binge_drinking5”), adjusting for known prebaseline confounders, including age, biological sex, family income, neighborhood satisfaction, and most importantly, adolescent volunteering assessed retrospectively at baseline and binge drinking at prebaseline, with the following code: glm2 <- glm(binge_drinking5 ~ age+BIO_SEX+income+neigh_satis1+volunteer_pre+binge_drinking1+ volunteer3, family = “binomial”, data = dat). Continuous variables, including age, income, and neighborhood satisfaction, were standardized (M = 0, SD = 1).

Computing an E-value From a Logistic-Regression Model for a Continuous Exposure

Because the logistic-regression model is stored in an object named ”glm1,” we obtain the log OR estimate from the glm object using the following code:

Then, we convert the coefficient (the log OR) to OR by exponentiating it via the exp() function and then calculate the corresponding 95% CI of the OR using the following code and store them into three objects named ”OR,” ”lowerOR,” and ”upperOR,” respectively:

Then, we need to install and load the EValue package to calculate the corresponding E-values using the following code:

Finally, we insert the resulting OR (”OR”) and its corresponding 95% CI (”lowerOR” and ”upperOR”) into the evalues.OR() function of the EValue package (Mathur et al., 2018) using the following code:

Then, we inspect and interpret the results by running the following code:

The E-value for our logistic-regression model with depressive symptoms at baseline predicting suicidal ideation at follow-up conditional on a relatively robust set of confounders was 1.81, indicating that an unmeasured confounder’s association with both the exposure and outcome would need to have an RR of at least 1.81 to fully negate the observed association between depressive symptoms and suicidal ideation. To render the observed association statistically nonsignificant and shift the 95% CI to include the null, an E-value of 1.37 is needed. If the unmeasured confounder has a weaker association with either the treatment (depressive symptoms) or the outcome (suicidal ideation) than the E-value suggests (i.e., less than an RR of 1.81), this would necessitate a stronger association with the other variable to fully explain away the observed effect. For example, if the confounder has an association equivalent to an RR of 1.50 with depressive symptoms, it would need a stronger association with suicidal ideation—greater than an RR of 1.81—to negate the exposure-outcome association. Conversely, if the unmeasured confounder has an association stronger than an RR of 1.81 with one variable, the required association with the other variable to negate the exposure-outcome association will be weaker than an RR of 1.81. To gain a better understanding of how the unmeasured confounder’s impact depends jointly on the strengths of its associations with both the exposure and outcome, one can plot a bias plot using the bias_plot() function of the EValue package (Fig. 1). In the literature, an E-value of 1.81 is often interpreted as “at least moderately robust” to unmeasured confounding (e.g., Kim et al., 2023, Kim, Wilkinson, Okuzono, et al., 2024). This interpretation may be justified in contexts in which larger confounder effects (e.g., RRs > 2) are rare, especially when both prebaseline outcome and exposure were controlled for and when the continuous exposure was not categorized (e.g., tertiles or quartiles; e.g., J. H. Hong et al., 2023; Kim et al., 2023, Kim, Wilkinson, Case, et al., 2024). As noted by Trinquart et al. (2019), categorizing a continuous exposure would produce greater E-values when the exposure contrast is more extreme compared with a contrast based on a 1-SD increase or another less extreme difference. For example, Kim et al. (2021) studied the association between life satisfaction and all-cause mortality and concluded that this association was “moderately robust” because the categorized exposure (first tertile vs. third tertile) produced an E-value of 2.02, controlling for a rich array of confounders including but not limited to demographics, child-abuse history, religious-service attendance, health insurance, health behaviors, physical limitations, chronic health conditions, and prior assessment of the exposure.¹² Based on their supplemental material treating life satisfaction (the exposure) as a continuous predictor, we identified its corresponding OR/RR and its 95% CI¹³ and found that the 1-SD increase equivalent of the E-value based on comparing the first tertile and third tertile is 1.63, which is comparable with the E-value from our analysis predicting depressive symptoms’ effect on subsequent suicidal ideation. Although we did not control for as many confounders as Kim et al. did, Davis et al. (2024) showed that compared with adjusting all known and measured confounders, controlling only for prior outcome and exposure eliminated almost all (known) bias in a large study investigating the causal effect of parental warmth on youth adjustment.

Fig. 1.

Bias plot for empirical Example 1. RR_EU and RR_UD represent the maximum magnitude of association that the unmeasured confounding has with the exposure/predictor and the outcome, respectively (for technical details, see Box 1). The equation in the plot is used to calculate the bias factor. By setting RR_EU = RR_UD, solving RR_EU × RR_UD / (RR_EU + RR_UD – 1) = 1.25, where 1.25 is the exposure-outcome risk-ratio association conditional on the measured confounder, gives us the E-value for this particular observed exposure-outcome association, which is 1.81 (the positive solution). This plot illustrates how the unmeasured confounder’s impact depends jointly on the strengths of its associations with both the exposure and outcome. It shows that when the strength of one association (e.g., RR_EU) is larger than the E-value of 1.81, the strength of RR_UD needed to negate the observed exposure-outcome association will be lower than 1.81.

Computing an E-value From a Linear-Regression Model for a Continuous Exposure

The linear-regression model is stored in an object labeled ”lm1.” First, we need to obtain the exposure’s coefficient estimate and its standard error into two objects named ”est” and ”se” using the following code:

Then, we can calculate the residual standard deviation of the outcome (a more conservative calculation of the outcome’s standard deviation is to calculate it via sd(data$outcome), which is the standard deviation of the outcome not conditional on the exposure and covariates) and store it in an object named ”sd” using the following code:

Then, we can insert the est, se, and sd objects created above into the evalues.OLS() function of the EValue package (Mathur et al., 2018) to obtain the E-values. Before executing the code, note that an additional argument delta needs to be specified; a common delta value is 1, which sets the contrast of interest between a hypothetical group of individuals with exposure values equal to an arbitrary value c versus another hypothetical group of individuals with exposure values equal to c + 1 unit (i.e., 1 SD because we standardized the exposure) following a dichotomization of the exposure variable (Mathur et al., 2018). Therefore, the E-values of our lm object can be obtained by running the following code:

Then, we inspect and interpret the results by running the following code:

In our linear-regression model analyzing the effect of depressive symptoms at baseline on poor self-rated health at follow-up, conditional on measured confounders, the E-value was 1.37. This indicates that for unmeasured confounding to fully explain away the observed association, it needs to have an association equivalent to an RR of 1.37 with both the exposure and the outcome above and beyond the confounders already adjusted for, but weaker confounding could not. In addition, to render the observed association’s 95% CI to include zero (i.e., rendering the observed association statistically nonsignificant), an E-value of 1.24 is needed, but weaker confounding could not. These values suggest that the observed association between depressive symptoms and subsequent self-rated health could, at most, be viewed as only somewhat robust to unmeasured confounding. However, this relatively small E-value should be viewed in light of the fact that depressive symptoms and self-rated health at prebaseline and other measured confounders were statistically controlled for. It should be noted that a smaller E-value can only suggest the evidence of causality for a given association is relatively weak—that is, the absence of evidence for causality is not the same as evidence of no causal effect (VanderWeele & Ding, 2017).

Computing an E-Value From a Logistic-Regression Model for a Binary Exposure

The logistic-regression model with a binary exposure is stored in an object labeled ”glm2.” First, we obtain the log OR estimate from the glm2 object using the following code:

Then, we inspect and interpret the results by running the following code:

In our logistic-regression model with a binary exposure, the effect of recent volunteering at baseline on binge drinking at follow-up, conditional on measured confounders, was marginally significant (OR = 0.76, p = .082). Had we passed the conventional significance test and obtained a p < .05, we would have proceeded and obtained an E-value of 1.96. An E-value of this magnitude would suggest that for an unmeasured confounder to fully explain away the observed association, it would need to have an association equivalent to an RR of 1.96 with both the exposure and outcome above and beyond the confounders already adjusted for, but weaker confounding could not. Because the effect’s 95% CI includes null, the E-value for the 95% confidence limit would be 1.00. The calculated E-value for our analytic sample was 1.96, lower than the E-value of 2.40 that Nakamura et al. (2024) reported. The discrepancy in magnitude and significance testing can arise from (a) different analytic approaches and data preprocessing decisions (e.g., they may have used Poisson regression with a “log” link to directly estimate the RR), (b) our use of listwise deletion, and (c) our use of the public-use version of the Add Health data set, which has a smaller sample size than the sample size of Nakamura et al. Furthermore, given the binary nature of both exposure and outcome,¹⁵ we showcased how to tabulate associations between covariates and exposure/outcome in Table 3. Regarding covariate-outcome associations, the E-value (1.96) in this analysis is larger than all covariates’ effects on the outcome except the effects of biological sex (RR = 2.57). Regarding covariate-exposure associations, the E-value (1.96) is larger than all exposure-confounder associations except for the associations between exposure and the prior exposure value (RR = 2.66) and the association between exposure and the prior outcome value (RR = 2.34). These comparisons helped contextualize the E-value’s strengths. The E-value suggests that an unmeasured confounder is associated with both the outcome and exposure with a strength similar to biological sex’s association with the outcome (the association between exposure and biological sex has an RR of 1.12); conditional measured confounders including but not limited to both prior exposure and outcome would negate the observed association, implying moderate robustness of the study finding. Furthermore, because the exposure of this example is binary (0 = no volunteering vs. 1 = volunteering), a scenario in which researchers may naturally be inclined to use matching/weighting methods to arrive at the statistical estimand because these methods clearly separate the design phase from the inference phase (Chan et al., 2022), we also describe how E-values can be calculated from an analysis based on PSM, which generally targets ATT, and inverse probability of treatment weighting (IPTW), which generally targets the ATE. For PSM, we estimated the probability of each participant getting into treatment (volunteering = 1), or the so-called propensity score, using logistic regression by regressing the treatment on all measured confounders, assuming conditional exchangeability. We then applied nearest neighbor matching with a caliper of 0.09 (higher and thus less stringent calipers such as 0.20 and 0.10 led to unacceptable imbalances in some confounders) via the MatchIt package in R. The propensity scores for the unmatched sample ranged from 0.03 to 0.70, and there was substantial overlap between the two treatment groups. The matched sample retained most treated subjects (812/850) with similar distributions of propensity scores across matched groups and showed acceptable confounder balance; the absolute values of SMDs in all measured covariates were less than 0.10. We then ran a multiple-regression analysis on this matched sample to estimate the treatment’s effect on the outcome, controlling for the same set of confounders used to estimate the propensity score to further reduce residual confounding. Cluster robust standard errors clustered on the pairing of matches to account for uncertainty in estimates (Chan et al., 2022). The OR for this analysis is 0.78, similar to the OR estimated via the regression analysis above, and the E-value is similar at 1.88. For IPTW, we used the same estimated propensity scores of each participant to create a pseudopopulation in which treatment assignment is independent of the observed covariates, which mimics randomization assuming conditional exchangeability (for an accessible tutorial on weighting for psychologists, see Thoemmes & Ong, 2016). To avoid extreme weights, which may dramatically increase variance, we used stabilized weights ranging from 0.46 to 9.28 (Chan et al., 2022). Again, the weighted sample showed an acceptable balance of covariates between the treated and control groups (all |SMDs| < 0.10). We then estimated the effect of the exposure on the outcome using a weighted regression based on the stabilized weights generated by IPTW via the survey package, again controlling for the same set of confounders used to estimate the propensity score. The point estimate on the OR metric was 0.90, lower than the main analysis based on regression control (OR = 0.76), and the E-value was lower, at 1.45. When a truncation (i.e., recoding weights outside the fifth percentile and 95th percentile ranges) based on the stabilized weights was applied to further reduce the effect of large weights (Thoemmes & Ong, 2016), the same regression analysis led to an OR of 0.81 and an E-value of 1.77.

Table 3.

Associations Between Measured Covariates and Exposure/Outcome in Empirical Example 3

E-value = 1.96	Outcome (RR)	Exposure (RR)	Exposure (RR; conservative)
Age (dichotomized)	1.22	0.92	1.17
Biological sex (reference = male)	2.65	1.12	1.12
Income (dichotomized)	1.15	0.77	1.33
Neighborhood satisfaction (dichotomized)	1.18	1.09	1.16
Volunteering in adolescence (reference = no)	1.50	2.66	2.66
Prebaseline binge drinking (reference = no)	1.71	2.34	2.34

Note: Continuous variables were dichotomized with a median split. We recoded volunteering (a protective factor) so that nonvolunteering now equals 1 (i.e., a higher risk “exposure” for an undesirable outcome). Likewise, protective confounders (with an OR/RR < 1.00) were reverse-scored. The OR estimates for the confounder-outcome associations were obtained from separate bivariate logistic regressions used to estimate each individual (binary) confounder’s effect on the outcome; given the rare prevalence of the outcome (binge drinking), RR is approximately equivalent to OR when predicting outcome. The RR association between exposure and each (binary) confounder is obtained by calculating the prevalence of each (binary) confounder and comparing the exposure group with the nonexposure group. For each binary confounder, a score of 1 (vs. 0) corresponds to the level that increases the risk for the outcome of binge drinking, which is the default following the reverse-scoring procedure mentioned earlier. The exposure-covariate association’s direction (RR_EU; see Box 1) is in relation to the covariate’s effect on the outcome (RR_UD) and thus may be lower than 1.00. Using the bias-factor formula provided in Box 1, we can calculate, for example, a hypothetical unmeasured confounder with the exact same associations (both in magnitude and direction) as (dichotomized) age’s associations with exposure and outcome that would produce minimum bias (i.e., (1.22 × 0.92) / (1.22 + 0.92 – 1) = 0.98). For a hypothetical unmeasured confounder that has the exact same strength and direction as age’s effect on the outcome to exert bias greater than 1.00 (i.e., to adjust down the observed exposure-outcome effect), its imbalance (age = 1) across exposure levels needs to be in the opposite direction as the age variable in relation to the exposure (i.e., the prevalence of age level = 1 is greater among the exposed group than among the unexposed group). If a hypothetical unmeasured binary confounder has the same magnitude of effect on the outcome as the age variable but with its coding reversed, its association with the exposure must be assessed in relation to the level (the current score of 1) that is associated with higher risks for the outcome (i.e., $\frac{prevalence of confounder presence (recoded level = 1) among exposed}{prevalence of confounder presence (recodedlevel = 1) among unexposed}$ ). In that case, the exposure-covariate relative risk association would be (699 / (699 + 1,114)) / (280 / (280 + 570)) = 1.17. Compare (699 / (699 + 1,114) / (280 / (280 + 570)) with (1,114 / 1,813) / (570 / (280 + 570)) used to obtain 0.92, the age’s relative risk with exposure reported in the table. All these demonstrations are to say, to be conservative, one can calculate the maximum exposure-covariate association by doing two calculations (and take the inverse when the relative risk is lower than 1.00) so that a more conservative hypothetical scenario is considered in which a hypothetical unmeasured confounder’s associations with exposure and outcome are maximized and in the same direction (see Column 4). Bold RRs are greater than the identified E-value of the potentially causal association of interest conditional on measured confounders. RR = risk ratio; OR = odds ratio.

Discussion

In this tutorial, we address a notable gap in nonexperimental psychological research: the lack of discussion surrounding unmeasured confounding. We introduce psychologists to the E-value, a sensitivity-analysis technique that offers an accessible means to enhance observational studies focused on causal inquiries, whether implicitly or explicitly (VanderWeele & Ding, 2017). We illustrate how to compute E-values using the EValue package (Mathur et al., 2018) in R under several circumstances commonly encountered by psychologists. Although this intuitive approach holds promise in aiding psychologists to assess the robustness of observed treatment-outcome associations to unmeasured confounding, we acknowledge that (a) the E-value has a contextual nature because there exist no rigid cutoffs to categorize E-values into small, medium, or large ranges; (b) the value of E-value is especially enhanced when most important known confounders (e.g., prior measurements of exposure and outcome) are statistically controlled for in the analysis; and (c) despite its utility, the E-value method possesses several limitations that warrant careful consideration.

The importance of contextualizing E-values and best practices

As highlighted by VanderWeele and Ding (2017), the interpretation of E-values does not follow strict cutoffs that classify them into small, medium, and large ranges. This underscores the importance of contextualizing E-values within the specific parameters of each study. For example, an E-value even as high as 5 may not be meaningful if existing known risk factors demonstrate RRs of 6 or 7, in which case, the observed association is still considered susceptible to unmeasured confounding. Conversely, a smaller E-value (e.g., 1.5) may indicate robust association if the known risk factors have lower RRs (e.g., 1.3; Chung & Chung, 2023). Moreover, to provide richer information in a given context, researchers are recommended to report two E-values for each potentially causal association—that is, both the E-value that negates the observed association and the E-value that makes an association statistically nonsignificant (VanderWeele & Mathur, 2020). Furthermore, the E-value cannot replace rigorous and thoughtful statistical adjustments, such as matching methods and regression analysis, because its sole purpose is to assess the plausibility of the exchangeability assumption. Studies that rigorously identify, measure, and control for known confounding before computing E-values to contextualize their findings will demonstrate (much) greater robustness (Chung & Chung, 2023). For example, studies that adjust for prebaseline exposure and outcome reduce the risk of reverse causality and likely report an E-value that is smaller than a similar study that does not adjust for prior measurements of the exposure and outcome, two of the strongest confounders influencing both the exposure and subsequent outcome (VanderWeele et al., 2020). As a result, researchers should always conduct a thorough literature review and explicitly state all known confounders of the exposure-outcome relationship under study, even those not measured in their research (VanderWeele & Mathur, 2020), in which case, the lack of control over known unmeasured confounding may bias the results and warrants thoughtful discussion (e.g., on whether the magnitude of association is known in the literature, how it compares with the E-value, and/or acknowledging its omission as a limitation). When important known and measured confounders, such as prior measurements of the exposure and outcome, are not controlled for, the E-value can be artificially inflated because the observed association may partly reflect the effects of these uncontrolled confounders rather than a true causal effect. In this case, the E-value suggests that a stronger unmeasured confounder would be needed to explain away the association, but this may be misleading because the effect is already confounded by factors that should have been accounted for. Conversely, when these known confounders are properly controlled for, the observed association becomes more accurate, and the E-value reflects the “true” robustness of the association to unmeasured confounding. This often results in a smaller but more realistic E-value that provides a more accurate assessment of how vulnerable the observed association is to potential unmeasured confounders.

Given the aforementioned considerations, we believe that the E-values are particularly promising (a) in longitudinal studies that control for both prior assessments of exposure and outcome (and ideally along with a rich set of measured confounders¹⁶; i.e., as in VanderWeele’s outcome-wide framework) and (b) in studies that use a rigorous selection of potential confounders using DAGs to block the majority of the noncausal paths (i.e., in the causal-graph framework). The latter criterion also implies that the E-value is most informative when only one unmeasured known confounder is present and all other known confounders are adequately controlled. In cases in which multiple known distinct confounders are unmeasured, the data may not be sufficient to answer the research questions effectively in the first place. Meanwhile, to gauge whether an E-value is small or large (i.e., the robustness of findings), researchers are recommended to report and comment on the magnitude of associations between each measured confounder (inverted for protective factors and median dichotomized for continuous confounders) and the exposure/outcome because as they are compared with the E-value (VanderWeele & Mathur, 2020; for an example, see the comment section of Table 4 focusing on Kaster et al. 2022). Although it is unlikely for the E-value to be larger than the association between the prior outcome values and the outcome and the association between the prior exposure values and the exposure, if the E-value is larger than every other strong known confounder statistically controlled for, the researcher may be more confident that the observed association is indeed causal. In other words, the E-value facilitates a more meaningful discussion, on the end of both the authors and the reviewers, on causal thinking by encouraging researchers to consider the strength of unmeasured confounding relative to known measured confounders. From an author’s perspective, when a reviewer raises issues with the use of observational data to infer causality because of the lack of measurement of a known key confounder, the authors can present a more meaningful discussion by employing sensitivity analyses, such as the E-value, by comparing the effects of that unmeasured but known confounder on the exposure and outcome—as reported and approximated in the literature—with the E-value. When a reviewer expresses concerns about the threat of unknown confounding, the authors can then provide context regarding the robustness of the observed exposure-outcome association (i.e., the E-value) by reporting and comparing the measured confounders’ effects with the reported E-value, assuming that the analysis is based on a rigorous selection of covariates. For example, if a strong known confounder shows a larger association with the exposure or outcome than the E-value, the exposure-outcome association may be seen as less robust and may necessitate more thoughtful discussions in the limitations section because an unmeasured confounder of similar magnitude could negate the observed association. Conversely, when the E-value exceeds the effects of a strongly measured confounder on the exposure and outcome, a stronger case for causality can be made. To further illustrate how E-values can be appropriately applied, we listed and commented on several recent articles—most focused on psychological constructs—that we believe have made good use of the E-value technique in Table 4. From a reviewer’s perspective, a deeper consideration of unmeasured confounding allows a critical evaluation of the quality of measured confounders in a longitudinal study. When the quality of measured confounders is deemed insufficient, a reviewer may require the authors to (a) apply more rigorous statistical control over the exposure-outcome association to evaluate whether the new results are consistent with the primary findings and (b) conduct sensitivity analyses, such as the E-value, to assess how sensitive the exposure-outcome association is to unmeasured confounding.

Table 4.

Recent Exemplary Articles With Application of the E-Value

Title	Authors	Exposure	Outcome(s)	Comments
“Gratitude and Mortality Among Older U.S. Female Nurses”	Chen et al. (2024)	Gratitude	Mortality	The first strength of this article is its adjustment of a rich set of potential confounders. Because the outcome is mortality, the prior outcome was also implicitly controlled for (i.e., analysis was done only on people who were alive at baseline; Mathur & VanderWeele, 2022). The second strength of this article is that it provided a table showing the associations between covariates and outcomes to contextualize the E-value of the article. The article could be even more robust if the data set also measured prior exposure of gratitude (a potential confounder unmeasured) and if the authors also provided a table showing the strengths of associations between the covariates (inverted for protective factors) and the exposure.
“Life Satisfaction and Subsequent Physical, Behavioral, and Psychosocial Health in Older Adults”	Kim et al. (2021)	Life satisfaction	Physical, behavioral, and psychosocial health	This article’s strongest strength is its adjustment of a rich set of covariates including both prior exposure and outcome. Because the E-values were conditional on these measured confounders, they were (much) more informative than E-values conditional on a limited set of confounders. The article could be even stronger if the authors also provided information regarding the strengths of associations between the exposure and confounders. Given that the article took an outcome-wide approach investigating a multitude of outcomes, providing covariate associations with all outcomes could be too burdensome. Providing a table reporting the three confounder-outcome associations of greatest magnitude for each outcome could be of great value (VanderWeele & Mathur, 2020). Although the E-value may likely be smaller than the prior outcome’s effect on itself at follow-up and the prebaseline exposure’s effect on itself at baseline (if the E-value is even stronger than this pair of associations, the finding is likely at least moderately robust as long as the prior outcome and exposure were not measured a long time ago), comparing the E-value with other covariates including known strong risk/protective factors and prior exposure’s effect on the outcome can be very informative.
“Association Between Bariatric Surgery and Macrovascular Disease Outcomes in Patients With Type 2 Diabetes and Severe Obesity”	Fisher et al. (2018)	Bariatric surgery	Macrovascular-disease outcomes	Although this article did not investigate psychological outcomes/exposures, we included this article because it exemplifies some of the best practices for the use of the E-value technique. First, it adjusted for a rich set of measured confounders via matching and regression analysis. Second, it provided information on the associations between covariates and the outcome. Doing so greatly facilitated benchmarking and contextualization of the E-value of the association of interest. For example, the E-value (2.72) of the primary exposure-outcome association is (much) greater than the effects of measured, known, strong risk factors on the outcome (e.g., being a current smoker) presented in its Table 4.
“Risk of Suicide Death Following Electroconvulsive Therapy Treatment for Depression: A Propensity Score-Weighted, Retrospective Cohort Study in Canada”	Kaster et al. (2022)	Electroconvulsive therapy treatment for depression	Suicide death	This article possesses multiple strengths that are worth comment. First, it implicitly controlled for prior outcome at baseline because the outcome is suicide death. Second, it controlled for a wide range of covariates (> 100 covariates selected based on directed acyclic graph) via propensity score matching, substantially reducing the risk of unmeasured confounding. Third, the E-value (3.2) is relatively large, even in absolute terms without any contextualization, given how comprehensive the statistical control is already. Fourth, the authors acknowledged a key limitation that there existed some potential unmeasured but known confounders and contextualized the E-value of 3.2 by commenting on how the unmeasured but known confounders’ effects may not approach the E-value the study observed: “Our analytic cohort (ie, the pseudo-population) was balanced for a wide range of clinically important covariates, which suggests our propensity score model is adequately specified and that bias because of confounding on these characteristics was mitigated. Furthermore, our E-value analysis suggests that strong unobserved confounding would be required to account for the observed results, which is unlikely considering the comprehensive set of covariates included in the analysis. There were some variables for which we did not have data, such as electrode placement (unilateral, bitemporal, or bifrontal) or stimulus parameters. Evidence from clinical trials suggests that the effect of these exposure characteristics on reducing depressive symptoms is likely to be small. We also had no information on patient preference or willingness to pursue electroconvulsive therapy; however, although this is strongly associated with receiving electroconvulsive therapy, its association with eventual death by suicide is likely to be much weaker and unlikely to have significant impact on our findings” (Kaster et al., 2022, Page 445). This article would be even further strengthened if the authors reported the measured confounders’ strengths of associations with both the exposure and outcome.

Applicability of the E-value in more complicated scenarios

Although the E-value has shown to be useful in longitudinal analyses in which a single exposure predicts a future outcome, its applicability in more complex modeling frameworks requires careful consideration. To date, E-values have primarily been applied in three contexts: (a) longitudinal regression that measures a single exposure (e.g., Kim, Wilkinson, Case, et al., 2024); (b) weighting-based methods, which create pseudopopulations to address both general and/or time-varying confounders (e.g., Zhong et al., 2018); and (c) matching-based methods, which minimize differences in measured confounders between the exposed group and nonexposed group (e.g., Fisher et al., 2018). Moreover, the E-value can also be applied in mediation models—when rigorous control of confounders is done for all important paths (i.e., exposure-outcome association, exposure-mediator association, and mediator-outcome association) and fits naturally under a causal-mediation framework (for an accessible application of E-value in this context for psychologists, see Li et al., 2023). The E-value can theoretically be used in the context of the cross-lagged panel model (CLPM). However, this approach is not emphasized here because of critiques of CLPM’s large set of parametric assumptions (Mulder, Luijken, et al., 2024) and more importantly, its confluence of within- and between-units variances (i.e., convergence; Hamaker et al., 2015). In addition, in CLPMs, when only prior outcomes are controlled, assumptions of exchangeability may likely be violated because of residual confounding, which can “inflate” the E-value artificially. This issue is especially common in CLPM-based psychological studies in which comprehensive confounder control is infrequent, highlighting the need for caution in using the E-value in complex modeling scenarios in which residual confounding and assumption violations may affect interpretability. Furthermore, as we note in our Limitations section, to the best of our knowledge, the E-value has seldom been applied in models that use only within-units variations over time, such as multilevel modeling of intensive longitudinal data, fixed-effects models, and RI-CLPMs. In such cases, interpreting E-values is challenging, especially when few (benchmark) important time-varying confounders are incorporated to help contextualize the E-value, as is commonly done in psychological studies testing within-units associations. Although within-persons analysis is robust to unmeasured time-invariant confounders, which by definition cannot cause/confound within-persons variations, it is still susceptible to unmeasured time-varying confounding (Usami et al., 2019). The common practice of omitting time-varying confounders in within-units analysis, such as RI-CLPM, among applied psychologists may preclude meaningful interpretations of sensitivity analysis such as the E-value in this context. Because of these observations, we believe that the E-value may be of limited value to within-units methods, especially as they are currently being applied in psychological research. Finally, a key issue across interdependent models—such as path models, CLPM, and RI-CLPM—is that one misspecified path could have implications for the rest of the model. When working with large multivariate models that require calculating several E-values, the challenge lies in interpreting them collectively, but it is important to consider all key paths together to assess the overall robustness of the model.

Limitations of E-value

Although the E-value technique provides useful insights, it is crucial to acknowledge its limitations. First, although E-values are a valuable tool for assessing the robustness of an observed association to unmeasured confounding, they do not address other important issues in causal inference. These include but are not limited to measurement error, sample-selection bias, treatment-effect heterogeneity, and bias introduced by third variables that do not necessarily cause both the treatment and outcome (i.e., collider bias; Elwert & Winship, 2014; Matthay & Glymour, 2020). It is important for researchers to consider these additional sources of bias when interpreting causal estimates. Second, E-values do not address common questionable practices in psychological science, such as p-hacking and selective statistical control (Friese & Frankenbach, 2020; VanderWeele & Ding, 2017). When selective statistical control is apparent (e.g., not adjusting for known confounders, including prebaseline exposure and outcome), the reporting of the E-value can give a false sense of confidence in results (Blum et al., 2020). Third, the primary purpose of E-values is to determine whether an observed effect could be completely confounded to the point of negating the association. However, the goal of research often extends beyond simply establishing whether an effect is significant; researchers also seek to accurately estimate the magnitude of the causal effect. In this regard, E-values offer limited utility because they do not directly contribute to estimating the size of the causal effect but rather assess the robustness of the association to potential unmeasured confounding. Consequently, although E-values are valuable for sensitivity analysis, other methods are needed to estimate causal effects accurately. Fourth, when the outcome is continuous, the E-value is less elegant, and additional assumptions are invoked (Ding & VanderWeele, 2016). When converting Cohen’s d to OR, the distributions of the continuous outcome in both groups are assumed to follow a logistic (approximately normal) distribution with equal variances (Anzures-Cabrera et al., 2011; Borenstein et al., 2009). The OR may be upwardly or downwardly biased depending on the specifics of the distributions, including assumptions about logistic distribution and equal variances and the nature of the control- and treatment-group risks. When these assumptions are unmet, the direction and magnitude of the bias will vary, leading to potential overestimation or underestimation of the true OR (for a comprehensive simulation, see Anzures-Cabrera et al., 2011). Furthermore, many other sensitivity-analysis approaches exist for estimating how strong the associations of unmeasured confounders would need to be to neutralize an observed effect, and the E-value represents only one of several recent perspectives (e.g., Cinelli & Hazlett, 2019) assessing the impact of unmeasured confounders. For example, the robustness value proposed by Cinelli and Hazlett (2019) is conceptually similar to E-values (for an accessible tutorial on the robustness value and related metrics in R and Stata, see Cinelli et al., 2020). The E-value is an approximation for effect measures other than RRs, whereas if linear regression is used to estimate an effect, the robustness value is exact. Although the robustness value expresses the confounding’s association with the treatment and outcome in terms of the percentage of variance explained (partial R²), the E-value represents these associations using RRs. In the case of a linear-regression model with a continuous outcome, we recommend that researchers become familiar with both sensitivity analyses and conduct both of them. In the case of binary outcomes, we recommend researchers conduct E-value as the go-to sensitivity analysis. In other words, the choice among the wide array of sensitivity-analysis techniques depends on the specific research context (e.g., when plausible assumptions regarding the prevalence of the unmeasured confounder can be made; Haneuse et al., 2019). Finally, for demonstration purposes, we controlled for only a limited set of sociodemographic covariates in addition to the prior outcome and the prior exposure and did not control for a wider range of covariates like most of the exemplary articles we highlighted in Table 2 that reported the E-value. For empirical observational studies conducted in the real world to be methodologically rigorous, the researcher should (a) control for most known confounders and (b) report and tabulate the associations between every confounder controlled for and the exposure/outcome as they are compared with the E-value.

Conclusion

We anticipate that this tutorial will assist psychologists in acknowledging the importance of explicitly evaluating the potential threat posed by unmeasured confounding. We believe that incorporating the reporting of E-values or metrics from other related sensitivity-analysis techniques in psychological research can contribute to this endeavor and move cumulative psychological science forward as the field moves toward greater emphasis on causal interpretations of findings (Brady et al., 2023; Davis et al., 2024; Grosz, 2023).

Footnotes

Acknowledgements

We thank Michael P. Grosz and the other anonymous reviewer for their constructive and insightful feedback.

Transparency

Action Editor: David A. Sbarra

Editor: David A. Sbarra

Author Contributions

Kaiwen Bi: Conceptualization; Data curation; Formal analysis; Methodology; Software; Visualization; Writing – original draft; Writing – review & editing.

Gabriel J. Merrin: Formal analysis; Methodology; Supervision; Validation; Writing – original draft; Writing – review & editing.

Tianyu Li: Methodology; Validation; Writing – original draft; Writing – review & editing.

Xianlin Sun: Formal analysis; Methodology; Supervision; Validation; Writing – original draft; Writing – review & editing.

Yi Chai: Methodology; Validation; Writing – original draft; Writing – review & editing.

Zekai Lu: Methodology; Validation; Writing – original draft; Writing – review & editing.

Mark Shuquan Chen: Conceptualization; Methodology; Supervision; Writing – original draft; Writing – review & editing.

ORCID iDs

Kaiwen Bi

Mark Shuquan Chen

Notes

References

Angrist

J. D.

Imbens

G. W.

Rubin

D. B.

(1996). Identification of causal effects using instrumental variables. Journal of the American Statistical Association, 91(434), 444–455. https://doi.org/10.1080/01621459.1996.10476902

Anzures-Cabrera

Sarpatwari

Higgins

J. P.

(2011). Expressing findings from meta-analyses of continuous outcomes in terms of risks. Statistics in Medicine, 30(25), 2967–2985. https://doi.org/10.1002/sim.4298

Merrin

G. J.

Zhang

Wang

Xiao

Chen

M. S.

(2024). Are there reciprocal interplays among Chinese adolescents’, fathers’, and mothers’ depression at the within-family level? A family systems perspective. Journal of Psychopathology and Clinical Science, 133(2), 140–154. https://doi.org/10.1037/abn0000883

Blum

M. R.

Tan

Y. J.

Ioannidis

J. P. A.

(2020). Use of E-values for addressing confounding in observational studies-an empirical assessment of the literature. International Journal of Epidemiology, 49(5), 1482–1494. https://doi.org/10.1093/ije/dyz261

Borenstein

Hedges

L. V.

Higgins

Rothstein

H. R.

(2009). Introduction to meta-analysis. Wiley. https://doi.org/10.1002/9780470743386

Brady

A. C.

Griffin

M. M.

Lewis

A. R.

Fong

C. J.

Robinson

D. H.

(2023). How scientific is educational psychology research? The increasing trend of squeezing causality and recommendations from non-intervention studies. Educational Psychology Review, 35, Article 37. https://doi.org/10.1007/s10648-023-09759-9

Caliendo

Kopeinig

(2008). Some practical guidance for the implementation of propensity score matching. Journal of Economic Surveys, 22(1), 31–72. https://doi.org/10.1111/j.1467-6419.2007.00527.x

Chan

G. C. K.

Lim

Sun

Stjepanovic

Connor

Hall

Leung

(2022). Causal inference with observational data in addiction research. Addiction, 117(10), 2736–2744. https://doi.org/10.1111/add.15972

Chen

Okereke

O. I.

Kim

E. S.

Tiemeier

Kubzansky

L. D.

VanderWeele

T. J.

(2024). Gratitude and mortality among older US female nurses. JAMA Psychiatry, 81(10), 1030–1038. https://doi.org/10.1001/jamapsychiatry.2024.1687

10.

Chinn

(2000). A simple method for converting an odds ratio to effect size for use in meta-analysis. Statistics in Medicine, 19(22), 3127–3131. https://doi.org/10.1002/1097-0258(20001130)19:22<3127::aid-sim784>3.3.co;2-d

11.

Chung

W. T.

Chung

K. C.

(2023). The use of the E-value for sensitivity analysis. Journal of Clinical Epidemiology, 163, 92–94. https://doi.org/10.1016/j.jclinepi.2023.09.014

12.

Cinelli

Ferwerda

Hazlett

(2020). Sensemakr: Sensitivity analysis tools for OLS in R and Stata. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.3588978

13.

Cinelli

Hazlett

(2019). Making sense of sensitivity: Extending omitted variable bias. Journal of the Royal Statistical Society B: Statistical Methodology, 82(1), 39–67. https://doi.org/10.1111/rssb.12348

14.

Cooper

Hedges

L. V.

Valentine

J. C.

(Eds.). (2009). The handbook of research synthesis and meta-analysis (2nd ed.). Russell Sage Foundation.

15.

Cornfield

Haenszel

Hammond

E. C.

Lilienfeld

A. M.

Shimkin

M. B.

Wynder

E. L.

(1959). Smoking and lung cancer: Recent evidence and a discussion of some questions. Journal of the National Cancer Institute, 22(1), 173–203. https://doi.org/10.1093/jnci/22.1.173

16.

Davis

Aks

I. R.

Somers

Schulze

Patel

Leitz

Valente

Pelham

W. E.

III . (2024). Confounding in non-experimental studies linking caregiver warmth to youth adjustment: The problem and potential solutions. OSF. https://doi.org/10.31219/osf.io/j8ucs

17.

Ding

VanderWeele

T. J.

(2016). Sensitivity analysis without assumptions. Epidemiology, 27(3), 368–377. https://doi.org/10.1097/ede.0000000000000457

18.

Elwert

Winship

(2014). Endogenous selection bias: The problem of conditioning on a collider variable. Annual Review of Sociology, 40, 31–53. https://doi.org/10.1146/annurev-soc-071913-043455

19.

Emdin

C. A.

Khera

A. V.

Kathiresan

(2017). Mendelian randomization. JAMA, 318(19), 1925–1926. https://doi.org/10.1001/jama.2017.17219

20.

Fisher

D. P.

Johnson

Haneuse

Arterburn

Coleman

K. J.

O’Connor

P. J.

O’Brien

Bogart

Theis

M. K.

Anau

Schroeder

E. B.

Sidney

(2018). Association between bariatric surgery and macrovascular disease outcomes in patients with type 2 diabetes and severe obesity. JAMA, 320(15), 1570–1582. https://doi.org/10.1001/jama.2018.14619

21.

Friese

Frankenbach

(2020). p-Hacking and publication bias interact to distort meta-analytic effect size estimates. Psychological Methods, 25(4), 456–471. https://doi.org/10.1037/met0000246

22.

Grosz

M. P.

(2023). Should researchers make causal inferences and recommendations for practice on the basis of nonexperimental studies? Educational Psychology Review, 35, Article 57. https://doi.org/10.1007/s10648-023-09777-7

23.

Grosz

M. P.

Ayaita

Arslan

R. C.

Buecker

Ebert

Hünermund

Müller

S. R.

Rieger

Zapko-Willmes

Rohrer

J. M.

(2024). Natural experiments: Missed opportunities for causal inference in psychology. Advances in Methods and Practices in Psychological Science, 7(1). https://doi.org/10.1177/25152459231218610

24.

Grosz

M. P.

Rohrer

J. M.

Thoemmes

(2020). The taboo against explicit causal inference in nonexperimental psychology. Perspectives on Psychological Science, 15(5), 1243–1255. https://doi.org/10.1177/1745691620921521

25.

Hamaker

E. L.

Kuiper

R. M.

Grasman

R. P.

(2015). A critique of the cross-lagged panel model. Psychological Methods, 20(1), 102–116. https://doi.org/10.1037/a0038889

26.

Haneuse

VanderWeele

T. J.

Arterburn

(2019). Using the E-value to assess the potential effect of unmeasured confounding in observational studies. JAMA, 321(6), 602–603. https://doi.org/10.1001/jama.2018.21554

27.

Hariton

Locascio

J. J.

(2018). Randomised controlled trials - The gold standard for effectiveness research: Study design: Randomised controlled trials. BJOG, 125(13), Article 1716. https://doi.org/10.1111/1471-0528.15199

28.

Harris

K. M.

Halpern

C. T.

Whitsel

E. A.

Hussey

J. M.

Killeya-Jones

L. A.

Tabor

Dean

S. C.

(2019). Cohort profile: The National Longitudinal Study of Adolescent to Adult Health (Add Health). International Journal of Epidemiology, 48(5), 1415–1415k. https://doi.org/10.1093/ije/dyz115

29.

Hong

(2004). Causal inference for multi-level observational data with application to kindergarten retention [Unpublished doctoral dissertation, University of Michigan]. https://deepblue.lib.umich.edu/handle/2027.42/124428

30.

Hong

J. H.

Nakamura

J. S.

Berkman

L. F.

Chen

F. S.

Shiba

Chen

Kim

E. S.

VanderWeele

T. J.

(2023). Are loneliness and social isolation equal threats to health and well-being? An outcome-wide longitudinal approach. SSM - Population Health, 23, Article 101459. https://doi.org/10.1016/j.ssmph.2023.101459

31.

Iacus

S. M.

King

Porro

(2012). Causal inference without balance checking: Coarsened exact matching. Political Analysis, 20(1), 1–24. https://doi.org/10.1093/pan/mpr013

32.

Kaster

T. S.

Blumberger

D. M.

Gomes

Sutradhar

Wijeysundera

D. N.

Vigod

S. N.

(2022). Risk of suicide death following electroconvulsive therapy treatment for depression: A propensity score-weighted, retrospective cohort study in Canada. Lancet Psychiatry, 9(6), 435–446. https://doi.org/10.1016/S2215-0366(22)00077-3

33.

Kim

E. S.

Delaney

S. W.

Tay

Chen

Diener

E. D.

Vanderweele

T. J.

(2021). Life satisfaction and subsequent physical, behavioral, and psychosocial health in older adults. The Milbank Quarterly, 99(1), 209–239. https://doi.org/10.1111/1468-0009.12497

34.

Kim

E. S.

Nakamura

J. S.

Strecher

V. J.

Cole

S. W.

(2023). Reduced epigenetic age in older adults with high sense of purpose in life. The Journals of Gerontology A: Biological Sciences and Medical Sciences, 78(7), 1092–1099. https://doi.org/10.1093/gerona/glad092

35.

Kim

E. S.

Wilkinson

Case

B. W.

Cowden

R. G.

Okuzono

S. S.

VanderWeele

T. J.

(2024). Connected communities: Perceived neighborhood social cohesion during adolescence and subsequent health and well-being in young adulthood-An outcome-wide longitudinal approach. Journal of Community Psychology, 52(6), 774–791.

36.

Kim

E. S.

Wilkinson

Okuzono

S. S.

Chen

Shiba

Cowden

R. G.

VanderWeele

T. J.

(2024). Positive affect during adolescence and health and well-being in adulthood: An outcome-wide longitudinal approach. PLoS Medicine, 21(4), Article e1004365. https://doi.org/10.1371/journal.pmed.1004365

37.

Lal

Lockhart

(2024). How much should we trust instrumental variable estimates in political science? Practical advice based on 67 replicated studies. Political Analysis, 32(4), 521–540. https://doi.org/10.1017/pan.2024.2

38.

Yoshida

Kaufman

J. S.

Mathur

M. B.

(2023). A brief primer on conducting regression-based causal mediation analysis. Psychological Trauma: Theory, Research, Practice and Policy, 15(6), 930–938.

39.

Linden

Mathur

M. B.

VanderWeele

T. J.

(2020). Conducting sensitivity analysis for unmeasured confounding in observational studies using E-values: The evalue package. The Stata Journal: Promoting Communications on Statistics and Stata, 20(1), 162–175. https://doi.org/10.1177/1536867x20909696

40.

Mathur

M. B.

Ding

Riddell

C. A.

VanderWeele

T. J.

(2018). Web site and R package for computing E-values. Epidemiology, 29(5), e45–e47. https://doi.org/10.1097/EDE.0000000000000864

41.

Mathur

M. B.

VanderWeele

T. J.

(2022). Methods to address confounding and other biases in meta-analyses: Review and recommendations. Annual Review of Public Health, 43, 19–35. https://doi.org/10.1146/annurev-publhealth-051920-114020

42.

Matthay

E. C.

Glymour

M. M.

(2020). A graphical catalog of threats to validity: Linking social science with epidemiology. Epidemiology, 31(3), 376–384. https://doi.org/10.1097/EDE.0000000000001161

43.

Mulder

J. D.

Luijken

Penning

Vries

B. B.

Hamaker

E. L.

(2024). Causal effects of time-varying exposures: A comparison of structural equation modeling and marginal structural models in cross-lagged panel research. Structural Equation Modeling: A Multidisciplinary Journal, 31(4), 575–591. https://doi.org/10.1080/10705511.2024.2316586

44.

Nakamura

J. S.

Wilkinson

Nelson

M. A.

Suzuki

VanderWeele

T. J.

(2024). Volunteering in young adulthood: Complex associations with later health and well-being outcomes. American Journal of Health Promotion, 39(1), 39–51. https://doi.org/10.1177/08901171241273424

45.

Pelham

W. E.

West

S. G.

Lemery-Chalfant

Goodman

S. H.

Wilson

M. N.

Dishion

T. J.

Shaw

D. S.

(2021). Depression in mothers and the externalizing and internalizing behavior of children: An attempt to go beyond association. Journal of Abnormal Psychology, 130(1), 60–77. https://doi.org/10.1037/abn0000640

46.

R Core Team. (2024). R: A language and environment for statistical computing [Computer software]. R Foundation for Statistical Computing. https://www.R-project.org/

47.

Rohrer

J. M.

(2018). Thinking clearly about correlations and causation: Graphical causal models for observational data. Advances in Methods and Practices in Psychological Science, 1(1), 27–42. https://doi.org/10.1177/2515245917745629

48.

Rohrer

J. M.

Lucas

R. E.

(2020). Causal effects of well-being on health: It’s complicated. PsyArXiv. https://doi.org/10.31234/osf.io/wgbe4

49.

Rohrer

J. M.

Murayama

(2023). These are not the effects you are looking for: Causality and the within-/between-persons distinction in longitudinal data analysis. Advances in Methods and Practices in Psychological Science, 6(1). https://doi.org/10.1177/25152459221140842

50.

Rosenbaum

P. R.

Rubin

D. B.

(1983). Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. Journal of the Royal Statistical Society B: Statistical Methodology, 45(2), 212–218. https://doi.org/10.1111/j.2517-6161.1983.tb01242.x

51.

Rubin

D. B.

(2005). Causal inference using potential outcomes. Journal of the American Statistical Association, 100(469), 322–331. https://doi.org/10.1198/016214504000001880

52.

Schlesselman

J. J.

(1978). Assessing effects of confounding variables. American Journal of Epidemiology, 108(1), 3–8.

53.

Thoemmes

Ong

A. D.

(2016). A primer on inverse probability of treatment weighting and marginal structural models. Emerging Adulthood, 4(1), 40–59.

54.

Trinquart

Erlinger

A. L.

Petersen

J. M.

Fox

Galea

(2019). Applying the E value to assess the robustness of epidemiologic fields of inquiry to unmeasured confounding. American Journal of Epidemiology, 188(6), 1174–1180. https://doi.org/10.1093/aje/kwz063

55.

Usami

(2023). Within-person variability score-based causal inference: A two-step estimation for joint effects of time-varying treatments. Psychometrika, 88(4), 1466–1494. https://doi.org/10.1007/s11336-022-09879-1

56.

Usami

Murayama

Hamaker

E. L.

(2019). A unified framework of longitudinal models to examine reciprocal relations. Psychological Methods, 24(5), 637–657. https://doi.org/10.1037/met0000210

57.

VanderWeele

T. J.

(2017). On a square-root transformation of the odds ratio for a common outcome. Epidemiology, 28(6), e58–e60. https://doi.org/10.1097/EDE.0000000000000733

58.

VanderWeele

T. J.

Ding

(2017). Sensitivity analysis in observational research: Introducing the E-value. Annals of Internal Medicine, 167(4), 268–274. https://doi.org/10.7326/M16-2607

59.

VanderWeele

T. J.

Ding

Mathur

(2019). Technical considerations in the use of the E-value. Journal of Causal Inference, 7(2). https://doi.org/10.1515/jci-2018-0007

60.

VanderWeele

T. J.

Mathur

M. B.

(2020). Commentary: Developing best-practice guidelines for the reporting of E-values. International Journal of Epidemiology, 49(5), 1495–1497. https://doi.org/10.1093/ije/dyaa094

61.

VanderWeele

T. J.

Mathur

M. B.

Chen

(2020). Outcome-wide longitudinal designs for causal inference: A new template for empirical studies. Statistical Science, 25(3), 437–466.

62.

Zhong

Q.-Y.

Gelaye

VanderWeele

T. J.

Sanchez

S. E.

Williams

M. A.

(2018). Causal model of the association of social support with antepartum depression: A marginal structural modeling approach. American Journal of Epidemiology, 187(9), 1871–1879.

Navigating Unmeasured Confounding in Nonexperimental Psychological Research: A Practical Guide to Computing and Interpreting E-Value

Abstract

Keywords

Technical Aspects of the E-Value

Study Pipeline for Reporting E-Values of a Given Study

Definition and Implementation

RR

OR

Differences in continuous outcomes

Linear-regression coefficient

Transparency

Introducing the Data

Analytic Plan

Computing an E-value From a Logistic-Regression Model for a Continuous Exposure

Computing an E-value From a Linear-Regression Model for a Continuous Exposure

Computing an E-Value From a Logistic-Regression Model for a Binary Exposure

Discussion

The importance of contextualizing E-values and best practices

Applicability of the E-value in more complicated scenarios

Limitations of E-value

Conclusion

Footnotes

Acknowledgements

Transparency

ORCID iDs

Notes

References