Causal mediation and sensitivity analysis for mixed-scale data

Abstract

The goal of causal mediation analysis, often described within the potential outcomes framework, is to decompose the effect of an exposure on an outcome of interest along different causal pathways. Using the assumption of sequential ignorability to attain non-parametric identification, Imai et al. (2010) proposed a flexible approach to measuring mediation effects, focusing on parametric and semiparametric normal/Bernoulli models for the outcome and mediator. Less attention has been paid to the case where the outcome and/or mediator model are mixed-scale, ordinal, or otherwise fall outside the normal/Bernoulli setting. We develop a simple, but flexible, parametric modeling framework to accommodate the common situation where the responses are mixed continuous and binary, and, apply it to a zero-one inflated beta model for the outcome and mediator. Applying our proposed methods to the publicly-available JOBS II dataset, we (i) argue for the need for non-normal models, (ii) show how to estimate both average and quantile mediation effects for boundary-censored data, and (iii) show how to conduct a meaningful sensitivity analysis by introducing unidentified, scientifically meaningful, sensitivity parameters.

Keywords

Bayesian methods causal inference identification ignorability zero inflated sata

1. Introduction

Mediation analysis is conducted across many scientific fields to understand the underlying mechanisms behind cause and effect relationships; examples include epidemiology, economics, and social science. Causal mediation analysis, often couched in the potential outcomes framework,^1,2 decomposes the effect of an exposure on the outcome along different causal pathways. A schematic depiction of a standard single-mediator model is given in Figure 1. In this diagram, $A$ denotes the exposure for the observational unit, $Y$ denotes the outcome, and $M$ denotes a mediator which may be on the causal pathway from the exposure to the outcome. When the mediator is accounted for in the relationship between $A$ and $Y$ , we measure a direct effect $(c^{'})$ , while when the mediator is ignored we measure the total effect $(c)$ . The indirect effect of the exposure through its effect on the mediator uses pathways $a$ and $b$ to affect the outcome.

Figure 1.

Schematic depiction of a causal structure with a single variable $M$ mediating the effect of the treatment $A$ on the outcome $Y$ . Top: the causal structure ignoring the existence of the mediator. Bottom: the causal structure with the mediator included.

While most works on mediation analysis have focused on the case where the mediator and outcome are continuous/normal or Bernoulli distributed, in our experience, it is common that one (or both) of the mediator or outcome will have a mixed-scale support. In this article, we focus on the case where the mediator and outcome are mixed continuous and discrete random variables; in particular, we assume that they have a continuous distribution on $(0, 1)$ with mass at the boundary points $0$ and $1$ . We argue that, particularly when taking a parametric Bayesian approach to estimation, it is important to adequately model the data, both for the purpose of reducing bias and to adequately account for uncertainty in effect estimation. To meet this challenge, we develop a general framework for performing causal mediation analysis with mixed-scale data. In principle, this framework can be used regardless of the model for the observed data, and we use a zero-one inflated beta regression model to illustrate.

For the sake of reproducibility, we focus on the JOBS II study of Vinokur et al.,³ for which a subset of data is available in the mediation package in R. A description of this dataset is given in Section 2.1. Imai et al.¹ present several analyses of this dataset, essentially operating under the assumption that the mediator (a measure of self-efficacy in finding a job) and the outcome (a measure of depression) are normally distributed. As shown in Figure 2, however, it is apparent that neither the outcome nor the mediator is well-described by a normal distribution; both exhibit skewness and there is a substantial mass at the boundary values of $1$ for depression and $5$ for self-efficacy.

Figure 2.

Empirical distribution of (left) measured depression level (right) measured job-search self-efficacy at the end of study.

An additional challenge with mixed-scale models is assessing the sensitivity of inferences to untestable assumptions. As with most estimands in causal inference, it is well-known that the causal mediation effects are not identified on the basis of the observed data distribution alone, and can only be consistently estimated under additional (unfalsifiable) assumptions. A useful benchmark assumption is sequential ignorability (SI, Imai et al.¹), which essentially rules out the existence of unmeasured confounders. We found performing sensitivity analysis in the mixed-scale setting to be challenging, as to the best of our knowledge none of the existing proposals for sensitivity analysis can be applied directly. For example, the approaches proposed by Imai et al.¹ are justified by a linear structural equation model (LSEM, Baron and Kenny⁴), which is not applicable in this setting. Similarly, the limited work with non-continuous or categorical data^5,6 also does not apply directly to the mixed-scale setting. We develop a pair of widely-applicable sensitivity analysis strategies that accomplish the two goals of (i) assessing the extent to which our conclusions are driven by unmeasured confounding and (ii) neither imposing any additional restrictions on, nor adding information about, the distribution of the observed data. Our second goal is part of a recent trend in causal inference and missing data research of proposing sensitivity analyses that clearly and unambiguously separate the (parametric) assumptions used to model the observed data from the assumptions used to identify the causal effects of interest.^8,9,7,10

1.1. Review of existing methods

The traditional approach to mediation analysis uses structural equation modeling (SEM) to quantify mediation; linear structural equation models (LSEMs) are particularly popular.¹¹ However, LSEMs do not generalize easily to non-linear systems.¹² Additionally, Imai et al.¹ make the point that the identification assumptions used in LSEMs are inexorably tied to the choice of parametric model, stating: “[because] the key identification assumption is stated in the context of a particular model, [it is] difficult to separate the limitations of research design from those of the specified statistical model.” Motivated by this argument, Imai et al.¹ proposed a more general approach to mediation analysis using a potential outcomes framework, introduced the nonparametric assumption of sequential ignorability to identify the effects, and showed that the single mediator LSEM is a special case of the potential outcomes framework that is valid as long as the linearity assumption holds.

There is a rich literature addressing the causal mediation problem from the semiparametric perspective. An emphasis in this literature is the development of methods that are both statistically efficient and multiply robust in the sense that they produce consistent estimates even if one of several models required for estimation are misspecified.^13,14 An advantage of these approaches is that one can easily use them with modern machine learning methods via cross-fitting.¹⁵ To the best of our knowledge, however, these methods have not been developed in the context of mixed-scale data. Bayesian nonparametric and semiparametric models based on infinite mixture models have also been proposed,¹⁶ although not for mixed-scale data.

A variety of models have also extended beyond continuous/binary models for the mediator and outcome. These include models for the zero-inflated count, survival, and ordinal data, as well as quantile regression models.^6,17,18,1

1.2. Contributions

We make the following contributions in this article. First, we describe how to implement the $g$ -formula for computing both mean and quantile causal effects for generic mixed-scale models under the sequential ignorability assumption. Second, we illustrate these concepts using a zero-one inflated beta regression model and argue for its appropriateness on the benchmark JOBS II dataset. Third, we show how to conduct a principled sensitivity analysis to check the sensitivity of our conclusions to the untestable sequential ignorability assumption; the sensitivity parameters we introduce are designed to be unidentified, so that varying them does not affect the distribution of the observed data. We show how to introduce sensitivity parameters that are shifts of the mean on either a linear or logit scale; both of these approaches are very easy to incorporate into our models by post-processing the model fit. These mean-shift assumptions are weaker than the usual sequential ignorability assumption in that they only identify the mean of the potential outcomes rather than their whole distribution, but include the results of sequential ignorability as a special case.

We also present a flexible zero-one inflated beta (ZOIB) model¹⁹ for mediation analysis with boundary-censored data and show how to perform inference with this model. The ZOIB models we use here are conceptually related to other zero-inflated models, such as the zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB), zero-inflated gamma (ZIG), and zero-inflated log-normal (ZILN). See Zuur et al.²⁰ for a review of zero-inflated models for count data and Liu et al.²¹ for semicontinuous zero-inflated models. Like the ZIG and ZILN (and unlike the ZIP and ZINB) models, the ZOIB model is semicontinuous, as the boundary points 0 and 1 occur with probability 0 in a standard beta regression model. These models are cast as a covariate-dependent mixture model, with the ZOIB being a mixture of a beta distribution and a Bernoulli distribution (where the parameters of the beta and Bernoulli distributions themselves are also covariate dependent).

A Bayesian implementation of this model is given at www.github.com/theodds/ZOIBMediation. Our model is implemented in Stan, with both the average and quantile causal mediation effects computed using a Monte Carlo implementation of the $g$ -formula. The Bayesian framework provides a straightforward approach to incorporating uncertainty in the sensitivity parameters through the use of informative priors, which can be elicited from subject-matter experts.²² In principle, however, one could also apply Frequentist inference using the nonparametric bootstrap.²³

1.3. Outline

In Section 2 we review the potential outcomes framework for mediation analysis and argue for the use of mixed-scale models on the JOBS II dataset. In Sections 3 and 4 we present our framework for causal mediation analysis, show how to compute the mean and quantile mediation effects, and develop our zero-one inflated beta regression model. In Section 5, we present two alternative assumptions to sequential ignorability which allow for a sensitivity analysis, and show that these assumptions identify the average causal mediation effects. In Section 6, we illustrate our methodology on synthetic data and real data. We conclude in Section 7 with a discussion. Proofs are in the Appendix. Some additional algorithms, additional sensitivity analyses, and Markov chain Monte Carlo diagnostics are given in the Supplemental Material.

2. Notation and definitions of causal effects

2.1. JOBS II dataset

For the sake of reproducibility, we motivate concepts and illustrate our methods on a subset of the JOBS II dataset³ which is available publicly in the mediation package in R. The JOBS II data was used to evaluate the potential benefits of participation in a job-search skills seminar in southeastern Michigan. Subjects were recently unemployed adults during 1991–1993. Participants were pre-screened and classified according to their risk of depression and anxiety. High-risk participants, along with a random sample of low-risk participants, were invited to participate in the study. Prior to the seminar, questionnaires were sent out to the respondents. The questionnaires covered a range of topics about the respondent, including their loss of employment, the quality of work-life at their previous job, their health problems, and their history of substance abuse. The primary baseline covariates, which we denote as $X_{i}$ , include measures of depression at baseline, education, income, race, marital status, age, sex, previous occupation, and level of economic hardship. The participants were randomly assigned to treatment and control groups. The treatment group, $A_{i} = 1$ was assigned to attend a seminar that taught participants job search skills and coping strategies for dealing with setbacks in the job hunt. The control group, $A_{i} = 0$ , received a booklet of job search tips. Prior to measuring the outcome, but post-intervention, researchers measured an underlying mechanism in the relationship between the intervention and outcome. This mediator was a continuous measure of job search self-efficacy, $M_{i}$ . In this study, two outcomes were measured: a continuous measure of depression using the Hopkins Symptom Checklist²⁴ and a binary variable for employment at the follow-up time. We will focus on the continuous measure of depression, $Y_{i}$ .

Even for a benchmark dataset like the JOBS II data, which has been analyzed using LSEMs,¹ there is overwhelming evidence that neither the outcome nor the mediator are normally distributed. The observed values of the outcome and mediator, which are supported on $[1, 5]$ and highly skewed, are displayed in Figure 2, and it is apparent from the figure that the assumption of (say) a normally distributed error is untenable.

2.2. The potential outcomes framework

Using the potential outcomes framework, the causal effect of the job training program can be defined as the difference between two potential outcomes. One potential outcome is realized if the subject participates in the training program and the other is realized if the subject does not.

Associated with the outcome and the mediator are potential outcomes that would have been observed had the treatment assignment has been different. A potential outcome is defined as the outcome which would have been observed under an exposure a participant did not actually receive. We let $M_{i} (a)$ denote the value of the mediator had the treatment has been assigned to the value $a$ ; in terms of the JOBS II study, this is the self-efficacy which would have been realized had the treatment for individual $i$ had been fixed at either receiving the treatment ( $a = 1$ ) or not ( $a = 0$ ). Similarly, we let $Y_{i} (a, m)$ denote the value of the outcome that would have been realized had the treatment for individual $i$ been fixed at $a$ and the mediator fixed at $m$ ; in terms of the JOBS II study, this is the depression level which would have been observed at a given level of self-efficacy under the two treatments.

We link the potential outcomes to the observed data through the consistency assumption that $M_{i} = M_{i} (A_{i})$ and $Y_{i} = Y_{i} {A_{i}, M_{i} (A_{i})}$ . The primary challenge in estimation of the mediation effects, that is, the effects of changes of $M_{i}$ and $A_{i}$ on $Y_{i}$ , lies in the fact that we cannot observe the counterfactual outcomes $Y_{i} {A_{i}, M_{i} (1 - A_{i})}$ , as this would require observing what would have happened under both $A_{i} = 1$ and $A_{i} = 0$ .

Throughout, we assume that the distribution of the potential outcomes, the mediator, and the exposure is distributed according to a distribution $f_{θ}$ in some parametric family ${f_{θ} : θ \in Θ}$ . Abusing notation, we use $f_{θ}$ as the conditional and marginal density/mass functions as needed, with its meaning inferred from context; for example, $f_{θ} (A_{i} = a ∣ X_{i} = x) = f_{θ} (a ∣ x)$ denotes the probability of $A_{i} = a$ given $X_{i} = x$ .

2.2.1. Sequential ignorability

As a starting point for identifying the causal effects of interest we will use the sequential ignorability assumption of Imai et al.¹ For subject $i$ , let $X$ be the support of the distribution of $X_{i}$ and let $M$ be the support of $M_{i}$ . The SI assumption imposes the following restrictions on the model parameterized by an unknown $θ$ :

\begin{aligned} {Y_{i} (a^{'}, m), M_{i} (a)} ⊥ A_{i} & ∣ X_{i} = x, θ and \end{aligned}

(SI1)

\begin{aligned} Y_{i} (a^{'}, m) ⊥ M_{i} (a) & ∣ A_{i} = a, X_{i} = x, θ \end{aligned}

(SI2)

for all

a, a^{'} = 0, 1

and

x \in X

, where the expression

[U ⊥ V ∣ W = w]

means that

U

is conditionally independent of

V

given

W = w

. Additionally, we require the overlap condition (SI3) that

\underset{θ}{Pr} (A_{i} = a ∣ X_{i} = x) > 0

and

f_{θ} {M_{i} (a) = m ∣ A_{i} = a, X_{i} = x} > 0

, for all

m \in M

. In words, SI1 states that, given the observed confounders, the treatment assignment is independent of the potential outcomes

Y_{i} (a^{'}, m)

and

M_{i} (a)

; this will hold whenever the treatment assignment is randomized. On the other hand, SI2 states that the assignment of the mediator does not affect the outcome, given the observed treatment and pre-treatment covariates. Of the two assumptions, SI2 is generally the more problematic; for example, in the JOBS II study, the job-search self-efficacy is not randomized by study design, so that SI2 makes the unfalsifiable assertion that all common causes of

M_{i} (a)

and

Y_{i} (a, m)

have been measured.

2.2.2. Causal mediation effects

We define the following causal mediation effects^26,25 using the JOBS II study for context. The natural indirect effect (NIE), also called the causal mediation effect, is defined for $a = 0, 1$ as

δ_{i} (a) = Y_{i} {a, M_{i} (1)} - Y_{i} {a, M_{i} (0)}

For example, in the JOBS II study,

δ_{i} (0)

is the effect of increasing/decreasing a subject’s self-efficacy from their baseline level to the level we would have observed had they attended the seminar, holding fixed that the subject did not attend the seminar. The natural direct effect (NDE) is defined for

a = 0, 1

ζ_{i} (a) = Y_{i} {1, M_{i} (a)} - Y_{i} {0, M_{i} (a)}

For example, in the JOBS II study,

ζ_{i} (1)

is the difference between the two potential depression levels for subject

i

according to whether they participated in the job training seminar or not, under the assumption that their job search self-efficacy is held constant at the level which would have been observed if they had attended theseminar.

Because we cannot observe $Y_{i} {a, M_{i} (a^{'})}$ when $a \neq a^{'}$ , we cannot directly observe either $δ_{i} (a)$ or $ζ_{i} (a)$ . Nevertheless, under sequential ignorability we can estimate the average mediation effects

\begin{aligned} \begin{aligned} δ (a) & = E_{θ} [Y_{i} {a, M_{i} (1)} - Y_{i} {a, M_{i} (0)}] and \\ ζ (a) & = E_{θ} [Y_{i} {1, M_{i} (a)} - Y_{i} {0, M_{i} (a)}] \end{aligned} \end{aligned}

(1)

The mediation effects

δ (a)

and

ζ (a)

decompose the average total effect

τ = E_{θ} [Y_{i} {1, M_{i} (1)} - Y_{i} {0, M_{i} (0)}]

in the sense that

δ (1) + ζ (0) = δ (0) + ζ (1) = τ

. The total effect is analogous to the usual average causal treatment effect (ATE) of the treatment assignment.

Under sequential ignorability, Imai et al.¹ showed that the distribution of the potential outcomes $Y_{i} {a, M_{i} (a^{'})}$ for any $a, a^{'}$ is nonparametrically identified as

\begin{aligned} \begin{aligned} f_{θ} (Y_{i} {a, M_{i} (a^{'})} = y ∣ X_{i} = x) \\ = \int_{M} f (Y_{i} = y ∣ M_{i} = m, A_{i} = a, X_{i} = x) f (M_{i} = m ∣ A_{i} = a^{'}, X_{i} = x) d m \end{aligned} \end{aligned}

(2)

for all

x \in X

. The marginal distribution of

Y_{i} {a, M_{i} (a^{'})}

is then

\int f_{θ} (Y_{i} {a, M_{i} (a^{'})} = y ∣ X_{i} = x) F_{X} (d x)

so that it (along with the average direct and indirect effects) is also identified. While there usually will not be a simple analytical expression for (2), it is nevertheless easy to approximate (2) using Monte Carlo integration; this approach was popularized by Robins²⁷ as a tool to implement the

g

-formula in causal inference.

While the average causal mediation effects are the most commonly studied, one may also be interested in causal effects on other aspects of the distribution of the outcome. Let $Q_{q} (Z)$ denote the $q^{th}$ quantile of a random variable $Z$ . Then the quantile mediation effects at the quantile $q$ are

\begin{aligned} \begin{aligned} δ_{q} (a) & = Q_{q} [Y_{i} {a, M_{i} (1)}] - Q_{q} [Y_{i} {a, M_{i} (0)}] and \\ ζ_{q} (a) & = Q_{q} [Y_{i} {1, M_{i} (a)}] - Q_{q} [Y_{i} {0, M_{i} (a)}] . \end{aligned} \end{aligned}

(3)

Because (2) fully identifies the distribution of

Y_{i} {a, M_{i} (a^{'})}

, the quantile mediation effects are also identified under SI. We note that, rather than the difference in the quantiles, one might be tempted to define

δ_{q} (a) = Q_{α} [Y_{i} {a, M_{i} (1)} - Y_{i} {a, M_{i} (0)}]

(and similarly for

ζ_{q} (a)

), which represents the causal mediation effect as a quantile of the differences. There are two issues which arise from doing this. First, SI is not sufficient to identify

δ_{q} (a)

, as SI does not identify the joint distribution of the potential outcomes. Second, we lose the decomposition property

τ_{q} = δ_{q} (a) + ζ_{q} (a^{'})

so that the mediation effects no longer serve as a decomposition of the total effect. The terms

δ_{q} (a)

and

ζ_{q} (a)

as defined in (3) do have a compelling causal interpretation: rather than representing a causal effect on the individual level, they capture the causal effect on the quantiles from shifting the entire population from untreated to treated.

3. Observed data models for zero-one inflated data

Estimating the causal mediation effects under SI requires only that we estimate the distribution of the observed data. Without loss of generality, we assume that $Y_{i}$ and $M_{i}$ can be rescaled to lie in the interval $[0, 1]$ ; for the JOBS II dataset, this can be done with the transformations $Y_{i} \leftarrow (Y_{i} - 1) / 4$ and $M_{i} \leftarrow (M_{i} - 1) / 4$ , as the measures of depression and self-efficacy were measured on a scale from 1 to 5.

A flexible distribution for zero-one inflated data on $[0, 1]$ is the zero-one inflated beta (ZOIB) distribution, which we denote by $ZOIB (α, γ, μ, ϕ)$ . If $Z \sim ZOIB (α, γ, μ, ϕ)$ then $Z$ is a mixed discrete-continuous random variable such that

\begin{aligned} \begin{aligned} Pr (Z = 0) & = α \\ Pr (Z = 1 ∣ Z \neq 0) & = γ and \\ [Z ∣ Z \notin {0, 1}] & \sim Beta {μ ϕ, (1 - μ) ϕ} \end{aligned} \end{aligned}

(4)

The parameterization of the beta distribution in (4) is chosen so that

μ

is the mean of the beta distribution, that is,

E_{θ} (Z ∣ Z \notin {0, 1}) = μ

. The mean of the

ZOIB (α, γ, μ, ϕ)

distribution is given by

\begin{aligned} E_{θ} (Z) = (1 - α) γ + (1 - α) (1 - γ) μ \end{aligned}

(5)

Figure 3 shows that the beta distribution is effective at modeling the shape of the data for the continuous part of the JOBS II data, while the parameters

α

and

γ

allow for an increased chance of observing the boundary values

Z_{i} = 1

and

Z_{i} = 0

Figure 3.

Kernel density estimate (gray) and fitted beta distribution (solid black) of the distribution of $Y_{i}$ and $M_{i}$ conditional on $Y_{i} \notin {0, 1}$ and $M_{i} \notin {0, 1}$ .

Our ZOIB model assumes that $[Y_{i} ∣ M_{i} = m, A_{i} = a, X_{i} = x] \sim ZOIB (α_{i}^{Y}, γ_{i}^{Y}, μ_{i}^{Y}, ϕ_{i}^{Y})$ and $[M_{i} ∣ A_{i} = a, X_{i} = x] \sim ZOIB (α_{i}^{M}, γ_{i}^{M}, μ_{i}^{M}, ϕ_{i}^{M})$ . We model the parameters of these ZOIB distributions with generalized linear models of the form

\begin{aligned} \begin{aligned} logit (α_{i}^{Y}) & = (X_{i}, M_{i})^{⊤} β_{α}^{Y} (A_{i}), logit (γ_{i}^{Y}) & = (X_{i}, M_{i})^{⊤} β_{γ}^{Y} (A_{i}) \\ logit (μ_{i}^{Y}) & = (X_{i}, M_{i})^{⊤} β_{μ}^{Y} (A_{i}), \log (ϕ_{i}^{Y}) & = (X_{i}, M_{i})^{⊤} β_{ϕ}^{Y} (A_{i}) \end{aligned} \end{aligned}

(6)

The dependence of the

β

’s on

A_{i}

is included to allow for heterogeneous effects of the covariates and mediator; the homogeneous model is included as a special case where only the intercept varies with

A_{i}

. Similar models are specified for

(α_{i}^{M}, γ_{i}^{M}, μ_{i}^{M}, ϕ_{i}^{M})

. As a default, all of the regression coefficients are given flat

Normal (0, τ^{2})

priors, where

τ^{2}

is taken to be large after centering and scaling the covariates

X_{i}

(except for the intercept) to have mean

0

and standard deviation

1

To estimate the mediation effects we also require a model for the distribution $F_{X}$ of the covariates. As a default, we assume that $F_{X}$ is discretely supported on the observed values of the $X_{i}$ ’s, that is, $\underset{θ}{Pr} (X_{i} = x_{j}) = ω_{j}$ where $(x_{1}, \dots, x_{N})$ are the observed values of the covariates. We then specify an improper Bayesian bootstrap²⁸ prior for $ω = (ω_{1}, \dots, ω_{N})$ , that is, $π (ω) = \prod_{i} ω_{i}^{- 1}$ . After observing the data, the posterior distribution of $ω$ is $Dirichlet (1, \dots, 1)$ , and can be sampled exactly. Specifying a Bayesian bootstrap prior for $F_{X}$ avoids the notoriously difficult task of estimating $F_{X}$ via density estimation, and has been shown in other settings to result in improved theoretical properties of Bayesian causal inference methods.²⁹

4. Posterior computation and inference

We divide inference into two steps:

Draw a set of approximate samples $θ_{1}, \dots, θ_{B}$ from the posterior distribution of $θ$ .

For each $θ_{b}$ , compute $δ (a), ζ (a)$ , and $τ$ , yielding approximate samples from the posterior distribution for these mediation effects.

For the first step, we use the probabilistic programming language Stan, which implements an adaptive version of Hamiltonian Monte Carlo (HMC) to sample

θ_{b}

’s.³⁰ The sole exception to this sampling scheme is that we sample

ω \sim Dirichlet (1, \dots, 1)

directly from the posterior distribution.

4.1. Average mediation effects

Due to the nonlinearities of the ZOIB model, the mediation effects $δ (a), ζ (a),$ and $τ$ are not available in closed form and must be approximated. To compute the mediation effects, we use a Monte Carlo implementation of the $g$ -formula. The idea is to note that, because (2) identifies the distribution of $Y_{i} {a, M_{i}^{⋆} (a^{'})}$ for all $a, a^{'}$ , we can simulate $K$ new realizations $Y_{i}^{⋆} {a, M_{i}^{⋆} (a^{'})}$ for $i = 1, \dots, N$ from the model, in which case

\begin{aligned} \begin{aligned} δ (a) & \approx K^{- 1} \sum_{i, k} ω_{i} [Y_{i k}^{⋆} {a, M_{i k}^{⋆} (1)} - Y_{i k}^{⋆} {a, M_{i k}^{⋆} (0)}] and \\ ζ (a) & \approx K^{- 1} \sum_{i, k} ω_{i} [Y_{i}^{⋆} {1, M_{i k}^{⋆} (a)} - Y_{i k}^{⋆} {0, M_{i k}^{⋆} (a)}] \end{aligned} \end{aligned}

(7)

are unbiased estimators of

δ (a)

and

ζ (a)

. The approximations in (7) are less efficient than using the true values

δ (a)

and

ζ (a)

because they contain Monte Carlo error, but are conservative in the sense that using them results in valid inference. The approximations can be improved by using various tricks to eliminate the Monte Carlo error. One improvement is to notice that we can decrease the variance of (7) by replacing

Y_{i k}^{⋆} {a, M_{i k}^{⋆} (a^{'})}

with the conditional expectation

E_{θ} [Y_{i k}^{⋆} {a, M_{i k}^{⋆} (a^{'})} ∣ M_{i k}^{⋆} (a^{'}), X_{i}]

; for the ZOIB model, this is given by

(1 - α_{i k}^{Y}) γ_{i k}^{Y} + (1 - α_{i k}^{Y}) (1 - γ_{i k}^{Y}) μ_{i k}^{Y}

where

α_{i k}^{Y}, γ_{i k}^{Y}

and

μ_{i k}^{Y}

are given by (6) with

A_{i}

evaluated at

a

and

M_{i}

evaluated at

M_{i k}^{⋆} (a^{'})

The Monte Carlo integration strategy is summarized in Algorithm 1, and it applies to any model. In Algorithm 2, we give the special case of our ZOIB regression models. In these algorithms, $F_{M}^{-} (u ∣ A_{i} = a, X_{i} = x) = inf {m : F_{M} (m ∣ A_{i} = a, X_{i} = x) \geq u}$ denotes the generalized inverse of the cumulative distribution function of $[M_{i} ∣ A_{i} = a, X_{i} = x]$ .

4.2. Quantile mediation effects

Equation (2) can also be used to form a Monte Carlo estimate of the quantile mediation effects, although the implementation is somewhat more subtle. If we have a sample of $Y_{i k}^{⋆} {a, M_{i}^{⋆} (a^{'})}$ ’s from the marginal density $f_{θ} (Y_{i} {a, M_{i} (a^{'})} = y)$ then we can approximate its $q^{th}$ quantile as $Q_{q} (F_{a a^{'}})$ , where $F_{a a^{'}}$ is the empirical distribution of the $Y_{i k}^{⋆} {a, M_{i}^{⋆} (a^{'})}$ ’s and $Q_{q} (F)$ is the $q^{th}$ quantile of $F$ . We can then calculate (3) using the approximation

\begin{aligned} δ_{q} (a) \approx Q_{q} (F_{a 1}) - Q_{q} (F_{a 0}) and ζ_{q} (a) \approx Q_{q} (F_{1 a}) - Q_{q} (F_{0 a}) \end{aligned}

(8)

Note that for this to be valid we must sample the covariates

X_{i k}^{⋆}

’s used to generate

M_{i k}^{⋆} (a^{'})

and

Y_{i k}^{⋆} {a, M_{i k}^{⋆} (a^{'})}

according to

ω_{i}

, rather than averaging over

ω

as in (7). This results in higher Monte Carlo error in (8) than in (7).

Reducing Monte Carlo error in (8) can also be done, although it requires different strategies; for example, it is no longer valid to replace $Y_{i k}^{⋆} {a, M_{i k}^{⋆} (a^{'})}$ with its mean. One may take $K$ very large, but this can substantially increase computation time. Another trick is to construct the joint distribution of ${Y_{i k}^{⋆} {a, M_{i k}^{⋆} (a^{'})} : a, a^{'} \in {0, 1}}$ in a way which makes the potential outcomes highly correlated. Interestingly, because (3) depends only on the marginal distributions of the potential outcomes, it is invariant to our choice of joint distribution; hence, this does not actually imply any additional restrictions on the model. To ensure a strong dependence between the $Y_{i}^{⋆} {a, M_{i}^{⋆} (a^{'})}$ ’s, we simulate $M_{i}^{⋆} (a)$ and $Y_{i}^{⋆} (a, m)$ to be comonotone,³¹ that is, we simulate $U, V \sim Uniform (0, 1)$ and apply the probability integral transform to get $M_{i}^{⋆} (a) = F_{M}^{-} (U ∣ A_{i} = a, X_{i})$ and $Y_{i}^{⋆} {a, M_{i}^{⋆} (a^{'})} = F_{Y}^{-} (V ∣ M_{i} = M_{i}^{⋆} (a^{'}), A_{i} = a, X_{i})$ (note that the same $U$ and $V$ are used for different values of $a$ and $a^{'}$ ). This, combined with taking $K$ to be modestly large (say, $K = 10$ ) is sufficient to effectively eliminate the Monte Carlo error.

Our general algorithm for approximating the quantile mediation effects is given in Algorithm 3, with the extension to the specific setting of the ZOIB model being derived in the same way Algorithm 2 was derived from Algorithm 1.

4.3. Assessing the Monte Carlo error

Linero²³ introduced a method for computing and (in the case where the effects are approximately normal) correcting for the Monte Carlo error in the types of estimators we have proposed; code for implementing this is available at www.github.com/theodds/AGC. This approach requires $K > 1$ and, in all cases we have considered, the Monte Carlo error is negligible for $K = 2$ . Linero²³ also shows that naive estimators that are not designed to eliminate Monte Carlo error can be very inefficient unless $K$ is taken rather large (say, $K \geq 10$ ).

5. Sensitivity analysis

Because SI is an untestable assumption, it is essential to assess the extent to which the conclusions of an SI-based analysis are sensitive to the existence of unmeasured confounders, that is, SI2. Accordingly, we now present a framework for performing sensitivity analysis using the mixed-scale models we have developed. Without loss of generality, we assume that the data has been scaled so that both $Y_{i} (a, m)$ and $M_{i} (a)$ take values in $[0, 1]$ . As a guiding principle, we require that any sensitivity parameters be pure in the sense that varying them does not alter the fit of the model to the data. This allows us to independently assess (i) goodness-of-fit for the observed data model and (ii) the impact of SI2 failing.

5.1. Sensitivity on the logit scale

We propose an approach to sensitivity analysis that allows for dependence between $Y_{i} (a, m)$ and $M_{i} (a)$ even after accounting for $X_{i}$ and $A_{i}$ . We replace assumption SI2 with the following two assumptions:

SI2A
Conditional on $X_{i}$ , the potential outcomes $M_{i} (0)$ and $M_{i} (1)$ are jointly distributed according to a Gaussian copula with correlation $ρ \in [0, 1]$ . More precisely, we have $M_{i} (a) = F_{M}^{-} {Φ (Z_{i a}) ∣ A_{i} = a, X_{i} = x}$ where $F_{M}^{-} (u ∣ A_{i} = a, X_{i} = x)$ denotes the generalized inverse cdf of $M_{i}$ given $A_{i} = a$ and $X_{i} = x$ , and $(Z_{i 0}$ , $Z_{i 1})$ is jointly standard normal with correlation $ρ$ .
SI2B
Conditional on $X_{i}$ , $M_{i} (0)$ , and $M_{i} (1)$ , the mean of $Y_{i} (a, m)$ is given by
$E_{θ} {Y_{i} (a, m) ∣ M_{i} (a), M_{i} (a^{'}), X_{i}} = expit [logit r_{y} (m, a, x) + λ {M_{i} (a) - m}]$
where $r_{y} (m, a, x) = E_{θ} (Y_{i} ∣ M_{i} = m, A_{i} = a, X_{i} = x)$ .
SI2B has been chosen specifically so that it reproduces the inferences under SI2 when $λ = 0$ while leaving the sensitivity parameters $λ$ and $ρ$ unidentified so that they can be varied freely without changing the fit of the model to the data. The most closely related sensitivity analysis framework which we are aware of is the “hybrid” approach of Albert and Wang,⁵ although this approach differs importantly in that the hybrid approach replaces the term $λ {M_{i} (a) - m}$ with a term of the form $λ (a - a^{'})$ , which is similarly designed to drop out of the distribution of the observed data.

To motivate the choice of SI2B (in particular, the term $M_{i} (a) - m$ ), we note that a natural way to induce correlation between the potential mediators and outcomes is to add an additional linear term to the regression model; that is, if we (for the sake of simplicity) replace our ZOIB model with a logistic regression model for the mean, we could incorporate $M_{i} (a)$ into the linear predictor as $β_{0}^{Y} + X_{i}^{⊤} β_{X}^{Y} + a β_{A}^{Y} + m β_{M}^{Y} + λ M_{i} (a)$ , with the term $λ M_{i} (a)$ capturing any association due to unmeasured confounding between $Y_{i} (a, m)$ and $M_{i} (a)$ . The issue with using this expression directly is that $λ$ is confounded with $β_{M}^{Y}$ ; adding and subtracting $λ m$ , however, gives
$β_{0}^{Y} + X_{i}^{⊤} β_{X}^{Y} + a β_{A}^{Y} + m (β_{m}^{Y} + λ) + λ {M_{i} (a) - m}$
Because $M_{i} (A_{i}) - M_{i} (A_{i}) = 0$ , the term $λ {M_{i} (a) - m}$ disappears from the distribution of the observed data; consequently, only the term $β_{m}^{Y} + λ$ is identified. Our decision to write our assumption as in SI2B (with $λ m$ subtracted off explicitly) only has the effect of reparameterizing the above model in terms of the identified parameter $β_{m}^{Y} + λ$ . While this argument only holds exactly for the logistic regression mean model, the intuition is the same for generic models: subtracting $λ m$ allows us to parameterize the distribution of the observed data with identifiable parameters.

The following proposition establishes that $E_{θ} [Y_{i} {a, M_{i} (a^{'})}]$ is identified for all $a, a^{'}$ so that the average causal mediation effects are also identified. A proof is given in Appendix 7.
Proposition 1
Suppose that SI1, SI2A, SI2B, and SI3 hold. Then we have
$\begin{aligned} E_{θ} [Y_{i} {a, M_{i} (a^{'})}] & = ∭ expit {{\tilde{r}}_{y} (m^{'}, a, x) + λ (m - m^{'})} \\ \times Normal {(z_{0}, z_{1})^{⊤} ∣ (0, 0)^{⊤}, Σ} d z_{0} d z_{1} F_{X} (d x) \end{aligned}$
where ${\tilde{r}}_{y} (m, a, x) = logit r_{y} (m, a, x)$ , $Σ = (\begin{matrix} 1 & ρ \\ ρ & 1 \end{matrix})$ , and we define $m^{'} = F_{M}^{-} {Φ (z_{a^{'}}) ∣ A_{i} = a^{'}, X_{i} = x)$ and $m = F_{M}^{-} {Φ (z_{a}) ∣ A_{i} = a, X_{i} = x}$ in the integral.

As in Section 4, there is no analytical expression for $E_{θ} [Y_{i} {a, M_{i} (a^{'})}]$ , and hence we must resort to Monte Carlo integration. Fortunately, by noting that the approximation
$E_{θ} [Y_{i} {a, M_{i} (a^{'})}] \approx K^{- 1} \sum_{i, k} ω_{i} expit [{\tilde{r}}_{y} {M_{i}^{⋆} (a^{'}), a, X_{i}} + λ {M_{i}^{⋆} (a) - M_{i}^{⋆} (a^{'})}]$
is unbiased, it is straight-forward to modify Algorithm 1 to compute $E_{θ} [Y_{i} {a, M_{i} (a^{'})}]$ under this assumption as well. A procedure for computing the mediation effects under SI2A and SI2B is given in Algorithm S.1 of the Supplemental Material.

As there is no information in the data about the sensitivity parameters $ρ$ and $λ$ , it is essential that we are able to elicit plausible ranges for their values. To gain better intuition about the role of $λ$ , suppose that we had instead posited the logistic regression model $logit r_{y} (m, a, x) = β_{0}^{Y} + x^{⊤} β_{X}^{Y} + a β_{A}^{Y} + m β_{M}^{Y}$ . In this case, we can rewrite
$logit E_{θ} {Y_{i} (a, m) ∣ M_{i} (a), M_{i} (a^{'}), X_{i}} = β_{0}^{Y} + x^{⊤} β_{X}^{Y} + a β_{A}^{Y} + m (β_{M}^{Y} - λ) + M_{i} (a) λ$
In words, $λ = 0$ (which produces the same inferences as SI2) attributes the entirety of the association between $M_{i}$ and $Y_{i}$ to a causal effect $β_{M}^{Y}$ from shifting the value of $m$ in the potential outcome $Y_{i} (a, m)$ . When $λ \neq 0$ this association is instead parsed into an effect of size $(β_{M}^{Y} - λ)$ for shifting $m$ in $Y_{i} (a, m)$ and a residual association of size $λ$ between $Y_{i} (a, m)$ and $M_{i} (a)$ , above-and-beyond the causal effect of shifting $m$ due to unmeasured confounding. In the absence of subject-matter expertize about the likely values of $λ$ , one can use weak prior knowledge about the magnitude of the mediation effect to narrow down the range of plausible values. In the JOBS II study, we do this by constructing a pilot estimate of $r_{y} (m, a, x)$ of the form $logit {\hat{r}}_{y} (m, a, x) = {\hat{β}}_{0}^{Y} + x^{⊤} {\hat{β}}_{X}^{Y} + a {\hat{β}}_{A}^{Y} + m {\hat{β}}_{M}^{Y}$ . In most cases, we feel that it is reasonable to assume that the effect of unmeasured confounding will not dominate the causal effect associated to $m$ , so that $λ \in [- {\hat{β}}_{M}^{Y}, {\hat{β}}_{M}^{Y}]$ is a conservative collection of plausible values of $λ$ .

The parameter $ρ$ measures dependence in the mediator process $M_{i} (\cdot)$ . Since $| ρ | \leq 1$ and is typically believed to be positive, we recommend repeating the sensitivity analysis for a small number of $ρ$ ’s in $[0, 1]$ . For the JOBS II data, we consider $ρ \in {0, 0.5, 0.95}$ , with the results for $ρ \in {0, 0.5}$ given in the Supplemental Material.
Remark 1
The use of copulas in SI2A bears a passing resemblance to our use of comonotone random variables to reduce Monte Carlo error in Section 4. In this case, however, the choice of $ρ$ does impact the model, and so we can no longer reduce the Monte Carlo error by making $M_{i k}^{⋆} (a)$ and $M_{i k}^{⋆} (a^{'})$ comonotone.
Remark 2
In the Supplemental Material, we provide a scheme for performing sensitivity analysis on a linear, rather than logit, scale. The linear scale sensitivity analysis has the advantage of not requiring the specification of $ρ$ .
6. Illustrations

6.1. Application to JOBS II

We apply our methodology to the JOBS II dataset using SI as a benchmark. As potential confounders we include numeric variables measuring economic hardship (econ_hard), baseline depression (depress1), and age (age), as well as the categorical variables measuring (sex), race (nonwhite), income bracket (income), occupation (occp), marital status (marital), and education level (educ).

We posit generalized linear models for each component of the ZOIB models with a homogeneous effect for the treatment and mediator. That is, we use linear predictors of the form $X_{i}^{⊤} β^{M} + θ^{M} A_{i}$ for the mediator and linear predictors of the form $X_{i}^{⊤} β^{Y} + θ^{Y} A_{i} + η M_{i}$ for the outcome (with separate coefficients for each model component). Note that the homogeneity assumption implies neither that $δ (0) = δ (1)$ nor that $ζ (0) = ζ (1)$ due to the nonlinearity of the link functions in (6).

The observed data models for $M_{i}$ and $Y_{i}$ are fit using Markov chain Monte Carlo (MCMC) in Stan, with a total of $8000$ samples collected over four parallel chains and $4000$ samples discarded to burn-in. There is no evidence of failure of the chains to mix: all traceplots indicate rapid mixing (see Figures S.4 and S.5 in the Supplemental Material), all values of the Gelman-Rubin diagnostic³² $\hat{R}$ are effectively $1$ (minimum of 0.9991, maximum of 1.003), and the minimal effective sample size across all of the monitored parameters is 1807.

We now perform posterior predictive checks to compare the observed data $Y_{i}$ and $M_{i}$ to replicate datasets simulated from the fitted model. The goal of these checks is to assess how well the fitted model aligns with the observed data. In Figure 4, we check the fit of the beta distribution to the continuous part of the observed mediator/outcome distributions by comparing a kernel density estimate of the observed-data distribution of $Y_{i}$ and $M_{i}$ to 100 replicated datasets sampled from the posterior predictive distribution; the top row shows density estimates for the depression level under each treatment level for the continuous part of the data, while the bottom row shows the same for job-search self-efficacy. The posterior predictive distribution produces datasets that closely match the observed data, suggesting that the beta model for the continuous part of the data is adequate.

Figure 4.

Kernel density estimates of the non-boundary proportion of the original data (black) and 100 replicated datasets (red).

In Figure 5, we check the fit of the logistic regression models to the boundary points $0$ and $1$ by comparing the observed proportion of $0$ s and $1$ s for the outcome and mediator to what is observed in replicated datasets. Again, there is close agreement between the model and simulated data.

Figure 5.

Histograms of the proportions of individuals taking the boundary values $0$ (in dark blue) and $1$ (in light yellow) across 100 replicated datasets for the outcome (depression, first row) and mediator (efficacy, second row), separately for the treated and untreated groups. The proportions of the observed data taking boundary values are given by the pink vertical dashed lines.

Since the outcomes in the analysis were scaled by the transformation $y \mapsto \frac{y - 1}{4}$ , the causal estimates can be brought back to their original scale by multiplying the estimates by 4. The results for the average causal mediation effects on the original scale are given in Table 1.

Table 1.

Effect estimates for the JOBS II data using the zero-one inflated beta (ZOIB) formulation for the outcome (depression) and mediator (efficacy) assuming sequential ignorability.

Effect	Est.	SD	Lower	Upper	$Z$ -score	$P$ -value
$δ (0)$	−0.0110	0.0108	−0.0330	0.0098	−1.0131	0.3110
$δ (1)$	−0.0102	0.0101	−0.0308	0.0089	−1.0144	0.3104
$ζ (0)$	−0.0282	0.0403	−0.1065	0.0491	−0.7000	0.4839
$ζ (1)$	−0.0275	0.0400	−0.1058	0.0490	−0.6880	0.4915
$τ$	−0.0385	0.0416	−0.1202	0.0415	−0.9244	0.3553

Although the average causal mediation effect estimates are small (less than a tenth of a point for all effects) and not statistically significant, all treatment effect estimates are negative, which would imply that participation in the job training seminar decreased the depression of subjects both directly through their participation in the seminar and indirectly through the effect of the training seminar on the self-efficacy of participants in finding a job. Of particular interest for us is the decomposition $τ = δ (0) + ζ (1)$ , which captures (i) the benefit of increasing self-efficacy generally for those who do not receive treatment and (ii) the additional benefit of the seminar above and beyond its effect on increasing a subject’s self-efficacy. However, there is little evidence for either a direct or indirect effect of the treatment on the outcome; in particular, the signs of the mediation effects are uncertain. Similar results are obtained for the quantile mediation effects.

6.2. Sensitivity analysis

While there is no evidence for either a direct or indirect treatment effect under SI, we may be concerned that the effects are being masked by unobserved confounding. We now apply the sensitivity analysis techniques introduced in Section 5 to assess the impact of unmeasured confounding.

Setting $ρ = 0.95$ , Figure 6 shows how inferences about the mediation effects change as $λ$ is varied across a range of plausible values. To calibrate $λ$ , we fit a linear model to the conditional mean $logit E (Y_{i} ∣ M_{i}, A_{i}, X_{i})$ using quasi-likelihood and then considered values of $λ$ no more than twice as large in magnitude than the estimated effect of $M_{i}$ in this model; this corresponds to the belief that most of the association of $Y_{i}$ with $M_{i}$ should be attributable to a causal effect of the mediator rather than confounding between the outcome and mediator processes.

Figure 6.

Sensitivity of inferences about $δ (a)$ and $ζ (a)$ to changes in the sensitivity parameter $λ$ under assumptions SI2A and SI2B. The dashed line is the posterior mean, and the bands delimit a pointwise 95% credible interval.

From Figure 6, inferences for the direct effects $ζ (a)$ are robust to unmeasured confounding between $Y_{i} (a, m)$ and $M_{i} (a)$ , while inferences for the indirect effects $δ (a)$ are less robust. When $λ$ is negative, there is less uncertainty in $δ (a)$ , although the estimates are also pulled towards zero. For larger values of $λ$ , the estimates of $δ (a)$ are larger (in magnitude), but also more uncertain. The substantive conclusion remains the same: there is little evidence regarding the sign of either the direct or indirect effects.

We provide a more in-depth sensitivity analysis in the Supplemental Material. In particular, we consider both varying $ρ$ and a linear variant of assumption SI2B. While the substantive conclusions don’t change from varying $ρ$ , we do see that $ρ$ interacts strongly with $λ$ , with the trends for $ρ = 0, 0.5$ being markedly nonlinear.

6.3. Simulation example

We evaluate Algorithm 2 under a variety of different data generating mechanisms based on the JOBS II data. To devise relevant simulation settings, we first fit our model to the JOBS II data and then modified the estimated coefficients of the fitted beta-regression model. We consider homogeneous effects for both the mediator and outcome, and write $ξ^{M}$ and $ξ^{Y}$ for the estimated coefficient for the effect of $A_{i}$ on $M_{i}$ and $Y_{i}$ , respectively. The following features of the data generating mechanism were varied.

Sample Size
We consider $N \in {899, 1798}$ , which is equal to the size of the JOBS II dataset and twice the size of the JOBS II dataset.
Treatment Effect on Mediator
We consider $ξ^{M} \in {0, {\hat{ξ}}^{M}, 10 {\hat{ξ}}^{M}}$ ; the first setting corresponds to no indirect effect of treatment, the second to a realistic indirect effect of the treatment, and the last to a very large indirect effect on the treatment.
Direct Treatment Effect on the Outcome
We consider $ξ^{Y} \in {0, {\hat{ξ}}^{Y}, 10 {\hat{ξ}}^{Y}}$ , the choices of which are analogous to the ones for the treatment effect on the mediator.

We consider five data generating mechanisms: (1) no mediation, where $(ξ^{Y}, ξ^{M}) = ({\hat{ξ}}^{Y}, 0)$ ; (2) complete mediation, where $(ξ^{Y}, ξ^{M}) = (0, {\hat{ξ}}^{M})$ ; (3) strong no mediation, where $(ξ^{Y}, ξ^{M}) = (10 {\hat{ξ}}^{Y}, 0)$ ; (4) strong complete mediation, where $(ξ^{Y}, ξ^{M}) = (0, 10 {\hat{ξ}}^{M})$ ; and (5) no modifications, where $(ξ^{Y}, ξ^{M}) = ({\hat{ξ}}^{Y}, {\hat{ξ}}^{M})$ . For each scenario, we used simulated 200 datasets to compute the bias of the effect estimates, the root-mean-squared error (RMSE), coverage of nominal 95% credible intervals, and the average length of a nominal 95% interval.

Table 2 summarizes the results for each scenario. For readability, all entries of the table are multiplied by $100$ . Prior to the simulation experiment, we computed the true direct, indirect, and total effects using Monte Carlo integration with $90, 799$ samples (101 times the size of the original data). For each simulated dataset, we collected a total of 2000 samples across eight parallel chains, with 250 burn-in samples per chain. Our method performs well in terms of bias for both sample sizes and, as expected, we observe lower RMSEs for the larger sample size. The 95% credible intervals are slightly conservative, particularly for the indirect effect; across all scenarios and effects, the smallest coverage probability was 94%. Ultimately, the results show that our approach to computing the mediation effects, while tending to be conservative, appears to work well.

Table 2.
JOBS II simulation results for average mediation effects using Algorithm 2.

$n = 899$ $n = 1798$

Scenario Effect Truth Bias RMSE Coverage Length Bias RMSE Coverage Length

$δ (0)$ −0.26 0.01 0.31 99.00 1.53 0.03 0.22 99.00 1.07

$δ (1)$ −0.26 0.02 0.29 100.00 1.45 0.03 0.21 99.50 1.02

1 $ζ (0)$ −0.98 0.00 1.06 97.50 4.26 0.06 0.72 95.50 3.02

$ζ (1)$ −0.97 0.01 1.05 97.00 4.23 0.07 0.72 95.50 3.01

$τ$ −1.23 0.02 1.11 96.50 4.41 0.09 0.74 95.50 3.13

$δ (0)$ −0.41 0.10 0.33 97.50 1.55 0.02 0.22 98.50 1.08

$δ (1)$ −0.40 0.09 0.32 98.00 1.53 0.01 0.22 98.50 1.06

2 $ζ (0)$ 0.22 0.01 1.11 95.00 4.33 0.01 0.73 95.50 3.06

$ζ (1)$ 0.24 0.00 1.10 95.00 4.30 0.00 0.73 95.00 3.03

$τ$ −0.17 0.10 1.18 94.50 4.49 0.02 0.80 95.00 3.17

$δ (0)$ −0.11 −0.11 0.30 98.50 1.49 −0.13 0.26 96.50 1.05

$δ (1)$ −0.10 −0.02 0.16 99.50 0.93 −0.04 0.12 99.00 0.65

3 $ζ (0)$ −9.75 0.06 1.03 95.50 3.98 −0.12 0.74 94.50 2.81

$ζ (1)$ −9.73 0.14 1.03 94.00 3.96 −0.02 0.73 95.50 2.79

$τ$ −9.84 0.03 1.04 94.50 4.08 −0.15 0.75 94.00 2.88

$δ (0)$ −1.34 0.05 0.38 96.00 1.64 0.06 0.28 97.50 1.14

$δ (1)$ −1.37 0.09 0.38 94.00 1.64 0.11 0.29 97.00 1.14

4 $ζ (0)$ 0.29 0.02 1.11 95.00 4.37 −0.07 0.72 97.00 3.09

$ζ (1)$ 0.26 0.06 1.08 94.50 4.25 −0.02 0.70 97.50 3.01

$τ$ −1.08 0.11 1.13 94.50 4.43 0.04 0.74 96.50 3.14

$δ (0)$ −0.33 0.02 0.33 98.50 1.53 0.02 0.22 99.50 1.08

$δ (1)$ −0.34 0.06 0.31 98.50 1.44 0.05 0.21 99.00 1.01

5 $ζ (0)$ −0.99 −0.07 0.95 96.50 4.28 −0.07 0.74 95.00 3.03

$ζ (1)$ −1.00 −0.04 0.94 96.50 4.24 −0.03 0.73 95.00 3.01

$τ$ −1.33 −0.01 1.05 96.50 4.42 −0.02 0.78 96.50 3.13

Each value in the table is multiplied by 100. The columns correspond to bias, the true values of the effect, the root-mean-squared error (RMSE), the coverage of nominal 95% credible intervals, and the average length of a nominal 95% credible interval. Scenarios 1–5 correspond to the following: (1) $ξ^{M} = 0$ and $ξ^{Y} = {\hat{ξ}}^{Y}$ ; (2) complete mediation, where $(ξ^{Y}, ξ^{M}) = (0, {\hat{ξ}}^{M})$ ; (3) $ξ^{M} = 0$ and $ξ^{Y} = 10$ ; (4) complete mediation, where $(ξ^{Y}, ξ^{M}) = (0, 10 {\hat{ξ}}^{M})$ ; (5) no modifications, that is, $(ξ^{Y}, ξ^{M}) = ({\hat{ξ}}^{Y}, {\hat{ξ}}^{M})$ , where $ξ^{M}$ and $ξ^{Y}$ are the estimated coefficients for the effect of $A_{i}$ on $M_{i}$ and $Y_{i}$ , respectively.

6.4. Robustness to model misspecification

			$n = 899$	$n = 1798$
	$δ (0)$	−0.26	0.01	0.31	99.00	1.53	0.03	0.22	99.00	1.07
	$δ (1)$	−0.26	0.02	0.29	100.00	1.45	0.03	0.21	99.50	1.02
1	$ζ (0)$	−0.98	0.00	1.06	97.50	4.26	0.06	0.72	95.50	3.02
	$ζ (1)$	−0.97	0.01	1.05	97.00	4.23	0.07	0.72	95.50	3.01
	$τ$	−1.23	0.02	1.11	96.50	4.41	0.09	0.74	95.50	3.13
	$δ (0)$	−0.41	0.10	0.33	97.50	1.55	0.02	0.22	98.50	1.08
	$δ (1)$	−0.40	0.09	0.32	98.00	1.53	0.01	0.22	98.50	1.06
2	$ζ (0)$	0.22	0.01	1.11	95.00	4.33	0.01	0.73	95.50	3.06
	$ζ (1)$	0.24	0.00	1.10	95.00	4.30	0.00	0.73	95.00	3.03
	$τ$	−0.17	0.10	1.18	94.50	4.49	0.02	0.80	95.00	3.17
	$δ (0)$	−0.11	−0.11	0.30	98.50	1.49	−0.13	0.26	96.50	1.05
	$δ (1)$	−0.10	−0.02	0.16	99.50	0.93	−0.04	0.12	99.00	0.65
3	$ζ (0)$	−9.75	0.06	1.03	95.50	3.98	−0.12	0.74	94.50	2.81
	$ζ (1)$	−9.73	0.14	1.03	94.00	3.96	−0.02	0.73	95.50	2.79
	$τ$	−9.84	0.03	1.04	94.50	4.08	−0.15	0.75	94.00	2.88
	$δ (0)$	−1.34	0.05	0.38	96.00	1.64	0.06	0.28	97.50	1.14
	$δ (1)$	−1.37	0.09	0.38	94.00	1.64	0.11	0.29	97.00	1.14
4	$ζ (0)$	0.29	0.02	1.11	95.00	4.37	−0.07	0.72	97.00	3.09
	$ζ (1)$	0.26	0.06	1.08	94.50	4.25	−0.02	0.70	97.50	3.01
	$τ$	−1.08	0.11	1.13	94.50	4.43	0.04	0.74	96.50	3.14
	$δ (0)$	−0.33	0.02	0.33	98.50	1.53	0.02	0.22	99.50	1.08
	$δ (1)$	−0.34	0.06	0.31	98.50	1.44	0.05	0.21	99.00	1.01
5	$ζ (0)$	−0.99	−0.07	0.95	96.50	4.28	−0.07	0.74	95.00	3.03
	$ζ (1)$	−1.00	−0.04	0.94	96.50	4.24	−0.03	0.73	95.00	3.01
	$τ$	−1.33	−0.01	1.05	96.50	4.42	−0.02	0.78	96.50	3.13

We now consider two simulation experiments designed to answer the following questions: (i) is the ZOIB model robust to model misspecification when the data is semi-continuous but the continuous part is not a beta regression? and (ii) does the commonly-used linear structural equation modeling (LSEM) framework perform well with semi-continuous data despite assuming that the underlying distribution is continuous?

To assess the robustness of the ZOIB model to model misspecification, we generate data under a censored regression model, where latent variables $M_{i}^{'} (a)$ and $Y_{i}^{'} (a, m)$ are modeled using normal linear models such that we observe (say) $M_{i} (a) = 0$ if $M_{i}^{'} (a) < 0$ and $M_{i} (a) = 1$ if $M_{i}^{'} (a) > 1$ . We generated plausible linear models by fitting unconstrained linear models to the JOBS II data.

Results for the censored regression simulation are given in Table 3, with the estimands being the median causal mediation effects. For comparison, results for fitting an unconstrained LSEM are also given. Both LSEM and the ZOIB models have performance that does not reach the nominal coverage level, with the ZOIB performing slightly worse overall in terms of RMSE, interval length, and coverage and slightly better in terms of bias. Interestingly, LSEM seems to perform well when the ground-truth is the censored linear model despite the fact that it does not respect the semicontinuous nature of the data; considering that LSEM is correctly specified except for the fact that it does not capture the boundary behavior correctly, this is not entirely unexpected.

Table 3.
Results for the censored regression simulation. All quantities in the table (except for coverage) are multiplied by $100$ . CI length denotes the average length of a 95% credible interval while coverage denotes the coverage probability of a nominal 95% credible interval.

Method Parameter Bias RMSE Standard error CI length Coverage

LM $δ_{0}$ 0.016 0.194 0.203 0.795 0.960

$δ_{1}$ 0.016 0.194 0.203 0.795 0.960

$ζ_{0}$ 0.260 0.994 0.935 3.669 0.950

$ζ_{1}$ 0.260 0.994 0.935 3.669 0.950

$τ$ 0.276 1.049 0.952 3.736 0.910

ZOIB $δ_{0}$ −0.028 0.220 0.234 0.919 0.980

$δ_{1}$ −0.024 0.215 0.232 0.909 0.970

$ζ_{0}$ 0.179 1.188 1.042 4.091 0.910

$ζ_{1}$ 0.183 1.191 1.039 4.077 0.910

$τ$ 0.155 1.241 1.063 4.172 0.890

Method	Parameter	Bias	RMSE	Standard error	CI length	Coverage
LM	$δ_{0}$	0.016	0.194	0.203	0.795	0.960
	$δ_{1}$	0.016	0.194	0.203	0.795	0.960
	$ζ_{0}$	0.260	0.994	0.935	3.669	0.950
	$ζ_{1}$	0.260	0.994	0.935	3.669	0.950
	$τ$	0.276	1.049	0.952	3.736	0.910
ZOIB	$δ_{0}$	−0.028	0.220	0.234	0.919	0.980
	$δ_{1}$	−0.024	0.215	0.232	0.909	0.970
	$ζ_{0}$	0.179	1.188	1.042	4.091	0.910
	$ζ_{1}$	0.183	1.191	1.039	4.077	0.910
	$τ$	0.155	1.241	1.063	4.172	0.890

ZOIB: zero-one inflated beta; CI: credible interval; RMSE: root-mean-squared error; LM: linear model.

While LSEM performs reasonably well when the underlying data-generating process is a censored linear regression, our next simulation shows that the LSEM can perform poorly when the underlying data-generating process is a ZOIB model. This simulation setting is also based on the JOBS II data, but with stronger treatment effects in the beta part of the regression ( $ξ^{M} = 2$ and $ξ^{Y} = - 0.5$ ) and with a constant precision. Results are given in Table 4. For this setting, we find that the LSEM is heavily biased and inefficient, with coverage probabilities as low as 16%. The ZOIB model, while not perfect, is more efficient, less biased, and is much closer to the nominal coveragelevels.

Table 4.

Results for the ZOIB simulation comparing the LSEM and ZOIB approaches. All quantities in the table (except for coverage) are multiplied by $100$ . CI length denotes the average length of a 95% credible interval while coverage denotes the coverage probability of a nominal 95% credible interval.

Method	Parameter	Bias	RMSE	Standard error	CI length	Coverage
LM	$δ_{0}$	−0.402	0.683	0.521	2.044	0.910
	$δ_{1}$	−1.454	1.556	0.521	2.044	0.170
	$ζ_{0}$	1.839	1.951	0.638	2.504	0.160
	$ζ_{1}$	0.786	1.022	0.638	2.504	0.790
	$τ$	0.384	0.518	0.382	1.499	0.820
ZOIB	$δ_{0}$	0.088	0.415	0.373	1.485	0.905
	$δ_{1}$	0.173	0.242	0.188	0.752	0.905
	$ζ_{0}$	-0.051	0.219	0.235	0.938	0.981
	$ζ_{1}$	0.034	0.394	0.396	1.575	0.933
	$τ$	0.122	0.257	0.221	0.887	0.905

ZOIB: zero-one inflated beta; CI: credible interval; LSEM: linear structural equation modeling; RMSE: root-mean-squared error; LM: linear model.

Summarizing, this simulation study shows that the ZOIB model performs reasonably well when the data-generating process is either a censored linear regression or ZOIB, while we found that LSEM performed well only when the data-generating process was a censored linear regression.

7. Discussion

In many practical situations, either the mediator or outcome (or both) will have a mixed-scale distribution, necessitating the use of models beyond the usual linear and generalized linear models. We proposed a zero-one inflated beta distribution after scaling the data to lie in $[0, 1]$ . This family of distributions includes many distributional shapes.

The framework proposed here is flexible enough for users to adapt Algorithm 1 and Algorithm 3 to fit essentially arbitrary zero-one inflated models; for example, it is straight-forward to adapt this approach to handle censored regression models.³³ It is also straight-forward to extend our approach to a Bayesian nonparametric setting using Dirichlet process mixture models¹⁶ and nonparametric Bayesian additive regression tree models.³⁴

Just as important as the model we have proposed is our description of how to perform a sensitivity analysis with this type of data. The framework for sensitivity analysis we presented is simple, interpretable, and also extends easily to other model specifications such as the censored linear regression model.

While we have taken a Bayesian approach in this paper, there is nothing in principle which stops us from applying our approach in the Frequentist framework. Taking a Frequentist approach using the bootstrap helps ensure that the estimated standard errors of the parameters remain honest even when the model is misspecified; on the other hand, the estimands may no longer correspond to the de-jure causal effects of interest resulting in a tradeoff that is not worthwhile. Bayesian approaches are also well-suited to settings with hierarchical structures, where software tools such as Stan make it very easy to consider many different models, and also allows for researchers to incorporate subject-matter expertize via the prior distribution. A potential downside is that, depending on the number of parameters, the Bayesian approach may result in high computational costs. As shown by Linero,²³ the $g$ -computation framework we used here can be used with the nonparametric bootstrap as well; hence, we could have performed maximum likelihood estimation and used the bootstrap to perform uncertainty quantification rather than using Bayesian inference.

In this work, we assumed that all relevant causal effects are defined with respect to the population $F_{X}$ that the covariates were sampled from. However, it can easily be modified to accommodate a stratified sample from a target population $G_{X}$ . For example, in Algorithm 3, given a population defined by $G_{X}$ we simply replace sampling $X_{i k}^{⋆}$ from $F_{X}$ (which is sampled from the Bayesian bootstrap) with sampling $X_{i k}^{⋆}$ from $G_{X}$ . In the case of stratified sampling on a binary covariate $V_{i}$ , with $\underset{θ}{Pr} (V_{i} = 1) = ϖ$ known, we might model the distribution of $[X_{i} ∣ V_{i} = 1]$ and $[X_{i} ∣ V_{i} = 0]$ with independent Bayesian bootstraps $G_{X}^{1}$ and $G_{X}^{0}$ and then sample $X_{i k}^{⋆} \sim ϖ G_{X}^{1} + (1 - ϖ) G_{X}^{0}$ .

Code and a package which replicates our analysis is available online at www.github.com/theodds/ZOIBMediation.

Supplemental Material

sj-pdf-1-smm-10.1177_09622802231173491 - Supplemental material for Causal mediation and sensitivity analysis for mixed-scale data

Supplemental material, sj-pdf-1-smm-10.1177_09622802231173491 for Causal mediation and sensitivity analysis for mixed-scale data by Lexi Rene, Antonio R Linero and Elizabeth Slate in Statistical Methods in Medical Research

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship and/or publication of this article: This material is based upon work supported by the National Science Foundation under Grant No. DMS-2144933 and the National Institute of Health under Grant Nos. NIH/NICHD-R01HD093055, NIH/NIMH-R01MH121364, and NIH/NIMH-R01MH121627.

ORCID iD

Antonio R Linero

Supplemental material

Supplemental material for this article is available online.

Appendix. Proof of Proposition 1

By iterated expectation and SI2B, we have

\begin{aligned} E_{θ} [Y_{i} {a, M_{i} (a^{'})}] & = E_{θ} E_{θ} [Y_{i} {a, M_{i} (a^{'})} ∣ M_{i} (a), M_{i} (a^{'}), X_{i}] \\ = E_{θ} expit [{\tilde{r}}_{y} {M_{i} (a^{'}), a, X_{i}} + λ {M_{i} (a) - M_{i} (a^{'})}] \end{aligned}

By SI2A, we have that

M_{i} (a) = F_{M}^{-} {Φ (Z_{i a}) ∣ A_{i} = a, X_{i} = x}

and

M_{i} (a^{'}) = F_{M}^{-} {Φ (Z_{i a^{'}}) ∣ A_{i} = a^{'}, X_{i} = x}

, where

(Z_{i 0}, Z_{i 1})

is jointly standard normal with correlation matrix

Σ

. Therefore

\begin{aligned} E_{θ} expit [{\tilde{r}}_{y} {M_{i} (a^{'}), a, X_{i}} + λ {M_{i} (a) - M_{i} (a^{'})}] \\ = ∭ expit {{\tilde{r}}_{y} (m^{'}, a, x) + λ (m - m^{'})} \\ \times Normal {(z_{0}, z_{1})^{⊤} ∣ (0, 0)^{⊤}, Σ} d z_{0} d z_{1} F_{X} (d x) \end{aligned}

where

m \equiv F_{M}^{-} (z_{a} ∣ A_{i} = a, X_{i} = x)

and

m^{'} = F_{M}^{-} (z_{a^{'}} ∣ A_{i} = a^{'}, X_{i} = x)

, completing the proof.

References

Imai

Keele

Tingley

. A general approach to causal mediation analysis. Psychol Methods 2010; 15: 309.

Rubin

. Direct and indirect causal effects via potential outcomes. Scand J Stat 2004; 31: 161–170.

Vinokur

Price

Schul

. Impact of the JOBS intervention on unemployed workers varying in risk for depression. Am J Community Psychol 1995; 23: 39–74.

Baron

Kenny

. The moderator–mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. J Pers Soc Psychol 1986; 51: 1173.

Albert

Wang

. Sensitivity analyses for parametric causal mediation effect estimation. Biostatistics 2015; 16: 339–351.

Wang

Albert

. Estimation of mediation effects for zero-inflated regression models. Stat Med 2012; 31: 3118–3132.

Franks

D’Amour

Feller

. Flexible sensitivity analysis for observational studies without observable implications. J Am Stat Assoc 2019; 115: 1730–1746.

Linero

. Bayesian nonparametric analysis of longitudinal studies in the presence of informative missingness. Biometrika 2017; 104: 327–341.

Linero

Daniels

. Bayesian approaches for missing not at random outcome data: the role of identifying restrictions. Stat Sci 2018; 33: 198–213.

10.

Scharfstein

Nabi

Kennedy

, et al. Semiparametric sensitivity analysis: unmeasured confounding in observational studies. arXiv preprint arXiv:2104.08300, 2021.

11.

Gunzler

Chen

, et al. Introduction to mediation analysis with structural equation modeling. Shanghai Arch Psychiatry 2013; 25: 390.

12.

MacKinnon

Dwyer

. Estimating mediated effects in prevention studies. Eval Rev 1993; 17: 144–158.

13.

Tchetgen Tchetgen

Shpitser

. Semiparametric theory for causal mediation analysis: efficiency bounds, multiple robustness, and sensitivity analysis. Ann Stat 2012; 40: 1816.

14.

Zheng

van der Laan

. Targeted maximum likelihood estimation of natural direct effects. Int J Biostat 2012; 8. 10.2202/1557-4679.1361

15.

Farbmacher

Huber

Lafférs

, et al. Causal mediation analysis with double machine learning. Econom J 2022; 25: 277–300.

16.

Kim

Daniels

Marcus

, et al. A framework for Bayesian nonparametric inference for causal effects of mediation. Biometrics 2017; 73: 401–409.

17.

Cheng

Guo

, et al. Mediation analysis for count and zero-inflated count data. Stat Methods Med Res 2018; 27: 2756–2774.

18.

VanderWeele

. Causal mediation analysis with survival data. Epidemiology 2011; 22: 582.

19.

Ospina

Ferrari

SLP

. A general class of zero-or-one inflated beta regression models. Comput Stat Data Anal 2012; 56: 1609–1623.

20.

Zuur

Ieno

Walker

, et al. Zero-truncated and zero-inflated models for count data. In Mixed effects models and extensions in ecology with R, Springer, 2009, pP.261–293.

21.

Liu

Shih

Y-CT

Strawderman

, et al. Statistical analysis of zero-inflated nonnegative continuous data: a review. Stat Sci 2019; 34: 253–279.

22.

Hogan

Daniels

. A Bayesian perspective on assessing sensitivity to assumptions about unobserved data. In: Molenberghs G, Fitzmaurice G, Kenward MG, Tsiatis A, and Verbeke G, (eds), Handbook of Missing Data Methodology. CRC Press, 2014.

23.

Linero

. Simulation-based estimators of analytically intractable causal effects. Biometrics 2022; 78: 1001–1017. 10.1111/biom.13499

24.

Derogatis

Lipman

Rickels

, et al. The Hopkins symptom checklist (HSCL): a self-report symptom inventory. Behav Sci 1974; 19: 1–15.

25.

Pearl

. Direct and indirect effects. In: Breese J and Koller D (eds), Proceedings of the 17th Conference on Uncertainty in Artificial Intelligence, San Fancisco, CA. Morgan Kaufmann, 2001, pp. 411–420.

26.

Robins

Greenland

. Identifiability and exchangeability for direct and indirect effects. Epidemiology 1992; 3: 143–155.

27.

Robins

. A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survival effect. Math Model 1986; 7: 1393–1512.

28.

Rubin

. The Bayesian bootstrap. Ann Stat 1981; 9: 130–134.

29.

Ray

van der Vaart

. Semiparametric Bayesian causal inference. Ann Stat 2020; 48: 2999–3020.

30.

Carpenter

Gelman

Hoffman

, et al. Stan: a probabilistic programming language. J Stat Softw 2016; 20: 1–37.

31.

Deelstra

Dhaene

Vanmaele

. An overview of comonotonicity and its applications in finance and insurance. In: Di Nunno G and ksendal B (eds), Advanced Mathematical Methods for Finance. Berlin, Heidelberg: Springer, 2011, pp. 155–179. 10.1007/978-3-642-18412-3_6

32.

Gelman

Rubin

. Inference from iterative simulation using multiple sequences. Stat Sci 1992; 7: 457–472.

33.

McDonald

Moffitt

. The uses of tobit analysis. Rev Econ Stat 1980; 62: 318–321.

34.

Linero

Murray

. Adaptive conditional distribution estimation with Bayesian decision tree ensembles. J Am Stat Assoc 2022. 10.1080/01621459.2022.2037431

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.52 MB

			$n = 899$				$n = 1798$
Scenario	Effect	Truth	Bias	RMSE	Coverage	Length	Bias	RMSE	Coverage	Length
	$δ (0)$	−0.26	0.01	0.31	99.00	1.53	0.03	0.22	99.00	1.07
	$δ (1)$	−0.26	0.02	0.29	100.00	1.45	0.03	0.21	99.50	1.02
1	$ζ (0)$	−0.98	0.00	1.06	97.50	4.26	0.06	0.72	95.50	3.02
	$ζ (1)$	−0.97	0.01	1.05	97.00	4.23	0.07	0.72	95.50	3.01
	$τ$	−1.23	0.02	1.11	96.50	4.41	0.09	0.74	95.50	3.13
	$δ (0)$	−0.41	0.10	0.33	97.50	1.55	0.02	0.22	98.50	1.08
	$δ (1)$	−0.40	0.09	0.32	98.00	1.53	0.01	0.22	98.50	1.06
2	$ζ (0)$	0.22	0.01	1.11	95.00	4.33	0.01	0.73	95.50	3.06
	$ζ (1)$	0.24	0.00	1.10	95.00	4.30	0.00	0.73	95.00	3.03
	$τ$	−0.17	0.10	1.18	94.50	4.49	0.02	0.80	95.00	3.17
	$δ (0)$	−0.11	−0.11	0.30	98.50	1.49	−0.13	0.26	96.50	1.05
	$δ (1)$	−0.10	−0.02	0.16	99.50	0.93	−0.04	0.12	99.00	0.65
3	$ζ (0)$	−9.75	0.06	1.03	95.50	3.98	−0.12	0.74	94.50	2.81
	$ζ (1)$	−9.73	0.14	1.03	94.00	3.96	−0.02	0.73	95.50	2.79
	$τ$	−9.84	0.03	1.04	94.50	4.08	−0.15	0.75	94.00	2.88
	$δ (0)$	−1.34	0.05	0.38	96.00	1.64	0.06	0.28	97.50	1.14
	$δ (1)$	−1.37	0.09	0.38	94.00	1.64	0.11	0.29	97.00	1.14
4	$ζ (0)$	0.29	0.02	1.11	95.00	4.37	−0.07	0.72	97.00	3.09
	$ζ (1)$	0.26	0.06	1.08	94.50	4.25	−0.02	0.70	97.50	3.01
	$τ$	−1.08	0.11	1.13	94.50	4.43	0.04	0.74	96.50	3.14
	$δ (0)$	−0.33	0.02	0.33	98.50	1.53	0.02	0.22	99.50	1.08
	$δ (1)$	−0.34	0.06	0.31	98.50	1.44	0.05	0.21	99.00	1.01
5	$ζ (0)$	−0.99	−0.07	0.95	96.50	4.28	−0.07	0.74	95.00	3.03
	$ζ (1)$	−1.00	−0.04	0.94	96.50	4.24	−0.03	0.73	95.00	3.01
	$τ$	−1.33	−0.01	1.05	96.50	4.42	−0.02	0.78	96.50	3.13