Abstract
Background:
Random allocation avoids confounding bias when estimating the average treatment effect. For continuous outcomes measured at post-treatment as well as prior to randomisation (baseline), analyses based on (A) post-treatment outcome alone, (B) change scores over the treatment phase or (C) conditioning on baseline values (analysis of covariance) provide unbiased estimators of the average treatment effect. The decision to include baseline values of the clinical outcome in the analysis is based on precision arguments, with analysis of covariance known to be most precise. Investigators increasingly carry out explanatory analyses to decompose total treatment effects into components that are mediated by an intermediate continuous outcome and a non-mediated part. Traditional mediation analysis might be performed based on (A) post-treatment values of the intermediate and clinical outcomes alone, (B) respective change scores or (C) conditioning on baseline measures of both intermediate and clinical outcomes.
Methods:
Using causal diagrams and Monte Carlo simulation, we investigated the performance of the three competing mediation approaches. We considered a data generating model that included three possible confounding processes involving baseline variables: The first two processes modelled baseline measures of the clinical variable or the intermediate variable as common causes of post-treatment measures of these two variables. The third process allowed the two baseline variables themselves to be correlated due to past common causes. We compared the analysis models implied by the competing mediation approaches with this data generating model to hypothesise likely biases in estimators, and tested these in a simulation study. We applied the methods to a randomised trial of pragmatic rehabilitation in patients with chronic fatigue syndrome, which examined the role of limiting activities as a mediator.
Results:
Estimates of causal mediation effects derived by approach (A) will be biased if one of the three processes involving baseline measures of intermediate or clinical outcomes is operating. Necessary assumptions for the change score approach (B) to provide unbiased estimates under either process include the independence of baseline measures and change scores of the intermediate variable. Finally, estimates provided by the analysis of covariance approach (C) were found to be unbiased under all the three processes considered here. When applied to the example, there was evidence of mediation under all methods but the estimate of the indirect effect depended on the approach used with the proportion mediated varying from 57% to 86%.
Conclusion:
Trialists planning mediation analyses should measure baseline values of putative mediators as well as of continuous clinical outcomes. An analysis of covariance approach is recommended to avoid potential biases due to confounding processes involving baseline measures of intermediate or clinical outcomes, and not simply for increased precision.
Introduction
There exists an extensive literature on how to handle baseline measures of a continuous clinical outcome variable in randomised controlled trials (RCTs). This literature relates to estimating the total treatment effect in terms of the clinical outcome.1–4 Mediation investigations that partition total intervention effects into mediated and non-mediated components have become increasingly popular, in particular in trials of mental health interventions such as psychological therapies. The UK National Institute for Health Research and the Medical Research Council fund the Efficacy and Mechanism Evaluation programme, which has as one of its aims understanding treatment mechanisms – in our view, ideally evaluated using an appropriate and valid analysis of mediation. But little advice is available on how to deal with baseline measures when attempting such mediation analyses, or indeed whether baseline measurement of clinical and putative mediator outcomes is necessary. This article is focused on comparing approaches for dealing with baseline measures of outcomes when total treatment effect estimation is to be supplemented by mediation assessment in trials.
Intention-to-treat analyses aim to evaluate the effect of treatment offer (effectiveness) or treatment receipt (efficacy, assuming full compliance with treatment offer). There are three well-known approaches for estimating this total treatment effect when the clinical outcome has also been measured before randomisation (baseline):
(A) Post approach: compare post-treatment clinical outcomes between trial arms.
(B) Change score approach: construct change scores by subtracting baseline values from the post-treatment values and compare the change scores between trial arms.
(C) Analysis of covariance (ANCOVA) approach: estimate the trial arm difference from a regression model that contains baseline measures of the outcome as a covariate, and models a linear effect of the baseline measure on post-treatment outcome.
The three intention-to-treat estimators are from a wider class of unbiased baseline adjusted estimators. Under a linear model the best adjustment is achieved by ANCOVA and so approach (C) is the most precise estimator.1–4
Trials of psychological therapies increasingly supplement total treatment effect estimation by a mediation analysis.5–9 The development of a psychological therapy is typically based on a theory regarding modifiable target factors and it is of interest to assess empirically how much of the total intervention effect can be attributed to changes in such a target intermediate outcome (an indirect effect). On occasions, it can also be of interest to demonstrate that an effect does not only come about by changing an intermediate variable (e.g. by changing adherence with medication), that is, to show a direct effect.
The traditional Baron and Kenny 10 steps for mediation assessment fit three regression models: (1) a regression model describing the treatment effect on the clinical outcome, (2) a regression model describing the treatment effect on the intermediate outcome and (3) a regression model describing the joint effects of the intermediate variable and the treatment on the clinical outcome. Traditionally, none of these models contain interactions between treatment and baseline variables nor do models (1) or (3) contain an interaction between treatment and the intermediate variable. Inferences regarding indirect and direct treatment effects can be obtained by fitting two of these regression models. We focus on fitting models (2) and (3) which is more prominent in the behavioural/social sciences. Trialists again have three choices for incorporating baseline measures:
(A) Post approach for mediation. Use the intermediate outcome as the dependent variable for model (2); use the clinical outcome as the dependent and the intermediate outcome as an explanatory variable for model (3); ignore the baseline measures.
(B) Change score approach for mediation. Use change in the intermediate variable as the dependent variable for model (2); use change in the clinical outcome as the dependent and change in the intermediate outcome as an explanatory variable for model (3).
(C) ANCOVA approach for mediation: Use the intermediate outcome as the dependent variable for model (2); use the clinical outcome as the dependent and the intermediate outcome as an explanatory variable for model (3); include baseline measures of both variables in both models.
To our knowledge, there exists little advice as to how to choose between these competing approaches and the practitioner might be forgiven for thinking that this is solely a matter of precision, as is the case for total effects. In this article, we demonstrate that despite randomisation such arguments are too simplistic for mediation investigations. Indeed, we conclude that measurement of baseline variables and subsequent incorporation into analysis models is necessary to avoid particular biases in estimators of causal mediation effects, and end up recommending approach (C) on the grounds of bias reduction.
Methods
Causal treatment effects
We consider trials that have observed a continuous putative mediator variable and a continuous clinical outcome at baseline
We observe the following variables for trial participants
We consider potential outcomes 13 for individuals from the trial’s target population:
This allows us to define a causal individual treatment (offer) effect in terms of the clinical outcome as the contrast
and the causal average treatment (offer) effect in the trial’s target population as
and the individual natural indirect treatment (offer) effect as
This leads to definitions of a causal (average) natural direct treatment (offer) effect as
and a causal (average) natural indirect treatment (offer) effect as
so that
These causal mediation estimands can be expressed as functions of parameters of structural models. Let denote
Causal mediation analysis in trials
We turn our attention to understanding how baseline variables
We will employ linear structural equation modelling diagrams to describe models graphically. The resulting linear structural model equations are straightforward to read from these graphs. Briefly, observed variables are indicated by square boxes and unobserved variables by circles. A single-headed arrow represents a causal effect of one variable on another and the associated path coefficient has a causal interpretation. A double-headed arrow indicates an un-modelled correlation between two variables. Importantly, the absence of an unblocked path connecting two variables reflects an independence assumption; for more details including path tracing rules see study by Pearl 18 or Spirtes et al. 19
Data generating model for trial outcomes
Figure 1 represents a realistic data generating model for RCTs. It is a simple change score model for longitudinal data: Baseline measures on the left-hand side are measured first (

Linear structural equation diagram describing a realistic data generating model for trials.
Importantly, Figure 1 includes three processes involving baseline measures of outcomes (indicated by dashed paths).
V common cause of baseline levels: A past unobserved variable
Each of these scenarios is plausible, in particular the existence of V. The measures taken at baseline do not represent the first occasion that the measures occur in the individual, but instead represent the first occasion that the investigators have observed the measures. Therefore, it is reasonable to assume that something has driven the values of M and Y at the first time they are measured. In practice, there will be multiple factors but we can represent these by a single unmeasured latent construct V.
Predicted biases of causal mediation effect estimators
We proceed to contrast the (true) data generating model in Figure 1 with the analysis models assumed by the three competing mediation approaches. These analysis models are fully described by the structural equation diagrams shown in Figure 2. Importantly, the diagrams in Figure 2 provide a graphical representation of the assumptions made by the various approaches as absences of paths between variables indicate independence assumptions. We can therefore utilise these graphs to make predictions about scenarios under which confounding biases might arise in estimators of mediation effects.

Linear structural equation diagrams describing the analysis models assumed by three approaches to mediation analysis: (a) post approach, (b) change score approach and (c) ANCOVA approach.
Under each mediation analysis approach, the target effect is estimated by the path coefficient labelled
The analysis model in Figure 2(a) implies that there are no paths connecting R and
The analysis model in Figure 2(b) implies that there are no paths connecting R and
The analysis model in Figure 2(c) assumes that all of R,
To validate our graphically derived bias predictions, we considered six pertinent data generating models for which we could make bias predictions. Table 1 summarises these data generating models together with our predictions. The models were chosen such that they reflected each of the three potential confounding processes (Models 1, 2 and 5 in Table 1). In addition, for each confounding process a second model was included that was subject to additional or altered parameter restrictions, since we predicted that the change score approach would perform favourably under these particular restrictions.
Bias predictions for three different estimators of causal mediation effects. a
ANCOVA: analysis of covariance
The target effect can be estimated without bias by either approach. Biases refer to estimators of the natural direct and indirect effects.
Results
Simulation study
Details of our simulation study design are provided in the Supplementary Material. Briefly, we simulated outcomes from a parallel group trial with n = 500 participants (250 per arm) under each of the six models listed in Table 1. Monte Carlo simulation techniques (s = 10,000 simulations) were used to evaluate the statistical properties of the three competing estimation approaches.
Table 2 shows the bias results from the simulation study. The expected values of the competing estimators can be compared against the true estimand values to assess bias:
Target effect estimation: As expected all three estimation approaches can estimate the target effect without bias. The ANCOVA approach (C) was the most precise across all data generating models. In models where an effect of baseline measures on change in the intermediate variable was assumed absent (
Causal mediation effects estimation: As expected, approach (A) suffered bias in estimates of natural direct and indirect effects under all models. Approach (B) was only able to provide unbiased estimates of mediation effects under Models 3, 4 and 6. As predicted, for approach (B) not to suffer from bias, a necessary assumption is the absence of some effects of baseline measures on change scores. In particular, these absences must be such that they remove any backdoor paths connecting the mediator change score and the clinical outcome change score. As predicted, approach (C) was able to provide unbiased estimates of mediation effects under all models.
Intention-to-treat effect estimation (clinical outcome): All three estimation approaches were able to estimate the average treatment effect on the clinical outcome without bias. This suggests that respective biases in estimates of NIE and NDE cancel each other out. Out of the three estimators considered here the ANCOVA approach (C) was the most precise under all models.
Bias results from simulation study: expected values of estimators based on s = 10,000 simulations. a
ATE: average treatment effect; NIE: natural indirect effect; NDE: natural direct effect.
Biased estimators are shown in italics.
Simulations assumed that treatment effects or mediator effects did not vary between individuals (effect homogeneity).
Worked example in a randomised trial
The Fatigue Intervention by Nurses Evaluation (FINE) trial (IRCTN74156610) was an RCT comparing pragmatic rehabilitation with supportive listening, a non-directive counselling treatment and treatment as usual by the general practitioner for patients in primary care with chronic fatigue syndrome/myalgic encephalomyelitis or encephalitis. When the findings of the trial were reported, pragmatic rehabilitation and supportive listening were each compared with treatment as usual in an intention-to-treat analysis. 22
Wearden and Emsley 6 examined the potential mediators of the effect of pragmatic rehabilitation on improvements in fatigue. The outcome was the Chalder fatigue scale score at 70 weeks. Reduction in limiting activities at 20 weeks was found to mediate the positive effect of pragmatic rehabilitation on fatigue at 70 weeks. The focus here is on a secondary data analysis of the trial data in order to illustrate the different methods (A)–(C) for dealing with baseline variables in the mediation analysis.
In the trial, 95 patients were randomised to pragmatic rehabilitation and 100 to treatment as usual. We analyse a complete-case dataset containing 146 patients (70 in pragmatic rehabilitation and 76 in treatment as usual), with observed data on seven variables: the outcome (Chalder fatigue score, scored 0–33, high score means more fatigue) at baseline and 70 weeks, the mediator (limiting activities, lower scores indicating more adaptive behaviours) at baseline and 20 weeks, randomisation indicator and the stratification variables (whether the patient was non-ambulatory (y/n) and London myalgic encephalomyelitis criteria).
The analysis was conducted in Stata version 14.2 using the paramed command. 12 Since we assume no interactions and have continuous mediators and outcomes, this produces the same estimates as a structural equation modelling approach would. Note that this differs slightly from the analysis presented by Wearden and Emsley, 6 which also adjusted for other potential mediators at baseline.
Table 3 shows the results of fitting the three analysis models. For all the three methods, the intention-to-treat result is statistically significant indicating that pragmatic rehabilitation improves fatigue score relative to treatment as usual and estimates vary from −2.76 to −3.25 points. All three estimators are valid with approach (C) providing the most precise estimate. The post approach (A) decomposes this into an NIE of −1.59, with 57% of the total effect mediated. The change score approach (B) indicated that the indirect effect accounted for 68% of the total effect. The ANCOVA approach (C), which the simulations have shown to be the unbiased estimator, gives an indirect effect of −2.55 accounting for 86% of the total effect. While approach (C) appears to remove bias, for the NIE this comes at the price of standard error (SE) inflation. However, the bias correction outweighs this inflation. Consistent with the results of our supplementary simulation study for the NIE, the closed form of the SEs is close to the bootstrap SEs for methods (A) and (B), whereas for method (C) the closed form estimated SE of 0.71 is an underestimate of the bootstrap SE of 0.85.
Causal treatment effect estimates from the FINE trial ((standard errors in brackets).
PM% is percentage of total effect mediated by natural indirect effect.
For all the three approaches, the indirect effect is statistically significant while the direct effect is not significant. This indicates that there is mediation through limiting activities. Qualitatively, the methods all give this conclusion, but as the aim of mediation is to accurately decompose the total effect into an indirect effect and a direct effect, the methods give various estimates, and we conclude that method (C) provides the correct estimate.
Discussion
We recommend that trialists measure baseline values of putative mediator variables as well as the clinical outcome when planning to conduct mediation investigations. Of the three mediation approaches considered here, we recommend the ANCOVA approach which includes baseline measures of the intermediate and clinical variable as covariates in all regression models. It was the only approach that was able to provide valid estimates under all the scenarios that we considered. Importantly, in contrast to total treatment effect estimation, when partitioning effects into mediated and non-mediated components, the decision for ANCOVA is based on bias reduction rather than precision arguments.
Mediation investigations based purely on post-treatment measures of the putative mediator and clinical outcome were found to suffer from bias under all the processes involving baseline variables that we considered. Importantly, the change score approach, often favoured by practitioners, only removes biases under additional assumptions regarding independence of measures of baseline values and change over time. The minimum assumptions necessary are that neither baseline measures of the putative mediator nor of the clinical variable predict change in the mediator variable. Such predictive effects might be present due to baseline values predicting illness trajectories. In addition, negative predictive effects of baseline levels on change in the same variable might occur as a result of regression to the mean, especially in populations that have been selected as ‘severe’ on the basis of outcome variables that might be subject to measurement error. 23 Independence assumptions have been questioned before (e.g. Gollob and Reichardt 24 or Cole and Maxwell 11 ) and are unlikely to hold in mental health trials.
Our findings are consistent with the literature. The estimation of the causal effect of a treatment in an observational study suffers from the same confounding issue. Lepage et al. 25 found that when treatment assignment was driven by baseline variables only the ANCOVA approach was unbiased for the average treatment effect. A change score approach was again found to be suffering from bias when baseline measures predicted subsequent change. Finally in line with the study by Vandenberghe et al., 26 we found that for continuous mediators and outcomes results were robust against misspecification of the mediator model.
For simplicity, we have here assumed no interactions between baseline variables and treatment allocation and no interaction between the mediator and the treatment in the model for the clinical outcome (moderated mediation). However, we anticipate that the bias results also hold under more complex data generating models. The ANCOVA approach to mediation analysis can be generalised to situations in which there is treatment moderation by including relevant interaction terms in the models.27–29 In practice, any ‘no interaction assumption’ should be verified before proceeding to use the simpler ANCOVA approach.
Supplemental Material
760300_supp_mat – Supplemental material for Beyond total treatment effects in randomised controlled trials: Baseline measurement of intermediate outcomes needed to reduce confounding in mediation investigations
Supplemental material, 760300_supp_mat for Beyond total treatment effects in randomised controlled trials: Baseline measurement of intermediate outcomes needed to reduce confounding in mediation investigations by Sabine Landau, Richard Emsley and Graham Dunn in Clinical Trials
Footnotes
Acknowledgements
The authors thank Cedric Ginestet, Paul Clarke, Kim Goldsmith, Andrew Pickles and Ian White for their contributions and suggestions, and Alison Wearden and the FINE trial team for their permission to use the FINE data. Trial registration number and register: International Standard Randomised Controlled Trial Number (IRCTN74156610).
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
This work was supported by the UK Medical Research Council project Grant MR/K006185/1. In addition, S.L. received salary support from the UK National Institute for Health Research Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King’s College London. R.E. and G.D. were supported by the MRC North West Hub for Trials Methodology Research (MR/K025635/1).
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
