A Revised Framework to Evaluate the Consistency Assumption Globally in a Network of Interventions

Abstract

Background

The unrelated mean effects (UME) model has been proposed for evaluating the consistency assumption globally in the network of interventions. However, the UME model does not accommodate multiarm trials properly and omits comparisons between nonbaseline interventions in the multiarm trials not investigated in 2-arm trials.

Methods

We proposed a refinement of the UME model that tackles the limitations mentioned above. We also accompanied the scatterplots on the posterior mean deviance contributions of the trial arms under the network meta-analysis (NMA) and UME models with Bland-Altman plots to detect outlying trials contributing to poor model fit. We applied the refined and original UME models to 2 networks with multiarm trials.

Results

The original UME model omitted more than 20% of the observed comparisons in both networks. The thorough inspection of the individual data points’ deviance contribution using complementary plots in conjunction with the measures of model fit and the estimated between-trial variance indicated that the refined and original UME models revealed possible inconsistency in both examples.

Conclusions

The refined UME model allows proper accommodation of the multiarm trials and visualization of all observed evidence in complex networks of interventions. Furthermore, considering several complementary plots to investigate deviance helps draw informed conclusions on the possibility of global inconsistency in the network.

Highlights

We have refined the unrelated mean effects (UME) model to incorporate multiarm trials properly and to estimate all observed comparisons in complex networks of interventions.

Forest plots with posterior summaries of all observed comparisons under the network meta-analysis and refined UME model can uncover the consequences of potential inconsistency in the network.

Using complementary plots to investigate the individual data points’ deviance contribution in conjunction with model fit measures and estimated heterogeneity aid in detecting possible inconsistency.

Keywords

consistency assumption deviance information criterion Bland-Altman plot network meta-analysis

Systematic reviews with network meta-analysis (NMA) have been at the forefront of evidence-based medicine over the past 2 decades.¹ The explosive rate of published systematic reviews with NMA from several health care fields and a recent comprehensive review on the methodological advances of NMA attest to the reception of this evidence synthesis design from the wide research community.^1,2 NMA has the advantage of providing a hierarchy of interventions for a specific research question to assist the end users of systematic reviews in selecting the best intervention for a condition. The intervention hierarchy results from modeling direct evidence from the relevant clinical trials and indirect evidence for interventions never compared in any clinical trial simultaneously. Consequently, NMA provides coherent evidence for all possible comparisons of interventions under the investigated outcome.³

The credibility of the results from NMA strongly depends on the validity of the consistency assumption that underlies this evidence synthesis tool. The consistency assumption dictates the agreement of direct and indirect evidence for any pairwise comparison in a closed loop of interventions (i.e., a path that starts and ends with the same intervention).⁴ The evaluation of the consistency assumption includes methods for local and global detection of possible inconsistency.^2,5 The local evaluation is the most prevalent in the published systematic reviews.^6,7 Among the methods for global evaluation, the unrelated mean effects (UME) model, introduced by Dias et al.,⁵ is the most frequently applied.⁶ The UME model is particularly useful in complex networks, in which the implementation of several statistical tests of inconsistency, such as the loop-specific approach,⁸ may become cumbersome, challenging when loops are also informed by multiarm trials and prone to multiplicity issues.

A global evaluation using the UME model is achieved by comparing the Bayesian NMA model with the Bayesian UME model using measures of model fit,⁵ such as the deviance information criterion (DIC).⁹ The model with the smaller DIC value by 3 or 5 units may be preferred regarding model fit and complexity.¹⁰ If the UME model fits the data better, this is evidence of possible inconsistency in the network.⁵ A scatterplot of the posterior mean deviance of the individual data points under the UME model against the NMA model can reveal the trials with a higher than expected posterior mean deviance.⁵ These trials may help identify the loops with possible evidence of inconsistency.⁵

Furthermore, the inspection of the between-trial variance for substantial reductions also offers valuable information on the suitability of the contrasted models. Suppose the between-trial variance estimated from a model such as the UME that does not incorporate the consistency assumption is substantially lower than that estimated from the NMA model. This implies that the estimated between-trial variance had to increase for the NMA model to fit well given the lack of consistent effects, thus suggesting potential inconsistency.

The presence of multiarm trials in the network may challenge the application of the UME model. Suppose a comparison is informed by a multiarm trial alone. In that case, selecting a different baseline intervention for that trial may omit this comparison from the estimation process.¹¹ This was the case with the network of thrombolytic treatments that Dias et al.⁵ considered to illustrate the UME model (figure 3 in Ref 5). A closed-loop of 3 interventions (SK, SK plus t-PA, and Acc t-PA) was informed by a multiarm trial and a 2-arm trial for 1 of the comparisons (SK versus SK plus t-PA). Of the 16 observed comparisons, the UME model estimated 15 treatment effects, inevitably omitting 1 of the comparisons in the multiarm trial (SK plus t-PA versus Acc t-PA). Considering a different baseline intervention for this trial (e.g., SK plus t-PA) would have resulted in the omission of a different comparison (in that case, SK versus Acc t-PA).

The omitted comparisons in the multiarm trials may carry evidence of possible design inconsistency in the network. The design inconsistency indicates disagreement in the treatment effects across different designs (i.e., 2-arm and multiarm trials) for the same comparison. Design inconsistency can be formally investigated using the design-by-treatment interaction model.^12,13 Contrasting the NMA model with the UME model regarding the treatment effect of all observed comparisons in the network offers an informal, exploratory investigation of the design inconsistency.

We aimed to propose a straightforward refinement of the UME model that accommodates the multiarm trials properly and yields treatment effects for all observed comparisons in networks with multiarm trials. Therefore, the proposed refinement allows the scrutiny of all observed evidence in the network to draw informed conclusions about the possibility of consistency. The article has the following structure. We first introduce 2 published systematic reviews with NMA as motivating examples. Then we present the Bayesian random-effects NMA and UME models—the latter as proposed by Dias et al.⁵ (called the UME-Dias model). We follow by proposing a straightforward refinement of the Bayesian random-effects UME model (called the refined UME model). We demonstrate the refined UME model using the motivating examples. Finally, we conclude with a discussion of the results, strengths, and limitations of the proposed refinement in complex networks with multiarm trials.

Motivating Examples

We considered 2 motivating examples: 1) the network of antimanic drugs for the mean change on mania rating scales,¹⁴ and 2) the network of pharmacologic interventions for the exacerbation of chronic obstructive pulmonary disease (COPD).¹⁵ Both networks included multiarm trials. The corresponding articles analyzed these networks in the standardized mean difference (SMD) and log odds ratio (OR) scales. Both outcomes were harmful; therefore, a negative SMD or log OR favored the first intervention in the comparison, and a positive SMD or log OR favored the second intervention. There was information on the number of missing (participant) outcome data in each arm of every trial for both networks. We excluded the missing outcome data from the analysis for illustrative purposes. Namely, we subtracted the number of missing outcome data from the number randomized in each arm of every trial. Methods to properly account for aggregate missing outcome data in NMA have been described elsewhere.^16,17

Methods

Random-Effects NMA Model with Multiarm Trials

For a network of $N$ trials comparing different sets of interventions, we have the following information for the investigated outcome in each trial arm. For a binary outcome, we collect the number of events, $r_{ik}$ , as reported in arm $k$ of trial $i$ out of the total randomized participants, $n_{ik}$ . We assume that $r_{ik}$ is sampled from a binomial distribution with an underlying probability of an event, $p_{ik}$ . Then the underlying log odds in arm $k$ of trial $i$ is a function of the underlying log odds of the baseline arm, $u_{i}$ , and the underlying log OR (the treatment effect), $δ_{ik 1}$ ,

logit (p_{ik}) = u_{i} + δ_{ik 1}

where $u_{i} = logit (p_{i 1})$

For a continuous outcome, we extract the mean outcome, $y_{ik}$ , as reported in arm $k$ of trial $i$ alongside the standard deviation, $s_{ik}$ , as measured in the total randomized participants, $n_{ik}$ . We assume a normal distribution for $y_{ik}$ with an underlying mean outcome, $θ_{ik}$ , and a variance, $s_{ik}^{2}$ that is commonly assumed known, even though it has been estimated. Then, the underlying mean in arm $k$ of trial $i$ is a function of the underlying mean of the baseline arm, $v_{i}$ , and the underlying SMD (the treatment effect), $δ_{ik 1}$ ,

θ_{ik} = v_{i} + δ_{ik 1} S_{i}

where $v_{i} = θ_{i 1}$ and $S_{i}$ is the pooled standard deviation in that trial,

S_{i} = \sqrt{\frac{\sum_{k = 1}^{a_{i}} s_{ik}^{2} (n_{ik} - 1)}{\sum_{k = 1}^{a_{i}} (n_{ik} - 1)}}

with $a_{i}$ being the number of arms in trial $i$ .

Under the random-effects model, $δ_{ik 1}$ is assumed to follow a normal distribution with mean $μ_{t_{ik} t_{i 1}} = μ_{t_{ik} A} - μ_{t_{i 1} A}$ (the consistency equation) and between-trial variance, $τ^{2}$ , assumed common within the network.¹⁸ With $t_{ik}$ , we indicate the intervention studied in arm $k$ of trial $i$ , and with $A$ , we indicate the reference intervention of the whole network. The NMA model estimates the treatment effect of $| T | - 1$ comparisons with the selected reference intervention (i.e., $μ_{t_{ik} A}$ with $t_{ik} \in T ∖ {A}$ and $T = {A, B, C, \dots}$ ) and uses the consistency equation to obtain the remaining possible comparisons.¹⁸

Likewise, in a multiarm trial $i$ , in which the consistency assumption is inherent, we estimate the treatment effects of $a_{i} - 1$ comparisons with the baseline intervention of the trial. The $(a_{i} - 1) \times 1$ vector of random effects, $δ_{i} = {(δ_{i 21}, δ_{i 31}, \dots, δ_{i a_{i} 1})}^{'}$ , is assumed to follow a multivariate normal distribution with $(a_{i} - 1) \times 1$ mean vector $μ = {(μ_{t_{i 2} A} - μ_{t_{i 1} A}, μ_{t_{i 3} A} - μ_{t_{i 1} A}, \dots, μ_{t_{i a_{i}} A} - μ_{t_{i 1} A})}^{'}$ and $(a_{i} - 1) \times (a_{i} - 1)$ variance-covariance matrix $Σ = τ^{2} (\begin{matrix} 1 & \dots & 0.5 \\ ⋮ & ⋱ & ⋮ \\ 0.5 & \dots & 1 \end{matrix})$ (equation (10) in Dias et al.¹⁰), which is equivalent to conditional univariate normal distributions of arm $k > 2$ given the arms from 2 to $a_{i} - 1$ (equation (11) in Dias et al.¹⁰),

δ_{ik 1} | (\begin{matrix} δ_{i 21} \\ \begin{matrix} ⋮ \\ δ_{i (a_{i} - 1) 1} \end{matrix} \end{matrix}) ~ N ((μ_{t_{ik} A} - μ_{t_{i 1} A}) + \frac{1}{a_{i} - 1} \sum_{j = 1}^{a_{i} - 1} [δ_{ij 1} - (μ_{t_{ij} A} - μ_{t_{i 1} A})], \frac{k}{2 (k - 1)} τ^{2}) .

By assuming a common $τ^{2}$ in the whole network, the correlation between any 2 random effects in the multiarm trial equals 0.5.¹⁹ These random effects refer to contrasts with the baseline arm of the multiarm trial. Choosing a different baseline arm will yield a different vector $δ_{i}$ .

Random-Effects UME Model of Dias and Colleagues

Contrary to the NMA model, the UME-Dias model does not pose consistency equations. Therefore, the UME-Dias model comprises separate random-effects pairwise meta-analyses for the observed comparisons.⁵ The random-effects UME-Dias model also considers a shared between-trial variance across the observed comparisons to borrow strength from comparisons with many trials.^5,11 Hence, the estimation of between-trial variance is greatly improved.¹⁹ Suppose the network includes a total of $M$ observed comparisons with $M \leq (\begin{matrix} | T | \\ 2 \end{matrix})$ . Then, under the random-effects model, the random effects follow separate normal distributions with mean $μ_{t_{ik} t_{i 1}}$ ( $t_{ik}$ , $t_{i 1} \in T$ and $t_{ik} \neq t_{i 1}$ ) for the observed comparisons and shared between-trial variance, $τ^{2}$ .

In the absence of multiarm trials, the UME-Dias model estimates the treatment effects of all observed comparisons. When the network includes multiarm trials, the UME-Dias model estimates the same vector $δ_{i}$ with the NMA model in multiarm trial $i$ . However, contrary to the NMA model, the UME model treats the random effects in vector $δ_{i}$ as separate univariate normal distributions (se pp 651–2, appendix in Ref 5),

δ_{ik 1} ~ N (μ_{t_{ik} t_{i 1}}, τ^{2}) k \geq 2 .

By making it similar to fitting separate pairwise meta-analyses to the data, the random-effects UME-Dias model retains its simplicity.¹¹ However, of the $(\begin{matrix} a_{i} \\ 2 \end{matrix})$ possible comparisons in that multiarm trial, the UME-Dias model will not estimate a total of $(\begin{matrix} a_{i} \\ 2 \end{matrix}) - (a_{i} - 1)$ comparisons. These comparisons do not include the baseline arm. If these comparisons are not informed by any 2-arm trial and are not found in the vector $δ_{i}$ of other multiarm trials in the network, they will be omitted completely by the UME-Dias model. In the Bayesian framework, the posterior distribution of $μ_{jl}$ for the omitted comparisons will coincide with the prior distribution.¹¹ Hence, if a noninformative normal prior distribution with zero mean and variance equal to 10,000 is assigned on $μ_{jl}$ , the posterior standard deviation of $μ_{jl}$ will be approximately 100 for the omitted comparisons.

Refined Random-Effects UME Model

The UME-Dias model does not properly accommodate the multiarm trials. The random effects are inherently correlated in multiarm trials. Hence, as a matter of principle, the conditional univariate normal distributions of the random effects for the multiarm trials (equation (11) in Dias et al.¹⁰) should be maintained in the UME model. Dias et al.¹¹ also suggested accounting for the correlated effects in the multiarm trials; however, the authors did not formally implement this model in their book.

In our proposed refinement of the UME-Dias model, we have maintained the conditional univariate normal distributions of the random effects for the multiarm trials. We have also developed an algorithm to automatically detect pairwise comparisons in the multiarm trials that the UME-Dias model would omit. When there is at least 1 omitted comparison, we perform another random-effects NMA in the subset of multiarm trials, and we use the consistency equation to obtain the summary treatment effect of the omitted comparisons. When the subset of multiarm trials forms subnetworks rather than a fully connected network, we perform random-effects NMA with consistency equations in each subnetwork separately. To prevent the multiarm trials from contributing twice to the estimation of $τ^{2}$ , we consider a different $τ^{2}$ for the subset of multiarm trials, indicated by $τ_{m}^{2}$ . This “companion” model does not share any common parameter with the rest of the model. Therefore, using the multiarm trials twice does not affect the estimation of the parameters outside the companion model.

In the absence of multiarm trials, the refined UME model boils down to the UME-Dias model. In the presence of multiarm trials, comparisons can also be omitted when the fixed-effect model is considered.⁵ In that case, the weighting approach proposed by Rücker and Schwarzer²⁰ can be used to prevent comparison omission. Dias et al.¹¹ discussed this weighting approach for the fixed-effect UME-Dias model in a Bayesian context.

Model Implementation

For each network, we applied Bayesian random-effects NMA, the refined UME model, and the UME-Dias model.⁵ We considered a normal prior distribution with zero mean and variance 10,000 for the location parameters of the models, and we assigned a half-normal prior distribution with scale parameter 1 on $τ$ and $τ_{m}$ . We considered 3 chains of different initial values, 100,000 iterations with 10,000 burn-in and thinning equal to 5 and 20 for the network of antimanic drugs under the NMA and both UME models, respectively, and thinning equal to 10 and 20 for the network for COPD under the NMA and both UME models, respectively. We inspected the autocorrelation plots to define the values for the thinning. Convergence was assessed using the Gelman-Rubin convergence diagnostic, where $\hat{R}$ : $\hat{R} > 1.1$ indicates a lack of convergence for the corresponding model node.²¹ We applied JAGS via the R-package R2jags (statistical software R, version 4.1.1) to run the models.^22–24

We tabulated the posterior median and posterior standard deviation of $τ$ , the posterior mean of the residual deviance ( $\bar{D}$ ), the DIC, and the number of effective parameters under all 3 models. A larger DIC by 5 units for the NMA model or $\bar{D}$ exceeding the number of data points under the NMA model indicates a possible inconsistency in the network. We created scatterplots and Bland-Altman plots on the posterior mean deviance contribution of the individual points to detect possible data points where the compared models have a poor fit. We considered the refined UME model to calculate the bias and limits of agreement in the Bland-Altman plot. We also obtained the leverage plots separately for the NMA and UME models to supplement our observations from the scatterplot and Bland-Altman plots. We used forest plots to illustrate the posterior mean and 95% credible interval of all observed comparisons under the NMA, the refined UME, and UME-Dias models. All figures were created using the R-package ggplot2.²⁵ For the network plots, we used the R-package pcnetmeta.²⁶ All functions related to the present article can be found as supplementary material (Supplementary Material 1).

Results

Network of Antimanic Drugs

Figure 1A illustrates the network of 13 antimanic drugs and placebo. There were 17 (27%) 3-arm trials in the network. Of the 33 (36%) observed comparisons, 7 (21%) were informed solely by at least 1 multiarm trial and contained no baseline intervention. The UME-Dias model omitted these comparisons. The maximum value of $\hat{R}$ across the parameters of all 3 models was 1.03, thus indicating convergence. All models yielded a posterior mean of residual deviance that exceeded the total number of 141 trial arms. The NMA model provided the largest $\bar{D}$ , followed by the UME-Dias and refined UME models ( $\bar{D} =$ 157.82, 149.73, and 146.67, respectively; Table 1). The DIC was similar for the NMA and UME-Dias model but lower by 2.54 units for the refined UME model (Table 1). The posterior median and 95% credible interval (CrI) of $τ$ were almost identical across the 3 models (Table 2). According to the DIC and the estimated $τ$ , there was little to choose between a model with and without consistency.

Figure 1

The network of antimanic drugs for the mean change on mania rating scales¹⁴ (plot A). The network of pharmacologic interventions for exacerbating chronic obstructive pulmonary disease¹⁵ (plot B). Each node refers to an intervention, and each link refers to a pairwise comparison. The size of the nodes is proportional to the number of observed comparisons that include that node. The thickness of the edge is proportional to the number of trials that investigated that comparison. The colored intervention loops indicate multiarm trials.

Table 1

Measures of Model Assessment for the NMA and UME Models

Model Assessment Measures	Network of Antimanic Drugs			Network for COPD
Model Assessment Measures	NMA	UME	UME-Dias	NMA	UME	UME-Dias
DIC	264.28	261.54	264.08	92.15	90.47	90.56
$p_{D}$	106.46	114.87	114.36	35.58	35.62	35.85
$\bar{D}$	157.82	146.67	149.73	56.58	54.85	54.70
Total trial arms	141			50

COPD, chronic obstructive pulmonary disease; $\bar{D}$ , posterior mean of the residual deviance; DIC, deviance information criterion; NMA, network meta-analysis; $p_{D}$ , number of effective parameters; UME, unrelated mean effects.

Table 2

Estimated between-Trial Standard Deviation ( $τ$ ) under the Compared Models

$τ$	Network of Antimanic Drugs			Network for COPD
$τ$	NMA	UME	UME-Dias	NMA	UME	UME-Dias
Posterior median	0.14	0.14	0.14	0.18	0.15	0.14
95% CrI	(0.09, 0.21)	(0.09, 0.21)	(0.08, 0.20)	(0.02, 0.40)	(0.01, 0.35)	(0.01, 0.35)

COPD, chronic obstructive pulmonary disease; CrI, credible interval; NMA, network meta-analysis; UME, unrelated mean effects.

Overall, including the consistency equations yielded similar posterior mean deviance contributions of the trial arms with discounting these equations and almost identical deviance contributions for both UME models (Figure 2A). An exception was 7 trial arms associated with larger deviance under the NMA model compared with the UME models that greatly exceeded the expected deviance contribution equal to 1. Of those trial arms, the arms of the 40th trial yielded a remarkably large posterior mean deviance contribution under all models.

Figure 2

Scatterplot on the posterior mean deviance contributions of the trial arms under the unrelated mean effects (UME) models (refined and Dias and colleagues; y axis) and the network meta-analysis (NMA) model (x axis) for the network of antimanic drugs (plot A). The gray dotted lines refer to 1 posterior mean deviance contribution. The Bland-Altman plot on the difference in the posterior mean deviance contributions between the NMA and the UME models (refined and Dias and colleagues) against the average posterior mean deviance contributions of the compared models (plot B). Each data point corresponds to a trial arm, indicated by a pair of numbers. The first number refers to the trial ID, and the second number refers to the trial’s arm, as placed in the analyzed data set (Supplementary Table S1).

In the Bland-Altman plot, the bias was slightly positive, indicating a tendency for the NMA model to yield slightly larger posterior mean deviance contributions on average than the refined UME model (Figure 2B). The trial arms associated with larger deviance under the NMA model were found outside the upper limit of agreements. These trial arms contributed 1 posterior mean deviance under the UME models (Figure 2A). Two of these points referred to 2-arm trials that were the sole contributors in divalproex versus carbamazepine (the 28th trial) and lithium versus divalproex (the 26th trial; Supplementary Table S1 in Supplementary Material 2). The remaining outlying points referred to haloperidol and lithium compared only in the multiarm trial (the 53rd trial), which included the omitted comparison of paliperidone versus lithium (Supplementary Table S1 in Supplementary Material 2). Supplementary Figure S1 (in Supplementary Material 2) located trials 28, 26, and 53 outside the red parabola (i.e., $x^{2} + y = 3$ ) of the leverage plot for the NMA model only (plot C) and trial 40 (olanzapine versus lithium) outside the red parabola of the leverage plot for all models. These points contributed to a DIC an amount larger than 3.

Overall, there were similar posterior estimates and sufficient overlapping in the 95% CrIs of SMD for the NMA and either UME model except for the comparisons of divalproex versus carbamazepine and of lithium versus divalproex and haloperidol (Figure 3). These comparisons corresponded to trials that contributed to the poor fit of the NMA model (Figure 2; Supplementary Figure S1), thus signaling possible inconsistency in the network

Figure 3

A panel of forest plots on the standardized mean difference (SMD) for all observed comparisons under the network meta-analysis (NMA), the refined unrelated mean effects (UME) model, and the UME-Dias model. Results refer to the posterior mean and 95% credible interval of the SMD. Gray panels refer to the omitted comparisons. Red and black indicate weak and strong evidence, respectively. Namely, the corresponding 95% credible interval includes and excludes the null value, respectively.

Network of Pharmacologic Interventions for COPD

The network of 5 interventions (and their combinations) and placebo for COPD included 5 (24%) multiarm trials: 2 three-arm and 3 four-arm trials (Figure 1B). Of the 15 (54%) observed comparisons, 6 (40%) were omitted from the UME-Dias model. All models converged according to the $\hat{R}$ diagnostic: the maximum value across the parameters of all models was 1.01. Both UME models yielded a similarly smaller posterior mean of residual deviance ( $\bar{D} ≅ 55$ ) than the NMA model that exceeded the total number of 50 trial arms. The DIC indicated little to choose between these models (Table 1). The estimated τ was substantial in all models but slightly larger and less precise under the NMA model (Table 2).

Overall, both UME models yielded similar posterior mean deviance contributions (Figure 4A). The scatterplot indicated a poor fit of the NMA model for 2 trial arms that exhibited substantial deviance under the NMA model and deviance close to 1 for both UME models (Figure 4A). These trial arms were flagged as outliers in the Bland-Altman plot as they exceeded the upper limit of agreement (Figure 4B). The 2 points referred to the unique trial that compared formoterol with tiotropium (Supplementary Table S2 in Supplementary Material 2). This trial was found outside the red parabola of the leverage plot for NMA (Supplementary Figure S2, plot C, in Supplementary Material 2) alongside the 2-arm trial 1 (0 events in fluticasone; Supplementary Table S2). Trial 1 also contributed to the poor fit of both UME models (Supplementary Figure S2, plots A and B) as both arms were associated with a substantial posterior mean deviance (Figure 4A).

Figure 4

Scatterplot of the posterior mean deviance contributions of the trial arms under the unrelated mean effects (UME) models (refined and Dias and colleagues; y axis) and the network meta-analysis (NMA) model (x axis) for the network of pharmacologic interventions for chronic obstructive pulmonary disease (plot A). The gray dotted lines refer to 1 posterior mean deviance contribution. The Bland-Altman plot on the difference in the posterior mean deviance contributions between the NMA and the UME models (refined and Dias and colleagues) against the average posterior mean deviance contributions of the compared models (plot B). Each data point corresponds to a trial arm, indicated by a pair of numbers. The first number refers to the trial ID, and the second number refers to the trial’s arm, as placed in the analyzed data set (Supplementary Table S2).

The panel of forest plots illustrated almost identical results for the UME models and, overall, similar results for the NMA and either UME model (Figure 5). For the comparison of formoterol with tiotropium, the posterior estimate of OR differed remarkedly between the NMA and UME models, which, in conjunction with the substantial deviance contribution of the corresponding trial to the poor fit of the NMA model (Figure 4), may suggest possible inconsistency in the network.

Figure 5

A panel of forest plots on the odds ratio (OR) for all observed comparisons under the network meta-analysis (NMA), refined unrelated mean effects (UME) model, and the UME-Dias model. The results refer to the posterior mean and 95% credible interval of the OR. Gray panels refer to the omitted comparisons. Red and black indicate weak and strong evidence, respectively. Namely, the corresponding 95% credible interval includes and excludes the null value, respectively.

Discussion

The refined UME model and the UME-Dias model gave the same conclusions regarding possible inconsistency in both examples. Both models elucidated the same data points that contributed to possible inconsistency in the investigated networks. They also yielded similar measures of model fit and almost identical estimated $τ$ . However, proper accommodation of the multiarm trials and visualization of all observed comparisons were possible only with the refined UME model, which are the main strengths of this model.

The present study considered a series of complementary plots to thoroughly investigate the individual data points’ deviance contribution, which laid the foundation for signaling possible inconsistency in both examples. Specifically, the scatterplot flagged the data points with higher posterior mean deviance than expected for the NMA model. The Bland-Altman plot complemented the scatterplot by detecting the outlying trial arms among those with substantial deviance contribution as they laid outside the 95% limits of agreement. Furthermore, this plot offers further exploratory insights that are not obvious by looking at the scatterplot. For instance, the lack of randomness in the scattered points and a nonzero bias may indicate a possible mismatch between the direct and NMA evidence that questions the whole evidence base. The leverage plot for NMA revealed that the outlying data points were found outside the red parabola, thus contributing substantially to the DIC (an amount larger than 3) and the model’s poor fit. The panel of forest plots pinpointed poor overlapping in the 95% CrI of the treatment effects of comparisons informed by the outlying trials, showing the consequences of possible inconsistency.

The panel of forest plots should be used to aid model critique and highlight the issues caused when potential inconsistency is detected rather than scrutinize the relative effects between the NMA and UME models. A comparison of the posterior mean of the treatment effects obtained via the NMA model with those obtained via the UME model has been criticized as an inappropriate method to evaluate inconsistency.⁶ This is because an NMA estimate is an amalgamation of direct and indirect evidence.

In the present study, we elaborated on the random-effects UME model for 2 reasons. First, the random-effects model is more appropriate than the fixed-effect model in systematic reviews, in which clinical and methodological heterogeneity should be expected and may manifest as statistical heterogeneity.²⁷ Second, the proper accommodation of multiarm trials and related parameterization issues are relevant in the random-effects model.

We have developed an algorithm to automatically detect the omitted comparisons (if any) in the network and incorporate them in the refined UME model. In Supplementary Material 1, we provide user-defined functions in R to run the refined UME model in 1 step and obtain the necessary plots of the present work. The user can employ the following effect measures: OR for binary outcomes, mean difference, SMD, and ratio of means for continuous outcomes.

The refined UME model is not immune to different parameterization, which comprises our work’s major limitation. The choice of parameterization will affect the derived omitted comparisons, because selecting a different baseline intervention for the multiarm trials will derive different omitted comparisons.^5,11 Suppose a connected network comprises an ABC, an AC, and an AB trial. Selecting intervention A to be the baseline arm in the ABC trial will yield the BC comparison as omitted. Selecting intervention B or C as the baseline arm in the ABC trial will not yield omitted comparisons. Suppose the network did not include an AB trial. Selecting intervention C as the baseline arm in the ABC trial will yield the AB as an omitted comparison. However, regardless of the selected parameterization, the refined UME model will estimate all observed comparisons of the network, contrary to the UME-Dias model.

Another limitation, common to the UME models, is that different parameterizations of the multiarm trials may affect the estimates and possibly the measures of model fit.¹¹ Suppose there is also a BC trial in the example above, and we are interested in the BC comparison. Since the multiarm and 2-arm trials inform all comparisons, there are no omitted comparisons. However, selecting a different baseline arm will lead to one comparison being informed solely by the corresponding 2-arm trial. When A is the baseline arm in the ABC trial, the BC trial supports only the BC comparison. When B or C is the baseline arm, both ABC and BC trials contribute to the estimation of BC. The extent to which different parameterizations lead to different conclusions may also depend on the extent of between-trial variance. Different parameterizations may not affect the conclusions if the between-trial variance is low.

Useful clinical decisions can be made based only on models that assume the underlying evidence is consistent to ensure coherent estimates for a proper incremental assessment of benefits and costs. Models like the UME that synthesize evidence without the consistency assumption are useful for assessing the feasibility of this assumption. When the consistency assumption is not deemed feasible, the whole evidence base should be called into question. Decisions should not be based on results from any syntheses that do not properly account for the reasons for this inconsistency.

Conclusion

The proposed refinement of the UME model handles multiarm trials properly, and it yields treatment effects for all observed comparisons. A thorough inspection of the deviance contribution of the individual data points in conjunction with visualizing the posterior summaries of all observed comparisons under the NMA and refined UME models can aid our conclusions about possible global inconsistency in the network. In the presence of inconsistency, we should not be making inferences based on any of the models because they do not adequately describe the totality of the evidence available.

Supplemental Material

sj-7z-1-mdm-10.1177_0272989X211068005 – Supplemental material for A Revised Framework to Evaluate the Consistency Assumption Globally in a Network of Interventions

Supplemental material, sj-7z-1-mdm-10.1177_0272989X211068005 for A Revised Framework to Evaluate the Consistency Assumption Globally in a Network of Interventions by Loukia M. Spineli in Medical Decision Making

Supplemental Material

sj-docx-2-mdm-10.1177_0272989X211068005 – Supplemental material for A Revised Framework to Evaluate the Consistency Assumption Globally in a Network of Interventions

Supplemental material, sj-docx-2-mdm-10.1177_0272989X211068005 for A Revised Framework to Evaluate the Consistency Assumption Globally in a Network of Interventions by Loukia M. Spineli in Medical Decision Making

Footnotes

Acknowledgements

We would like to thank 2 anonymous reviewers for their insightful comments that thoroughly improved the article.

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. The author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Financial support for this study was provided entirely by a grant from the German Research Foundation (Deutsche Forschungsgemeinschaft, grant No. SP 1664/1-3). The funding agreement ensured the authors’ independence in designing the study, interpreting the data, writing, and publishing the report.

Research Data

Data supporting the findings of this study are available as supplementary material.

ORCID iD

Loukia M. Spineli

Supplemental Material

Supplementary material for this article is available on the Medical Decision Making website at .

References

Zarin

Veroniki

Nincic

, et al. Characteristics and knowledge synthesis approach for 456 network meta-analyses: a scoping review. BMC Med. 2017;15(1):3.

Efthimiou

Debray

van Valkenhoef

, et al. GetReal in network meta-analysis: a review of the methodology. Res Synth Methods 2016;7(3):236–63.

Caldwell

. An overview of conducting systematic reviews with network meta-analysis. Syst Rev. 2014;3:109.

Salanti

. Indirect and mixed-treatment comparison, network, or multiple-treatments meta-analysis: many names, many benefits, many concerns for the next generation evidence synthesis tool. Res Synth Methods. 2012;3(2):80–97.

Dias

Welton

Sutton

Caldwell

Ades

. Evidence synthesis for decision making 4: inconsistency in networks of evidence based on randomized controlled trials. Med Decis Making. 2013;33(5):641–56.

Nikolakopoulou

Chaimani

Veroniki

Vasiliadis

Schmid

Salanti

. Characteristics of networks of interventions: a description of a database of 186 published networks. PLoS One. 2014;9(1):e86754.

Petropoulou

Nikolakopoulou

Veroniki

, et al. Bibliographic study showed improving statistical methodology of network meta-analyses published between 1999 and 2015. J Clin Epidemiol. 2017;82:20–8.

Bucher

Guyatt

Griffith

Walter

. The results of direct and indirect treatment comparisons in meta-analysis of randomized controlled trials. J Clin Epidemiol. 1997;50(6):683–91.

Spiegelhalter

Best

Carlin

van der Linde

. Bayesian measures of model complexity and fit. J R Stat Soc Series B Stat Methodol. 2002;64:583–639.

10.

Dias

Sutton

Ades

Welton

. Evidence synthesis for decision making 2: a generalized linear modeling framework for pairwise and network meta-analysis of randomized controlled trials. Med Decis Making. 2013;33(5):607–17.

11.

Dias

Ades

Welton

Jansen

Sutton

. Checking for inconsistency. In: Network Meta-Analysis for Decision Making. Chichester (UK): Wiley; 2018, pp 189–226.

12.

Higgins

Jackson

Barrett

Ades

White

. Consistency and inconsistency in network meta-analysis: concepts and models for multi-arm studies. Res Synth Methods. 2012;3(2):98–110.

13.

White

Barrett

Jackson

Higgins

. Consistency and inconsistency in network meta-analysis: model estimation using multivariate meta-regression. Res Synth Methods. 2012;3(2):111–25.

14.

Cipriani

Barbui

Salanti

, et al. Comparative efficacy and acceptability of antimanic drugs in acute mania: a multiple-treatments meta-analysis. Lancet. 2011;378(9799):1306–15.

15.

Baker

Coleman

. Pharmacologic treatments for chronic obstructive pulmonary disease: a mixed-treatment comparison meta-analysis. Pharmacotherapy. 2009;29(8):891–905.

16.

Spineli

Kalyvas

Papadimitropoulou

. Continuous(ly) missing outcome data in network meta-analysis: a one-stage pattern-mixture model approach. Stat Methods Med Res. 2021;30(4):958–75.

17.

Spineli

. An empirical comparison of Bayesian modelling strategies for missing binary outcome data in network meta-analysis. BMC Med Res Methodol. 2019;19(1):86.

18.

Ades

. Assessing evidence inconsistency in mixed treatment comparisons. J Am Stat Assoc. 2006;101:447–59.

19.

Higgins

Whitehead

. Borrowing strength from external trials in a meta-analysis. Stat Med. 1996;15(24):2733–49.

20.

Rücker

Schwarzer

. Reduce dimension or reduce weights? Comparing two approaches to multi-arm studies in network meta-analysis. Stat Med. 2014;33(25):4353–69.

21.

Gelman

Rubin

. Inference from iterative simulation using multiple sequences. Stat Sci. 1992;7(4):457–72.

22.

Plummer

. JAGS: Just Another Gibbs Sampler. Version 4.3.0 user manual. 2017. Available from: https://people.stat.sc.edu/hansont/stat740/jags_user_manual.pdf

23.

Yajima

R2jags: Using R to Run ‘JAGS’. R package version 0.6-1. 2020. Available from: https://CRAN.R-project.org/package=R2jags

24.

R Core Team. R: A Language and Environment for Statistical Computing. Vienna (Austria): R Foundation for Statistical Computing; 2021. Available from https://www.r-project.org

25.

Wickham

. ggplot2: Elegant Graphics for Data Analysis. New York: Springer; 2016.

26.

Lin

Zhang

Hodges

Chu

. Performing arm-based network meta-analysis in R with the pcnetmeta Package. J Stat Softw. 2017;80:5.

27.

Higgins

. Commentary: heterogeneity in meta-analysis should be expected and appropriately quantified. Int J Epidemiol. 2008;37(5):1158–60.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.04 MB

0.39 MB