Sage Journals: Discover world-class research

Abstract

Background: Network meta-analysis computes treatment ranking to assist with clinical decision making, but it is not always clear how reliable the ranking is and how likely the accumulation of new evidence may alter the ranking. Uncertainty and robustness of ranking are two concepts related to the reliability of ranking. However, it is still unclear whether these two approaches would always yield similar conclusions on the reliability of ranking, i.e., a robust ranking is also one of low uncertainty.

Purpose: This study aimed to investigate the relationship between the uncertainty and robustness of treatment ranking by using normalized entropy and quadratic weighted Cohen’s kappa, respectively. Data. We used datasets of previously published NMAs from a database maintained by Petropoulou et al. at the University of Bern. Analysis. Scatter plots and Pearson’s correlation coefficients were used to demonstrate the direction and strength of the association between uncertainty and robustness of ranking for NMA-level and treatment-level evaluation.

Results: We found that when the uncertainty of ranking is very low, treatment ranking is unlikely to be altered by deleting a trial from the complete data. However, network meta-analysis with robust treatment ranking may have high uncertainty of treatment ranking.

Conclusions: Therefore, although the robustness of the ranking can find the trial that has the most significant impact on the ranking, the high robustness of ranking does not mean that the ranking would not easily change when new trials are added in the future.

Keywords

Network meta-analysis ranking robustness uncertainty reliability

Introduction

Network meta-analysis (NMA) synthesizes direct and indirect evidence to compare multiple treatments within a connected network.¹ To assist with recommending the best treatments, NMA computes ranking probabilities for each treatment and then calculates indices, such as P-score and SUCRA (surface under the cumulative ranking curve), to determine the ranking of treatments.^2,3 As ranking can always be obtained from NMA, it is, therefore, important to know how reliable the ranking is^4–6 and how likely the ranking is to be altered by new evidence.^7–9

The uncertainty of ranking^10,11 and robustness of ranking¹² are two concepts related to the reliability of ranking. The uncertainty of ranking can be visualized by the distribution of ranking probabilities of a treatment. The more concentrated the ranking probabilities, the lower the uncertainty of ranking is. The 95% credible interval of SUCRA and normalized entropy are two quantitative approaches to represent the uncertainty of ranking for treatments.^10,11 However, the 95% credible interval of SUCRA is not a good index, because its range is affected by the total number of treatments included in the network.¹¹ The robustness of ranking measures how sensitive the ranking is to subtle alterations of a dataset. The approach proposed to evaluate the robustness of ranking is to remove one trial and then evaluate how the ranking would change.¹² When the agreement between the two rankings derived from the complete dataset and the modified dataset with one trial removed is high, the treatment ranking of a NMA is considered robust.

Both the uncertainty and robustness of ranking have been applied to published NMAs to evaluate the reliability of treatment ranking,^10,12 but whether these two approaches would yield similar conclusions on the reliability of ranking has not yet been fully explored. One study explored the association between uncertainty and robustness of ranking.¹² However, that study only analyzed two NMAs, which is too small to generalize.

In this study, we aim to empirically investigate the relationship between the uncertainty and the robustness of treatment ranking by using a database of NMA. These two concepts are often presented in NMA results to show the reliability of ranking^7,13–15; however, they are rarely both reported and compared in an NMA. We would like to examine whether the high robustness of treatment ranking is associated with low uncertainty or whether they are two independent concepts. Their association would be investigated both at the treatment-level and NMA-level.

Methods

Data source

We used datasets of previously published NMAs, from a database maintained by Petropoulou et al. at the Institute of Social and Preventive Medicine (ISPM), University of Bern. The database can be downloaded by using the R package nmadb.¹⁶ We used NMAs flagged as verified and those with odds ratio and mean difference as outcome measures, in the arm-based data format, with ten or fewer treatments. If a network became disconnected when we use the leave-one-trial-out approach, it would also be excluded.

Reanalysis of NMA

We reanalyzed and estimated relative effects by using the network suit in Stata.¹⁷ Based on the estimated relative effects of each treatment, probabilities of being the best or the other ranks were calculated from 1000 draws from which SUCRA was obtained.² The ranking of treatments is then determined according to their SUCRA values; the greater its SUCRA is, the higher its ranking is. Then, treatment-level and NMA-level assessments of the uncertainty and the robustness of treatment ranking were conducted.

Uncertainty of treatment ranking

The uncertainty of treatment ranking is quantified by using normalized entropy,¹¹ which is firstly used the distribution of ranking probability in each position to calculate Shannon’s entropy $H (t)$ and then rescaled by dividing the range of maximum and minimum entropy for $n$ treatments in a network. The formula of normalized entropy is in the Appendix-1. The range of normalized entropy is from 0 to 1, and we classified the uncertainty of treatment ranking into five levels, including very high (>0.8), high (0.6–0.8), median (0.4–0.6), low (0.2–0.4), very low (<0.2).

Robustness of treatment ranking

To evaluate the robustness of treatment ranking,¹² each trial within a NMA was deleted from the network in turn during each re-analysis and a new ranking of treatment was computed. We computed the Cohen’s kappa coefficients for assessing the agreement of ranking for each dataset with one trial removed. The formula of Cohen’s kappa coefficient (ĸ) is in the Appendix-2. We classified the robustness of treatment ranking into five levels: slight (<0.2), fair (0.2–0.4), moderate (0.4–0.6), substantial (0.6–0.8), and almost perfect (>0.8) agreement.

Association between the uncertainty and robustness of ranking

Since the uncertainty of ranking is quantified by normalized entropy for each treatment and robustness of ranking is quantified by quadratic weighted Cohen’s kappa for each trial, we take the average of normalized entropies of all treatments and the average of quadratic weighted Cohen’s kappa of all trials within the NMA to represent the overall uncertainty and robustness of ranking (NMA-level evaluation). We also measured the percentage of trials whose removal does not affect the ranking of a treatment to represent the robustness of ranking to compare with the uncertainty of each treatment (treatment-level evaluation). Scatter plots and Pearson’s correlation coefficients were used to demonstrate the direction and strength of the association. Linear mixed model was conducted to investigate the effects of treatment-level or NMA-level factors (Appendix Table 1), such as the number of participants or number of treatments included in the NMA, on the association between uncertainty and robustness of ranking.

In addition to using the average of quadratic weighted Cohen’s kappa to measure the robustness of treatment ranking of the whole network, we also used the minimum and maximum values of quadratic weighted Cohen’s kappa within each network to represent the worst and the best scenarios when one trial was deleted from the original dataset. Compared to the average value, the minimum and maximum values of quadratic weighted Cohen’s kappa are expected to be less related to the number of treatments in the network.

Software

We used the statistical software package R (version 4.0.2, R development Core Team) to download datasets from nmadb package, and used RStata package to call STATA (version 14, Stata Corp, 4905 Lakeway Drive, College Station, Texas, USA) from R to undertake NMA. All the other analyses were undertaken using the statistical software R.

Results

The selection process of NMAs from nmadb database were shown in Appendix Figure 1. A total of 60 NMAs were included. Among them, 43 NMAs reported odds ratios and 17 NMAs the mean difference. The basic information of the 60 NMAs was summarized in Table 1. The medium number of interventions included within the network was 5 (Q1-Q3: 4–7), and over 70% of NMAs compared fewer than six interventions. The median number of trials included within the network was 26 (Q1-Q3: 17–36). More than one quarter (28.3%) of NMAs included fewer than 20 trials in the network, 50.0% NMAs included 20–40 trials, 15.0% NMAs included 40–60 trials and 6.7% NMAs included more than 60 trials. Regarding the type of interventions assessed in the network, 66.7% NMAs were pharmacological versus placebo, 20.0% NMAs were non-pharmacological versus any, and 13.3% NMAs were pharmacological versus pharmacological. When one of their included trials was deleted, the treatment ranking of 50 NMAs (80.0%) was altered. The further information, such as condition/disease, outcome measure, and the number of trials and treatments included of each NMA, can be found in Appendix Table 2.

Table 1.

Summary of the 60 NMAs.

Characteristics	N (%)
Interventions, n	5 (4–7)^a
Four	19 (31.7%)
Five	15 (25.0%)
Six	10 (16.7%)
Seven	3 (5.0%)
Eight	4 (6.7%)
Nine	6 (1.0%)
Ten	3 (5.0%)
Trials, n	26 (17–36)^a
<20	17 (28.3%)
20–40	30 (50.0%)
40–60	9 (15.0%)
>60	4 (6.7%)
Type of interventions assessed, n
Non-pharmacological versus any	12 (20.0%)
pharmacological versus pharmacological	8 (13.3%)
pharmacological versus placebo	40 (66.7%)
Ranking of treatments after leave-one-trial out approach
All remained unchanged	12 (20.0%)
Have some change	48 (80.0%)

^amedian (1^st and 3^rd quantile).

NMA-level association between uncertainty and robustness

For the 60 NMAs, the associations between the average normalized entropy and the average, minimum, maximum value of quadratic weighted Cohen’s kappa were presented in Figure 1. Their Pearson’s correlation coefficients were −0.59, −0.50, and 0, respectively. While the average normalized entropy was less than 0.4, the average quadratic weighted Cohen’s kappa values were close to 1. When the normalized entropy increased, the variation of average quadratic weighted Cohen’s kappas increased, but they remained almost greater than 0.9. The minimum value of quadratic weighted Cohen’s kappa showed greater variations when the average normalized entropy was high. When the average normalized entropy was less than 0.25, the minimum values of quadratic weighted Cohen’s kappa became 1, i.e. a perfect agreement. When the average normalized entropy was greater than 0.75, the minimum value of quadratic weighted Cohen’s kappa ranged between 0.2 and 0.8. In general, the higher the average normalized entropy was, the lower the minimum quadratic weighted Cohen’s kappa was. However, some NMAs with a high average normalized entropy showed high minimum quadratic weighted Cohen’s kappa. The maximum value of quadratic weighted Cohen’s kappa were all 1 for 60 NMAs, showing that there was at least one trial, the deletion of which did not change their treatment ranking. We also used five levels of the uncertainty and robustness of treatment ranking to present their associations in Appendix Table 3.

Figure 1.

Scatter plots of average normalized entropy and (A) average/(B) minimum/(C) maximum quadratic weighted Cohen’s kappa for 60 networks.

Treatment-level association between uncertainty and robustness

The 60 NMAs included 348 treatments, and Figure 2 showed the scatterplot for the association between the normalized entropy and the percentage of treatments that did not change rank. Their Pearson’s correlation coefficient was −0.59. Each point represented a treatment. Among 348 treatments, the percentage of trials, the deletion of which did not change rank of the treatment, ranged from 37% to 100%. For those treatments whose ranks were changed by the deletion of a trial, over 25% of them are in the high and very high levels of ranking uncertainty.

Figure 2.

Scatter plot of normalized entropy and percentage of treatments that did not change rank for 348 treatments within 60 NMAs.

The detailed methods and results of regression analysis were presented in Appendix-4. The inverse association still presented between robustness and uncertainty of ranking when the treatment-level and NMA-level factors were adjusted.

Discussion

In this empirical study analyzing 60 NMAs, as we expected, the treatment ranking of an NMA with low uncertainty of ranking is unlikely to be altered by subtle changes of the database; however, when the uncertainty of ranking is high, the robustness of ranking showed a wide range. Therefore, the high robustness of ranking does not always correspond to the low uncertainty of ranking, indicating that robustness and uncertainty are two correlated but distinctive concepts.

However, most NMAs only use one of them to evaluate the reliability of ranking.^13–15 When there is no outlying trial within the network, the robustness of ranking may be high, but the ranking can still be of great uncertainty. Therefore, the evaluation of the reliability of ranking should be conducted in two steps. The first is to evaluate the uncertainty of ranking. If the uncertainty is low, we expect the ranking to be reliable. If the ranking uncertainty is high, the robustness of ranking can determine whether any outlying trials influence the overall ranking. This two-step process provides a more comprehensive evaluation of treatment rankings from NMAs.

The squared weighted Cohen’s kappa was recommended to quantify the agreement between treatment rankings.¹² It measures the changes in ranking by assigning a more significant penalty to a greater difference in ranking position. However, Cohen’s kappa is an NMA-level statistic and cannot be used to evaluate of ranking robustness at the treatment level. We, therefore, used the percentage of trials, the deletion of which does not change the rank of treatment, to assess the robustness of ranking for individual treatments. At the NMA-level, we computed the minimum, average, and maximum value of the quadratic weighted Cohen’s kappa to represent the robustness of ranking for a NMA. The maximum and average value of quadratic weighted Cohen’s kappa are less useful, since we want to know the maximum impact caused by deletion a trial within the NMA. Therefore, we recommend using the minimum value of weighted Cohen’s kappa to represent the overall robustness of ranking at the NMA level.

The normalized entropy we used to quantify ranking uncertainty can provide treatment or NMA-level information while ranking robustness can additionally provide trial-level information. Different levels of information are all needed when we evaluate the ranking of NMAs. We may want to find out which NMA or which treatment may need to gather more evidence and also which trial may affect ranking the most and is needed to flag out for further investigation.

There are some limitations to our analysis. Firstly, NMAs included in this study are those with two or more trials in each arm, i.e. the selected NMAs contained more data. Since evaluating the robustness of ranking needs to remove each trial in turn, NMAs were excluded if removing a trial would break the network. Therefore, alternative approaches are required to assess the robustness of ranking for NMAs excluded from our evaluation. Secondly, we only included those NMAs using odds ratio and mean difference as outcome measures. Further analysis can be conducted to compare these two metrics for other outcome measures.

Using the robustness of ranking alone may lead to considering treatment rankings with high robustness but high uncertainty to be reliable. We may then conclude that the results can be applied confidently in clinical practice and no further evidence is required to improve our knowledge of the relative efficacy those treatments. Therefore, we recommend using the two-step evaluation process to comprehensively evaluate treatment ranking derived from NMAs in the future.

Supplemental Material

Supplemental Material - High robustness does not always imply low uncertainty of treatment rankings: An empirical study of 60 network meta-analyses

Supplemental Material for High robustness does not always imply low uncertainty of treatment rankings: An empirical study of 60 network meta-analyses by Yun-Chun Wu and Yu-Kang Tu in Research Methods in Medicine & Health Sciences

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Financial support for this study was provided entirely by grants from the Ministry of Science and Technology in Taiwan (grant number: MOST 109-2314-B-002 -150 -MY3). The funding agreement ensured the authors’ independence in designing the study, interpreting the data, writing, and publishing the report.

ORCID iDs

Yun-Chun Wu

Yu-Kang Tu

Supplemental Material

Supplemental material for this article is available online.

References

Ades

. Combination of direct and indirect evidence in mixed treatment comparisons. Stat Med. 2004; 23(20): 3105–3124.

Salanti

Ades

Ioannidis

. Graphical methods and numerical summaries for presenting results from multiple-treatment meta-analysis: an overview and tutorial. J Clinical Epidemiology. 2011; 64(2): 163–171.

Rücker

Schwarzer

. Ranking treatments in frequentist network meta-analysis works without resampling methods. BMC Med Res Methodol. 2015; 15(1): 58.

Bafeta

Trinquart

Seror

, et al. Reporting of results from network meta-analyses: methodological systematic review. Bmj-Brit Med J 2014; 348: g1741.

Carroll

Hemmings

. On the need for increased rigour and care in the conduct and interpretation of network meta-analyses in drug development. Pharm Stat. 2016; 15(2): 135–142.

Yepes-Nuñez

S-A

Guyatt

, et al. Development of the summary of findings table for network meta-analysis. J Clin Epidemiol. 2019; 115: 1–13.

Faltinsen

Storebø

Jakobsen

, et al.

Network meta-analysis: the highest level of medical evidence?

BMJ Evidence-Based Medicine. 2018; 23(2): 56–59.

Cipriani

Higgins

Geddes

, et al. Conceptual and technical challenges in network meta-analysis. Ann Internal Medicine. 2013; 159(2): 130–137.

Rosenberger

Duan

Chen

, et al. Predictive P-score for treatment ranking in Bayesian network meta-analysis. Bmc Med Res Methodol. 2021; 21(1): 213.

10.

Trinquart

Attiche

Bafeta

, et al. Uncertainty in treatment rankings: reanalysis of network meta-analyses of randomized trials. Ann Internal Medicine. 2016; 164(10): 666–673.

11.

Y-C

Shih

M-C

Y-K

. Using normalized entropy to measure uncertainty of rankings for network meta-analyses. Med Decis Making. 2021; 41(6): 706–713.

12.

Daly

Neupane

Beyene

, et al. Empirical evaluation of SUCRA-based treatment ranks in network meta-analysis: quantifying robustness using Cohen’s kappa. BMJ Open. 2019; 9(9): e024625.

13.

Noma

Gosho

Ishii

, et al. Outlier detection and influence diagnostics in network meta-analysis. Res Synth Methods. 2020.11: 891–902.

14.

Zhang

Yuan

Chu

. The impact of excluding trials from network meta-analyses - an empirical study. PLoS One. 2016; 11(12): e0165889.

15.

Zhang

Carlin

. Detecting outlying trials in network meta-analysis. Stat Med. 2015; 34(19): 2695–2707.

16.

Petropoulou

Nikolakopoulou

Veroniki

A-A

, et al. Bibliographic study showed improving statistical methodology of network meta-analyses published between 1999 and 2015. J Clinical Epidemiology. 2017; 82: 20–28.

17.

White

. Network meta-analysis. Stata J. 2015;15(4):951–985.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.66 MB