Abstract
Introduction
Network meta-analysis (NMA) synthesizes direct and indirect evidence to compare multiple treatments within a connected network. 1 To assist with recommending the best treatments, NMA computes ranking probabilities for each treatment and then calculates indices, such as P-score and SUCRA (surface under the cumulative ranking curve), to determine the ranking of treatments.2,3 As ranking can always be obtained from NMA, it is, therefore, important to know how reliable the ranking is4–6 and how likely the ranking is to be altered by new evidence.7–9
The uncertainty of ranking10,11 and robustness of ranking 12 are two concepts related to the reliability of ranking. The uncertainty of ranking can be visualized by the distribution of ranking probabilities of a treatment. The more concentrated the ranking probabilities, the lower the uncertainty of ranking is. The 95% credible interval of SUCRA and normalized entropy are two quantitative approaches to represent the uncertainty of ranking for treatments.10,11 However, the 95% credible interval of SUCRA is not a good index, because its range is affected by the total number of treatments included in the network. 11 The robustness of ranking measures how sensitive the ranking is to subtle alterations of a dataset. The approach proposed to evaluate the robustness of ranking is to remove one trial and then evaluate how the ranking would change. 12 When the agreement between the two rankings derived from the complete dataset and the modified dataset with one trial removed is high, the treatment ranking of a NMA is considered robust.
Both the uncertainty and robustness of ranking have been applied to published NMAs to evaluate the reliability of treatment ranking,10,12 but whether these two approaches would yield similar conclusions on the reliability of ranking has not yet been fully explored. One study explored the association between uncertainty and robustness of ranking. 12 However, that study only analyzed two NMAs, which is too small to generalize.
In this study, we aim to empirically investigate the relationship between the uncertainty and the robustness of treatment ranking by using a database of NMA. These two concepts are often presented in NMA results to show the reliability of ranking7,13–15; however, they are rarely both reported and compared in an NMA. We would like to examine whether the high robustness of treatment ranking is associated with low uncertainty or whether they are two independent concepts. Their association would be investigated both at the treatment-level and NMA-level.
Methods
Data source
We used datasets of previously published NMAs, from a database maintained by Petropoulou et al. at the Institute of Social and Preventive Medicine (ISPM), University of Bern. The database can be downloaded by using the R package nmadb. 16 We used NMAs flagged as verified and those with odds ratio and mean difference as outcome measures, in the arm-based data format, with ten or fewer treatments. If a network became disconnected when we use the leave-one-trial-out approach, it would also be excluded.
Reanalysis of NMA
We reanalyzed and estimated relative effects by using the network suit in Stata. 17 Based on the estimated relative effects of each treatment, probabilities of being the best or the other ranks were calculated from 1000 draws from which SUCRA was obtained. 2 The ranking of treatments is then determined according to their SUCRA values; the greater its SUCRA is, the higher its ranking is. Then, treatment-level and NMA-level assessments of the uncertainty and the robustness of treatment ranking were conducted.
Uncertainty of treatment ranking
The uncertainty of treatment ranking is quantified by using normalized entropy,
11
which is firstly used the distribution of ranking probability in each position to calculate Shannon’s entropy
Robustness of treatment ranking
To evaluate the robustness of treatment ranking, 12 each trial within a NMA was deleted from the network in turn during each re-analysis and a new ranking of treatment was computed. We computed the Cohen’s kappa coefficients for assessing the agreement of ranking for each dataset with one trial removed. The formula of Cohen’s kappa coefficient (ĸ) is in the Appendix-2. We classified the robustness of treatment ranking into five levels: slight (<0.2), fair (0.2–0.4), moderate (0.4–0.6), substantial (0.6–0.8), and almost perfect (>0.8) agreement.
Association between the uncertainty and robustness of ranking
Since the uncertainty of ranking is quantified by normalized entropy for each treatment and robustness of ranking is quantified by quadratic weighted Cohen’s kappa for each trial, we take the average of normalized entropies of all treatments and the average of quadratic weighted Cohen’s kappa of all trials within the NMA to represent the overall uncertainty and robustness of ranking (NMA-level evaluation). We also measured the percentage of trials whose removal does not affect the ranking of a treatment to represent the robustness of ranking to compare with the uncertainty of each treatment (treatment-level evaluation). Scatter plots and Pearson’s correlation coefficients were used to demonstrate the direction and strength of the association. Linear mixed model was conducted to investigate the effects of treatment-level or NMA-level factors (Appendix Table 1), such as the number of participants or number of treatments included in the NMA, on the association between uncertainty and robustness of ranking.
In addition to using the average of quadratic weighted Cohen’s kappa to measure the robustness of treatment ranking of the whole network, we also used the minimum and maximum values of quadratic weighted Cohen’s kappa within each network to represent the worst and the best scenarios when one trial was deleted from the original dataset. Compared to the average value, the minimum and maximum values of quadratic weighted Cohen’s kappa are expected to be less related to the number of treatments in the network.
Software
We used the statistical software package R (version 4.0.2, R development Core Team) to download datasets from nmadb package, and used RStata package to call STATA (version 14, Stata Corp, 4905 Lakeway Drive, College Station, Texas, USA) from R to undertake NMA. All the other analyses were undertaken using the statistical software R.
Results
Summary of the 60 NMAs.
amedian (1st and 3rd quantile).
NMA-level association between uncertainty and robustness
For the 60 NMAs, the associations between the average normalized entropy and the average, minimum, maximum value of quadratic weighted Cohen’s kappa were presented in Figure 1. Their Pearson’s correlation coefficients were −0.59, −0.50, and 0, respectively. While the average normalized entropy was less than 0.4, the average quadratic weighted Cohen’s kappa values were close to 1. When the normalized entropy increased, the variation of average quadratic weighted Cohen’s kappas increased, but they remained almost greater than 0.9. The minimum value of quadratic weighted Cohen’s kappa showed greater variations when the average normalized entropy was high. When the average normalized entropy was less than 0.25, the minimum values of quadratic weighted Cohen’s kappa became 1, i.e. a perfect agreement. When the average normalized entropy was greater than 0.75, the minimum value of quadratic weighted Cohen’s kappa ranged between 0.2 and 0.8. In general, the higher the average normalized entropy was, the lower the minimum quadratic weighted Cohen’s kappa was. However, some NMAs with a high average normalized entropy showed high minimum quadratic weighted Cohen’s kappa. The maximum value of quadratic weighted Cohen’s kappa were all 1 for 60 NMAs, showing that there was at least one trial, the deletion of which did not change their treatment ranking. We also used five levels of the uncertainty and robustness of treatment ranking to present their associations in Appendix Table 3. Scatter plots of average normalized entropy and (A) average/(B) minimum/(C) maximum quadratic weighted Cohen’s kappa for 60 networks.
Treatment-level association between uncertainty and robustness
The 60 NMAs included 348 treatments, and Figure 2 showed the scatterplot for the association between the normalized entropy and the percentage of treatments that did not change rank. Their Pearson’s correlation coefficient was −0.59. Each point represented a treatment. Among 348 treatments, the percentage of trials, the deletion of which did not change rank of the treatment, ranged from 37% to 100%. For those treatments whose ranks were changed by the deletion of a trial, over 25% of them are in the high and very high levels of ranking uncertainty. Scatter plot of normalized entropy and percentage of treatments that did not change rank for 348 treatments within 60 NMAs.
The detailed methods and results of regression analysis were presented in Appendix-4. The inverse association still presented between robustness and uncertainty of ranking when the treatment-level and NMA-level factors were adjusted.
Discussion
In this empirical study analyzing 60 NMAs, as we expected, the treatment ranking of an NMA with low uncertainty of ranking is unlikely to be altered by subtle changes of the database; however, when the uncertainty of ranking is high, the robustness of ranking showed a wide range. Therefore, the high robustness of ranking does not always correspond to the low uncertainty of ranking, indicating that robustness and uncertainty are two correlated but distinctive concepts.
However, most NMAs only use one of them to evaluate the reliability of ranking.13–15 When there is no outlying trial within the network, the robustness of ranking may be high, but the ranking can still be of great uncertainty. Therefore, the evaluation of the reliability of ranking should be conducted in two steps. The first is to evaluate the uncertainty of ranking. If the uncertainty is low, we expect the ranking to be reliable. If the ranking uncertainty is high, the robustness of ranking can determine whether any outlying trials influence the overall ranking. This two-step process provides a more comprehensive evaluation of treatment rankings from NMAs.
The squared weighted Cohen’s kappa was recommended to quantify the agreement between treatment rankings. 12 It measures the changes in ranking by assigning a more significant penalty to a greater difference in ranking position. However, Cohen’s kappa is an NMA-level statistic and cannot be used to evaluate of ranking robustness at the treatment level. We, therefore, used the percentage of trials, the deletion of which does not change the rank of treatment, to assess the robustness of ranking for individual treatments. At the NMA-level, we computed the minimum, average, and maximum value of the quadratic weighted Cohen’s kappa to represent the robustness of ranking for a NMA. The maximum and average value of quadratic weighted Cohen’s kappa are less useful, since we want to know the maximum impact caused by deletion a trial within the NMA. Therefore, we recommend using the minimum value of weighted Cohen’s kappa to represent the overall robustness of ranking at the NMA level.
The normalized entropy we used to quantify ranking uncertainty can provide treatment or NMA-level information while ranking robustness can additionally provide trial-level information. Different levels of information are all needed when we evaluate the ranking of NMAs. We may want to find out which NMA or which treatment may need to gather more evidence and also which trial may affect ranking the most and is needed to flag out for further investigation.
There are some limitations to our analysis. Firstly, NMAs included in this study are those with two or more trials in each arm, i.e. the selected NMAs contained more data. Since evaluating the robustness of ranking needs to remove each trial in turn, NMAs were excluded if removing a trial would break the network. Therefore, alternative approaches are required to assess the robustness of ranking for NMAs excluded from our evaluation. Secondly, we only included those NMAs using odds ratio and mean difference as outcome measures. Further analysis can be conducted to compare these two metrics for other outcome measures.
Using the robustness of ranking alone may lead to considering treatment rankings with high robustness but high uncertainty to be reliable. We may then conclude that the results can be applied confidently in clinical practice and no further evidence is required to improve our knowledge of the relative efficacy those treatments. Therefore, we recommend using the two-step evaluation process to comprehensively evaluate treatment ranking derived from NMAs in the future.
Supplemental Material
Supplemental Material - High robustness does not always imply low uncertainty of treatment rankings: An empirical study of 60 network meta-analyses
Supplemental Material for High robustness does not always imply low uncertainty of treatment rankings: An empirical study of 60 network meta-analyses by Yun-Chun Wu and Yu-Kang Tu in Research Methods in Medicine & Health Sciences
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Financial support for this study was provided entirely by grants from the Ministry of Science and Technology in Taiwan (grant number: MOST 109-2314-B-002 -150 -MY3). The funding agreement ensured the authors’ independence in designing the study, interpreting the data, writing, and publishing the report.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
