Abstract

While an increased cardiovascular risk from using cyclo-oxygenase-2 (COX-2) inhibitor nonsteroidal anti-inflammatory drugs (NSAIDs) such as rofecoxib has been clearly demonstrated, the risk–benefit profile of traditional NSAIDs is less clear. In a recent study published in the British Medical Journal (BMJ), Trelle and colleagues evaluated the cardiovascular risk of seven different widely used NSAIDs (naproxen, ibuprofen, diclofenac, celecoxib, etoricoxib, rofecoxib and lumiracoxib) and concluded that there was evidence of an increased risk of either myocardial infarction, stroke, cardiovascular death, death from any cause, or the combined Anti-Platelet Trialist Collaboration’s Combined endpoint, for each of the seven treatments [Trelle et al. 2011]. The findings are of interest, not just for the results themselves which lend support to those who argue that there are significant safety issues surrounding the use of both COX-2 inhibitor and traditional (non-COX-2 inhibitor) NSAIDs, but also for the network meta-analysis (NMA) design that was employed enabling a comparison of each of the seven different NSAIDs against placebo. A total of 31 trials were included in the analysis. The frequency of evaluation of each treatment varied considerably: celecoxib was investigated in 15 of the trials and compared with five different treatments; ibuprofen was investigated in only two of the 31 trials and compared with two other treatments. Employing a standard meta-analysis approach would not have allowed data from all 31 trials to be used and would also not have allowed a direct head-to-head comparison of the seven different NSAIDs with each other and of each against placebo.
Also known as mixed treatment comparisons meta-analyses, or multiple treatments meta-analyses, the term NMA describes the network of comparisons arising when a collection of studies each with different sets of treatments is combined [Salanti et al. 2008]. Unlike traditional meta-analysis, which can only summarize results of trials that evaluate the same treatment–placebo or treatment–treatment combination, NMAs combine the results from all studies that have at least one treatment in common. Instead of obtaining one overall summary effect, the NMA provides individual point estimates and confidence (or credibility) intervals of each treatment against a common comparator (e.g. placebo) and/or of each treatment against any other included treatment. For example, in the study by Trelle and colleagues, of the seven treatments studied, ibuprofen, diclofenac and etoricoxib were not at any time compared directly with placebo for any of the outcomes, but the risk against placebo could still be determined for each treatment, and then ranked accordingly [Trelle et al. 2011].
Since patient prognostic factors at baseline of a randomized controlled trial (RCT) will likely differ from group to group between trials, the benefits of randomization would not normally hold in a direct comparison of treatment arms across trials [Glenny et al. 2005]. However, NMAs can partially preserve the benefits of randomization by effectively using a common control group for comparison, and can fully preserve randomization by incorporating a Bayesian random-effects design in which strength is borrowed across all treatments and all trials [Glenny et al. 2005; Lu and Ades, 2004]. A hierarchical Bayesian random-effects model was employed by Trelle and colleagues with two random effect levels: comparison and trial. The NMA assumed that log odds ratios were from the same common (random-effect) distribution, i.e. that the relative treatment effects of the different studies were sufficiently similar to be combined, and that they were additive. For example, the log odds ratio comparing placebo with celecoxib was deemed to be predictable from adding the log odds ratio of placebo versus naproxen with the log odds ratio of naproxen versus celecoxib.
The use of NMA offers an attractive methodology for the aggregation of trial data from a number of different treatments, especially when direct comparisons are either limited or unavailable, perhaps for ethical reasons. However, results from RCTs with direct comparisons remain the gold standard for estimating treatment effects, and those obtained using largely indirect comparisons, especially those that appear to cover different settings and times, need to be interpreted cautiously [Salanti et al. 2009]. As with traditional meta-analyses, a careful check for evidence of heterogeneity should be undertaken and covariate adjustment should also be performed, if necessary [Song et al. 2003]. In addition, NMAs also assume ‘coherence’ or ‘consistency’. That is, the treatment-effect estimates obtained from direct and indirect comparisons should match each other. Unfortunately, this is not always the reality with, in some instances, results from direct and indirect comparisons differing to the extent that they are in considerably opposing directions [Ioannidis, 2006]. The most obvious reason for inconsistency is violation of the exchangeability assumption of the various study patient populations. A treatment may be superior amongst patients resistant to improvement, i.e. patients receiving second- or even third-line therapy, but inferior when tested in the general population. Even when estimates do concur, the assumption of consistency is in practice difficult to establish since direct effect estimates can only be determined using the usually small number of trials in which direct comparisons were possible and the power to detect any such differences is therefore low. In addition, coherence of the network does not itself guarantee that the conclusions are generalizable and reliable [Lumley, 2002]. Against these caveats, Trelle and colleagues did at least establish consistency between direct and indirect comparisons, and additionally demonstrated that the results of traditional random-effects meta-analyses for all available direct comparisons were consistent with the NMA results. However, the authors still acknowledge that inconsistency cannot be ruled out by virtue of the low number of events and trials allowing direct comparisons [Trelle et al. 2011].
So what can we conclude from the recent BMJ NMA? First, the trial did appear to meet most of the relevant assumptions and criteria for quality. Their chosen model fully preserved randomization across trials, inconsistency was generally low (all below the cut-off value of 50% which is considered as the limit for consistency [Lu and Ades, 2004], and evidence of heterogeneity in direct comparisons was also low to moderate (the median between trial variance τ2 in the posterior distribution ranged from 0.03 to 0.12). However, although various secondary subset analyses were performed, no adjustments were performed in the primary analysis for study-level characteristics such as study period, duration, or the use of low-dose aspirin, or for patient-level characteristics such as age, the number of other routine medications and the existence of chronic disease. Against this background, the evidence for an increased cardiovascular risk still appeared convincing, at least in the primary analysis by virtue of the consistent demonstration of risk ratio point estimates greater than 1.3 for the majority (27 out of 35) of treatment–placebo comparisons across the five outcomes, and also with seven of the 35 treatment–placebo comparisons being statistically significant at the 0.05 level. A risk ratio of 1.3 was a level considered in advance to be clinically meaningful. There was however only limited evidence of a dose–response effect, and results were similar when the analysis was restricted to high-dose only trials. Finally, in a sensitivity analysis that included only trials with adjudicated events (which captured approximately 85% of events), there was no evidence for a statistically significant increase in risk for any of the treatments versus placebo and for any outcomes except for rofecoxib versus placebo for myocardial infarction.
Therefore, when the primary and secondary analyses are considered together, the findings of Trelle and colleagues add evidence in favor of an overall increase in cardiovascular risk with the use of traditional NSAIDs; however, sufficient uncertainty remains that debate surrounding the exact level of that risk will likely continue for now. Perhaps results from current large-scale clinical trials can be included in an additional NMA in the future, at which timepoint sufficient events will have occurred to enable an improved estimate of NSAID risk.
Footnotes
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Conflict of interest statement
The authors declare no conflicts of interest in preparing this article.
