Abstract
Background:
Randomized controlled trials are the gold standard for demonstrating safety and efficacy of coronary devices with or without accompanying drug treatments in interventional cardiology. With the advent of last-generation drug-eluting stents having enhanced technical attributes and long-term clinical benefits, the proof of incremental angiographic or long-term clinical efficacy becomes more challenging. The purpose of this review is to provide an overview of the most common and alternative study endpoints in interventional cardiology and their potential reimbursement value. Moreover, we intend to describe the statistical limitations in order to demonstrate differences between potential treatment groups. Furthermore, careful endpoint recommendations for a given patient number are offered for future study designs.
Methods:
The number of patients per treatment group was estimated for various study designs such as noninferiority test hypotheses with hard clinical endpoints and various surrogate endpoints. To test for differences in various surrogate endpoint scenarios, the corresponding patient group sizes were explored. To evaluate these endpoints in terms of their reimbursement impact, preferred endpoints for technical appraisals in interventional cardiology at the National Institute of Health and Care Excellence (NICE) were used.
Results:
Even with the most stringent experimental control to reduce bias-introducing factors, studies with hard primary clinical endpoints such as the occurrence of major adverse cardiac events (MACE) or target-lesion revascularization (TLR) rates remain the gold standard, with numbers reaching into the 300–700 patient range per group. Study designs using loss in fractional-flow reserve (FFR) or stent-strut-coverage rates can be statistically formulated; however, the clinical ramifications for the patient remain to be discussed. Nonrandomized study designs with intrapatient angiographic controls in nontarget vessels may merit further thoughts and explorations.
Conclusions:
From a reimbursement impact, the primary endpoints MACE and TLR are the best choices for a moderately sized study population of 500 patients per group. Angiographic endpoints, in particular minimal lumen diameter (MLD), are not useful in this context. The emerging endpoints such as loss in FFR or stent coverage require smaller patient populations. However, their impact on reimbursement-related decisions is limited.
Keywords
Background
The Benestent trial [Serruys et al. 1994] is one of the most important landmark studies in interventional cardiology that was conducted more than 20 years ago. It revealed that stenting was superior to plain old balloon angioplasty (POBA). Its ‘statistical engine’ was a continuity-corrected Chi-squared test (double-sided α = 5%, 80% power), based on an event rate of 30% in the POBA group and 18% in the bare-metal stent (BMS) group. Expected differences in event rates of 12% are in today’s clinical practice a rare privilege for statisticians. Since then, improvements in procedural outcomes, comedication regimens and devices, led to very small differences between two treatment groups. Therefore, clinical trials based on these small, expected differences are very challenging to conduct. If one ventures to compare two drug-eluting stents (DESs) in terms of binary-restenosis rates using the same statistical hypothesis as in the Benestent trial, a minimum of 984 patients per group would have to be recruited (double-sided α = 5%, 80% power, binary-restenosis rate = 5%, 50% reduction). Given bias-introducing factors, for example, comedication compliance with or without center effects, this study appears to be an effortful task.
This review provides sample-size estimates for standard designs in interventional cardiology such as ‘test for differences’ or ‘noninferiority’ designs of hard clinical and angiographic endpoints. In addition, other potential endpoints with the aim to minimize the number of patients for exploratory purposes are presented and discussed. The proposed endpoints are then gauged according to their reimbursement value.
Methods and results
Statistical analysis
All sample-size estimates were calculated with nQuery/nTerim version 2.0 (Statistical Solutions Ltd., Cork, Ireland). Test hypotheses were divided in terms of endpoints (clinical and surrogate) and test hypotheses (noninferiority and test for difference). For simplicity reasons we will refer to the ‘test for difference’ as a ‘superiority’ design.
Reimbursement impact
Due to the plethora of reimbursement systems worldwide, a grossly simplified approach was chosen to determine the level of impact for selected study endpoints in the field of interventional cardiology. The National Institute of Health and Care Excellence (NICE) in the UK publishes technology appraisals in a number of evaluation pathways for major indications. Within the framework of coronary artery disease (CAD), NICE appraised a total of five technologies for DESs [NICE, 2010a], BMSs [NICE, 2011], bio-absorbable scaffolds [NICE, 2014a], drug-coated balloons [NICE, 2010b] and CAD-relevant comedication [NICE, 2003]. These appraisals were investigated for the frequency of study endpoints that constituted the basis for their relative reimbursement impact in this review.
Surrogate endpoint study designs
Angiographic or test for difference (superiority)
In the most commonly used study design, an angiographic benefit in the treatment group is shown by a significantly lower late lumen loss [(LLL) in lesion or in segment]. This corresponds to the following test hypothesis:
Ho: LLL in the treatment group is equal to or higher than the LLL in the control group.
Ha: LLL in the treatment group is lower than the LLL in the control group.
Angiographic or noninferiority
Another common design is defined with a noninferiority test hypothesis
Ho: LLL in the treatment group is higher than or equal to the LLL in the control group plus a noninferiority margin ΔLLL.
Ha: LLL in treatment group is lower than the LLL in the control group plus a noninferiority margin ΔLLL.
Alternative angiographic endpoints
A single-group design with intrapatient control measurements such as reference vessel mean lumen diameters in a nontarget vessel may also be used when a patient control group is not feasible. This, however, may only be justified when lumen changes in the entire coronary vasculature are not expected. Kleber and coworkers investigated the effect of positive remodeling in target lesions as compared with nontarget-vessel-reference vessel diameters as an intrapatient control [Kleber et al. 2014]. They observed target-lesion lumen enlargement when compared with nontarget vessel mean lumen diameters within the same patient:
Ho: The mean lumen diameter in the treated segment (or lesion) of the target vessel is lower or equal to the mean lumen diameter in the control segment (or lesion) of a nontarget vessel within the same patient.
Ha: Mean lumen diameter in the treated segment (or lesion) of the target vessel is larger than the mean lumen diameter in the control segment (or lesion) of a nontarget vessel within the same patient.
Clinical endpoint/superiority
To show significantly lower event rates [e.g. target-lesion revascularization (TLR) or major adverse cardiac event (MACE)] the following test hypothesis is typically applied:
Ho: The event rates π1 in the treatment group is higher or equal to the event rate π2 in the control group.
Ha: The event rate π1 in the treatment group is lower than the event rate π2 in the control group.
Clinical endpoint/non-inferiority
Another common design is defined by the noninferiority test hypothesis:
Ho: The event rate π1 in the treatment minus the event rate π2 in the control group is larger than or equal to the non-inferiority margin Δ.
Ha: The event rate π1 in the treatment minus the event rate π2 in the control group is smaller than the non-inferiority margin Δ.
Fractional-flow reserve/superiority
Despite its lack of universal acceptance, a less common study design can be formulated based on a difference in fractional-flow reserve (FFR) between FFR values after the intervention and at the follow-up interval. Absolute FFR values can be obtained from the relevant literature for sample size estimates [Pijls et al. 2000; De Bruyne et al. 2012; Johnson et al. 2014]:
ΔFFRcontrol group = FFRcontrol group post PCI – FFRcontrol group 6 months
ΔFFRtreatment group = FFR treatment group post PCI – FFR treatment group 6 months
where δFFR can be defined as 0.05 or 0.10 depending on the expected treatment effect.
Ho: ΔFFRtreatment group – ΔFFRcontrol group ⩾ δFFR
that is, the investigational device is inferior to the device in the control group.
Ha: ΔFFRtreatment group – ΔFFRcontrol group < δFFR
that is, the investigational device is not inferior to the device in the control group.
Vasomotility and mean lumen diameter
A potential endpoint to study endothelial cell regardless of smooth-muscle cell functionality can be defined in terms of drug-induced mean lumen diameter changes at baseline
Ho: The mean lumen diameter in the treated lesion at
Ha: Mean lumen diameter in the treated lesion at
Ratios of uncovered stent struts
A percentage of uncovered struts between two treatment groups could also be defined as a potential safety endpoint [Adriaenssens et al. 2014]. Moreover, ratios of uncovered struts could conceptually be determined at different time points within one group to document the time course of stent coverage (e.g. 4 weeks and 3 months). The rationale of this approach would be that if there are less uncovered struts as compared with a control group, the risk of stent thrombosis (ST) would be reduced. Recently, this design was used by Karjalainen and coworkers, who used a noninferiority margin of 5% and a standard deviation for their measurements of 5% [Karjalainen et al. 2015].
The corresponding test hypotheses can be formulated as follows with the difference in uncovered struts and the following definitions:
Δuncovered struts = percentage difference of uncovered stent struts between two treatment groups (BMS
H0: Δuncovered struts ⩾ δstrut coverage
HA: Δuncovered struts < δstrut coverage
Results
The most common angiographic endpoints are listed in Table 1. The most frequently used surrogate endpoint in interventional cardiology is late lumen loss (LLL) within the lesion or within a defined segment.In general, the larger the standard deviations of LLL, the more patients have to be recruited (Figure 1). With an expected difference in LLL of at least 0.3 mm, a superiority design would require less than 50 patients per treatment group independent of the assumed LLL standard deviations. However, the necessary number of patients can quickly increase into the 200–300 range if the expected LLL difference is in the range of 0.1 mm (top panel, Figure 1). The corresponding sample size three-dimensional plots are different for noninferiority and superiority designs (Figure 1: top
Angiographic surrogate endpoints.
QCA, quantitative coronary angiography.

Sample-size estimates for a late lumen loss (LLL) superiority design with various common standard deviations and expected difference (top panel), and sample-size estimates for a noninferiority design with different noninferiority margins and commonly observed LLL standard deviations.
Other surrogate endpoints (Table 2) are also described in the literature, such as FFR [Pijls et al. 2000; De Bruyne et al. 2012; Johnson et al. 2014], which can be defined as a pressure drop across a lesion.Due to the novelty of the ‘loss in FFR’ as a primary endpoint, sample-size estimates were calculated for various common standard deviations and noninferiority margins (Figure 2). With a common standard deviation of 0.1 and a noninferiority margin of 0.4, 100 patients per group are needed.
Other surrogate endpoints.
FFR, fractional flow reserve; OCT, optical coherence tomography; BVS, bioresorbable vascular scaffold; DCB, drug-coated balloon.

Sample-size estimates for a noninferiority design with difference in fractional-flow reserve (FFR) as the primary endpoint for various noninferiority margins and common standard deviations in FFR measurements.
Furthermore, the percentage of uncovered struts as a primary endpoint has gained popularity since rapidly covered stent struts may have an advantage in terms of the reduced risk of stent thrombosis (ST) [Tahara et al. 2011]. This would translate to a reduced duration of dual antiplatelet therapy (DAPT), a benefit to the patient. Figure 3 illustrates the number of patients per group in a noninferiority design for covered stent struts with various noninferiority margins and common standard deviations.

Sample-size estimates for a noninferiority design with percentage of uncovered stent rates as the primary endpoint, various noninferiority margins and common standard deviations.
Table 3 lists clinical endpoints such as the TLR rate, MACE rate or the target-vessel failure (TVF) rate. ST and stroke rates may serve as safety endpoints for selected patient populations. Figure 4 illustrates the needed patient numbers for TLR in a noninferiority design. These estimates also hold true for MACE and TVF rates. Given an 8% expected event rate in the treatment group and a noninferiority margin of 4%, the group size would need to be 440 patients. The largest patient groups are necessary if the ST rate is chosen in a noninferiority design. Given the already-low incidence ratesin the sub-1% range and noninferiority margins ranging from 0.2–0.6%, thousands of subjects are required for a properly designed trial (Figure 5).
Clinical endpoints.
CABG, coronary-artery bypass graft; DES, drug-eluting stents; MACE, major adverse cardiac event; TLR, target-lesion revascularization; Re-PCI, repeat percutaneous coronary intervention; TVR, target-vessel revascularization; TVF, target-vessel failure; MI, myocardial infarction; ST, stent thrombosis.

Sample-size estimates for a noninferiority design in terms of target-lesion revascularization (TLR) as the primary endpoint, with various noninferiority margins and expected TLR rates in the treatment group.

Sample-size estimates for a noninferiority design with stent-thrombosis (ST) rates as the primary endpoint with various noninferiority margins and expected ST rates in the treatment group.
The summarized study-population sizes are complemented with the postulated reimbursement value (Table 4). Basis for the reimbursement impact were the aforementioned NICE appraisals in the field of CAD. In the searched appraisals, MACE was identified as the most frequent endpoint for studying the clinical evidence for a partic-ular technology. Myocardial infarction (MI) as a reimbursement-relevant endpoint was used in 17.6% of all referenced endpoints, followed by TLR, TVR and cardiac death, with 11.8%. Based on these findings, their relative impact was rated with 100% in the case of MACE, 80% in case of MI, and 70% for TLR/TVR and cardiac death (Table 4).
Study population sizes and postulated reimbursement value.
As of February 2016.
Based on NICE appraisals TA152, IPG492, TA71, TA236 and MTG1.
Based on NICE appraisal MIB2.
FFR, fractional flow reserve; MACE, major adverse cardiac advent; MI, myocardial infarction; NICE, National Institute of Health and Care Excellence; ST, stent thrombosis; TLR, target-lesion revascularization; TVR, target-vessel revascularization.
Discussion
Late lumen loss
LLL, PS and MLD are based on edge-detection algorithms applied in coronary quantitative angiography (QCA), which requires a calibration based on the outer diameter of a contrast-dye-filled guiding-catheter tip (6F = 2.0 mm). Defined on this reference calibre, all diameters obtained from the shadow images of an angiogram are the foundation to compute LLL, PS and MLD. The major methodological advantage is that a blinding relative to the treatment group can be elegantly performed when the device type cannot be recognized in the angiogram. True blinding (investigator
Minimal lumen diameter
The advantages and disadvantages of LLL measurements also apply to the MLD. However, it is our opinion that the MLD seems to be hemodynamically indifferent because it does not consider the lesion length that impacts on the translesion pressure drop. This has been established in the human domain by Brosh and coworkers using FFR [Brosh et al. 2005], and also in experimental animal models [Young et al. 1975]; the latter demonstrated that there is a lesion-length-dependent pressure gradient across artificially produced stenoses. We should also not ignore the Navier–Stokes equation [Sherman, 1990] that provides the theoretical basis for pressure gradients across narrowed fluid boundaries defined by the diameter and length of the stenosis.
Fractional-flow reserve endpoints
By intuition, not every pressure drop across a lesion manifests itself as hemodynamically or clinically important. However, due to the averaging algorithm during pullback of the FFR wire, measurement variations are very small and in the ± 0.12 mmHg range [Pijls et al. 2000].
Even though a difference in FFR as an endpoint seems intuitive, it ought to be pointed out that a mere pressure drop independent of the lesion location may not have the merit to determine clinical efficacy. Also, the fact that FFR cannot be interpreted in a linear fashion, that is, a ΔFFR from 0.95 to 0.90 mmHg, is certainly less severe than a difference between 0.80 to 0.75 mmHg. Only when the lesion locations have the same impact on the coronary circulation, for example, with a lesion inclusion criterion such as the proximal left-anterior descending artery, may it be worthwhile to further explore FFR endpoints. Needless to say, the absolute FFR values at follow up should be above the established cut-off value of 0.80 [De Bruyne et al. 2012]. A recently published randomized trial comparing DCB and DES treatment in
Strut coverage
Strut coverage, by definition, can only provide an estimate on the temporal healing characteristics. It is, by definition, not an efficacy marker for vessel patency. Furthermore, it appears important that the formation of a functional endothelium can neither be expected nor truly documented with optical coherence tomography (OCT). In the current interventional cardiology community, there is an unmet need to reduce the duration of DAPT. Nevertheless, given that a completely covered stent has a lower risk for ST, an estimate on how fast stent struts can be covered seems desirable. A novel healing-response endpoint that includes the ratios of uncovered and malapposed stent struts was proposed by the US National Institute of Health and elaborated by Räber and coworkers, which seems to include more healing properties than the mere number of uncovered struts [Räber et al. 2015]. Unfortunately, there is no defined cut-off value for stent-strut coverage that translates to a no-risk situation for ST. There are literature reports, however, that state when 90–95% of stent struts are covered, the risk of ST without DAPT is sufficiently reduced [Tahara et al. 2011]. Nonetheless, it is debatable if the percentage of covered stent struts is an adequate measure to assess the risk of late ST and its continued need for DAPT. In the case of late ST, neoatheroma formation may be the culprit without underlying mechanical factors such as strut coverage.
Clinical endpoints
One side aspect of this review was to investigate other attractive endpoints to keep patient numbers low in early proof-of-concept trials. Clinical endpoints are most likely not the first approach when conducting early studies. However, it is undisputable that from a clinical standpoint and in terms of patient benefits, MACE, TVF or TLR rates are the most meaningful measures and the ‘gold standard’ for documenting efficacy. In this context, the importance to standardize clinical endpoints in coronary stent trials remains unchanged [Cutlip et al. 2007]. Recently, a large noninferiority margin of 4.5% with event rates of 7% was used to demonstrate noninferiority between resorbable scaffolds and DES by Ellis and coworkers [Ellis et al. 2015].
TLR, as the preferred efficacy endpoint was suggested by Silber and Herdeg, inasmuch as it establishes a direct cause-and-effect relationship between the lesion treatment and its failure [Silber and Herdeg, 2008].
Follow-up period
This review is an attempt to provide a first glance at study population sizes for preferred endpoints in patients with CAD. When biometric planning for a trial begins, comparable target populations and their expected outcomes for a specific treatment are elucidated first. If the number of patients is not manageable, study-budget-related questions gain importance. The provided estimates in this review may serve as a first guidance to eliminate possible trial designs that cannot be conducted with a reasonable number of patients. Often neglected, confounding factors may be introduced during the course of the follow up. By definition, once the study patients are discharged to the point of the follow up, there is a black box of confounding factors which may not be fully described. The patients’ reluctance to adhere to their DAPT may increase the risk of ST [Iakovou et al. 2005] or the level of exercise can have an impact on the recurrence of MI. Without the intention to delve into the postinterventional psychological aspects, there are also established relationships between compliance with comedication and psychological support for the patient [Warner et al. 2013].
Reimbursement
To have a clinical or angiographic benefit of one treatment group is the first step towards demonstrating clinical efficacy. However, which incremental benefit is worth additional spending in a particular healthcare system? We will have difficulties explaining a LLL improvement of 0.1 mm to a health-insurance provider, unless these angiographic results translate into clinical improvements. To our knowledge, there is no accepted methodology available that gauges the reimbursement value for various endpoints as a function of the number of patients. A first step on this terrain is attempted with the endpoints illustrated in Figure 6 on the basis of the sample-size estimate in Figures 1 –5. We are aware that this is a first crude attempt to relate endpoints, their sample-size estimates and their frequency of use in NICE appraisals to formulate a relative reimbursement value in %.

Number of patients
Given that TLR directly correlates with the treatment success or efficacy, as previously suggested [Räber et al. 2015], the reimbursement value of 70% seems adequate, whereas MACE remains the gold standard. In addition, TLR rates were the most useful endpoints of the main Markov transition-model inputs in a cost-efficacy study conducted by NICE [NICE, 2010b] and Bonaventura and coworkers, comparing DES and DCB angioplasty [Bonaventura et al. 2012]. The reimbursement value for the rate of ST as a primary endpoint is rather limited (20%) inasmuch as it is not an efficacy endpoint but a safety endpoint with a low incidence rate of 0.5–2.0% at 1 year. Despite this semiquantitative and certainly somewhat subjective attempt to portray these complex relationships of reimbursement value, clinical importance and study-population sizes, it seems obvious that angiographic endpoints are not the favourites in this context. Cardiac death is also of high interest from a reimbursement point of view. Nevertheless, there is the ethical dilemma due to fact that there are no additional treatment costs when this endpoint is reached.
In this review, we refrained from quality-of-life endpoints, even though they have been used in various patient subsets, such as patients undergoing aortic-valve replacements [Tully et al. 2015]. In multimorbid patients, by nature, multiple factors contribute to the overall health status so that causal relationships between the primary treatment and the overall health status are difficult to determine.
Limitations
The estimated number of patients for a given test hypothesis are based on available literature references. They do not replace a properly conducted sample-size calculation for a particular design. Furthermore, most endpoints, such as those related to safety and efficacy, are not interchangeable for a given target population and the objective of a trial. To gauge reimbursement values, each endpoint was based on NICE appraisals only and should be expanded to other cost-benefit analyses.
Conclusions
In terms of reimbursement value, the primary endpoints MACE and TLR remain the best choice for a moderately sized study population of 500 patients per group. The angiographic endpoint MLD does not reflect all aspects of the hemodynamic environment distal of the lesion, and appears to be of low-reimbursement impact. Even though it is desirable to refrain from angiographic endpoints from a reimbursement standpoint, other surrogate endpoints such as difference in FFR or strut coverage with smaller study populations may merit further explorations for proof-of-concept studies. Nevertheless, the emerging endpoints, such as loss in FFR or stent coverage, without clearly established clinical benefits, are not useful for reimbursement purposes. An intrapatient angiographic endpoint such as the mean lesion diameter in the target lesion as compared with a nontarget lesion may be useful if randomization is not possible.
Footnotes
Acknowledgements
We would like to express our gratitude to Viktor Breul (Medical Scientific Affairs, Aesculap AG, Tuttlingen, Germany) for his highly appreciated statistical support and expertise to verify the statistical basis of this review. We also wish to acknowledge Dr Christian Sperling (Medical Scientific Affairs, B. Braun Melsungen AG, Berlin, Germany) for his shared expertise.
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Conflict of interest statement
The authors declare that there is no conflict of interest.
