Abstract
Background:
The averted events ratio (AER) is a recently developed estimand for non-inferiority active-control prevention trials with a time-to-event outcome. In contrast to the traditional rate ratio or rate difference, the AER is based on the number of events averted by each of the two treatments rather than the observed events. The AER requires an assumption about either the background event rate (the counterfactual placebo incidence) or the counterfactual effectiveness of the control treatment. We develop and present sample size formulae for trials in which the AER is defined as the primary estimand, and draw comparisons with the conventional 95-95 method based on the rate ratio.
Methods:
We express sample size in terms of the expected number of events and required person-years follow-up in the control and experimental arms. Sample size formulae were based on Wald confidence intervals on a logarithmic scale, assuming the active and control treatments to be equally effective. Using the AER, sample size depends on whether the analysis will be based on the counterfactual placebo incidence or the counterfactual treatment effectiveness. For both approaches, and the 95-95 method, sample size is a function of the background event rate, the effectiveness of the control treatment, the preservation-of-effect size (non-inferiority margin), the confidence limit for inferring non-inferiority, and the desired statistical power to demonstrate non-inferiority.
Results:
The smallest sample size is obtained using the AER based on the counterfactual placebo incidence. The advantage is greater the higher the value of the control treatment effectiveness. For example, compared with the 95-95 method, it allows between a 2.6-fold and 4.0-fold reduction in sample size for 50% treatment effectiveness (depending of the non-inferiority margin), and between a 7.7-fold and 11.9-fold reduction for 80% treatment effectiveness. The AER based on the control treatment effectiveness is less efficient but still requires smaller sample sizes than the 95-95 method: between a 1.5-fold and 2.9-fold reduction for 50% treatment effectiveness, and between a 2.3-fold and 6.4-fold reduction for 80% treatment effectiveness. Sample size is highly sensitive to the non-inferiority margin: increasing the preservation-of-effect size from 50% to 60% implies a 1.84-fold increase in the sample size; from 60% to 70%, an increase of 2.15-fold; and from 70% to 80%, an increase of 2.55-fold.
Conclusion:
As well as having important advantages of interpretation, using the AER as the primary estimand in active-control non-inferiority trials permits smaller and more cost-effective studies. Ideally, the AER should be derived via the counterfactual placebo incidence when this is practicable.
Keywords
Introduction
Active-control trials, where an experimental treatment is compared with a standard treatment, are performed when the inclusion of a placebo control group is considered unethical. 1 They are often conducted within a non-inferiority framework, where a critically important design decision is the specification of the non-inferiority margin.2,3 One approach to this problem is to attempt to demonstrate that the experimental treatment preserves a minimum fraction (typically 50%) of the effect of the standard, conventional treatment (preservation-of-effect criterion).4–7 The 95-95 method is a specific application of this approach for time-to-event outcomes.3,8,9 The name of the method derives from the use of (a) the 95% confidence interval from a meta-analysis of historical trials comparing the control treatment versus placebo to derive a conservatively low estimate for the intrinsic effect of the control treatment and (b) the 95% confidence interval from the active-control trial to derive a conservatively low estimate for the effect of the experimental treatment relative to the standard treatment. Although the 95% values are arbitrary, they have largely become a de facto standard.
With the 95-95 method, inference is performed on a log rate ratio (or log hazard ratio) scale.8,9 However, we have pointed out that this can lead to major problems of interpretation.10–12 First, and most crucially, the method may fail to formally demonstrate non-inferiority even when the experimental treatment is demonstrably highly effective. 11 Second, inference can be highly unstable, in that a slight re-distribution of endpoints between the two treatment groups can reverse the conclusion regarding non-inferiority. 11 Third, it paradoxically gives more precise inference in less adherent populations than in highly adherent populations. 10 Finally, the method is predicated on significance testing and lacks an interpretable estimand. 4
These problems can be avoided by using an estimand we have developed, the averted events ratio (AER), which essentially considers inference on a rate difference scale.10,12 With this estimand, the experimental and standard treatments are compared in terms of averted events rather than observed events. The AER has a clinically appealing preservation-of-effect interpretation: the proportion of events prevented by using the experimental treatment that would otherwise have been prevented by the standard treatment. 10
We have exemplified the AER in the field of HIV pre-exposure prophylaxis drugs taken by HIV-negative individuals to prevent the acquisition of infection. 13 Several active-control, non-inferiority trials have been conducted using the licenced, two-drug combination tenofovir disoproxil fumarate and emtricitabine (TDF-FTC) as the control arm; all were designed and analysed using the 95-95 method.14,15 TDF-FTC is highly efficacious and reduces the risk of HIV acquisition by ~95% if taken as prescribed. 16 These trials were consequently very large (>10,000 person-years follow-up (PYFU)) in order to generate the required number of endpoints. However, we have previously shown that much tighter inference on non-inferiority can be achieved if the analysis is performed via the AER, suggesting that smaller, more affordable, trials may be possible.11,17 In this article, we describe how to determine the sample size based on the AER and assess the reduction in sample size compared with the 95-95 method. We hope that this may stimulate the use of the AER, which, despite its broad applicability, 12 has, to date, been only slowly adopted.
Methods
Definition of the AER
We conceptualise that the active-control trial includes a hypothetical third arm that receives no intervention (called the counterfactual placebo group), in addition to the experimental and control arms. The (unobserved) incidence rate in the counterfactual placebo group can also be thought of as the background incidence in the study population. Denote the counterfactual placebo, control, and experimental arms by the subscripts P, C, and E, respectively. Let λ (subscripted by P, C, or E) represent the relevant incidence rate. The AER is defined as
Alternatively, the AER can be expressed in terms of the counterfactual effectiveness of the control treatment relative to placebo (
We note that this formulation involves the standard rate ratio of the observed event rates. However, the transformation converts this into the ratio of the averted event rates. 18
As will be shown in the next section, the required sample size depends on whether formulation (1) or (2) is used for the AER.
Sample size for the AER
For simplicity, we assume that we observe equal PYFU, denoted by F, in the control and experimental arms. Let XC and XE be random variables denoting the number of observed events, where
Let
We assume throughout that the experimental and control treatments are equally effective (
Similarly, when estimating the AER via the counterfactual treatment effectiveness (equation (2)), the required expected number of events in each of the active treatment arms is:
The ratio of the number of events required when using
Sample size for the 95-95 method
With the 95-95 method, effect preservation is measured by
The ratio of the number of events when using the 95-95 method rather than
Alternatively, when the comparator is
Determining PYFU
The formulae in the previous sections give the required number of events in each of the active arms. To obtain the required number of events in the counterfactual placebo arm (with equivalent follow-up to the active arms) equations (3), (4), and (6) are divided by
Asymptotic assumptions
The formulae above are based on Wald confidence intervals derived via Taylor series expansions and are therefore only valid asymptotically. For the AER based on the counterfactual placebo incidence (
Results
Sample size for the different estimands
Table 1 shows the required number of events per active arm to achieve 90% power to demonstrate non-inferiority based on the lower 5% confidence limit, according to the estimand, the desired non-inferiority margin (
Sample size to demonstrate non-inferiority.
Table shows values to achieve 90% power to demonstrate that the lower 5% confidence limit (CL) will exceed the specified non-inferiority margin.
To obtain the required PYFU per arm, values in Table should be multiplied by
Values should be scaled by 0.7219 for 80% power and 5% CL; by 1.2270 for 90% power and 2.5% CL; by 0.9165 for 80% power and 2.5% CL.
Derived from profile-likelihood confidence intervals.
To illustrate how to use Table 1, consider
The number of required counterfactual placebo endpoints is shown graphically in Figure 1, according to the value of

Sample size in terms of the required number of events in the counterfactual placebo group, by non-inferiority margin and effectiveness of the control treatment: (a) AER based on counterfactual placebo incidence, (b) AER based on control treatment effectiveness, (c) 95-95 method.

Sample size in terms of the required number of events in each active treatment group, by non-inferiority margin and effectiveness of the control treatment: (a) AER based on counterfactual placebo incidence (b) AER based on control treatment effectiveness (c) 95-95 method.
Comparison of samples sizes between the different approaches
The ratio of sample sizes required with the three different approaches is shown in Figure 3. Using

Comparison of sample sizes under different analytical approaches. (a) AER based on counterfactual placebo incidence versus 95-95 method (b) AER based on counterfactual placebo incidence versus AER based control treatment effectiveness (c) AER based on control treatment effectiveness versus 95-95 method.
If
Simulation
For all of the scenarios in the Table, we simulated 10,000 trials to quantify the empirical power against the nominal power of 90% (Appendix, Section 5 in the Supplementary Material). For the 95-95 method and
Example
DISCOVER was a randomised, double-blind, active-control trial comparing two PrEP regimens in gay men at high risk of HIV infection: TAF/FTC (experimental) versus TDF/FTC (control).
15
The primary endpoint was an incident HIV infection. Sample size was calculated using the 95-95 method with the following parameters, based on data from three historical placebo-controlled trials of TDF/FTC: lower bound of the 95% CI for the rate ratio = 2.64 (which translates to
It transpired that the DISCOVER study investigators seriously underestimated the level of treatment adherence and therefore the effectiveness of the PrEP regimens. In the actual trial, only 17 endpoints were observed in total (6 TAF/FTC, 11 TDF/FTC) against the anticipated 144 endpoints. Non-inferiority was formally achieved (rate ratio = 0.55, 95% CI 0.20–1.48), but this finding was highly fragile. 11 In a post hoc calculation, assuming both treatments were 95% effective, we estimated that the actual power of the trial, when analysed by the 95-95 method, was only 14%.
Discussion
Some scientists may be surprised that using the AER can have such a dramatic effect on sample sizes while remaining a valid estimand. This is due to a fundamental paradox with the rate ratio – as adherence to treatment increases, the number of observed events decreases and the confidence interval for this measure gets increasingly wider. 10 The AER avoids this paradox, and gives tighter inference the higher the adherence. It is important to note that this advantage does not arise automatically – it comes from making an assumption about one of the counterfactual parameters, and this assumption needs to be approximately correct for valid inference.
Another key finding from our analysis is that much smaller sample sizes are obtained if the AER is estimated via the counterfactual placebo incidence rate rather than the counterfactual treatment effectiveness, particularly when effectiveness is high. In HIV prevention research, the critical importance of estimating the background incidence rate is now widely recognised, and various ways of achieving this have been proposed. 20 In other contexts, depending on the epidemiology of the disease and the availability of surveillance systems, estimating the background event rate may not be feasible, and using the counterfactual treatment effectiveness may be the only possible approach. 12
Specifying the sample size parameters
We address a possible point of confusion when using the AER. In the analysis, estimation of the AER is performed via either the counterfactual treatment effectiveness or the counterfactual placebo incidence. However, at the design stage, estimation of the sample size requires specification of both parameters, regardless of the analytical approach. Both counterfactual parameters will usually be subject to considerable uncertainty, and to avoid an under-powered study, it is prudent to use conservatively low values. If using the counterfactual treatment effectiveness, one can adopt the same approach as with the 95-95 method. We note that the term ‘constancy’ assumption is somewhat misleading – the actual effectiveness in the trial must be equal to or greater than this value for valid inference about non-inferiority. Also, between planning the trial and its completion, further information may emerge, and there is no objection in principle to using updated estimates in the final analysis provided these are carefully justified. Our sample size formulae involve single, fixed values for
Supplemental Material
sj-docx-1-ctj-10.1177_17407745251377435 – Supplemental material for Sample size estimation for the averted events ratio
Supplemental material, sj-docx-1-ctj-10.1177_17407745251377435 for Sample size estimation for the averted events ratio by David T Dunn, Oliver T Stirrup and David V Glidden in Clinical Trials
Footnotes
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: DTD was supported by the UK Medical Research Council grant MC_UU_00004/03 and MC_UU_00004/07. DVG was supported by US National Institutes of Health grant R01AI143357.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
