Abstract
Multi-armed multi-stage designs evaluate experimental treatments using a control arm at interim analyses. Incorporating response-adaptive randomisation in these designs allows early stopping, faster treatment selection and more patients to be assigned to the more promising treatments. Existing frequentist multi-armed multi-stage designs demonstrate that the family-wise error rate is strongly controlled, but they may be too conservative and lack power when the experimental treatments are very different therapies rather than doses of the same drug. Moreover, the designs use a fixed allocation ratio. In this article, Fisher’s least significant difference method extended to group-sequential response-adaptive designs is investigated. It is shown mathematically that the information time continues after dropping inferior arms, and hence the error-spending approach can be used to control the family-wise error rate. Two optimal allocations were considered. One ensures efficient estimation of the treatment effects and the other maximises the power subject to a fixed total sample size. Operating characteristics of the group-sequential response-adaptive design for normal and censored survival outcomes based on simulation and redesigning the NeoSphere trial were compared with those of a fixed-sample design. Results show that the adaptive design attains efficient and ethical advantages, and that the family-wise error rate is well controlled.
Keywords
Introduction
Group sequential analysis, where data are sequentially evaluated over time and which allows stopping the trial early for success or futility and dropping of inferior treatments at interim analyses, has been well accepted from a regulatory review perspective in terms of saving time and resources and being more ethical. 1 Incorporating response-adaptive randomisation in a group sequential design can be more ethical by assigning more patients to the more promising treatments utilising the cumulative patients’ responses observed during the course of the trial. Some response-adaptive randomisation methods can target a pre-specified optimal allocation that is based on some optimality criterion. They are the so-called ‘optimal’ response-adaptive randomisation methods.
Methods combining group sequential tests with response-adaptive randomisation have been proposed. Jennison and Turnbull 2 derived theory to support that the combined approach still maintains the overall error rates for two-armed normal trials with known variances. The authors proved that the joint distribution of the test statistics has a standard form similar to that for a group-sequential non-adaptive design, but with the additional feature that the information level can depend on previous test statistics. In addition, a reduction in the inferior treatment number can be achieved at a cost of a slight increase in the expected total sample size. Morgan and Coad 3 compared several adaptive allocation rules in a group sequential setting for two-armed binary trials, including two urn-model type designs, the doubly-adaptive biased coin design (DBCD) and the sequential maximum likelihood estimation rule, which minimises the expected number of failures and is a special case of the DBCD. Among the designs that they investigated, the drop-the-loser rule 4 is found to be the most efficient method for achieving the competing objectives of reducing the expected number of failures and the expected total sample size. Zhu and Hu 5 studied the combined approach for two-armed clinical trials with normal and binary responses. By considering monitoring the adaptive design at a continuous information time, the authors proved that the sequence of test statistics converges to a Brownian motion in distribution and asymptotically satisfies the canonical joint distribution proposed by Jennison and Turnbull 6 for standard group sequential designs. Extension of the methods to two-armed censored survival trials was discussed in Liu and Coad. 7
For multi-armed clinical trials, in addition to the inflation of the type I error rate caused by sequential testing, one needs to ensure that the family-wise error rate induced by multiple treatment comparisons is preserved. One simple approach is the Bonferroni adjustment to control the nominal type I error rate for each pairwise test. The Bonferroni approach can be used in the group sequential setting and allows unequal allocation across treatment groups, and, if desired, different shapes of critical boundaries for different pairwise comparisons. However, the approach strongly controls the overall type I error rate, and it can be too conservative at the price of losing power. The multi-armed multi-stage (MAMS) design,8,9 which in essence is a pairwise comparison approach that simultaneously compares multiple arms with the control at each analysis, allows dropping of inferior treatments at interim analyses when considering efficacy and futility boundaries. It also strongly controls the family-wise error rate.10,11 Alternatively, one may choose to control the per-hypothesis error rate instead. Another approach to group sequential monitoring of multi-armed clinical trials is to use a global test. Jennison and Turnbull6,12 derived critical boundaries analogous to Pocock’s and the O’Brien and Fleming boundaries. These were derived based on multi-armed normal trials with equal variances and equal treatment allocation. For the unequal variances case, Proschan et al. 13 suggested obtaining the critical boundaries by simulation. Alternatively, by the significance level approach, the critical boundaries derived under the assumption of equal variances can be used to give an approximate test. 6 Incorporating response-adaptive sampling in such a group sequential global test for multi-armed trials was investigated by Liu and Coad. 14 Their simulation results showed that, using the significance level approach, the same critical boundaries can be used to give an approximate test and the use of the combined approach can preserve the advantages of both group sequential analysis and optimal response-adaptive randomisation.
Previous work in Liu and Coad7,14 has shown that the same critical boundaries as for a group-sequential non-adaptive design can be used approximately for both multi-armed and two-armed group-sequential response-adaptive clinical trials and yields good operating characteristics. The idea of this study is that work can be incorporated in an extension of the two-stage Fisher’s least significant difference (LSD) method, which consists of (i) a global test that sequentially monitors the homogeneity of multi-armed treatment effects and (ii) unadjusted two-armed pairwise tests if the global null hypothesis is rejected. Moreover, it allows dropping inferior arms at interim analyses and using response-adaptive randomisation to assign more patients to the more promising treatments. Fisher’s LSD method is more powerful than Scheffé’s method and is considered to be one of the most powerful multiple comparison methods. 15 It can serve as an alternative to monitoring pairwise comparisons using MAMS designs.
Proschan et al. 13 considered the group sequential Fisher’s LSD method in the cases of equal allocation and fixed unequal allocation determined prior to the commencement of the experiment. For unbalanced models, Hayter 16 discussed how to adjust testing to control the family-wise error rate for the LSD method for trials with more than four treatments. No adjustment is needed for three-treatment comparisons. It is possible to increase the power for the remaining pairwise comparisons at a later stage when any pairwise null hypothesis has been rejected and the inferior treatment has been dropped. Follmann et al. 17 proposed sequentially rejective procedures to alleviate the issue of power loss using critical boundaries for the subsequent tests based on the remaining number of treatments. Nevertheless, the authors pointed out that the sequentially rejective procedures do not necessarily strongly control the family-wise error rate. In practice, the choice between pairwise tests and a global test depends on the aim of a trial. Here, we will focus on Fisher’s LSD method extended to the group sequential design with optimal response-adaptive randomisation. We will describe how the overall type I error rate is controlled in the group sequential Fisher’s LSD method. More specifically, information time will be shown to be continuously increasing when the trial proceeds, even when inferior arms have been dropped at interim analyses. The combined approach in the Bayesian paradigm has been explored. With the advantage of flexibility in Bayesian designs, one can also consider adding experimental arms to a platform clinical trial. 18 However, unlike the frequentist approach, the Bayesian one does not focus on controlling the type I error rate, which is a common requirement by regulatory authorities. We focus on the frequentist approach here.
The form of the test for the analogue of Fisher’s LSD method generalised to group-sequential response-adaptive designs is described in Section 2. It is also shown that the information time continues after dropping inferior arms for immediate and censored survival outcomes. Two response-adaptive randomisation functions are presented in Section 3. They are both a function of the current and the optimal allocation proportions. In particular, optimal allocation proportions for multi-armed trials and their update after some inferior arms have been dropped are discussed. Operating characteristics of the extension of Fisher’s LSD method investigated via simulation of three-armed normal and censored survival trials and redesigning a four-armed binary trial are summarised in Section 4. Conclusions of the findings and some discussion of the trial design in practice are in Section 5. The definitions of the two randomisation functions and simulation results for some further scenarios are provided in the Appendix.
Form of test
An extension of Fisher’s LSD method 13 to group-sequential response-adaptive designs is investigated in this paper. First, a global test statistic is monitored sequentially to test for the homogeneity of treatment effects. If the global null hypothesis is rejected, unadjusted pairwise comparisons are conducted at this and subsequent looks if the trial proceeds. Inferior treatments can be dropped after pairwise comparisons. Note that each unadjusted pairwise comparison is essentially a two-sample test. One can conduct all pairwise comparisons. In a clinical trial, the primary interest is usually the comparison of each experimental treatment to the control. For simplicity, we illustrate the design for three-armed normal trials first, which can be applied to other types of responses and extended to more than three treatment arms, as described in later sections.
Critical boundaries
Assume that the treatment responses for patients are normally distributed with mean
In the pairwise comparisons, we wish to test the pairwise null hypotheses If If If If If If If If If
When the global null hypothesis
Immediate responses
Information time for immediate responses is a ratio of the current sample size to the maximum sample size. For trials that allow dropping of inferior treatments, Follmann, Proschan and Geller
17
showed that the information time at look
For optimal response-adaptive randomisation, the total sample size on treatment
For censored survival responses, the information time is proportional to the number of events. The information time can be approximated by the ratio of the expected number of events on the remaining arms at look
For simplicity, we consider the information time scale under
Optimal response-adaptive randomisation
Optimal allocation proportions for multi-armed trials
We consider two target optimal allocation proportions for multi-armed clinical trials.
Note that both of the above optimal allocation proportions
Two optimal response-adaptive randomisation procedures, the doubly-adaptive biased coin design (DBCD) 26 and the efficient randomised-adaptive design (ERADE), 27 aimed at targeting the pre-specified optimal allocation for multi-armed trials are considered. Both the DBCD and the ERADE are functions of the current and target treatment allocation proportions, and the treatment allocation for the two response-adaptive randomisation procedures depends on the cumulative data. The target optimal allocation is also a function of the unknown parameters of the treatment effect sizes. In practice, parameter estimates based on the current responses are used. A learning phase using methods such as permuted-block randomisation is usually required to obtain initial parameter estimates. Then the DBCD and the ERADE for multi-armed trials can be implemented. The applications of the two optimal response-adaptive randomisation procedures with group sequential analysis in two-armed and multi-armed trials have been elaborated upon in Zhu and Hu 5 and Liu and Coad.7,14 We briefly describe the two procedures in the Appendix.
Simulation studies
Three-armed normal trials
Consider comparing
The two group-sequential response-adaptive designs DBCD and ERADE are compared with the group-sequential non-adaptive complete randomisation (CR) design in terms of the error probabilities, the expected number of patients (ENP), the expected number of failures (ENF) and the average allocation proportions with standard deviations. The following two error probabilities are considered: the probability of rejecting at least one of the two pairwise null hypotheses, that is,
The true underlying effect of each treatment influences the decisions of treatment dropping and trial termination. Here, we consider three potential scenarios: (i) the control is inferior to the two experimental treatments, (ii) one experimental treatment is superior and one is inferior to the control and (iii) both experimental therapies are inferior to the control. The order of
As shown in Table 1, the simulated type I error rate
Simulated type I error rates for three-armed normal trials using complete randomisation and response-adaptive randomisation with dropping of inferior treatment,
.
Simulated type I error rates for three-armed normal trials using complete randomisation and response-adaptive randomisation with dropping of inferior treatment,
Table 2 considers the case where the control is inferior to the two experimental therapies. In this case, the ENP is about 264 for the group sequential CR design and 260 for the group-sequential response-adaptive designs, since the trials are allowed to stop early for superiority at interim analyses. Compared to CR, the response-adaptive designs can increase the power while using fewer patients. For example, the ERADE increases the power by nearly 4% while using five fewer patients on average compared to CR. Again, the target
Simulated powers for three-armed normal trials using complete randomisation and response-adaptive randomisation with dropping of inferior treatment,
Table 3 considers the case where one experimental treatment is superior and one is inferior to the control. In this case, trials may stop early with the claim that a treatment is superior or drop the inferior treatment at an interim analysis. Here, the ENP is about 280 for the group sequential designs. Compared to the previous table,
Simulated powers for three-armed normal trials using complete randomisation and response-adaptive randomisation with dropping of inferior treatment,
Table 4 considers the case where both experimental therapies are inferior to the control. The target
Simulated powers for three-armed normal trials using complete randomisation and response-adaptive randomisation with dropping of inferior treatment,
Consider testing the contrasts of the mean survival times between two experimental treatments and the control, where independent exponentially distributed survival times and uniformly distributed arrival and censoring times are assumed. This testing problem was investigated by Sverdlov, Tymofyeyev and Wong
23
using a fixed-sample design with the DBCD, and their simulation settings were based on the head and neck cancer experiment.
29
We wish to test the global null hypothesis
Under the null hypothesis, from Table 5, the simulated type I error rate
Simulated type I error rates for three-armed censored survival trials using complete randomisation and response-adaptive randomisation with dropping of inferior treatment,
.
Simulated type I error rates for three-armed censored survival trials using complete randomisation and response-adaptive randomisation with dropping of inferior treatment,
Under the alternative hypothesis, in Table 6, the target
Simulated powers for three-armed censored survival trials using complete randomisation and response-adaptive randomisation with dropping of inferior treatment,
For Table 7, where one experimental treatment is superior and one is inferior to the control, the target
Simulated powers for three-armed censored survival trials using complete randomisation and response-adaptive randomisation with dropping of inferior treatment,
For Table 8, where the experimental treatments are inferior to the control, the target
Simulated powers for three-armed censored survival trials using complete randomisation and response-adaptive randomisation with dropping of inferior treatment,
The NeoSphere trial
30
is redesigned using the adaptive designs with dropping of inferior treatments at interim analyses. The probabilities of success for the treatments are
Suppose that a higher value of the test statistic indicates that the corresponding experimental treatment has greater efficacy. For the adaptive designs, early termination is allowed for treatment efficacy or futility. The possible cases when trials continue to the next look are shown below.
If If If If If If If
In addition, fixed-sample designs are provided alongside for comparison, where the critical boundaries are 7.81 and 1.96 for the global and pairwise tests, respectively. For optimal response-adaptive randomisation, the
As can be seen in Table 9, under the null hypothesis, the type I error rate
Simulated type I error rates for redesigning NeoSphere trial using complete randomisation and response-adaptive randomisation with dropping of inferior treatments,
Under the alternative hypothesis, in Table 10, a similar power is obtained for all of the designs. However, use of the group-sequential response-adaptive designs can reduce the ENP and the ENF compared to the group sequential CR design. More specifically, about four fewer patients on average and five fewer failures are achieved for the adaptive designs with the
Simulated powers for redesigning neoSphere trial using complete randomisation and response-adaptive randomisation with dropping of inferior treatments,
Both optimal response-adaptive designs, the DBCD and the ERADE, can target the optimal allocations well. The use of the ERADE with the
The above simulation results show that the group-sequential response-adaptive designs with dropping of inferior treatments can well control the overall type I error rate. More precisely, the probability of falsely rejecting one or more pairwise null hypotheses when the parameters are all equal is less than or equal to 5%. In addition, the combined approach can achieve a higher or similar power while using fewer patients compared to the group sequential CR design. Furthermore, fewer failures are obtained for the adaptive designs. It is concluded that the combined approach can be more ethical in terms of reducing the total sample size and the total number of failures in a trial, since early stopping for efficacy and futility and dropping of inferior arms at interim looks are allowed. Both optimal response-adaptive designs can target the specified optimal allocations well, with the ERADE consistently having a lower variability in the allocation proportions than the DBCD. Comparing the two optimal allocations derived based on different optimality criteria, in general, the adaptive designs with the
Our previous study of the redesigning of the four-armed NeoSphere trial using the combined approach but without dropping of treatments in Liu and Coad
14
showed that the type I error rate was slightly inflated. This was because the critical boundaries under the assumption of equal variances for all treatments were used as an approximation. However, heterogeneity increases when the treatments have unequal variances, which could result in a higher probability of rejecting the global null hypothesis. Nevertheless, if the global null hypothesis is rejected, pairwise comparisons are conducted subsequently in the Fisher’s LSD method presented in this paper. The critical boundaries for the pairwise
Our simulation studies on normal, binary and censored survival responses show that the extension of Fisher’s LSD method to group-sequential response-adaptive designs can be applied to different responses while preserving good operating characteristics. In practice, considerations on whether or not to incorporate response-adaptive randomisation (RAR) in a group sequential clinical trial and to strongly or weakly control the family-wise error rate include, but are not limited to, (a) the number of treatments being compared and the heterogeneity of the treatment effects, (b) the length of time to obtain the intermediate/surrogate outcome measurements, and (c) the phase and aim of the clinical trial. RAR design is worth applying when more than two treatments are compared with heterogeneous treatment effects, and rapidly accumulating outcomes are available. Although adaptive designs can be applied across different phases of a clinical trial, from an early-phase dose escalation study to a late-phase confirmatory trial, the latter usually requires strong control of the overall type I error rate. Conversely, a relaxed type I error rate may be used for the former, since the main objective is usually to quickly identify promising treatments with tolerable toxicities and some sign of efficacy. Whether or not to control the family-wise error rate has been widely discussed. 31 The main argument for not adjusting for multiplicity is that, if the same comparisons were made in separate trials, control of the family-wise error rate would not be required. In the second stage of our proposed group-sequential response-adaptive Fisher’s LSD method, unadjusted two-sample pairwise comparison is considered for each hypothesis test comparing any experimental treatments remaining in the trial with the control.
Trial design is usually considered on a case-by-case basis and carefully planned before the start of the trial. The proposed combined approach possesses the efficient and ethical advantages of adaptive designs. For instance, the interim analyses can take place at any continuous information time. Inferior treatments can be dropped at an interim analysis while the information time continues. Adding new arms to an ongoing clinical trial has practical advantages, but is even more complicated and requires a rigorous plan. Such a development is beyond the scope of this paper. In brief, previous work has described how the family-wise type I error rate, and any-pair and all-pairs powers can be calculated when adding new arms to a platform trial using an extension of Dunnett’s method. 32 The authors suggested the following strategies. If the new research arm is deemed to be very different to the arms already in the trial, then no strong control of the family-wise type I error rate is required. One can just control the pairwise type I error rate and power each research arm separately. On the other hand, if the research arms are similar, such as different dosages of a drug, then one needs to adjust the error rate based on the total number of pairwise comparisons, the allocation ratio for the new or ongoing comparisons and the overlapping shared control information time of the new research arm versus the other ones, that is, the correlation between the test statistics for the pairwise comparisons.
Footnotes
Acknowledgements
This work was carried out whilst the first author was in receipt of scholarship from the Ministry of Education in Taiwan for her Ph.D. study at Queen Mary, University of London. The authors also wish to thank the Editor and two referees for their comments, which have led to an improved paper.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
