Adaptive enrichment designs allow subgroup selection of the patient population within a confirmatory trial via an interim analysis. However, this design complicates treatment effect estimation and uncertainty quantification. This paper introduces a -value inversion method using various sample space orderings to construct confidence intervals either unconditionally or conditional on the subgroup selected for a general class of two-stage two-group designs. In addition, the -value functions can be used to derive median-unbiased estimators and conditional moment estimators. Through simulation it is shown that the proposed intervals have close to nominal coverage, in contrast to naive confidence intervals based on the maximum likelihood estimator. Moreover, the median-unbiased estimators and conditional moment estimators have good performance with respect to median and mean bias, respectively. The method is illustrated by a re-analysis of a trial investigating treatment interactions with KRAS mutation type in patients with metastatic colorectal cancer.
With the widespread adoption of human genome sequencing techniques, there is an increasing need to identify patient heterogeneity in medical practice.1 As a result, precision medicine has become an appealing concept in clinical treatment development and has led to the realization that the traditional one-size-fits-all approach to treatment is insufficient.2 Identifying the most appropriate patient population group has begun to be part of the drug development process. In order to screen out the promising population of an experimental medication, the adaptive enrichment design was introduced in Phase II/III clinical trials.3 The enrichment design allows for various modifications based on the interim analysis, such as sample size re-estimation and subgroup selection. However, those adaptive modifications inevitably introduce bias and difficulties in parameter inference.
There is already a large body of research on enrichment designs, such as the approach proposed by Wang et al.4,5 which considers adaption in sample size and futility stopping in the first interim analysis. Rather than allowing only one subgroup to be selected in the first interim analysis, the design of Magnusson and Turnbull6 considers cases in which more than one subgroup treatment effect exceeds the futility threshold and proceeds to subsequent stages. They assume that the sampling rule following selection is fixed. In other words, for every possible selection result, the sample size in subsequent stages should be prespecified. Based on Magnusson and Turnbull’s approach, Lin et al.7 proposed a design involving sample size re-estimation for stage 2 that depends on the observed statistic values in stage 1 to ensure the conditional power is maintained at a desired level. Several recent papers have considered Bayesian decision-theoretic approaches to determining the sample size and decision boundaries in enrichment designs. Ondra et al.8 and Burnett et al.9 proposed Bayesian optimal rules for subgroup selection that maximize or improve expected utilities at the interim analysis. Rosenblum et al.10 use sparse linear programming to optimize the decision rule for subgroup selection and multiple testing procedures.
Developing an unbiased or consistent point estimator of the treatment effect remains a significant research area because of the impact of treatment or subgroup selection characteristics in adaptive enrichment. As the naive maximum likelihood estimate fails to account for the selection bias in the initial stage, it often yields an overestimation of the actual treatment effect. Robertson et al. provide a methodological review11 and practical guidance12 on point estimation for adaptive trial designs in general. Moreover, several researchers have proposed different unbiased or bias-reduced point estimators to address the issue specifically for adaptive enrichment designs. Kimani et al.13 proposed two estimators for a two-stage multi-arm enrichment design, where the most effective treatment in the first stage proceeds to the second stage, and any ineffective treatments are dropped at the first stage for futility. One of the estimators is an extension of the uniformly minimum variance conditionally unbiased estimator (UMVCUE) proposed by Cohen and Sackrowitz.14 However, Cohen and Sackrowitz14 assumed that the design would always continue to the second stage, whereas Kimani et al.’s13 approach allows for an early stop in the first stage. The other estimator proposed by Kimani et al.13 is the bias-adjusted estimator, which extends the estimator proposed by Stallard and Todd.15 Kunzmann et al.16 proposed a conditional moment estimator based on the work of Luo et al.17 The main idea is that the conditional expectation of the statistic of the target subgroup given interim analysis result and the observed statistic of the complimentary subgroup is a function of the true treatment effect and does not depend on complementary subsets. Magnusson and Turnbull6 evaluated the conditional and unconditional bias of the naive maximum likelihood estimate of the treatment effect and pointed out the absence of a perfectly unbiased estimator. Hence, they suggested utilizing the bootstrap method to reduce bias. Di Stefano et al.18 performed a simulation study to compare different methods for adjusting for selection bias in the context of adaptive enrichment designs with a time-to-event endpoint. They found that UMVCUE was most successful at removing bias, but at the cost of a high variance, resulting in the highest mean squared error (MSE), while shrinkage estimators gave the best trade-off between bias and variance to produce the lowest MSE.
The use of point estimates alone neglects the uncertainty of parameter inference, which is why many regulations mandate reporting confidence intervals for all treatment effects in clinical trials. Furthermore, the ICH E9 guideline19 requires that “Estimates of treatment effects should be accompanied by confidence intervals, whenever possible, and the way in which these will be calculated should be identified.” To address this, numerous studies have focussed on developing confidence interval construction for various types of adaptive designs. One such method is the confidence region approach proposed by Posch et al.20 for the flexible group sequential design, which utilizes the close testing procedure to adjust -values at each stage and combines them using various combination functions. Stallard and Todd15 adopt the straightforward -value inversion approach to construct confidence intervals; however, their design only allows the most effective treatment to be chosen at the interim analysis. Their -value function is based on the ordering method proposed by Armitage21 and Fairbanks and Madsen22 which prioritizes subgroups that stop at the earlier stage for efficacy over those that stop at the later stages.
For those designs that allow flexible selection of treatment arms, Magirr et al.23 proposed an approach that utilizes the closed testing principle and -value combination functions to construct a confidence region for all experimental treatment arms that strongly controls the family-wise error rate (FWER) at the desired level and is guaranteed to be concordant with the results of the hypothesis tests. Kimani et al.24 adopted this confidence region construction method to derive two-sided confidence intervals for time-to-event data with subgroup partition that is not prespecified but depends on the observed outcomes of patients. Nevertheless, Magirr et al.’s23 confidence intervals do not offer information for rejected hypotheses when just a subset of hypotheses are rejected, which potentially contributes to the conservativeness of the confidence region. Magnusson and Turnbull6 suggested using a double bootstrap technique for constructing confidence intervals. This approach commences with the basic maximum likelihood estimators (MLEs) and generates the initial set of bootstrap samples by simulating new datasets assuming the MLE values are correct. However, the simulation results in the paper indicated that the coverage probabilities of this method is often poor.
In this paper, we propose a -value inversion method for the subgroup confidence interval construction similar to the approach for multiple treatment arms trial proposed by Stallard and Todd.15 Stallard and Todd’s15 method first establishes a confidence region and then reduces it to a confidence interval for the chosen treatment through two approaches: (1) assuming that the treatment effects of the unselected subgroups are equal to their MLE; (2) assuming that the treatment effects of the unselected subgroups are equal to zero. Nonetheless, the naive MLE and the null assumption overlook the bias introduced by the selection rule. Thus, we embrace a concept similar to the conditional moment estimator proposed by Luo et al.17 to formulate the -value function for a subgroup by conditioning on the interim statistic for the other group(s). In enrichment designs, only subgroups with evidence of a positive treatment effect are kept following the interimanalysis. Therefore, our focus lies on estimating the treatment effects for the selected group(s). Nevertheless, there is also interest in estimating the outcomes of all enrolled subgroups, but requiring adjustment for multiplicity. Hence, we construct both conditional and unconditional confidence intervals to address these considerations. In the following sections, the term “conditional” means conditioning on the event that the certain subgroup is chosen in the first stage, while the term “unconditional” refers to the process of constructing confidence intervals for the target individual subgroup regardless of the selection results in the interim analysis. In addition, our approach incorporates enrichment designs that allow more than one subgroup to be selected at the first interim analysis and the trial to be terminated early due to futility and efficacy. By inverting the -value function derived for the confidence interval at the 0.5 significance level, we also construct the median-unbiased estimator for the enrichment design. A conditional moment estimator can also be constructed by noting that the -value function corresponds to the conditional survivor function of the test statistic.
We focus on the class of adaptive enrichment designs that comprise two stages and two subgroups, incorporating an experimental arm and a control arm. In Section 2, we initially introduce a general form of the -value function specific to the target subgroup, conditioning on its selection, as well as the -value function applicable to the individual target subgroup irrespective of the selection outcome. Point estimates and confidence intervals are established using these -value functions. The method is evaluated by simulation in Section 3. To illustrate the general method, we present a re-analysis of a clinical trial on patients with metastatic colorectal cancer in Section 4. The article concludes with a discussion.
General method of confidence interval construction and point estimate
Notation and setting
We assume a two-arm trial where at the first stage patients are recruited from a general patient population, but are screened to determine their membership in one of two disjoint groups . For instance, could represent biomarker positive and negative patients, respectively. More generally, a series of baseline covariates could be measured and group membership represents some known partition of the whole covariate space into two disjoint sets. The prevalence of the groups is assumed known a priori, such that if patients are planned to be recruited at the first stage then the number, , recruited from subgroup satisfies for and . Patients are randomized to either the experimental treatment or the control treatment and interest lies in determining which subgroup of the patient population benefits from the new treatment. Hence, at the end of the first stage there is an interim analysis which selects a subgroup, , from and determines whether to proceed to a second stage where recruitment is restricted to patients from the selected subgroup. Stopping for either futility or efficacy may also be possible.
Some designs may utilize prior knowledge of the treatment effect mechanism. For instance, if the treatment is assumed to be more promising for patients in group , then selection of could be precluded. Often, designs will specify a fixed stage 2 sample size assuming the trial proceeds. However, more generally, the stage 2 sample size can depend on the stage 1 data.
It is assumed that the treatment effects (experimental compared to control) for groups can be characterized by . For continuous response data, could represent the mean treatment difference in responses for patients in group , for binary data, could represent the log-odds ratio, and for survival data could represent the log-hazard ratio.
Let for denote the score statistic corresponding to . Asymptotically, where is the Fisher information (see for instance chapter 13.4 of Jennison and Turnbull25). and are assumed to be independent. In each case the alternative hypothesis to be tested is .
The selected subgroup, and the stage 2 Fisher information, are assumed to be functions of . Conditional on the decision, , the score statistics from the data observed in the second stage are then , where if group is not enriched at the second stage. In what follows,
denotes the density of for given .
Let represent the cumulative score statistic for group at the termination of the trial, and define the cumulative Fisher information for group at termination as .
We can also define to be the score statistic at stage corresponding to , where it is assumed that , and hence the score statistic is computed on data pooled across both groups. Asymptotically, and provided the homogeneity assumption holds, and, moreover, is asymptotically equivalent to , where . Similarly, is the cumulative score statistic for the whole population, with . The global statistic is also tested against a one-sided alternative, .
Framework for decisions
We assume that the adaptive enrichment design defines a mapping that maps from the sample space of stage 1 score statistics, , to a decision space consisting of where denotes the subgroup selection and is the stage 2 sample size. When , the trial terminates at stage 1, rejecting the null for and concluding futility for the unselected subgroup(s). It is assumed that are known in advance.
In general, the sample space can be partitioned into up to seven disjoint subspaces corresponding to the subspaces of to which they are mapped:
where some designs may preclude one or more of these types of decisions leading to an empty subspace. Note that this notation differs from the used in Magnusson and Turnbull,6 where corresponds to the set of patients in group of the patient population.
For designs where the stage 2 sample size is not set in advance, the stage 2 information may depend on precisely where within or the stage 1 statistics lie, meaning that and are functions of .
Magnusson–Turnbull design
In the general case, the enrichment design proposed by Magnusson and Turnbull6 involves an initial stage to establish the selected subgroup, , followed by a group sequential design of an arbitrary number of stages. The design also allows for the patient population to be partitioned into an arbitrary number of subpopulations. Here we focus on the two-stage design with two subgroups.
In the first stage, the treatment effect is individually evaluated in each of the subgroups, and we only continue randomization for selected populations (i.e. subgroups with evidence of a positive treatment effect). In other words, we only use observations from the remaining subgroups when performing conditional hypothesis tests.
The choice of is based on a boundary . Specifically, group can only be included in if . Two variant decision rules are considered:
A priori ordering: Without loss of generality, it is assumed that . In that case the trial terminates if and group 2 is only included in if for and . Hence the possible values of are and .
No prior ordering: involves all groups for which . Hence is also permissible.
If then the trial terminates. Otherwise, let and , then the trial stops for efficacy if and proceeds to stage 2, otherwise.
At the second stage, patients will only be recruited from the selected groups. However, the total information at stage 2, is assumed invariant to . The final decision at the end of stage 2 is based on the cumulative score statistic and corresponding cumulative Fisher information where efficacy for is declared if and the null hypothesis is accepted otherwise.
A choice can be made regarding the timing of the interim analysis, in relation to the maximum information level, , for instance corresponding to equal stagewise sample sizes. The values of and are chosen to ensure the Type I error under is equal to , with the stage 1 boundaries set via error spending functions. The value of is then chosen to satisfy a power constraint, where the power can either be to reject the null for or for any individual group. Full details of the calculations involved in setting the boundaries and sample size are given in Magnusson and Turnbull’s6 work.
Figure 1 illustrates the values of corresponding to , , in the cases where there is a priori ordering, (left panel) and where there is no prior ordering (right panel). In the former case, the prior ordering forces . The stage 2 information for group , only depends on which region in which lies. Specifically
and
Partition of the sample space of for Magnusson and Turnbull’s design in the presence of a priori ordering (a) and without prior ordering (b).
In Section S2 of the Supplemental Material we show that the design of Lin et al (2021)7 also adheres to the same general framework, with the complication that the stage 2 sample size depends on the specific value of rather than just the region in which lies.
-value functions
Whitehead26 describes an approach to constructing confidence intervals based on exploiting the relationship between hypothesis testing and confidence intervals. Assuming the parameter to be estimated is denoted by , the general -value function based on such relationship as where is some summary statistic which is a random variable depending on , and is the observed statistic. If the value of is monotonically increasing on and is defined by , then , which provides a method for obtaining a distinct value of for a given data set with a minimum coverage probability of .
To construct a -value function for a given parameter in the adaptive enrichment design, we consider the class of space orderings proposed by Emerson and Fleming.27 Specifically, using the score statistic and associated Fisher information from Section 2.1, we define a summary statistic and for some choice of . Here corresponds to the case where If , then is the standardized score statistic, whereas results (asymptotically) in the maximum likelihood estimate. Hence the -value function considers the probability that would exceed the observed value , considering the possibility of stopping at any stage, as a function of .
As noted in the introduction, interest may lie either in a confidence interval for the treatment effect in the selected subgroup or an individual component of , in which case the -value function should consider probabilities conditional on that selection having occurred. Here, we assume that the subgroup selection occurs at the interim analysis and so a conditional confidence interval would still be computed after stage 2 even if ultimately the null hypothesis for was not rejected. In this way, the -value functions do not depend on the decision boundaries of the design at the end of stage 2.
Alternatively, interest could instead lie in for a given group , regardless of whether group was selected. In this case, simultaneous confidence intervals for the treatment effects for group and would be required. In what follows, we consider the two main cases, conditional or not conditional on selection, separately.
Conditional on selection
Initially, suppose that the stage 1 data lead to a single group being chosen, such that for or . For the -value function conditional on selection, ordering is with respect to and we condition on the event . This is equivalent to an event where
The -value function therefore concerns the probability of the event
In the general case, is not necessarily a rectangular region of . As a consequence, for , depends on the whole vector . To avoid this issue, in addition to conditioning on , we also condition on , the realized value of where . Hence the probability of interest reduces to
where and . This is similar to the construction of the conditional moment estimator,17 which considers the expectation of the score statistic given the decision and the stage 1 statistic in the unselected group. Note that in the special, but common, case where is a rectangular region of , is invariant to the value of and hence the additional conditioning has no effect.
When calculating the -value function, the stage at which the trial terminates is not conditioned upon. As a consequence, the -value function can be written as , where the two terms correspond to the probability of exceeding the observed statistic by stopping at stage 1 for efficacy, and by proceeding to stage 2, respectively.
Contribution of stopping at stage 1
For the contribution of stopping at stage 1, the probability of interest is
We can first define
which represents the regions for which group is chosen but the trial stops at stage 1, and then , corresponding to the region where , and hence
Since and are at most a union of disjoint intervals of and , Equation (1) can be represented by a ratio of sums of differences of normal cdfs.
Contribution of proceeding to stage 2
For the contribution of proceeding to stage 2, let represent the value of the stage 2 statistic for group that produces the observed cumulative score statistic if and The probability of interest can then be expressed as
where
which represent the regions of for which is chosen but the trial proceeds to stage 2, conditional on the stage 1 statistic in the unselected group.
-value functions conditional on
For some designs, such as Lin et al’s design considered in Section S2 of Supplemental Material, the range of possible values of given and given a particular may not include . In those cases, rather than seeking a confidence interval for given , better-behaved confidence intervals will be obtained by conditioning only on Equally, if we could consider individual confidence intervals for or conditional on . Since in the above, we already condition on , the approach used in Section 2.4.1 can be easily adapted. It is only necessary to alter the definitions of , and to accommodate values that lead to either or . For instance, if we seek then we would take
whereas for we use
-value functions for the common treatment effect
When , the adaptive enrichment design will typically test . It is therefore natural in that situation to seek a confidence interval for . For this purpose, we assume , although the consequences of making this assumption when it is not correct will be explored in Section 3.
Emulating the previous notation, define as the set of values of that lead to , and as the set of values for which the trial stops at stage 1 with . Then in general we can write
where .
Similarly, let be the set of values of for which and the trial proceeds to stage 2, then
where and is the joint density of
Often and the distribution of will depend at most on , in which case (3) can be simplified to be in terms of integrals over the conditional density of given This is the case in the examples considered below.
Unconditional -value
Rather than considering a -value function conditional on a given selection we may seek to construct a -value function for , the treatment effect for group regardless of whether . In order to produce a probability that only depends on the of interest, we again condition on , the stage 1 score statistic for the other group. The ordering is with respect to , and as before the -value function can be decomposed into two parts corresponding to group stopping at stage 1, or group proceeding to stage 2.
Group could stop at stage 1 either for futility or for efficacy. Hence we first define
which gives the region of for which group will stop at stage 1, and then let
which gives the set of values of that lead to stopping at stage 1 with an unstandardized score statistic that exceeds . The probability of interest is then
In order for group to stop at stage 2, the stage 1 score statistic must lie within regions in which group is enriched. We therefore define
and hence
where is defined as in Section 2.4.1. As before, the overall -value function is then given by
The explicit forms of the -value functions for the Magnusson–Turnbull design used in Sections 3 and 4 are given in the Appendix. The form of the -value functions for Lin et al.7’s design is given in Section S2 of the Supplemental Material.
Confidence interval construction
Once the relevant -value function has been defined for a given case, confidence interval construction then involves inverting the function. Define for then Hence serves as a confidence region for . Provided is a monotonically increasing function in , there exists a unique such that and hence gives a one-sided confidence interval. Moreover, if desired, defined by and , for , gives a two-sided confidence interval. Assuming a monotonic function, the boundaries for the confidence intervals can be computed by using a numerical line search.
For an entirely arbitrary design and an arbitrary choice of ordering parameter , there is no guarantee that increases with . This can occur, for instance, if score ordering is chosen (), but the stage 1 and stage 2 sample sizes are very imbalanced, and is more prone to occur for the unconditional -values. In the context of group sequential designs, it is proven that the MLE ordering () is guaranteed to lead to proper intervals whereas counter-examples exist for other orderings (Emerson and Fleming, 1990). We did not encounter any issues with the Magnusson–Turnbull design using score ordering. In contrast, implementing Lin et al’s design where the second stage sample size can be substantially larger than stage 1 led to issues using score ordering (), but was well-behaved for MLE ordering. However, if the -value function is non-monotonic a (conservative) one-sided confidence interval could be constructed by setting the lower limit to be . In the simulations given below, we compare these confidence to naive confidence intervals based on the MLE and Fisher information which do not account for selection. Specifically, a naive one-sided confidence interval for has lower bound .
Simultaneous confidence intervals
Often, it will be desirable to ensure the individual confidence intervals for and collectively have coverage. Since the -value functions for and condition on the stage 1 score statistic for the other group, and will not be independent and will have a dependence that is difficult to characterize. We therefore propose to construct simultaneous confidence intervals for and by using a Bonferroni correction. Specifically, we take to obtain a simultaneous confidence interval for , where we would expect the resulting confidence region to be slightly conservative. Note that this approach can be used either with the individual unconditional -values defined in Section 2.5 or alternatively the individual -values conditional on considered in Section 2.4.4.
Point estimation
While the main focus of this paper is the construction of confidence intervals for the treatment effects, the construction of the -value function naturally also facilitates a median unbiased estimator for , and also gives a direct approach for calculating conditional moment estimators.
Specifically, a median unbiased estimator is given by letting satisfy where this approach can be applied to any of the -value functions defined above.
Moreover, the conditional moment estimator,16,17 satisfies In general, we can note that is the corresponding conditional survivor distribution function of and hence
In practice, the additional integration may need to be performed numerically, making the conditional moment estimate (CME) significantly more computationally intensive to calculate than the corresponding median unbiased estimate (MUE).
An additional disadvantage of the conditional moment estimator is that in some cases it will be undefined. This can occur if the statistic in group is sufficiently large that given group is chosen it is guaranteed that the procedure terminates for efficacy at stage 1. In that situation, has a lower bound at and . It is then possible to have leading to no solution for the CME equation.
In the simulations given below, we compare these point estimators with the naive maximum likelihood estimate given by where and are the cumulative score statistic and Fisher information for , respectively.
Numerical studies
In this section, we evaluate the performance of confidence intervals and point estimates for Magnusson and Turnbull’s design via simulation. We consider a similar setup to the trial described in Magnusson and Turnnull’s paper,6 but using two rather than three subgroups. Patients in each subgroup have an equal chance of receiving either the experimental treatment or the placebo treatment. We assume patient outcomes are normally distributed with a common variance , and where and denotes the expected response for subgroup under the control and experimental treatment, respectively. Thus the true treatment effect difference in subgroup is , and the efficient score and observed information are defined as
where for is the sample mean of the treatment or control arm. The prevalence of subgroup 1 and subgroup 2 is 0.6 and 0.4, respectively, and we randomly generate the sample size of each subgroup by drawing from a binomial distribution. The trial is designed on the basis of a clinically relevant treatment effect of 0.2 for each subgroup, meaning a maximum of 625 patients are needed for each stage to ensure 90% power to reject the null hypothesis for at least one subgroup assuming , and that , assuming a Type I error of 0.025. Utilizing the spending error functions delineated in the work of Magnusson and Turnbull,6 the standardized boundaries are computed as follows:
Without loss of generality, in the simulations, meaning . We test the one-sided hypotheses and at significance level. When evaluating the performance of confidence intervals that are conditional on a particular selected subgroup , we use rejection sampling to obtain 10,000 trials in which While for the unconditional intervals, we simply simulate 10,000 trials and retain them regardless of the selected subgroup(s). We consider seven scenarios with respect to the true treatment effects, where the first three correspond to the most anticipated outcomes - a null scenario where the target treatment causes no difference from the placebo treatment for the entire population, i.e. , a scenario where which means the treatment is only effective for subgroup 1 and a further scenario where represents that the experimental treatment is effective for the entire population and treatment effect is homogeneous among them, which is also the scenario for which the design aims to have 90% power. The remaining scenarios consider less anticipated situations such as a more extreme positive treatment effect or cases where the treatment is harmful for one of the subgroups.
Let and be the sample size of the experimental treatment arm and the control treatment arm. All s are estimated by the pooled sample variance
where and are the sample variances of the experimental treatment arm and the control treatment arm.
Confidence intervals
Here, we assess the coverage properties of the proposed confidence intervals. Histograms of the distribution lower bounds under different scenarios are shown in Figure 2 given that only subgroup 1 is chosen in the first stage. Each row displays lower bounds of confidence intervals obtained under scenarios , and respectively. The red vertical line in each single histogram is the 97.5% quantile. Figure 2 illustrates that around 2.5% of the lower bounds, derived from both the score and MLE ordering methods, exceed the true treatment effect. This observation suggests that the coverage probability of these confidence intervals closely matches the nominal level.
Distribution of the lower bound of a one-sided 97.5% confidence interval for given subgroup 1 has been selected, based on score-ordering, maximum likelihood estimator (MLE)-ordering and a naive Wald confidence interval in the Magnusson–Turnbull design. The red line is the 97.5% quantile. (a) , (b) and (c) .
Table 1 gives empirical coverage probabilities and powers of confidence intervals conditioned solely on subgroup 1 selected in the interim analysis, respectively. Here power refers to the probability that the confidence interval excludes 0 and hence coincides with Type I error for . These conditional confidence intervals are constructed using score and MLE sample space ordering approaches. In comparison to the naive confidence intervals, both the score and MLE confidence intervals exhibit coverage probabilities close to the nominal level. However, under scenarios and , the score ordering confidence intervals demonstrate higher power than MLE ordering confidence intervals. For the two-sided conditional confidence intervals, the results for the scenarios are also outlined in Table 1. Again, coverage probabilities demonstrate favourable performance across all scenarios. The naive confidence interval neglects any selection process during the interim analysis, leading to extremely poor coverage probability when at least one subgroup is likely to be chosen. However, its statistical power surpasses that of the conditional confidence intervals constructed through the score and MLE sample space ordering.
Empirical coverage and power (type I error for null case) of conditional one-sided 97.5% and two-sided 95% CIs when subgroup 1 is selected, under different scenarios for in the normal distribution case.
One-sided
Coverage probability
Power
Mean of lower bounds
Scenario
Score
MLE
Naive
Score
MLE
Naive
Score
MLE
Naive
0.9751
0.9752
0.9205
0.0249
0.0248
0.0795
0.9732
0.9754
0.9682
0.7373
0.7218
0.9218
0.0481
0.0300
0.0829
0.9744
0.9742
0.9683
0.7346
0.7207
0.9206
0.0482
0.0294
0.0827
0.9751
0.9740
0.9740
0.9976
1
1
0.2419
0.2978
0.3011
0.9755
0.9739
0.9739
0.9978
1
1
0.2426
0.2976
0.3011
0.9714
0.9714
0.6298
0.0008
0.0009
0.0009
0.9781
0.9741
0.9682
0.7293
0.7173
0.9171
0.0461
0.0294
0.0818
Two-sided
Coverage probability
Power
Mean of CI width
Scenario
Score
MLE
Naive
Score
MLE
Naive
Score
MLE
Naive
0.9459
0.9494
0.9151
0.0540
0.0506
0.0848
0.2960
0.2939
0.2505
0.9458
0.9447
0.9522
0.7365
0.7233
0.9145
0.4104
0.3353
0.2942
0.9409
0.9339
0.9441
0.7275
0.7204
0.9110
0.4085
0.3349
0.2948
0.9500
0.9495
0.9564
0.9976
1
1
0.4637
0.4056
0.4033
0.9482
0.9466
0.9545
0.9978
1
1
0.4634
0.4057
0.4031
0.9433
0.9433
0.6298
0.7407
0.7408
0.2767
0.3044
0.3044
0.2479
0.9499
0.9459
0.9524
0.7293
0.7173
0.9171
0.4090
0.3347
0.2940
CI: confidence interval; MLE: maximum likelihood estimator.
In the scenario where both subgroups are chosen at the first interim, Table 2 reveals that the coverage probability remains close to the nominal level. However, when the treatment effect varies across subgroups, the -value function, which assumes the treatment effects are equal, is misspecified. As a consequence, the coverage probability in relation to the population-averaged effects is somewhat below the nominal 97.5%, with this issue becoming more pronounced for the and cases.
Empirical coverage and power (type I error for null case) of conditional one-sided 97.5% confidence intervals when both subgroups are selected under different scenarios for in the normal distribution case.
Coverage probability
Power
Mean of lower bounds
Scenario
Score
Maximum likelihood estimator (MLE)
Naive
Score
MLE
Naive
Score
MLE
Naive
0.9772
0.9758
0.8238
0.0228
0.0242
0.1762
0.9681
0.9581
0.9100
0.3624
0.2683
0.8239
0.0505
0.9719
0.9734
0.9659
0.6673
0.5172
0.9774
0.0396
0.0219
0.0939
0.9732
0.9732
0.9732
0.9999
0.9999
1.0000
0.3394
0.3425
0.3443
0.9313
0.9281
0.9259
0.9768
0.9832
1.0000
0.1747
0.1893
0.2040
0.9665
0.9665
0.4624
0.002
0.0038
0.0162
0.9134
0.8927
0.6469
0.1777
0.182
0.5365
0.0121
The simultaneous confidence intervals for both subgroups are constructed using the Bonferroni approach outlined in Section 2.6 where the significance level assigned to each subgroup is . Table 3 compares the FWER, overall power, and average number of rejections in each trial of three scenarios. We notice that all of those FWERs are close to the nominal level we desired, but not all of them are smaller than 0.025. Theoretically, by adopting the classic Bonferroni correction, the FWER should be slightly conservative. However, under the null scenario, the coverage of the 97.5% confidence is slightly below the nominal level. This is likely to be due to the intervals not accounting for the random variation in the observed subgroup prevalence or that the pooled sample variance is used in the statistic rather than the true population value of . Moreover, in a single trial, score ordering simultaneous confidence intervals reject more hypotheses compared to MLE-ordered simultaneous confidence intervals, consistent with its superior overall power performance. Histograms for the distribution of the simultaneous confidence interval lower bounds are presented in Figure 3. The left histogram lists all lower bounds from subgroup 1 simultaneous confidence intervals and the right histogram lists those from subgroup 2. What can be seen in Figure 3 is that the 98.75% quantiles (vertical red line) are approximately located around the true treatment effect for every case which also implies that our individual -value functions ensure the individual confidence intervals have coverage probabilities close to the nominal level. As for the conditional simultaneous confidence intervals, Table 4 tells that the coverage probabilities are still close to the nominal level we desire under both score and MLE orderings, but the score ordering confidence intervals have greater power.
Distribution of the Bonferroni simultaneous confidence interval lower bounds with family-wise error rate (FWER) constrained at or below 0.025. The vertical red lines are the 98.75% quantiles. (a) , (b) and (c) .
Coverage and power of unconditional simultaneous confidence intervals for .
Coverage probability
Power
Average rejection
Mean of lower bounds
Subgroup 1
Subgroup 2
Scenario
Score
Maximum likelihood estimator (MLE)
Score
MLE
Score
MLE
Score
MLE
Score
MLE
0.9737
0.9737
0.0263
0.0263
0.0263
0.0264
0.9752
0.9734
0.7235
0.7150
0.7289
0.7194
0.0250
0.0142
0.9758
0.9773
0.7741
0.7549
0.9246
0.8627
0.0043
0.0019
0.9746
0.9746
0.9997
0.9999
1.9519
0.0264
0.2691
0.2694
0.2170
0.2170
0.9750
0.9757
0.9999
0.9999
1.0112
0.7193
0.2321
0.2717
0.9778
0.9777
0.0120
0.0120
0.0120
0.0120
0.9747
0.9741
0.7746
0.7747
0.7746
0.7747
0.0319
0.0195
Power refers to the proportion of intervals that exclude 0 for at least one component. Average rejection refers to the mean rejections of the null hypothesis in every trial.
Coverage and power of conditional simultaneous confidence intervals for when .
Coverage probability
Power
Mean of lower bounds
Subgroup 1
Subgroup 2
Scenario
Score
Maximum likelihood estimator (MLE)
Score
MLE
Score
MLE
Score
MLE
0.9742
0.9775
0.0258
0.0225
0.9703
0.9756
0.4355
0.2922
0.9734
0.9768
0.5091
0.3906
0.9758
0.9758
0.9980
0.9986
0.2629
0.2631
0.1954
0.1955
0.9753
0.9753
0.9865
0.9921
0.2552
0.2657
0.9726
0.9799
0.0129
0.0057
0.9705
0.9736
0.4579
0.2759
Power refers to proportion of intervals which exclude 0 for at least one component.
Point estimates
In this section, we present the outcomes of the MUE for the treatment effect, obtained by inversely applying the associated -value functions at the 0.5 significance level and also the CME obtained by treating the -value function as the conditional survival distribution of the test statistic. These estimates are compared to the naive maximum likelihood estimate (MLE). Tables 5 and 6 present the mean and median bias and root-mean square error (RMSE) of point estimators of the treatment effect when just subgroup 1 and when both groups are selected. In all circumstances, the median bias of the MUE is close to zero and is generally nearer than either the corresponding CMEs or naïve maximum likelihood estimates (MLEs). However, CMEs perform best in terms of mean bias. The naïve MLE usually overestimates the treatment effect as its bias is mostly positive.
Performance of point estimators for when subgroup 1 is selected.
Mean bias
Median bias
RMSE
Scenario
MUE
CME
MLE
MUE
CME
MLE
MUE
CME
MLE
0.0479
0.0040
0.0781
0.0782
0.0794
0.0237
0.0184
0.0295
0.0191
0.1094
0.1027
0.0884
0.0227
0.0178
0.0299
0.0177
0.1103
0.1039
0.0909
0.0002
0.1025
0.1063
0.0997
0.0016
0.0031
0.1022
0.1059
0.0997
0.1068
0.1060
0.0780
0.0775
0.1188
0.0107
0.0012
0.0295
0.0003
0.0189
0.0857
0.0861
0.0868
MUE and CME are computed based on -value functions using MLE ordering ( = 1). MUE: median unbiased estimate; CME: conditional moment estimate; MLE: naive maximum likelihood estimate; RMSE: root mean square error.
Performance of point estimators for when both subgroups are selected.
Mean bias
Median bias
RMSE
Scenario
MUE
CME
MLE
MUE
CME
MLE
MUE
CME
MLE
0.0479
0.0021
0.0656
0.0781
0.0782
0.0793
0.0205
0.0192
0.0585
0.0138
0.0183
0.0496
0.0857
0.0824
0.0910
0.0089
0.0051
0.0318
0.0003
0.0291
0.0829
0.0789
0.0697
0.0006
0.0001
0.0008
0.0010
0.0010
0.0011
0.0805
0.0812
0.0800
0.0551
0.0514
0.0590
0.0554
0.0524
0.0566
0.0908
0.0907
0.0894
0.0055
0.0021
0.1177
0.0048
0.0022
0.1160
0.0807
0.0785
0.1275
0.0480
0.0397
0.0953
0.0323
0.0189
0.0755
0.1115
0.1092
0.1241
MUE and CME are computed based on -value functions using MLE ordering (k = 1). Assumed true value of used when , when , when and when . MUE: median unbiased estimate; CME: conditional moment estimate; MLE: naive maximum likelihood estimate; RMSE: root mean square error.
However, while reducing bias, there is often a trade-off with the performance of RMSE. We notice that there are cases where both bias and RMSE are big, such as the conditional MLE under the null scenario. This is due to the significant bias present in this scenario (i.e. RMSE is the sum of the variance and squared bias). Additionally, when there is heterogeneity in treatment effects, the estimate of the treatment effect exhibits the highest bias and RMSE among all three estimators. This is also a consequence of the homogeneity assumption we employ in the -value function.
A similar set of simulations based upon the design of Lin et al (2021) is presented in Section S2.2 of the Supplemental Materials.
Illustrative Example: Panitumumab-FOLFIRI versus FOLFIRI alone in patients with metastatic colorectal cancer
As a realistic motivating example, we re-analyse data from a randomized phase 3 trial on the use of FOLFIRI with panitumumab compared to FOLFIRI alone as a second-line treatment of metastatic colorectal cancer.28,29
The original trial (20050181) was initially designed as a conventional parallel group design, unselected by KRAS mutation status. However, emerging KRAS data from other studies of panitumumab indicated that monotherapy clinical benefit was isolated to patients with wild-type KRAS. As a consequence, the protocol was amended after completion of enrolment to incorporate patient stratification by KRAS status. Were information on the impact of KRAS status and recent advancements in adaptive enrichment design methods known at the onset of the trial, it may have been more appropriate to design the trial as a two-stage adaptive enrichment design. Given there is an a priori assumption of higher efficacy among those with wild-type KRAS, it would make sense to only continue to the second stage if there is evidence of a survival benefit for wild-type KRAS patients using panitumumab + FOLFIRI, but select the whole population if there is also evidence of a promising treatment effect for those without wild-type mutations.
Following the assumptions made in the original protocol amendment, we assume that 55% of patients are of wild-type KRAS tumour time and that a hazard ratio of 0.67 with respect to the primary endpoint of progression-free survival represents a clinically relevant treatment difference. Using a two-stage Magnusson–Turnbull design, aiming for a 90% power to reject the null hypothesis for either wild-type KRAS tumours or the whole population, assuming the clinically relevant effect holds for the whole population, controlling Type I error at 1% and assuming equal information weights before and after the interim, leads to decision boundaries , where the maximum cumulative Fisher information requirement is 102.3.
Since patients are randomized equally to treatment groups, the Fisher information after events have been observed is approximately .30 Hence the interim analysis should occur after 205 events have occurred (from either KRAS tumour type). Using the potential follow-up time variable in the dataset to infer relative recruitment times, the interim analysis would occur 382 days after the first patient was randomized. At this point the respective log-rank -statistics are 2.73 for the wild type and -0.17 for the non-wild type. Hence, based on the Magnusson–Turnbull design, while there is strong evidence of a treatment effect in the wild-type subgroup it is just below the stopping threshold, . Hence the trial would proceed to a second stage where subsequent patients would only be enrolled if their tumour is of wild-type and the final analysis occurs after a further 205 events (among wild-type tumour patients recruited at either stage). Taking these patients from the remaining wild-type tumour patients in the original trial, the final analysis would occur at 664 days, where the final Z-statistic is 2.670. Hence the conclusion is that there is survival benefit of the combination treatment for wild type tumours (since ). The stagewise results of the trial are given in Table 7.
Results of the panitumumab-FOLFIRI trial run as a two-stage Magnusson and Turnbull design.
Stage 1
Wild type
13.04
94
22.80
2.73
Not wild type
-0.87
111
26.29
-0.17
Stage 2
Wild type
9.94
207
51.26
Total
Wild type
22.98
74.06
2.67
0.31
refers to the number of events in group at stage .
In order to implement the methods in Section 2, we make the approximation (which holds asymptotically) that the score (log-rank) statistic is such that can be used as an estimator for and also approximates the Cox partial likelihood MLE.
When a subgroup stops before stage 2, the corresponding -value function requires an estimate of the stagewise information which would have been observed had the trial proceeded (and conditional on the stage 1 result for the other subgroup). For normally distributed response data and assuming the stage two sample size were adhered to, it is reasonably uncontroversial to use the estimate of the pooled residual variance at stage 1 to estimate the counterfactual stage two information. For survival data the correct way to estimate the stage 2 information is less clear. Here, we take the convention that the rate of stage 2 information per observed event is the same as observed in stage 1. For instance, if the same number of events are to be observed in each stage, the stage 2 information should be equal to that of stage 1. Therefore, if subgroup is chosen on its own but stops for efficacy at stage 1, the potential stage 2 information for group (had the trial proceeded to stage 2) is taken as . Similarly, if both groups are chosen and the trial stops at stage 1, the stage 2 information for group is taken as .
The 95% confidence interval for the log-hazard ratio of wild type KRAS tumour patients, conditional on selection, using MLE ordering () is (0.526, 0.015), corresponding to a HR of between 0.59 and 0.99. The median unbiased estimator is 0.284, while the conditional moment estimator is 0.260. These contrast to the uncorrected Cox proportional hazards model MLE which is 0.309 (95% CI: 0.536, 0.082), which is itself very close to the approximate uncorrected estimate .
The simultaneous unconditional 95% confidence intervals for the log-hazard ratios for wild type and non-wild type tumours are and , respectively, which in this case, broadly agrees with the conclusions of the trial. To compute the unconditional -value function for non-wild type tumours the counterfactual stage 2 information is taken to be equal to that group’s stage 1 information.
In Section S1 of the Supplemental Materials, additional simulations investigate the Magnusson–Turnbull designs for a time-to-event endpoint, where it is shown that performance comparable to the normally distributed case can be achieved for the confidence intervals and point estimators.
Discussion
In this paper, we have shown that confidence intervals, both conditional and unconditional on subgroup selection, can be constructed for adaptive enrichment designs by use of -value function inversion. Unlike naive confidence intervals based on the MLE and Fisher information, our proposed intervals have close to nominal coverage in most cases. The exception is when but . In that case, it was assumed that in order to obtain a confidence interval for the overall population effect but the simulations indicated that when , the confidence interval for assuming homogeneity will have less than nominal coverage for the population effect , and it is a remaining open problem how to construct a confidence interval for in that situation. Nevertheless, when it is also possible to construct simultaneous confidence intervals for and which were shown to have close to nominal simultaneous coverage even when .
The constructed -value functions were also shown to provide both a MUE and CME. Through simulation, these estimators were shown to be effective at providing estimates with low median-bias, or mean-bias, for MUE and CME, respectively. Nevertheless, in many cases, the naive MLE may be comparable or superior on the basis of RMSE.
Throughout the paper, a trial with two stages and two subgroups is assumed. Assuming, the subgroup selection still occurs at the end of the first stage, the methods can be extended to either designs with more than two subgroups or trials with more than two stages, assuming subgroup selection occurs at the end of the first stage. If there are groups then the sample space of will be in dimensions and the possible decision space will involve partitioning into a higher number of regions. As in the two-stage case, -value functions can be computed by considering, , where is the probability of exceeding and stopping at stage , for . However, in general, the calculation of requires an increasing dimension of integration as increases.
A limitation of the proposed confidence intervals is that they rely on asymptotic approximations for the distribution of the score statistic. Generally, these approximations will perform well for continuous endpoints with moderate sample sizes. Potentially, the methods in this paper could also be extended to assume a non-central -statistic for the score statistic to allow robustness to even lower sample sizes. However, for time-to-event data, the expected Fisher information depends on the treatment effects whereas our method assumes the Fisher information is fixed. Potentially, a larger sample is therefore needed to achieve accuracy. However, in Section S1 of the Supplemental Materials the intervals are shown to perform well for a realistically sized trial powered to obtain 80% power to detect a hazard ratio of 0.74 (log HR = -0.3).
Ideally, confidence intervals in adaptive enrichment trials would have concordance with the trial conclusion. For trial designs involving a closed testing procedure and using a -value combination formulation to combine data across the two stages, it should be possible to adapt the approach of Magirr et al.23 to produce concordant simultaneous intervals, although it is unclear whether they would lead to informative intervals. Our method aims to be general and to provide informative intervals but has the limitation of having no guarantee of concordance. Potentially, the degree of disagreement could be reduced by judicious choice of the ordering parameter . For instance, in Magnusson and Turnbull’s design score-ordering () leads to disagreement due to the design thresholds, and being different. Choosing such that removes this form of disagreement, except that the value of would depend on the group under consideration.
Functions in R to obtain confidence intervals as well as CME and MUEs for both the Magnusson–Turnbull design and the Lin et al design are provided in the Supplemental Materials. Our method can be applied to nearly all adaptive enrichment designs that specify subgroups in advance. However, further research is needed to develop a more comprehensive approach capable of accommodating designs like the one proposed by Simon and Simon,31 where subgroups are not predetermined.
Supplemental Material
sj-pdf-1-smm-10.1177_09622802261423180 - Supplemental material for Confidence intervals and point estimates for treatment effects in adaptive enrichment designs
Supplemental material, sj-pdf-1-smm-10.1177_09622802261423180 for Confidence intervals and point estimates for treatment effects in adaptive enrichment designs by Jinyu Zhu, Andrew Titman and Fang Wan in Statistical Methods in Medical Research
Supplemental Material
sj-pdf-2-smm-10.1177_09622802261423180 - Supplemental material for Confidence intervals and point estimates for treatment effects in adaptive enrichment designs
Supplemental material, sj-pdf-2-smm-10.1177_09622802261423180 for Confidence intervals and point estimates for treatment effects in adaptive enrichment designs by Jinyu Zhu, Andrew Titman and Fang Wan in Statistical Methods in Medical Research
Footnotes
Acknowledgments
This publication is based on research using information obtained from , which is maintained by Project Data Sphere. Neither Project Data Sphere nor the owner(s) of any information from the web site have contributed to, approved or are in any way responsible for the contents of this publication.
ORCID iDs
Andrew Titman
Fang Wan
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
Appendix: Point estimates and confidence intervals for the Magnusson–Turnbull design
In this section the methods considered in Section 2.4 are applied directly to the Magnusson–Turnbull design introduced in Section 2.3. We specifically present the variant of the design where no prior ordering is assumed. However, the results are easily adapted to the case of a prior ordering.
Conditional -values for
Suppose firstly that , then and and hence
where there is no direct dependence on the specific value of . Similarly, and if then . Hence,
For situations where either or a confidence interval for conditional on can be constructed in a similar manner, except conditioning on then has an impact. Specifically, if , implying , then is as above and and stay the same. However, if , then the decision to stop at stage 1 is based on , and hence and provided , and is empty otherwise. Let then the resulting expressions for and will be the same as above, except we replace with .
Unconditional -values for
Using the same definition of as above, for the unconditional -value function for ,
Hence
Similarly, and so
which is identical to the numerator in the conditional case.
Analogous expressions for a confidence interval for , unconditionally or conditional on will have the same form except using and in place of and .
-value function for
For the -value function for conditional on , let and let represent the distribution of conditional on . Here
for and is 0 otherwise. Then
where .
Moreover, since the stage 2 information in the design is fixed given ,
KnottnerusJATugwellP. Heterogeneity and clinical reality. J Clin Epidemiol2013; 66: 809–811.
3.
SchmidliHBretzFRacineA, et al.Confirmatory seamless phase II/III clinical trials with hypotheses selection at interim: applications and practical considerations. Biometr J2006; 48: 635–643.
4.
WangSJO’NeillRTHungHJ. Approaches to evaluation of treatment effect in randomized clinical trials with genomic subset. Pharmaceut Stat2007; 6: 227–244.
5.
WangSJJames HungHO’NeillRT. Adaptive patient enrichment designs in therapeutic trials. Biometr J: J Math Methods Biosci2009; 51: 358–374.
6.
MagnussonBPTurnbullBW. Group sequential enrichment design incorporating subgroup selection. Stat Med2013; 32: 2695–2714.
7.
LinRYangZYuanY, et al.Sample size re-estimation in adaptive enrichment design. Contemp Clin Trials2021; 100: 106216.
8.
OndraTJobjörnssonSBeckmanRA, et al.Optimized adaptive enrichment designs. Stat Methods Med Res2019; 28: 2096–2111.
9.
BurnettTJennisonC. Adaptive enrichment trials: what are the benefits?Stat Med2021; 40: 690–711.
10.
RosenblumMFangEXLiuH. Optimal, two-stage, adaptive enrichment designs for randomized trials, using sparse linear programming. J R Stat Soc: Ser B (Stat Methodol)2020; 82: 749–772.
11.
RobertsonDSChoodari-OskooeiBDimairoM, et al.Point estimation for adaptive trial designs i: a methodological review. Stat Med2023; 42: 122–145.
12.
RobertsonDSChoodari-OskooeiBDimairoM, et al.Point estimation for adaptive trial designs ii: Practical considerations and guidance. Stat Med2023; 42: 2496–2520.
13.
KimaniPKToddSStallardN. Conditionally unbiased estimation in phase II/III clinical trials with early stopping for futility. Stat Med2013; 32: 2893–2910.
14.
CohenASackrowitzHB. Two stage conditionally unbiased estimators of the selected mean. Stat Probab Lett1989; 8: 273–278.
15.
StallardNToddS. Point estimates and confidence regions for sequential trials involving selection. J Stat Plann Inference2005; 135: 402–419.
16.
KunzmannKBennerLKieserM. Point estimation in adaptive enrichment designs. Stat Med2017; 36: 3935–3947.
17.
LuoXLiMShihWJ, et al.Estimation of treatment effect following a clinical trial with adaptive design. J Biopharmaceut Stat2012; 22: 700–718.
18.
Di StefanoFPannauxMCorregesA, et al.A comparison of estimation methods adjusting for selection bias in adaptive enrichment designs with time-to-event endpoints. Stat Med2022; 41: 1767–1779.
19.
EMA. ICH E9 statistical principles for clinical trials -scientific guideline European Medicines Agency. online, 1998.
20.
PoschMKoenigFBransonM, et al.Testing and estimation in flexible group sequential designs with adaptive treatment selection. Stat Med2005; 24: 3697–3714.
FairbanksKMadsenR. P values for tests using a repeated significance test design. Biometrika1982; 69: 69–74.
23.
MagirrDJakiTPoschM, et al.Simultaneous confidence intervals that are compatible with closed testing in adaptive designs. Biometrika2013; 100: 985–996.
24.
KimaniPKToddSRenfroLA, et al.Point and interval estimation in two-stage adaptive designs with time to event data and biomarker-driven subpopulation selection. Stat Med2020; 39: 2568–2586.
25.
JennisonCTurnbullBW. Group sequential methods with applications to clinical trials. Boca Raton: CRC Press, 1999.
26.
WhiteheadJ. The Design and Analysis of Sequential Clinical Trials. Chichester: John Wiley & Sons, 1997.
27.
EmersonSSFlemingTR. Parameter estimation following group sequential hypothesis testing. Biometrika1990; 77: 875–892.
28.
PeetersMPriceTCervantesA, et al.Randomized phase iii study of panitumumab with fluorouracil, leucovorin, and irinotecan (folfiri) compared with folfiri alone as second-line treatment in patients with metastatic colorectal cancer. J Clin Oncol2010; 28: 4706–4713.
29.
PeetersMPriceTCervantesA, et al.Final results from a randomized phase 3 study of folfiri +- panitumumab for second-line treatment of metastatic colorectal cancer. Ann Oncol2014; 25: 107–116.
30.
Di ScalaLGlimmE. Time-to-event analysis with treatment arm selection at interim. Stat Med2011; 30: 3067–3081.
31.
SimonNSimonR. Adaptive enrichment designs for clinical trials. Biostatistics2013; 14: 613–625.