Sage Journals: Discover world-class research

Abstract

In single-arm trials with a predefined subgroup based on baseline biomarkers, it is often assumed that a biomarker defined subgroup, the biomarker positive subgroup, has the same or higher response to treatment compared to its complement, the biomarker negative subgroup. The goal is to determine if the treatment is effective in each of the subgroups or in the biomarker positive subgroup only or not effective at all. We propose the isotonic stratified design for this problem. The design has a joint set of decision rules for biomarker positive and negative subjects and utilizes joint estimation of response probabilities using assumed monotonicity of response between the biomarker negative and positive subgroups. The new design reduces the sample size requirement when compared to running two Simon's designs in each biomarker positive and negative. For example, the new design requires 23%–35% fewer patients than running two Simon's designs for scenarios we considered. Alternatively, the new design allows evaluating the response probability in both biomarker negative and biomarker positive subgroups using only 40% more patients needed for running Simon's design in the biomarker positive subgroup only.

Keywords

Simon's design biomarker positive group biomarker-stratified trials isotonic estimation

1 Introduction

Recent advances in cancer biology and genomics are shifting the focus of anti-cancer treatment development to therapies that are tailored to a particular subset of patients. There is a need to develop efficient phase 2 biomarker-guided clinical trial designs that evaluate the efficacy of a new therapy in biomarker negative and positive patients.¹ A biomarker is predictive if the treatment effect in a biomarker-defined subgroup, often referred to as the biomarker positive subgroup, is higher than the treatment effect in its complement, the biomarker negative subgroup. A prognostic biomarker is associated with response to treatment but does not necessarily modify the treatment effect. A two-arm trial is needed to evaluate a predictive biomarker. In oncology, an initial evaluation of therapies is usually done in a single-arm trial. Differential response to treatment in biomarker positive and negative patients in a single-arm study indicates that the biomarker is prognostic or predictive or both. Existing data regarding response to standard therapy is often available. Availability of such historical control data allows for predictive biomarkers to be evaluated in a single-arm trial.

The goal of an initial evaluation of targeted therapy is to make one of these potential conclusions: to continue the development of the new therapy in both biomarker positive and biomarker negative patients, to continue the development in biomarker positive patients only, or to stop the development all together. This is achieved by testing the null hypotheses about response to treatment in the biomarker positive subgroup and in the biomarker negative subgroup. This requirement guarantees that a biomarker negative subgroup is not incorrectly recommended for future development of the therapy if the therapy is ineffective in biomarker negative patients. Phase 3 trials with biomarkers often have a different goal, the goal of showing that response to treatment is significantly different from a control in the biomarker positive subgroup or in the overall population of combined biomarker negative and positive subgroups.^2–4

Two-stage designs with futility stopping are widely used in single-arm phase 2 trials in oncology.⁵ Simon's design⁶ allows stopping an ineffective therapy early. Jones and Holmgren⁷ described the simplest approach to the problem of treatment evaluation in the biomarker positive and negative subgroups, running two sub-trials with Simon's design in each biomarker-defined subgroup. The type I error rate is controlled by dividing α between the two trials. Jones and Holmgren⁷ and Berry et al.⁸ suggested that a biomarker stratified design is more efficient than running two independent sub-trials. A stratified design is essentially two sub-trials, one in each biomarker subgroup, with joint decision rules at the interim and final analyses. During the interim analysis, a stratified design might allow for patient enrichment to restrict the second stage enrollment to the biomarker positive subgroup. Tournoux-Facon et al.⁹ considered a stratified design with efficacy stopping at the interim analysis. Freidlin and Korn¹⁰ proposed a two-stage design with fixed order testing where the biomarker positive subgroup is tested first, and, if significant, the biomarker negative subgroup is tested. Parashar et al.¹¹ formalized the type I and II error rate considerations in a stratified design and tabulated decision rules for an optimal stratified Simon's design to yield the smallest expected sample size under the null hypothesis.

Stratified designs have been used in practice to evaluate targeted therapies. Andre et al.¹² evaluated trastuzumab with HER-2 overexpression as a biomarker using two Simon's designs. The trial started with a Simon's design in the combined population. If the futility boundary is crossed in the unselected population at the interim analysis, a second Simon's design is initiated in the biomarker positive subgroup only. More recently, a single-arm phase 2 “CirCe T-DM1” trial¹³ evaluated the efficacy of trastuzumab-emtansine in HER2-negative metastatic breast cancer patients with HER2-positive circulating tumor cells (CTCs). The design from Tournoux-Facon et al.⁹ with CTC count as a potential predictive biomarker was used in this trial.

In this article, we explore the use of isotonic estimation¹⁴ in biomarker-guided designs. We use the assumption that the response probability in the biomarker positive subgroup is the same or higher compared to that of the biomarker negative subgroup. Isotonic estimates are the maximum likelihood estimates of response probabilities in the biomarker positive and negative subgroups under the isotonic restriction.¹⁵ If the observed data in the biomarker positive and negative subgroups contradict the isotonic assumption, the data from the two subgroups are pooled in order to estimate the response probabilities. We show that isotonic estimation combined with joint decision rules increases the efficiency of the stratified Simon's design. We tabulate designs that minimize the expected sample size under the null hypothesis, and designs that minimize the weighted average of the expected sample sizes under the null and alternative hypotheses. We compare our proposed approach with the design of Parashar et al.¹¹ and with running two independent sub-trials in biomarker positive and negative subgroups using Simon's design.

The rest of the article is organized as follows. In Section 2, we describe our proposed isotonic stratified design. In Sections 3 and 4, the new design is compared with existing approaches. Our findings are summarized in Section 5.

2 The isotonic stratified design

Define the true response rates for the biomarker negative, B − subgroup, and the biomarker positive, B + subgroup, as $p^{-}$ and $p^{+}$ . Furthermore, define the response probability under the null hypothesis as p₀, and the response probability in the alternative hypothesis as $p_{1}$ , where $p_{0} < p_{1}$ . As pointed out by Parashar et al.,¹¹ setting $p_{0}^{-} = p_{0}^{+} = p_{0}$ implies that the biomarker is not prognostic. The goal is to test two null hypotheses $H_{0}^{-} : p^{-} = p_{0}$ , and $H_{0}^{+} : p^{+} = p_{0}$ against the corresponding alternative hypotheses that $H_{1}^{-} : p^{-} = p_{1}$ , and $H_{1}^{+} : p^{+} = p_{1}$ .

We assume a monotonic non-decreasing relationship for true response rates in the two biomarker subgroups, $p^{-} \leq p^{+}$ , since biomarker positive patients are expected to have a better response to a targeted therapy compared to biomarker negative patients. Some existing designs^7,11 for testing $H_{0}^{-}$ and $H_{0}^{+}$ use the above assumption to set up joint decision rules at the interim and final analyses. If a sufficient response in B− is seen at the interim analysis, B + is concluded to have a sufficient response as well. In the final analysis, if a sufficient response in B − is seen, both hypotheses, $H_{0}^{-}$ and $H_{0}^{+}$ are rejected regardless of the outcome observed in B + . We propose to utilize the assumption $p^{-} \leq p^{+}$ further and use maximum likelihood estimates under an assumption of monotonicity¹⁵ for response probabilities in B − and B + in the testing procedure. In a two-stage design, after stage 1, we first estimate the response probabilities $p^{-}$ and $p^{+}$ by proportions, $\hat{p^{-}} = X_{1}^{-} / N_{1}^{-}$ and $\hat{p^{+}} = X_{1}^{+} / N_{1}^{+}$ . Here $N_{1}^{-}$ and $N_{1}^{+}$ are the number of subjects accrued in B − and B + and $X_{1}^{-}$ , $X_{1}^{+}$ are the number of responses observed in B − and B +, respectively. Let $\tilde{p^{-}}$ and $\tilde{p^{+}}$ be isotonic maximum likelihood estimates of $p^{-}$ and $p^{+}$ . If $\hat{p^{-}} \leq \hat{p^{+}}$ , we set $\tilde{p^{-}} = \hat{p^{-}}$ and $\tilde{p^{+}} = \hat{p^{+}}$ . If $\hat{p^{-}} > \hat{p^{+}}$ , violating the monotonicity assumption $p^{-} \leq p^{+}$ , we pool data from the B − and the B + subgroups and $\tilde{p^{-}} = \tilde{p^{+}} = (X_{1}^{-} + X_{1}^{+}) / (N_{1}^{-} + N_{1}^{+})$ . For ease of presentation, we compute a corresponding isotonically adjusted number of responses as $\tilde{X_{1}^{-}} = \tilde{p_{1}^{-}} N_{1}^{-}$ and $\tilde{X_{1}^{+}} = \tilde{p_{1}^{+}} N_{1}^{+}$ , where $\tilde{X_{1}^{-}}$ and $\tilde{X_{1}^{+}}$ do not have to be an integer. If both subgroups proceed to the second stage, $N_{2}^{-}$ and $N_{2}^{+}$ patients are enrolled, respectively. Similar calculations are carried out in the final analysis to yield $\tilde{X^{-}}$ and $\tilde{X^{+}}$ , the number of isotonically adjusted responses in the two subgroups. If the B − subgroup is stopped for futility at the interim analysis, it might be beneficial to enrich the B + subgroup and enroll more B + participants in stage 2 than originally planned. We enroll $N_{2 E}^{+}$ participants from B + in stage 2 if B − is stopped for futility. The final analysis of B + includes data on B + participants from both stages.

The isotonic stratified design is described as follows:

Step 1:
Enroll $N_{1}^{-}$ biomarker negative patients and $N_{1}^{+}$ biomarker positive patients in stage 1.
Step 2:
Let $k_{1}^{-}$ and $k_{1}^{+}$ be Stage 1 boundaries for B − group and B + group, $\tilde{X_{1}^{-}}$ and $\tilde{X_{1}^{+}}$ are the number of isotonically adjusted responses. (i)
If $\tilde{X_{1}^{-}} < k_{1}^{-}$ and $\tilde{X_{1}^{+}} < k_{1}^{+}$ , stop both subgroups for futility.
(ii)
If $\tilde{X_{1}^{-}} < k_{1}^{-}$ and $\tilde{X_{1}^{+}} \geq k_{1}^{+}$ , stop the B − subgroup for futility. Enroll $N_{2 E}^{+}$ biomarker positive patients in stage 2 patients.
(iii)
If $\tilde{X_{1}^{-}} \geq k_{1}^{-}$ , enroll $N_{2}^{-}$ biomarker negative and $N_{2}^{+}$ biomarker positive patients.

Step 3:
Final analysis. Let ${\tilde{X}}^{-}$ and ${\tilde{X}}^{+}$ be the number of isotonically adjusted responses in $N^{-} = N_{1}^{-} + N_{2}^{-}$ and $N^{+} = N_{1}^{+} + N_{2}^{+}$ patients following (iii). Let $X_{E}^{+}$ be the number of responses in $N_{E}^{+} = N_{1}^{+} + N_{2 E}^{+}$ patients following (ii). (i)
If $\tilde{X^{-}} \geq k^{-}$ conclude efficacy in both subgroups. Here set $k^{-}$ is the decision boundary for the $B^{-}$ subgroup.
(ii)
If ${\tilde{X}}^{-} < k^{-}$ and ${\tilde{X}}^{+} < k^{+}$ , declare the treatment not effective in any population. Here $k^{+}$ is the decision boundary for the B + subgroup.
(iii)
If ${\tilde{X}}^{-} < k^{-}$ and $X^{+} \geq k^{+}$ , or $X_{E}^{+} \geq k_{E}^{+}$ , conclude efficacy in the biomarker positive population only. Here $k_{E}^{+}$ is the decision boundary in the case when only the B + subgroup is enrolled in stage 2.
(iv)
Otherwise, the treatment is not effective for any population.

The flowchart of the isotonic stratified design is shown in Figure 1.

Figure 1.
Flowchart of the isotonic stratified design.

Consider three potential scenarios for the true response probabilities in B− and B+, $(p^{-}, p^{+})$ : treatment is ineffective in both subgroups $(p_{0}, p_{0})$ , treatment is only effective in the biomarker positive subgroup $(p_{0}, p_{1})$ , and treatment is effective in both populations $(p_{1}, p_{1})$ . Three potential decisions can be made at the end of the trial, reject $H_{0}^{-}$ and $H_{0}^{+}$ reject $H_{0}^{+}$ only, or reject none of the hypotheses. There are three routes to reject at least one of the null hypotheses (see Figure 1). Route 1 rejects both $H_{0}^{-}$ and $H_{0}^{+}$ when both subgroups pass the first stage. Route 2 rejects only $H_{0}^{+}$ when both subgroups pass the first stage. Route 3 rejects only $H_{0}^{+}$ when only the B + subgroup proceeds to the second stage. We denote the probability of rejecting Route j, j = 1, 2, and 3, as $R_{j} (p^{-}, p^{+})$ . We describe how to compute the probability of rejecting $H_{0}^{-}$ and/or $H_{0}^{+}$ through each of these routes in Supplemental Appendix 1.

Similar to the criteria proposed by Parashar et al.,¹¹ we aim to control the type I error in the scenario where there is no activity in B − and B + . To achieve that, we need to impose the following constraints:
$\begin{aligned} R_{1} (p_{0}, p_{0}) + R_{2} (p_{0}, p_{0}) + R_{3} (p_{0}, p_{0}) \leq α \end{aligned}$
(1)
The power requirements of the design are (1) to have at least $1 - β$ power to reject only $H_{0}^{+}$ when the treatment is only effective in the biomarker positive subgroup; and (2) to have at least $1 - β$ power to reject under both $H_{0}^{-}$ and $H_{0}^{+}$ when the treatment is effective in both subgroups. We impose the following constraints:
$\begin{aligned} min (R_{1} (p_{1}, p_{1}), R_{2} (p_{0}, p_{1}) + R_{3} (p_{0}, p_{1})) \geq 1 - β \end{aligned}$
(2)
Two additional errors can be made during the decision process. When the treatment is only effective in the biomarker-positive subgroup $(p_{0}, p_{1})$ , one can make the error of rejecting $H_{0}^{-}$ . When the treatment is effective in both populations $(p_{1}, p_{1})$ , one can make an error of failing to reject $H_{0}^{-}$ . Our design does not have a requirement to control these two errors. Controlling these two errors is possible but leads to a larger required sample size. A more detailed discussion of these two errors is presented in Section 4.

We assume that the response is immediate, or the trial is paused to wait for results of the interim analysis. All the probabilities can be calculated exactly. If the trial stops for futility in both subgroups, the sample size is $N_{1}^{-} + N_{1}^{+}$ . If the trial continues with only the biomarker-positive subgroup after the interim analysis, the sample size is $N_{1}^{-} + N_{E}^{+}$ , where $N_{E}^{+} = N_{1}^{+} + N_{2 E}^{+}$ . If the trial continues with both subgroups, the total sample size enrolled is $N^{-} + N^{+}$ , where $N^{-} = N_{1}^{-} + N_{2}^{-}$ and $N^{+} = N_{1}^{+} + N_{2}^{+}$ . The calculation of the expected sample size and probability of early termination for futility can be found in Supplemental Appendix 1. To describe the proposed design one needs to specify the following decision rules and sample sizes as $(k_{1}^{-} k_{1}^{+})$ / $(N_{1}^{-} N_{1}^{+})$ → $(k_{E}^{+} / N_{E}^{+})$ | $(k_{1}^{-} k_{1}^{+})$ / $(N_{1}^{-} N_{1}^{+})$ . In this formula, there are five decision boundaries to determine: $k_{1}^{-} and k_{1}^{+}$ are the decision boundaries for B − and B + at the end of the first stage, $k_{E}^{+}$ is the final decision boundary associated with $N_{E}^{+}$ patients for B + subgroup when the trial proceeds with B + only in stage 2, and $k^{-}$ and $k^{+}$ are the final boundaries for B − and B + applied to the total number of responses in B − and B + correspondingly.

Denote the expected sample size when the true response probabilities are $(p^{-}, p^{+})$ as $ESS (p^{-}, p^{+})$ . We propose two main optimality criteria: the first criterion is to minimize the expected sample size under the null hypothesis, denoted as $E N_{0} = ESS (p_{0}, p_{0})$ . This is the same criterion that is used in Simon's optimal design.⁶ The second type of the optimal design is a design minimizing the weighted average of expected sample sizes under three scenarios we are considering
$E N = 0.5 ESS (p_{0}, p_{0}) + 0.25 ESS (p_{0}, p_{1}) + 0.25 ESS (p_{1}, p_{1})$
Here the null scenario gets half of the weight, and the other half is divided between the two alternative scenarios. The component $ESS (p_{0}, p_{1})$ is important because we want to control the sample size in B + when B − is dropped due to low response. The component $ESS (p_{1}, p_{1})$ approximates the total sample size when both B − and B + continue in the second stage. We also considered designs that minimize the maximum sample size $N_{max} = max (N_{1}^{-} + N_{E}^{+}, N^{-} + N^{+})$ and the result is in the Supplemental Material. The designs were tabulated for one-sided $α = 0.05$ and $1 - β = 0.8$ and 0.9. Since isotonic estimation is used to obtain $\tilde{X_{1}^{-}}$ , $\tilde{X_{1}^{+}}$ , $\tilde{X^{-}}$ , and $\tilde{X^{+}}$ , the best decision boundaries $k_{1}^{-}$ , $k_{1}^{+}$ , $k^{-}$ , and $k^{+}$ might not be integers. Here, we only considered integer values for decision boundaries for ease of presentation and to reduce the search time.
3 Comparison of the isotonic stratified design with other stratified designs

We compared the isotonic stratified design with two Simon's designs, in B − and B +, with joint decision rules, a stratified Simon's design from Parashar et al.,¹¹ the PBSWM − design or PBSWM −, and a similar design where testing starts with B+, PBSWM + . The two Simon's designs have a set of joint decision rules at the final analysis: if $H_{0}^{-}$ is rejected, both the biomarker negative and positive subgroups are recommended for future trials without testing $H_{0}^{+}$ .^7,11 Both PBSWM − and PBSWM + have a set of joint decision rules at the interim and the final analyses. PBSWM − tests efficacy in B − first. If $H_{0}^{-}$ is rejected, $H_{0}^{+}$ is automatically rejected and both the biomarker negative and positive subgroups are recommended. In contrast the PBSWM + design tests B + first and B − is only considered if efficacy is shown in B +.

The difference between PBSWM − and PBSWM + lies in the decision when the observed data is indicating efficacy in B − but no efficacy in B + . In this case, PBSWM + does not reject any null hypotheses while PBSWM − rejects both hypotheses. The isotonic stratified design pools the data observed in B − and B + and the decision is made based on the pooled estimate.

Parashar et al.¹¹ considered early stopping for efficacy in the PBSWM − design. We did not include early efficacy stopping in our simulation to make all designs comparable and because it is often desirable to get more experience with a new therapy in a phase 2 trial.

The two Simon's optimal designs have the type I error rate of $α / 2$ each due to Bonferroni adjustment for testing two biomarker subgroups. The PBSWM designs and the isotonic stratified design were obtained using an optimization R program¹⁶ via a grid search over the 10-dimensional parameter space, $k_{1}^{-}$ , $k_{1}^{+}$ , $N_{1}^{-}$ , $N_{1}^{+}$ , $k_{E}^{+}$ , $N_{E}^{+}$ , $k^{-}$ , $k^{+}$ , $N^{-}$ , and $N^{+}$ , with constraints (1) and (2). The R code to generate designs in Supplemental Tables 1 to 3 is available on GitHub (https://github.com/IsotonicStratified/Isotonic-Stratified-Design).

Table 1 shows the designs that optimize EN₀. The isotonic stratified design minimizing EN₀ significantly reduces the expected number of subjects under the null hypothesis, EN₀, compared to other designs while maintaining the same or lower total sample size. Two independent Simon's optimal designs do not perform well compared to stratified designs because interim decision rules do not reflect the isotonic assumption. The PBSWM + design has the highest probability of stopping the trial early for futility. However, the isotonic stratified design yields about the same EN₀ as PBSWM + because the isotonic stratified design requires a much smaller total sample size N_max. The isotonic stratified design has the best EN (Table 1) and performs better overall compared to both PBSWM − and PBSWM + .

Table 1.
Comparison of designs minimizing EN₀ with two parallel optimal Simon's designs (Simon), the PBSWM + and PBSWM − designs, and the isotonic stratified design (isotonic) with $α = 0.05$ and $1 - β = 0.8$ .

$p_{0}$ $p_{1}$ Design PET EN₀ EN N _max $(k_{1}^{-} k_{1}^{+})$ / $(N_{1}^{-} N_{1}^{+})$ → $(k_{E}^{+} / N_{E}^{+})$ | $(k_{1}^{-} k_{1}^{+})$ / $(N_{1}^{-} N_{1}^{+})$

0.15 0.30 Simon 0.55 79.7 99.7 142 (5 5)/(23 23) → (−/−)|(17 17)/(71 71)

PBSWM+ 0.84 59.2 90.6 137 (1 8)/(10 36) → (19/79)|(15 15)/(68 69)

PBSWM− 0.61 74.0 92.3 129 (6 5)/(26 23) → (17/72)|(20 11)/(85 44)

Isotonic 0.73 58.4 79.2 110 (2 6)/(14 28) → (16/70)|(12 14)/(50 60)

0.20 0.40 Simon 0.56 55.2 71.5 110 (4 4)/(13 13) → (−/−)|(17 17)/(55 55)

PBSWM+ 0.77 42.9 60.5 87 (2 6)/(11 21) → (17/52)|(13 13)/(43 44)

PBSWM− 0.58 50.1 62.3 87 (6 4)/(19 14) → (15/47)|(19 9)/(60 27)

Isotonic 0.77 39.1 53.4 73 (2 6)/(10 20) → (17/55)|(11 12)/(34 39)

0.25 0.45 Simon 0.59 58.0 74.2 108 (6 6)/(17 17) → (−/−)|(20 20)/(54 54)

PBSWM+ 0.82 46.0 67.7 100 (2 9)/(10 26) → (21/55)|(16 19)/(45 55)

PBSWM− 0.64 54.7 67.7 88 (8 6)/(22 17) → (21/57)|(21 12)/(56 32)

Isotonic 0.73 43.8 60.6 83 (2 7)/(10 20) → (22/62)|(16 15)/(42 41)

0.30 0.50 Simon 0.60 63.9 84.8 130 (7 7)/(17 17) → (−/−)|(27 27)/(65 65)

PBSWM+ 0.81 48.0 72.3 111 (3 10)/(11 25) → (26/62)|(22 22)/(55 56)

PBSWM− 0.61 58.3 70.8 93 (10 7)/(24 18) → (23/54)|(26 14)/(61 32)

Isotonic 0.74 47.4 63.5 86 (5 8)/(15 20) → (26/63)|(18 18)/(42 44)

$p_{0}$	$p_{1}$	Design	PET	EN₀	EN	N _max	$(k_{1}^{-} k_{1}^{+})$ / $(N_{1}^{-} N_{1}^{+})$ → $(k_{E}^{+} / N_{E}^{+})$ \| $(k_{1}^{-} k_{1}^{+})$ / $(N_{1}^{-} N_{1}^{+})$
0.15	0.30	Simon	0.55	79.7	99.7	142	(5 5)/(23 23) → (−/−)\|(17 17)/(71 71)
		PBSWM+	0.84	59.2	90.6	137	(1 8)/(10 36) → (19/79)\|(15 15)/(68 69)
		PBSWM−	0.61	74.0	92.3	129	(6 5)/(26 23) → (17/72)\|(20 11)/(85 44)
		Isotonic	0.73	58.4	79.2	110	(2 6)/(14 28) → (16/70)\|(12 14)/(50 60)
0.20	0.40	Simon	0.56	55.2	71.5	110	(4 4)/(13 13) → (−/−)\|(17 17)/(55 55)
		PBSWM+	0.77	42.9	60.5	87	(2 6)/(11 21) → (17/52)\|(13 13)/(43 44)
		PBSWM−	0.58	50.1	62.3	87	(6 4)/(19 14) → (15/47)\|(19 9)/(60 27)
		Isotonic	0.77	39.1	53.4	73	(2 6)/(10 20) → (17/55)\|(11 12)/(34 39)
0.25	0.45	Simon	0.59	58.0	74.2	108	(6 6)/(17 17) → (−/−)\|(20 20)/(54 54)
		PBSWM+	0.82	46.0	67.7	100	(2 9)/(10 26) → (21/55)\|(16 19)/(45 55)
		PBSWM−	0.64	54.7	67.7	88	(8 6)/(22 17) → (21/57)\|(21 12)/(56 32)
		Isotonic	0.73	43.8	60.6	83	(2 7)/(10 20) → (22/62)\|(16 15)/(42 41)
0.30	0.50	Simon	0.60	63.9	84.8	130	(7 7)/(17 17) → (−/−)\|(27 27)/(65 65)
		PBSWM+	0.81	48.0	72.3	111	(3 10)/(11 25) → (26/62)\|(22 22)/(55 56)
		PBSWM−	0.61	58.3	70.8	93	(10 7)/(24 18) → (23/54)\|(26 14)/(61 32)
		Isotonic	0.74	47.4	63.5	86	(5 8)/(15 20) → (26/63)\|(18 18)/(42 44)

$p_{0}$ is the true response rate under the null hypothesis and $p_{1}$ is the true response rate under the alternative hypothesis. PET is the probability of early termination under the null hypothesis $(p^{-}, p^{+}) = (p_{0}, p_{0})$ .

A disadvantage of PBSWM − is that it oversamples B −. Oversampling B − by the PBSWM − is expected as the design heavily relies on testing the B − subgroup. Similarly, B + is oversampled for PBSWM +. Though the sequence of decision rules in the isotonic stratified design is like in PBSWM −, the isotonic adjustment leads to both data from B − and B + influencing the decision and to the probability of rejecting the null hypothesis is calculated similarly to PBSWM + (see Supplemental Material). To summarize, the new isotonic stratified design minimizing EN₀ yields the smallest maximum sample size, smallest EN₀, and, unlike PBSWM −, oversamples B +.

Table 2 tabulates the designs that minimize EN. For the two Simon's designs, we selected the Simon's design that minimizes the weighted average of EN₀ and the maximum total sample size with weight 0.5.¹⁷ As before the one-sided type I error rate for each of the Simon's designs is $α / 2$ . The isotonic stratified design has the smallest EN of all four pairs of response probabilities. The total sample size requirement N_max is also the smallest in three out of four cases tabulated. Looking at stage 2 sample sizes when both subgroups are being enrolled in the second stage, only 33% of the patients are from B + in the PBSWM − design on average. The proportion of B + is 55% for the PBSWM + design and 45% on average for the isotonic stratified design. We optimized the designs without setting any restrictions on the proportion of B + patients. Users can specify the desired prevalence of B − to generate designs if sampling a high proportion of the B − subgroup is not desirable, however, this will lead to an increased sample size if this proportion is different from the optimal proportion.

Table 2.

Comparison of the designs minimizing EN with two parallel Simon's designs (Simon), the PBSWM + and PBSWM − design, and the isotonic stratified design (isotonic) with $α = 0.05$ and $1 - β = 0.8$ .

$p_{0}$	$p_{1}$	Design	PET	EN₀	EN	$N_{max}$	$(k_{1}^{-} k_{1}^{+})$ / $(N_{1}^{-} N_{1}^{+})$ → $(k_{E}^{+} / N_{E}^{+})$ \| $(k_{1}^{-} k_{1}^{+})$ / $(N_{1}^{-} N_{1}^{+})$
0.15	0.30	Simon	0.46	79.8	97.1	118	(6 6)/(31 31) → (−/−)\|(15 15)/(59 59)
		PBSWM+	0.76	67.8	86.9	114	(2 8)/(16 40) → (16/65)\|(12 14)/(53 61)
		PBSWM−	0.45	76.8	89.0	111	(6 4)/(27 22) → (16/66)\|(18 10)/(74 37)
		Isotonic	0.43	63.8	77.3	100	(3 3)/(17 17) → (15/62)\|(12 12)/(51 49)
0.20	0.40	Simon	0.71	52.0	65.6	84	(7 7)/(23 23) → (−/−)\|(14 14)/(42 42)
		PBSWM+	0.75	48.4	59.2	77	(4 7)/(16 26) → (15/45)\|(11 12)/(36 41)
		PBSWM−	0.49	51.5	60.6	75	(5 4)/(17 15) → (16/49)\|(16 9)/(49 26)
		Isotonic	0.56	39.5	51.6	74	(3 3)/(11 11) → (15/46)\|(13 10)/(43 31)
0.25	0.45	Simon	0.32	61.0	72.6	86	(6 6)/(21 21) → (−/−)\|(17 17)/(43 43)
		PBSWM+	0.77	52.4	64.6	84	(4 10)/(15 31) → (18/47)\|(14 16)/(39 45)
		PBSWM−	0.46	56.2	65.6	79	(7 4)/(20 13) → (20/54)\|(21 10)/(55 24)
		Isotonic	0.45	44.4	57.0	80	(3 3)/(10 10) → (17/45)\|(15 15)/(40 40)
0.30	0.50	Simon	0.31	65.4	76.8	100	(7 7)/(19 19) → (−/−)\|(22 22)/(50 50)
		PBSWM+	0.77	53.2	68.0	91	(5 11)/(16 29) → (23/53)\|(18 19)/(44 47)
		PBSWM−	0.51	59.0	70.0	84	(8 6)/(20 16) → (26/62)\|(25 12)/(58 26)
		Isotonic	0.51	48.8	60.7	80	(5 5)/(14 14) → (22/52)\|(18 16)/(43 37)

4 Comparison of error rates and power among biomarker-stratified designs

Table 3 compares the power and the error rates of the designs from Table 1. The rates were computed based on the binomial distribution (see Supplemental Appendix 1).

Table 3.
Comparison of the power and the error rates for designs minimizing EN₀ (Table 1) including two parallel Simon's designs (Simon), the PBSWM+, the PBSWM− designs, and the isotonic stratified design (isotonic) with $α = 0.05$ and $1 - β = 0.80$ .

Decision

$(p_{0}, p_{1})$ $(p_{1}, p_{1})$

Correctly Wrongly Reject any Correctly reject Fail to

$p_{0}$ $p_{1}$ Design N reject $H_{0}^{+}$ only reject $H_{0}^{-}$ hypothesis $H_{0}^{-}$ and $H_{0}^{+}$ reject $H_{0}^{-}$

0.15 0.30 Simon 142 0.79 0.03 0.96 0.80 0.16

PBSWM+ 137 0.80 0.06 0.87 0.80 0.07

PBSWM− 129 0.80 0.02 0.96 0.80 0.16

Isotonic 110 0.80 0.05 0.95 0.80 0.15

0.20 0.40 Simon 110 0.78 0.03 0.96 0.80 0.16

PBSWM+ 87 0.80 0.06 0.88 0.80 0.08

PBSWM− 87 0.80 0.02 0.96 0.80 0.16

Isotonic 73 0.80 0.05 0.95 0.80 0.15

0.25 0.45 Simon 108 0.78 0.03 0.96 0.80 0.16

PBSWM+ 100 0.80 0.06 0.88 0.80 0.08

PBSWM− 88 0.80 0.02 0.96 0.80 0.16

Isotonic 83 0.80 0.04 0.95 0.81 0.14

0.30 0.50 Simon 130 0.79 0.03 0.96 0.80 0.16

PBSWM+ 111 0.80 0.06 0.87 0.80 0.07

PBSWM− 93 0.80 0.02 0.96 0.80 0.16

Isotonic 86 0.80 0.04 0.95 0.80 0.15

				Decision
0.15	0.30	Simon	142	0.79	0.03	0.96	0.80	0.16
PBSWM+	137	0.80	0.06	0.87	0.80	0.07
PBSWM−	129	0.80	0.02	0.96	0.80	0.16
Isotonic	110	0.80	0.05	0.95	0.80	0.15
0.20	0.40	Simon	110	0.78	0.03	0.96	0.80	0.16
PBSWM+	87	0.80	0.06	0.88	0.80	0.08
PBSWM−	87	0.80	0.02	0.96	0.80	0.16
Isotonic	73	0.80	0.05	0.95	0.80	0.15
0.25	0.45	Simon	108	0.78	0.03	0.96	0.80	0.16
PBSWM+	100	0.80	0.06	0.88	0.80	0.08
PBSWM−	88	0.80	0.02	0.96	0.80	0.16
Isotonic	83	0.80	0.04	0.95	0.81	0.14
0.30	0.50	Simon	130	0.79	0.03	0.96	0.80	0.16
PBSWM+	111	0.80	0.06	0.87	0.80	0.07
PBSWM−	93	0.80	0.02	0.96	0.80	0.16
Isotonic	86	0.80	0.04	0.95	0.80	0.15

$p_{0}$ is the true response rate under the null hypothesis and $p_{1}$ is the true response rate under the alternative hypothesis.

The probability of rejecting any hypothesis under $(p_{1}, p_{1})$ is the lowest for the PBSWM + design and it also has the highest probability of wrongly rejecting $H_{0}^{-}$ under $(p_{0}, p_{1})$ . Although controlling the probability of wrongly rejecting $H_{0}^{-}$ under $(p_{0}, p_{1})$ at $α = 0.05$ was not our objective when generating these designs, Simon, PBSWM − and isotonic stratified design all controlled such error rate at 0.05 for the scenarios we considered. Our main comparison is between the isotonic stratified design and the PBSWM − design since the two designs require significantly fewer patients compared to the other two designs. Both designs yield similar power and error probabilities with the PBSWM − design having slightly lower probability of wrongly rejecting $H_{0}^{-}$ under $(p_{0}, p_{1})$ compared to the isotonic stratified design. As discussed in Section 3, the isotonic stratified design has an advantage of oversampling B + and has a slightly smaller sample size than the PBSWM + design. The probability to reject $H_{0}^{+}$ and not to reject $H_{0}^{-}$ for two Simon's designs under $(p_{0}, p_{1})$ is equal to the power of Simon's design multiplied by the probability of not rejecting $H_{0}^{-}$ , for example, $0.80 \times (1 - 0.025) = 0.78$ for a design that achieves 80% with a 0.025 type I error rate.

We evaluated the design performance when the monotonicity of the response probabilities in B − and B + does not hold by considering a scenario $(p^{-}, p^{+}) = (p_{0} + δ, p_{0})$ , δ > 0. The type I error is preserved for PBSWM +. For PBSWM −, the type I error is not preserved because $H_{0}^{+}$ is automatically rejected when $H_{0}^{-}$ is rejected. We simulated the designs from Table 1 under the scenario $(p^{-}, p^{+}) = (p_{0} + 0.05, p_{0})$ . Results are shown in Table A2 of Supplemental Appendix 3. For example, for designs generated for $(p_{0}, p_{1}) = (0.15, 0.30)$ , the type I error rate of rejecting $H_{0}^{+}$ for PBSWM − was 0.21, while for the isotonic stratified design, the inflation was smaller at 0.08. We also investigated the effect on the probability of rejecting $H_{0}^{-}$ when the monotonicity assumption is violated by simulating designs from Table 1 under scenario $(p^{-}, p^{+}) = (p_{1}, p_{1} - δ)$ with δ = 0.05. Since possible decisions are to reject no hypotheses, to reject $H_{0}^{+}$ only or to reject both $H_{0}^{+}$ and $H_{0}^{-}$ , we evaluated the probability of rejecting both hypotheses. This probability was retained at 0.80 for PBSWM −, it was equal to 0.59 for PBSWM+, and was equal to 0.71 for the isotonic stratified design. That is, when monotonicity is violated with $p^{-} > p^{+}$ , PBSWM − has an increased type I error rate but maintains power, PBSWM + maintains the type I error rate but has low power, and the isotonic stratified design has both an increased type I error rate and lower power, however, the increase and the loss are not as dramatic as for the other designs.

5 Conclusions

We describe an isotonic stratified design for phase 2 clinical trials to evaluate the efficacy of a new therapy in biomarker-positive and negative patients. Some existing approaches^7,9,11 used the assumption of monotonicity between response probabilities in the biomarker negative and positive subgroups in constructing the sequence of decision rules. We use this assumption not only to test the null hypotheses but also to estimate the response probabilities in B − and B +. If the observed response probability in the biomarker-negative subgroup is higher than in the biomarker-positive subgroup, we estimate the response rate in both subgroups from the combined sample.

Compared with the required sample size to run two optimal Simon's designs in each biomarker subgroup, 2N, our proposed design yields up to a 30% smaller total sample size, or 1.4N, and a 22% smaller expected sample size under the null hypothesis. That is, we can evaluate the response probability in both B − and B + with the isotonic stratified design using only 40% more patients needed to test B + only using Simon's design. Compared to the PBSWM − design, our proposed design yields up to a 10% smaller total maximum sample size and about a 22% smaller expected sample size under the null hypothesis. Another desirable property of our design compared to the PBSWM – design is that the optimal allocation proportion to B + is generally higher than 0.5. It is more ethical to oversample B + in a biomarker-stratified trial since patients in B + are believed to be more likely to benefit from the experimental treatment. In fact, the optimal allocation proportion to B − in the PBSWM − design is as low as 0.40 when minimizing for EN₀ and 0.30 when minimizing for EN.

When choosing a design to use, one can minimize EN₀, EN, or N_max. The choice of optimization criterion is similar to the one in selecting optimal or minimax Simon's designs.⁶ Biomarker stratified designs optimizing N_max (see Table A1 of Supplemental Appendix 2) yield the smallest total maximum sample size while designs minimizing EN₀ optimize for the case where it is believed that the treatment is unlikely to work. When response heterogeneity is expected, we recommend using the EN criterion as it considers all three scenarios $(p_{0}, p_{0})$ , $(p_{0}, p_{1})$ , and $(p_{1}, p_{1})$ .

There are three possible conclusions at the end of the trial: reject both null hypotheses, recommend both subgroups for further development; accept the null hypothesis $H_{0}^{-}$ but reject the null hypothesis $H_{0}^{+}$ and proceed with only the B + subgroup for further development, and accept both null hypotheses and halt the development. When reporting the results of the trial, single-arm phase 2 oncology trials usually report the number of responses, the proportion of responses, and whether or not the efficacy boundary was crossed. The results of a biomarker-stratified trial can be reported as the number and proportion of responses in each subgroup, B − and B +, and if significance was achieved in either subgroup. Although our design uses isotonic estimates of response probabilities in the decision rule, we recommend reporting the observed number and proportion of responses in each subgroup.

Phase 2 trials often use tumor response as the primary endpoint, while the primary endpoint in phase 3 trials is progression-free or overall survival. These time-to-event endpoints might not correlate well with tumor response. For this reason, if a biomarker-stratified phase 2 trial is successful with respect to tumor response in both subgroups, it might be wise to consider a biomarker-stratified design for the next trial rather than conducting subsequent trials in an unstratified population.

Supplemental Material

sj-pdf-1-smm-10.1177_09622802241238978 - Supplemental material for Isotonic design for single-arm biomarker stratified trials

Supplemental material, sj-pdf-1-smm-10.1177_09622802241238978 for Isotonic design for single-arm biomarker stratified trials by Lang Li and Anastasia Ivanova in Statistical Methods in Medical Research

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported in part by the NIH Grant U24 HL138998.

ORCID iD

Lang Li

Supplemental material

Supplemental material for this article is available online.

References

Antoniou

Jorgensen

Kolamunnage-Dona

. Biomarker-guided adaptive trial designs in phase II and phase III: a methodological review. PLoS ONE 2016; 11: e0149803.

Rosenblum

Luber

Thompson

, et al. Group sequential designs with prospectively planned rules for subpopulation enrichment. Stat Med 2016; 35: 3776–3791.

Rosenblum

Qian

, et al. Multiple testing procedures for adaptive enrichment designs: combining group sequential and reallocation approaches. Biostatistics 2016; 17: 650–662.

Rosenblum

Fang

Liu

. Optimal, two-stage, adaptive enrichment designs for randomized trials, using sparse linear programming. J R Stat Soc Ser B 2020; 82: 749–772.

Ivanova

Paul

Marchenko

, et al. Nine-year change in statistical design, profile, and success rates of phase II oncology trials. J Biopharm Stat 2016; 26: 141–149.

Simon

. Optimal two-stage designs for phase II clinical trials. Control Clin Trials 1989; 10: 1–10.

Jones

Holmgren

. An adaptive Simon two-stage design for phase 2 studies of targeted therapies. Contemp Clin Trials 2007; 28: 654–661.

Berry

Broglio

Groshen

, et al. Bayesian hierarchical modeling of patient subpopulations: efficient designs of phase II oncology clinical trials. Clin Trials 2013; 10: 720–734.

Tournoux-Facon

Rycke

Tubert-Bitter

. Targeting population entering phase III trials: a new stratified adaptive phase II design. Stat Med 2011; 30: 801–811.

10.

Freidlin

Korn

. Biomarker enrichment strategies: matching trial design to biomarker credentials. Nat Rev Clin Oncol 2014; 11: 81–90.

11.

Parashar

Bowden

Starr

, et al. An optimal stratified Simon two-stage design. Pharm Stat 2016; 15: 333–340.

12.

Andre

Mazouni

Liedtke

, et al. HER2 expression and efficacy of preoperative paclitaxel/FAC chemotherapy in breast cancer. Breast Cancer Res Treat 2008; 108: 183–190.

13.

Jacot

Cottu

Berger

, et al. Actionability of HER2-amplified circulating tumor cells in HER2-negative metastatic breast cancer: the CirCe T-DM1 trial. Breast Cancer Res 2019; 21: 121.

14.

Robertson

Wright

Dykstra

. Order restricted statistical inference. Chichester: John Wiley & Sons, 1988.

15.

Barlow

. Statistical inference under order restrictions: The theory and application of isotonic regression. Chichester: John Wiley & Sons, 1972.

16.

R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2013, http://www.R-project.org/ .

17.

Jung

Lee

Kim

, et al. Admissible two-stage designs for phase II cancer clinical trials. Stat Med 2004; 23: 561–9.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.70 MB

				Decision
				$(p_{0}, p_{1})$		$(p_{1}, p_{1})$
				Correctly	Wrongly	Reject any	Correctly reject	Fail to
$p_{0}$	$p_{1}$	Design	N	reject $H_{0}^{+}$ only	reject $H_{0}^{-}$	hypothesis	$H_{0}^{-}$ and $H_{0}^{+}$	reject $H_{0}^{-}$
0.15	0.30	Simon	142	0.79	0.03	0.96	0.80	0.16
		PBSWM+	137	0.80	0.06	0.87	0.80	0.07
		PBSWM−	129	0.80	0.02	0.96	0.80	0.16
		Isotonic	110	0.80	0.05	0.95	0.80	0.15
0.20	0.40	Simon	110	0.78	0.03	0.96	0.80	0.16
		PBSWM+	87	0.80	0.06	0.88	0.80	0.08
		PBSWM−	87	0.80	0.02	0.96	0.80	0.16
		Isotonic	73	0.80	0.05	0.95	0.80	0.15
0.25	0.45	Simon	108	0.78	0.03	0.96	0.80	0.16
		PBSWM+	100	0.80	0.06	0.88	0.80	0.08
		PBSWM−	88	0.80	0.02	0.96	0.80	0.16
		Isotonic	83	0.80	0.04	0.95	0.81	0.14
0.30	0.50	Simon	130	0.79	0.03	0.96	0.80	0.16
		PBSWM+	111	0.80	0.06	0.87	0.80	0.07
		PBSWM−	93	0.80	0.02	0.96	0.80	0.16
		Isotonic	86	0.80	0.04	0.95	0.80	0.15