Sage Journals: Discover world-class research

Abstract

Although response-adaptive randomisation (RAR) has gained substantial attention in the literature, it still has limited use in clinical trials. Amongst other reasons, the implementation of RAR in real world trials raises important practical questions, often neglected in the technical literature. Motivated by an innovative phase-II stratified RAR rare-disease trial, this paper addresses two challenges: (1) How to ensure that RAR allocations are desirable, that is, both acceptable and faithful to the intended probabilities, particularly in small samples? and (2) What adaptations to trigger after interim analyses in the presence of missing data? To answer (1), we propose a Mapping strategy that discretises the randomisation probabilities into a vector of allocation ratios, resulting in improved frequentist errors. Under the implementation of Mapping, we answer (2) by analysing the impact of missing data on operating characteristics in selected scenarios. Finally, we discuss additional concerns including: pooling data across trial strata, analysing the level of blinding in the trial, and reporting safety results.

Keywords

Adaptive designs rare disease implementation adaptive randomisation mapping

1. Introduction

Well-designed randomised controlled trials (RCTs) have long been valued for their well-understood statistical properties and are recognised as the gold standard for conducting evidence-based clinical research to assess the efficacy of interventions. Yet, standard RCTs can demand substantial time and resources – both in terms of sample size and cost, and therefore can result impractical in cases such as rare diseases, where patient enrolment is slow and limited in size. Even in a common disease setting, many subtypes are increasingly being identified and may require personalised or stratified approaches to therapy, thus splitting the feasible number of patients that can be recruited for the overall trial into smaller groups for each subtype stratum trial.¹ Furthermore, conducting a trial with the main purpose of learning about treatment effectiveness (as in the traditional RCTs) may be ill suited in fatal diseases, where some have suggested that the priority should be to treat trial participants as effectively as possible.^1,2 These drawbacks often prevent successful randomised experimentation and have been widely acknowledged as limiting medical innovation.³

Adaptive trial designs have been proposed as a means of addressing some of the practical limitations of traditional RCTs. They enable the possibility of not only enhancing the likelihood of detecting the most promising treatments without substantially increasing the sample size, but also offering expected benefit to the trial participants.⁴ The fundamental characteristic of an adaptive clinical trial is to allow, according to a prespecified plan, dynamic adjustments of design features while patient enrolment is ongoing⁵ based on data observed at interim analysis. The first proposal of a design of this nature can be traced back to Thompson’s⁶ idea of skewing the randomisation probabilities toward the most promising treatments according to their posterior probability of success. Due to this historical genesis, adaptive randomisation designs were often referred to as adaptive designs.^7–9 Although the more recent use of the term applies more generally (see e.g., Bhatt and Mehta,⁴ Pallmann et al.⁵ for an overview), in this work, our focus will be on response-adaptive randomisation (RAR) designs, which prespecify how and when the randomisation probabilities should be adjusted based on the accumulated response data. We also explore how the randomisation probabilities can be used to inform early stopping rules for experimental arms.

RAR has received substantial attention in the biostatistical literature, contributing to a fertile area of methodological and theoretical research. Despite this and the recent encouragement of RAR adoption from government agencies and health authorities,¹⁰ RAR uptake in clinical experimentation remains disproportionately low compared to the stream of theoretical work on this topic.^11,12 The reasons behind the gap of the RAR methodology/theory versus the RAR in practice are diverse. First, the role of RAR in clinical trials has long been and still remains a subject of active debate within biostatistics due to its potential impact on statistical inference. Bias and hypothesis testing issues, among others, have been intensively studied (see e.g., Villar et al.¹³), and several solutions have emerged both from the biostatistics^11,14,15 and the machine learning^16–19 community. For a recent extensive review on the matter, we refer to Robertson et al.,¹² references therein, and related discussions. Second, the practical debut of RAR in clinical trials, that is, the two-armed ECMO trial,²⁰ resulted in a highly controversial interpretation of its results and their generalisability due to the final extreme treatment imbalance. This application of RAR to a clinical trial limited the use of RAR in clinical trials for the next 20 years. Third, the implementation phase of RAR in a real-trial context poses critical practical challenges, many of which may also apply to more traditional RCTs but which require a distinct approach when using RAR. These include, but are not limited to RAR, for example:

(1)
For a given vector of theoretical randomisation probabilities, how can we minimise the chances of observing undesirable treatment allocations (that is, observed allocations diverging from their theoretical counterparts beyond an acceptable level) while taking into account the impact this may have on the design’s operating characteristics? More formally, let $π = (π_{0}, π_{1}, \dots, π_{K})$ and $ρ = (ρ_{0}, ρ_{1}, \dots, ρ_{K})$ denote the target randomisation probabilities and the observed allocation proportions, respectively, where $π_{k}$ is the randomisation probability of arm $k$ and $ρ_{k}$ is the proportion of participants assigned to arm $k$ (that is, $ρ_{k} = n_{k} / n$ , with $n_{k}$ the number of participants in arm $k$ and $n$ the trial sample size). Then, our aim is to ensure $n ρ_{k} \approx n π_{k}$ , for each arm $k$ . Although this may be less of an issue in large-sample trials, the concern would certainly be crucial in rare-disease trials even with equal randomisation (though these issues exacerbate with unequal sample sizes predetermined from the start or resulting from RAR). To illustrate this, consider a three-arm trial with $n = 20$ and non-adaptive $π = (0.5, 0.4, 0.1)$ . From Figure 1 showing the distribution of $ρ$ across 10,000 simulations for each arm, it can be noted that: $P (ρ_{0} \leq 0.4) \approx P (ρ_{1} \leq 0.3) \approx 25 %$ , and $P (ρ_{2} = 0) \geq 12 %$ . This indicates a considerable likelihood of the experimental arms receiving near-equal allocation, contrary to their target probabilities, and the control arm being completely excluded. Considering the above and the fact that there will be only one trial with 20 patients, it may be both safer and statistically powerful to work on the basis of a discrete allocation ratio, such as $5 : 4 : 1$ , ensuring that at least two patients are assigned to arm $k = 2$ . Furthermore, additional questions may arise when one has to practically define the allocation ratio corresponding to a target randomisation probability. This is of particular interest in RAR trials, where randomisation occurs sequentially in stages of intrinsically smaller size. For example, what should be the targeted allocation ratio corresponding to a stage-specific probability target of $π = (0.5, 0.4, 0.1)$ with a sample size of $n = 6$ ? Should it be $3 : 2 : 1$ or rather $3 : 3 : 0$ ?
(2)
In case of missing response data that inform interim decisions in an RAR trial, when and to what extent should we allow deviations from balanced allocation in the following stage as dictated by a complete-case approach? Once we address problem (1), the natural progression from there is to think about what to do if we encounter missing responses at the interim analyses. A critical design decision with respect to the adaptation of the trial is whether or not to adapt the allocation towards the more promising treatment under the presence of missing responses. To the best of our knowledge, this issue has not been formally explored in the literature.

Figure 1.
Empirical distribution of the observed allocation $ρ$ of arms $k = 0, 1, 2$ under the randomisation scheme $π = (0.5, 0.4, 0.1)$ . Results are averaged across 10,000 replicas of the randomisation scheme using ‘sample’ function in $R$ .

In this work, we are specifically concerned with the research questions (1) and (2) as these are essential to the design and implementation of the motivating phase-II rare-disease RAR trial, StratosPHere 2. An overview of the study is presented in Section 2, with the detailed protocol given in Deliu et al.²¹ For (1), we propose a Mapping rule to convert the vector of continuous target randomisation probabilities into a discrete allocation ratio object. The resulting rule preserves the randomisation properties to a chosen acceptable degree, avoids the occurrence of extreme allocation ratios by chance, and improves the operating characteristics of the original design in scenarios of interest by reducing the RAR design’s variability. For (2), we describe a procedure for handling missing data by re-evaluating the operating characteristics and taking into account the frequency of adaptations triggered in the resulting design through simulations. Other practical challenges, going from pooled analysis to safety reporting, are also surfaced.

Overall, with this work, we aim to discuss a set of critical practical problems along with potential solutions and recommendations, guided by our experience and collaboration with the clinical team in designing and conducting StratosPHere 2. Our research questions are directly inspired by addressing the practical needs and the well-known difficulties of a rare disease community. We emphasise that our proposals are not meant to be regarded as universal solutions, but rather to inspire and encourage greater synergy between methodological and practical research. We hope our work contributes to stimulating research to increase the adequate adoption and implementation of adaptive designs such as RAR into clinical practice.

The remainder of this paper is structured as follows. In Section 2, we provide an overview of the motivating RAR trial, StratosPHere 2, and present the preliminary notation and design setup (Section 2.1). In Section 3, we discuss the research question (1). We explore the research question (2) in Section 4. In Section 5, we discuss additional challenges that are of central interest in the final analysis of our motivating trial, and, potentially, other stratified RAR trials. Final considerations and concluding remarks are given in Section 6.
2. Motivating case study

This work is motivated by practical challenges we encountered while planning the implementation of a rare-disease trial in pulmonary arterial hypertension (PAH): StratosPHere 2.²¹ PAH is a life-threatening, progressive disorder characterised by high blood pressure in the arteries of the lungs. It affects between 15 and 50 people per million in the US and Europe and, although treatable, there is currently no cure. In this context, StratosPHere 2 represents the first-ever precision-medicine trial specifically designed to treat the causes of PAH rather than its symptoms. Importantly, it seeks to directly address devastating causes given by genetic mutations in the bone morphogenetic protein receptor type-2 (BMPR2), the most common genetic cause of familial PAH.²²

StratosPHere 2 is a three-armed, placebo-controlled phase-II stratified RAR trial. The primary objective is to explore the efficacy of two repurposed therapies as genetic modulators of BMPR2 signalling: hydroxychloroquine and phenylbutyrate. Patients are stratified according to two specific classes of BMPR2 mutations, namely haploinsufficient (herein, Stratum A) and missense (herein, Stratum B) mutations. More formally, denoted by $T_{1}$ and $T_{2}$ the two active treatments (hydroxychloroquine and phenylbutyrate, respectively along with the standard of care) and by $C$ the control (placebo with the standard of care) treatment, trial’s primary objective is to test the hypothesis that a mean increase in the primary outcome, say $E (Δ Y)$ , can be achieved in the two mutation strata $s = A, B$ . The primary analysis endpoint $Δ Y$ is a measure of the engagement of the BMPR2 pathway defined as a change in the genetic expression from study entry to 8 weeks follow-up after treatment initiation. This represents a novel composite panel of validated measurements of BMPR2 target genes using quantitative PCR; we refer to Section 12 of Deliu et al.²¹ and StratosPHere 1 Study²³ for further details. As the two active treatments, $T_{1}$ and $T_{2}$ , have distinct mechanisms of action pertinent to each stratum, the following represent the primary hypotheses to test:

\begin{aligned} Stratum A H_{0}^{A} & : E Δ Y_{T_{1}}^{A} - E Δ Y_{C}^{A} \leq 0 vs. H_{1}^{A} : E Δ Y_{T_{1}}^{A} - E Δ Y_{C}^{A} > 0 \\ Stratum B H_{0}^{B} & : E Δ Y_{T_{2}}^{B} - E Δ Y_{C}^{B} \leq 0 vs. H_{1}^{B} : E Δ Y_{T_{2}}^{B} - E Δ Y_{C}^{B} > 0. \end{aligned}

(1)

Primary versus adaptation endpoint

The primary endpoint of the study, $Δ Y$ , is a continuous outcome, whose choice was driven by domain and statistical aspects (see Deliu et al.,²¹ Jones et al.²³) and whose role is central in the final analyses. However, although study’s hypotheses in equation (1) are based on this continuous data variable, given its response-adaptive nature, an additional type of endpoint is defined with the purpose of being used to dictate the pre-planned adaptions of trial’s features. This endpoint is a dichotomisation of the final continuous endpoint and it represents the adaption endpoint. While the same parameter could be taken as both primary and adaptation endpoint, for the purpose of a safe and conservative adaptation in such a small trial, a binary endpoint has been considered for the latter. In fact, from the design point of view, it will result in a more conservative RAR when using a binary endpoint, and from a final analysis perspective, it would be more powerful on the continuous data. We introduce the binary indicator $I (Δ Y \geq δ)$ , where $δ = 0.3$ is the minimum meaningful change associated with a positive BMPR2 engagement. Consequently, we take as the binary adaptation endpoint the parameter $θ_{k} \in [0, 1]$ defined as:
$θ_{k} = E (Δ Y_{k} \geq δ), k = C, T_{1}, T_{2} .$
(2)
This adaptation endpoint guides the randomisation probability throughout the trial, and it can be interpreted as the expected number of successes associated with each arm $k$ .
2.1. StratosPHere 2 design

Overall, for both strata, an expected number of $N = 40$ patients is expected to be enrolled in 3 consecutive stages of randomisation, denoted by $t = 1, 2, 3$ . Specifically, we expect to recruit $n_{1, s} = 6$ , $n_{2, s} = 6$ and $n_{3, s} = 8$ participants per stratum, where $n_{t, s}$ denotes the sample size of the stage $t$ block, with $t = 1, 2, 3$ , and of stratum $s = A, B$ . Such values are determined based on practical recruitment considerations and their frequentist properties are evaluated in simulation studies. Specifically, for the overall study duration (of at least 2 years), the rare disease allows for an expected total sample size of $n = 20$ patients per stratum (and likely one patient per month) over the funding period of the trial grant. The stage-specific sizes are determined according to practical implementability and simulation evaluations aiming at maximising power.

Eligible subjects will be randomly assigned to one of the three arms ( $C$ , $T_{1}$ , or $T_{2}$ ) following a Bayesian response-adaptive randomisation (BRAR) design implemented independently in each stratum $s = A, B$ . In stage 1, a restricted 2:2:2 allocation ratio for $C : T_{1} : T_{2}$ is considered; this reflects a balanced randomisation probability scheme, that is, ${π_{1, s, k} = 1 / 3; k = C, T_{1}, T_{2}}$ with $s = A, B$ . Once all responses from stage 1 are observed, a first interim analysis is performed to (possibly) update the randomisation probabilities for stage 2, say ${π_{2, s, k}; k = C, T_{1}, T_{2}}$ for all $s$ . The selected design does not allow for arm dropping at this second stage. A second interim analysis (possibly) adapts the randomisation probabilities ${π_{3, s, k}; k = C, T_{1}, T_{2}}$ for the (final) stage 3 block of 8 patients in each stratum $s = A, B$ . This interim accounts for stage 2 as well as stage 1 response data. In stage 3, a futile active treatment arm is allowed to be dropped if the associated randomisation probability is lower than a prespecified threshold $τ \in [0, 0.2]$ . Its precise value is documented with the trial sponsor under restricted access and will be publicly disclosed at the end of the study to minimise BRAR predictability and preserve study’s integrity. A schematic of StratosPHere 2 design for a generic stratum $s$ is presented in Figure 2.

Figure 2.
Schematic of the StratosPHere 2 trial design for one stratum. Threshold $τ$ specifies the probability for arm dropping and would be disclosed at the end of the trial to preserve its integrity (e.g., avoid allocations’ predictability).

Bayesian response-adaptive randomisation

The design of StratosPHere 2 builds on the RAR allocation rule proposed in Trippa et al.²⁴ and further discussed in Wason and Trippa.²⁵ It adopts a Bayesian framework for adjusting the randomisation probabilities at each interim analysis based on the accumulated response data up to that point. Let $D_{n_{\bar{t - 1}, s}}$ be all the observed data (assigned arms and corresponding outcomes or responses) from the $n_{\bar{t - 1}, s}$ participants of stratum $s$ , where $\bar{t - 1}$ refers to all accumulated stages ${1, \dots, t - 1}$ . For example, at stage $t = 3$ , $\bar{t - 1} = {1, 2}$ and includes $n_{1, s} + n_{2, s}$ patients. Then, Trippa’s rule defines the stage $t$ randomisation probabilities for stratum $s$ , say ${π_{t, s, k}^{Trippa}; k = C, T_{1}, T_{2}}$ , as:
$\begin{aligned} π_{t, s, k}^{Trippa} (γ_{t}, η_{t}) \propto {\begin{cases} \frac{p (θ_{k} > θ_{C} ∣ D_{n_{\bar{t - 1}, s}})^{γ_{t}}}{\sum_{j \in {T_{1}, T_{2}}} p (θ_{j} > θ_{C} ∣ D_{n_{\bar{t - 1}, s}})^{γ_{t}}} & if k = T_{1}, T_{2}, \\ \frac{1}{K} \exp (max (n_{\bar{t - 1}, s, T_{1}}, n_{\bar{t - 1}, s, T_{2}}) - n_{\bar{t - 1}, s, C})^{η_{t}} & if k = C, \end{cases} \end{aligned}$
(3)
where $n_{t, s, k}$ represents the number of individuals assigned to arm $k$ in stratum $s$ up to time $t$ , $K = 3$ is the number of study arms, and $θ_{k}$ reflects the adaptation endpoint associated with arm $k$ , as defined in equation (2). The two stage-varying hyper-parameters $γ_{t}$ and $η_{t}$ are introduced to modulate, respectively: (i) The current and final allocation imbalance between the two active treatment arms $T_{1}$ and $T_{2}$ and; (ii) the overall allocation of the control arm $C$ at the end of the study. We will refer to this rule as Trippa-BRAR $(γ_{t}, η_{t})$ and to the associated design as Control Protected. In StratosPHere 2, $η_{t}$ is tuned so as to guarantee a minimum allocation of the control arm of around $1 / K \approx 0.33$ .

Note that Trippa-BRAR $(γ_{t}, η_{t})$ can be viewed as an extension of the popular Thompson sampling (TS; Thompson⁶) rule, recognised as the first prototype of a BRAR design. Here, randomisation probabilities are expressed in terms of their posterior probability of being associated with the maximum expected outcome $θ_{k}$ , that is,
$π_{t, s, k}^{T S} (γ) \propto \frac{p (θ_{k} > θ_{k^{'}}, k \neq k^{'} ∣ D_{n_{\bar{t - 1}, s}})^{γ}}{\sum_{j \in {C, T_{1}, T_{2}}} p (θ_{j} > θ_{j^{'}}, j \neq j^{'} ∣ D_{n_{\bar{t - 1}, s}})^{γ}} k = C, T_{1}, T_{2},$
(4)
where $γ$ is a positive tuning hyper-parameter introduced by Thall and Wathen²⁶ for stabilising the randomisation probabilities.When $γ = 1$ , equation (4) reduces to vanilla TS, as first proposed in Thompson.⁶ We term this rule TS-BRAR $(γ)$ and the associated design Unrestricted, given that no restrictions are placed on the randomisation probabilities, which are uniquely guided by the observed responses. For example, compared to Trippa-BRAR $(γ_{t}, η_{t})$ , TS-BRAR $(γ)$ does not impose any restrictions on the control arm. Finally, note that taking $γ = 0$ , gives us conventional balanced randomisation.

Being a Bayesian framework, we make use of prior distributions on the unknown parameters $θ_{k}$ for $k = C, T_{1}, T_{2}$ . To reflect the novelty of the study, we assume Beta( $1, 1$ ) prior distributions for all arms.
2.2. Operating characteristics

We evaluated by extensive simulations the frequentist properties (type-I error and power) as well as the probability of patients receiving a superior arm when this exists. As detailed in Deliu et al.,²¹ all evaluations for final analysis, were implemented using a fully non-parametric approach based on a bootstrap resampling technique applied on response data – corresponding to cases (patients) but with standard of care treatment, available from a preliminary phase (StratosPHere 1; Jones et al.²³) and using a one-sided Wilcoxon test. The non-parametric approach is motivated by the small sample and the non-normally distributed data. As illustrated in Table 1, satisfactory results are expected with the proposed sample size. Simulation studies resulted in almost $80 %$ power under a $10 %$ type-I error control in each stratum of $n = 20$ patients for the BRAR design. Importantly, it highlights the benefits of a BRAR design over a fixed strategy in both allocating the most promising arms (column 5), and achieving an increased power in multi-armed cases (column 3).

Table 1.
Operating characteristics of the evaluated Bayesian response-adaptive randomisation (BRAR) designs.

Frequentist properties Empirical allocation

Design Power Type-I error Arm $C$ Arm $T_{1}$ Arm $T_{2}$

Fixed equal randomisation 0.748 0.12 0.34 (0.11) 0.33 (0.11) 0.33 (0.10)

Unrestricted 0.748 0.09 0.23 (0.12) 0.23 (0.12) 0.54 (0.17)

Control Protected 0.788 0.09 0.34 (0.07) 0.20 (0.13) 0.46 (0.13)

StratosPHere 2 0.788 0.10 0.32 (0.07) 0.21 (0.10) 0.47 (0.12)

Permuted block 0.783 0.12 0.30 (0) 0.35 (0) 0.35 (0)

Mapped- $α$ 0.795 0.11 0.30 (0) 0.20 (0.11) 0.50 (0.11)

Mapped- $β$ 0.789 0.11 0.30 (0) 0.20 (0.11) 0.50 (0.11)

Values are averaged across $10, 000$ independent replicas; results are reported in terms of mean (standard deviation). Here, a value 0 for the standard deviation reflects the imposed restrictions on the number of arms.

3. Mapping of allocation probabilities to allocation ratios

Randomisation is an essential component of a trial design. In complex trial settings, careful considerations are required from the implementation side to choose an appropriate randomisation procedure (be this adaptive or not). For example, in small trials, particular emphasis is placed on avoiding undesirable allocations, which can occur when targeting a particular allocation ratio. Furthermore, with the spread of adaptive designs, there is an increasing need for randomisation methods that allow unequal allocations. The rationale for using unequal allocation is provided by Peckham et al.²⁷ and Dumville et al.,²⁸ underlining factors such as cost considerations or ethical concerns. However, methodological challenges of randomisation in small samples remain; see, for example, Berger et al.²⁹ and van der Pas,³⁰ who discuss the permuted-block design and put forward merged-block designs to lower predictability. To achieve a target allocation ratio,³¹ propose a truncated multinomial design where participants are randomised to treatments according to a multinomial distribution until the target allocation number is reached for each treatment. Kuznetsova and Tymofyeyev³² also point out the importance for robust randomisation methods to preserve the targeted allocation ratio, discussing and calling for alternative allocation procedures, such as biased coin randomisation, that better approximate the allocation ratio in small samples, while reducing the selection bias in open-label studies. Designs such as the big-stick design,³³ the block-urn design,³⁴ the maximal procedure,³⁵ or the brick-tunnel design,³⁶ among others, are some of the randomisation methods that effectively achieve unequal treatment allocations, once the allocation ratio has already been determined. In parallel, one also faces the problem of determining these practical allocation ratios, by which we mean the allocation ratios are desirable given the allocation probabilities and they adhere to the constraints of the trial design. Existing proposals, for example, Tymofyeyev et al.,³⁷ discuss how allocations can be optimised in order to maximise power; however, resulting values are typically defined on a continuous range and their translation in practice is underdeveloped. Therefore, there is gap between methods to define ideal allocation ratios reflecting the underlying design’s goal and the existing procedures to best implementing resulting ratios into a feasible the randomisation system. This is particularly true for BRAR settings, which is the most used RAR in practice so far.³⁸

To summarise, managing unequal allocation ratios with small sample sizes is a challenge within itself (even outside of RAR designs). Particularly, BRAR designs pose additional challenges since instead of working with whole number ratio allocation weights, it primarily returns (continuous) allocation probabilities. This makes it more challenging to achieve the desired allocation, as described within the research problem 1 in the introduction.

In this paper, we aim to fill the aforementioned gap. Specifically, we propose a method to map the (continuous) allocation probabilities derived according to the design’s allocation rule, for example, Thompson sampling or Trippa’s rule, to a suitable ratio using probability thresholds which act as boundaries for decision making for the adaptive design. We call this method Mapping. As a result, the randomisation process preserves its adaptability and operating characteristics up to the level of avoiding pre-established undesirable allocations. Furthermore, Mapping simultaneously covers both the both the determination of the target ratio and its implementation. This guarantees that the randomised sequence (randomised list) adheres to the constraints of the design and maintains the expected allocation ratios.

This section focuses on research problem (1) as outlined in the Introduction. First, we provide a comparison amongst selected designs considered for StratosPHere 2 differing in their degree of constraints on the allocation probabilities. We will refer to them as the Unmapped RAR design, since their continuous randomisation probabilities are directly used to determine the allocations. Then, we consider the baseline design to be StratosPHere 2 (as illustrated in Figure 2 and discussed in the study protocol²¹ and statistical analysis plan).³⁹ Finally, we present Mapped designs which includes Permuted Block Design and our proposed designs that translate the continuous allocation probabilities to discrete target allocations. A preliminary summary of the comparators is given in Table 2.

Table 2.
Taxonomy of the evaluated BRAR designs, with corresponding allocation rule and restrictions per stage. $# C = 2$ indicates the exact number of controls allocated.

Design type Design name Allocation rule Stage 1 Stage 2 Stage 3

Unmapped Fixed equal randomisation – – – –

Unrestricted TS-BRAR $(γ)$ – – –

Control protected Trippa-BRAR $(γ_{t}, η_{t})$ – – –

Baseline StratosPHere 2 Trippa-BRAR $(γ_{t}, η_{t})$ 2 : 2 : 2 No arm dropping –

Mapped Permuted block – 2 : 2 : 2 2 : 2 : 2 2 : 3 : 3

Mapped- $α$ Trippa-BRAR $(γ_{t}, η_{t})$ 2 : 2 : 2 No arm dropping; $# C = 2$ $# C = 2$

Mapped- $β$ Trippa-BRAR $(γ_{t}, η_{t})$ 2 : 2 : 2 No arm dropping; $# C = 2$ $# C = 2$

Design type	Design name	Allocation rule	Stage 1	Stage 2	Stage 3
Unmapped	Fixed equal randomisation	–	–	–	–
	Unrestricted	TS-BRAR $(γ)$	–	–	–
	Control protected	Trippa-BRAR $(γ_{t}, η_{t})$	–	–	–
Baseline	StratosPHere 2	Trippa-BRAR $(γ_{t}, η_{t})$	2 : 2 : 2	No arm dropping	–
Mapped	Permuted block	–	2 : 2 : 2	2 : 2 : 2	2 : 3 : 3
	Mapped- $α$	Trippa-BRAR $(γ_{t}, η_{t})$	2 : 2 : 2	No arm dropping; $# C = 2$	$# C = 2$
	Mapped- $β$	Trippa-BRAR $(γ_{t}, η_{t})$	2 : 2 : 2	No arm dropping; $# C = 2$	$# C = 2$

TS-BRAR: Thompson sampling Bayesian response-adaptive randomisation; BRAR: Bayesian response-adaptive randomisation.

Since each design is implemented independently within each stratum, we outline our proposal with reference to a single stratum. The same approach can be replicated for the other stratum. We recall that each stratum is structured in three stages $t = 1, 2, 3$ with sizes $n_{t}$ determined according to practical recruitment considerations: $n_{1} = 6$ , $n_{2} = 6$ and $n_{3} = 8$ . At the end of each stage $t - 1$ , that is, during the interim analysis, the BRAR updates the vector of randomisation probabilities for the allocation of the treatments for the next stage: ${π_{t, s, k}; k = C, T_{1}, T_{2}}$ .

3.1. Unmapped designs

We consider three types of Unmapped designs – first one is Fixed Equal Randomisation where there is a $1 / K$ probability ( $K$ is the total number of arms) of getting a treatment arm getting assigned to a patient. The other two designs include: Unrestricted where we do not impose any of the trial constraints and the allocations are given by Thompson sampling probabilities TS-BRAR $(γ)$ as in equation (4) where $γ = 1$ , and Control Protected where the allocation rule is Trippa-BRAR $(γ_{t}, η_{t})$ as given in equation (3) – this ensures a minimum allocation of the control arm. Compared to Control Protected design, StratosPHere 2 design based on the motivating study has the additional trial restrictions per stage, namely, a restricted randomisation with an allocation ratio of $2 : 2 : 2$ at stage 1 and no arm dropping allowed at stage 2, while early arm dropping is allowed in stage 3 (see Figure 2). Given its relevance, we shall use it as a baseline design for constructing our Mapping strategy. An outline of the compared designs is reported in Table 2.

These designs include direct implementation according to the continuous randomisation probabilities estimated at each interim analysis (end of stages 1 and 2; see Figure 2) with no guaranteed properties of the resulting allocation sequences. As an example, think of assigning the second stage of 6 patients to the three treatment arms by rolling a fair 3-sided die, that is, with probability of 1/3 each. Since the allocation is determined randomly only once, there is no guarantee that the assignments will result in a 2:2:2 allocation across the treatments due to the stochastic nature of the process; thus, the resulting sequence (randomisation list) of treatments may deviate from the expected or desired allocation.

Implementation of Unmapped versus Mapped designs

Unmapped procedures require the randomisation system to use probabilistic assignment methodology that is, utilise probabilities derived from the BRAR algorithm along with some type of random number generator and logic to perform patient assignments. Implementing this within a randomisation system is complex and requires advanced software/coding by randomisation system providers. This level of complexity can limit the number of providers capable of supporting such systems. Whereas for Mapped designs, the derived discrete allocations can be implemented within the standard randomisation schedules (e.g., randomisation list with permuted blocks containing the included treatments and allocations). These standard randomisation schedules can be used in any randomisation system.

3.2. Mapped designs

Now, we describe our proposal for the Mapped versions of the Unmapped designs. In practice, Mapping can be viewed as an intermediate step to link the adaptive design’s definition to its implementation in a concrete trial setting. As such, Mapping involves a decision rule to translate the continuous randomisation probabilities derived in the Unmapped designs at the interim analyses into a target vector of discrete allocation ratios of the form $R_{0} : R_{1} : \dots : R_{K}$ for a trial with $K$ arms. Its fundamental principle is to define a set of probability thresholds to split the continuous probability space into discrete categories that match possible values of the allocation ratios $R_{0} : R_{1} : \dots : R_{K}$ . This ensures an efficient allocation of treatments to the small population where we want to avoid the possibility of undesired allocations, while maintaining interpretability of the process.

In Table 2, we have listed three designs as Mapped – first of which is the Permuted Block Design where the allocation ratio is fixed to maintain balance in the three different stages. The other two designs, which are discussed in the rest of the section, allow more room to deviate from the balanced allocations if needed.

Definition of the discrete allocation ratio space

The allocation ratio is directly driven by the allocation probabilities dictated by the underlying Unmapped design. Given its practical relevance, we will focus on the baseline design StratosPHere 2 where we are interested in the $R_{0} : R_{1} : R_{2}$ possible allocation ratios. An important step is establishing discrete “adaptation” categories within the Mapping strategy or categories which represent how much a promising arm could be given preference given observed data and by design.

Specifically, in our implementation, we consider the following five decision categories depending on the stage of the trial and the data observed: Drop, Disfavour, Balance, Favour, and Keep. The first two categories refer to a situation in which an active treatment arm shows unpromising results relative to the other active arm, as opposed to the last two categories; category Balance indicates a case of relative indifference between the two active treatment arms. Note that, for a given arm, these categories are mutually exclusive – if an arm is determined as Drop and it is the only arm in this category, it will be excluded from further allocation and the Disfavour allocation ratios will not apply. To resemble the design of StratosPHere 2 (which has a protection on the control treatment), we start by fixing the number of allocations to control $R_{0}$ at 2 for each stage in the Mapped versions. This ensures that the pre-defined final allocation of approximately $1 / K \approx 0.3 = 6 / 20$ for the control arm is preserved. Therefore, we now focus exclusively on categorising the mapping for the active treatment arm(s).

Guided by the StratosPHere 2 design, in Table 3, we present possible decisions that can be made regarding the allocation ratios for $C : T_{1} : T_{2}$ , for each stage and for each category. At stage 1, in the absence of sufficient information about the treatment effects, we start with an equal allocation ratio (or, equivalently, we set the Balance category for all arms). In stage 2, we allow for mild skewing of the allocation ratio towards the more promising arm, but Drop or Keep is not an option here. Drop that is, removing an active arm, or Keep that is, dropping the other active arm is an additional possible choice given the data only in stage 3.

Table 3.
Possible options for the allocation ratios for $C : T_{1} : T_{2}$ at each stage of the trial based on the categories of the active treatment arms.

Category Description Stage 1 Stage 2 Stage 3

Drop $T_{1}$ can be dropped Never Never 2 : 0 : 6

Disfavour $T_{1}$ can be disfavoured, but not dropped Never 2 : 1 : 3 2 : 1 : 5

2 : 2 : 4

Balance The active arms can be allocated equally 2 : 2 : 2 2 : 2 : 2 2 : 3 : 3

Favour $T_{1}$ can be favoured, without dropping the other Never 2 : 3 : 1 2 : 4 : 2

2 : 5 : 1

Keep $T_{1}$ can be kept, while dropping the other Never Never 2 : 6 : 0

“never” refers to a situation in which that category is not an option (e.g., at stage 1, we always choose balance, with arms allocated based on a $2 : 2 : 2$ ratio, and “never” the other categories).

Given the categories per interim stage, one can define multiple types of Mapped designs depending on the number of thresholds that are considered per stage on the continuous probability space. In this paper, we discuss two types of such designs which we will call Mapped- $α$ and Mapped- $β$ . In Mapped- $α$ , we introduce a probability threshold in stage 2 to distinguish between Disfavour and Favour, followed by two distinct probability thresholds in stage 3 to distinguish between two of the consecutive categories of Drop, Disfavour, Favour and Keep. Let $M_{t}^{m}$ denote the mapping domains for Mapped design $m$ and stage $t$ of the trial. Thus, $M_{2}^{α} = {Disfavour, Favour}$ and $M_{3}^{α} = {Drop, Disfavour, Favour, Keep}$ at stages 2 and 3 respectively, while $M_{1}^{α} = {Balance}$ . Notice that this design does not allow for any balance region on the decision line for the categorisation of an active treatment arm. This motivates the consideration of the second Mapped design, a refined one called Mapped- $β$ , which accommodates for a Balance option (that is, no adaptation) on the decision line. That is, Mapped- $β$ has the following mapping domains while preserving $M_{1}^{β} = M_{1}^{α} = {Balance}$ : $M_{2}^{β} = {Disfavour, Balance, Favour}$ and $M_{3}^{β} = {Drop, Disfavour, Balance, Favour, Keep}$ . This is achieved by taking a higher number of probability thresholds as shown in Figure 3 which provides an illustrative schematic of the proposed Mapped designs. We can notice that once the Balance region converges to an empty set (as depicted by the arrows), then Mapped- $β$ converges to Mapped- $α$ . An important consideration is if it is important to keep the Balance option available throughout the trial, in which case one may want to favour the design with higher granularity. Intuitively, the higher granularity mapping requires a higher level of “evidence” in the observed probabilities in order to deviate from a balanced allocation. It is also a pragmatic design in the sense that it will allow for not changing the design when data is not favourable enough to do so.

Figure 3.
Schematic of the proposed Mapped designs. Closing the Balance region in the top-row design Mapped- $β$ converts it into the Mapped- $α$ design as shown in the bottom row.

Thus, to summarise, Mapped designs implement the usage of thresholds $p_{t, i}$ at stage $t$ of the trial indexed at $i$ for discretising the decision line. Mapped- $β$ has two thresholds at stage 2: $p_{2, 1}^{'}$ and $p_{2, 1}^{″}$ and four thresholds at stage 3 $p_{3, 1}, p_{3, 2}^{'}, p_{3, 2}^{″}, p_{3, 3}$ . When the hyphenated thresholds are equal that is, $p_{2, 1}^{'}$ = $p_{2, 1}^{″}$ and $p_{3, 2}^{'} = p_{3, 2}^{″}$ then the design simplifies to Mapped- $α$ eliminating the Balance region on the decision line.

Once a category for an active arm has been chosen based on the thresholds by the Mapped design (threshold selection is discussed later), the corresponding discrete allocation ratio is selected from the options listed in Table 3. If two active arms fall in the same category, say Disfavour, then we opt for the balanced allocation of that stage. If the category has two possible allocation ratios, then either of them is chosen at random with equal probability.

Definition of the Mapping functions

We now give a formal description of our proposed approach, encompassing both the decision rule and the allocation strategy. At the end of stage 1 and 2, we observe some interim data and obtain a vector of allocation probabilities $π = [π_{C}, π_{T_{1}}, π_{T_{2}}]$ where $C$ , $T_{1}$ , and $T_{2}$ denote Control, Treatment 1 and Treatment 2 respectively. For a Mapping $m$ in stage $t$ , for each active treatment arm $k = T_{1}, T_{2}$ , we define a decision rule $D e c_{t}^{m} (π_{k})$ as a function of the allocation probabilities. This determines the allocation decision of assigning the active treatment arm to an adaptation, and based on the decision, the final allocation ratio gets specified.

For example, the representation of Figure 3 can be formalised through a mapping function $D e c_{t}^{m} (π_{k})$ when Mapped design $m$ is $β$ as:
$D e c_{t}^{β} (π_{k}) : [0, 1] \to M_{t}^{β},$
where $[0, 1]$ is the domain of $π_{k}$ while $M_{t}^{β}$ is the stage $t$ domain of the discrete allocation ratio introduced by the mapping. Specifically, $M_{2}^{β} = {D i s f a v o u r, B a l a n c e, F a v o u r}$ and $M_{3}^{β} = {D r o p, D i s f a v o u r, B a l a n c e, F a v o u r, K e e p}$ . More generally, for a set $M_{t}^{m} = {{C a t e g o r y}_{1}, \dots, {C a t e g o r y}_{J}}$ of adaptation categories under a mapping $m$ at stage $t$ , the allocation probability space $[0, 1]$ is partitioned into subintervals by thresholds ${p_{t, 0}, p_{t, 1}, \dots, p_{t, J}}$ such that $0 = p_{t, 0} < p_{t, 1} < \dots < p_{t, J} = 1$ , and the decision rule outputs one of the given categories $j$ :

$\begin{aligned} {D e c}_{t}^{m} (π_{k}) = {C a t e g o r y}_{j} if π_{k} \in [p_{t, j - 1}, p_{t, j}) . \end{aligned}$

Once the mapping rule outputs an adaptation category for each active treatment arm, we can finalise the allocation ratio based on the number of arms in each category. This is defined using an auxiliary function $A l l o c_{t}^{m} (| T^{C a t e g o r y} |)$ where $| X |$ denotes the cardinality of the active arms assigned to a given adaptation category and $t \in {2, 3}$ represents the trial stage. The bar notation above a category name, for example, $T^{\bar{B a l a n c e}}$ , indicates the complement set for that active arm – meaning, not in that category. Formally, for the more granular Mapped design $m = β$ , we define it as:
$\begin{aligned} A l l o c_{2}^{β} (| T^{C a t e g o r y} |) = & {\begin{cases} 2 : 1 : 3 :: C : T^{D i s f a v o u r} : T^{\bar{D i s f a v o u r}} & if | T^{D i s f a v o u r} | = 1 \\ 2 : 2 : 2 :: C : T^{B a l a n c e} : T^{\bar{B a l a n c e}} & if | T^{B a l a n c e} | = 2 \\ 2 : 3 : 1 :: C : T^{F a v o u r} : T^{\bar{F a v o u r}} & if | T^{F a v o u r} | = 1 \\ 2 : 2 : 2 :: C : T_{1} : T_{2} & otherwise \end{cases}; \\ A l l o c_{3}^{β} (| T^{C a t e g o r y} |) = & {\begin{cases} 2 : 0 : 6 :: C : T^{D r o p} : T^{\bar{D r o p}} & if | T^{D r o p} | = 1 \\ 2 : 1 : 5 or 2 : 2 : 4 :: C : T^{D i s f a v o u r} : T^{\bar{D i s f a v o u r}} & if | T^{D i s f a v o u r} | = 1 \\ 2 : 5 : 1 or 2 : 4 : 2 :: C : T^{F a v o u r} : T^{\bar{F a v o u r}} & if | T^{F a v o u r} | = 1 \\ 2 : 3 : 3 :: C : T_{1} : T_{2} & if | T^{B a l a n c e} | = 1 \\ 2 : 6 : 0 :: C : T^{K e e p} : T^{\bar{K e e p}} & if | T^{K e e p} | = 1 \\ 2 : 3 : 3 :: C : T_{1} : T_{2} & otherwise \end{cases} . \end{aligned}$

Determination of the Mapping thresholds

The implementation of Mapping requires fixing the set of probability thresholds ${p_{t, 0}, p_{t, 1}, \dots, p_{t, J}}$ . Here we do this empirically by assessing the impact of possible threshold values on resulting operating characteristics of the adopted Mapped design. For simplicity we illustrate this process using the Mapped- $α$ design but a similar (yet more complex) process can be used for the Mapped- $β$ design.

Thresholds are selected per stage using simulations, with calibration of an additional parameter that captures the level of adaptability achieved across simulations at each stage. With adaptability we refer to decisions for “Favouring” or “Dropping” an active arm, thereby deviating from a balanced allocation between the two active treatment arms. In our motivating trial, this occurs when the allocation ratios in stage 2 and stage 3 deviate from the initial 2:2:2 and 2:3:3 due to an active arm being favoured or dropped. Our goal is to determine thresholds such that adaptations are less often triggered under the null hypothesis (no difference in treatment arms) while also resulting in good operating characteristics, particularly in terms of power (under the alternative hypothesis: superiority of an active arm).

We express adaptability as a percentage: at each stage, we replicate the Mapped- $α$ design 10,000 times over a uniform grid of thresholds in $[0.33, 0.66]$ , and evaluate the proportion of simulated trials in which a specific adaptation (e.g., favouring $T_{1}$ or dropping $T_{1}$ ) is triggered under the two hypotheses. The grid is chosen considering that, for stage 2, the number of controls is fixed to 2 and there are 6 patients; therefore, the remaining probability of allocating the other two (active) arms is between $2 / 6 = 0.33$ and $1 - (2 / 6) = 0.66$ . Figure 4(a) shows that the choice of $p_{21} = p_{21}^{'}$ = $p_{21}^{″}$ = 0.45 produces the best trade-off between the two competing hypotheses. This serves as the mid point of the decision line for stage 2. For stage 3, we set the mid point of stage 3 decision line to the same value as decided for stage 2, that is, $p_{32} = p_{32}^{'} = p_{32}^{″} = 0.45$ , and fix the dropping threshold $p_{31}$ to $τ \in [0, 0.2]$ . The value of $τ$ is the same value as taken in StratosPHere 2 and will be disclosed at the end of the trial to preserve study’s integrity (see Section 2.1). The threshold $p_{33}$ is chosen by simulation analyses similar to those performed for the previous stage: as shown in Figure 4(b), the probability of dropping the active arm under the Null is minimised for $p_{33} \geq 0.55$ .

Figure 4.
Selecting probability thresholds for Mapped- $α$ design. (a) Trade off in favouring between the two hypotheses. The brown dashed lines intersect at the point of optimal trade off corresponding to threshold value 0.45 with the coordinates ( $x, y$ ) representing the favouring under the null and alternate hypotheses respectively. (b) Drop % for different thresholds under the two hypotheses. The brown dashed line passes through the Drop % values for the two hypotheses corresponding to the threshold 0.55.

For Mapped- $β$ , we should introduce the additional threshold by updating the values of $p_{2, 1}^{'}$ and $p_{3, 2}^{'}$ to $1 / K$ where $K = 3$ (the number of arms); this means the region between $1 / K$ and 0.45 represents the Balance region as defined before.

Operating characteristics of the Mapped versus Unmapped designs

A final part of addressing our research question (1) is to investigate: How do the operating characteristics of the original StratosPHere 2* design and other variants, including the Mapped versions for our selected thresholds, compare against each other?* Results corresponding to the BRAR designs outlined in Table 2 are presented in Table 1. The operating characteristics are reported in terms of power, type-I error, and expected allocation probabilities under the alternative hypothesis of an optimal arm (see also Section 2). These are computed following a bootstrap approach (based on real data from a pilot StratosPHere 1 phase; see Jones et al.²³), and by replicating the design a number of $10, 000$ independent times. As shown in Table 1, the implemented Mapping rule not only guarantees a safe allocation (preventing undesirable allocation ratios), but also results in non negligible improvements both in terms of frequentist errors and participants allocated to the most promising arm.
4. Handling missing data

In this section, we address research problem (2) as outlined in the Introduction. Specifically, we discuss here how to handle the occurrence of missing response data during the conduct of the StratosPHere 2 trial to effectively implement the (adaptive) Mapped- $α$ design. Our main interest lies in the operating characteristics to assess how the BRAR is affected by potential missing data and different ways to handle it. To achieve this, we conduct simulation-based sensitivity analyses considering various cases of missing response data present only in the first two stages of the study. We do not include the possibility of missing response data in stage 3 of either of the strata, as this is the final stage of the trial and it does not guide any further adaptation. In particular, we want to understand whether and when an adaptation could still be implemented in the presence of missing data.

All simulations are carried out under a missing-at-random framework assuming all the components of the biomarker panel for a patient’s primary endpoint are missing. We focus on the case where a maximum of 2 patients, that is, 10% drop out of the study in the listed missing data cases. If the rate of missing data exceeds 10%, we assume that no adaptations can be safely triggered. We also assume that when a data point is missing we will refrain from imputing it as this is a small-sized trial. Imputation will be discussed later in this section. The following cases are considered and compared with a scenario with no missing data denoted as Case 0, that is, when the response data is available for (6, 6, 8) patients in each stage respectively.

Case 1 One patient’s data is missing at random from stage 1 of a stratum, that is, the composition of the stratum is (5, 6, 8) for the sample sizes of the first, second and third stages respectively.

Case 2 Two patients’ data are missing at random from stage 1 of a stratum, that is, the composition of the stratum is (4, 6, 8) for the first, second and third stages respectively.

Case 3 One patient’s data is missing at random from stage 2 of a stratum, that is, the composition of the stratum is (6, 5, 8) for the first, second and third stages respectively.

Case 4 Two patients’ data are missing at random from stage 2 of a stratum, that is, the composition of the stratum is (6, 4, 8) for the first, second and third stages respectively.

Case 5 One patient’s data is missing at random from each of stages 1 and 2 of a stratum, that is, the composition of the stratum is (5, 5, 8) for the first, second and third stages respectively.

Results, in terms of the operating characteristics of the Mapped- $α$ design evaluated under different cases of missingness, are reported in Table 4, where the missing values were excluded from the simulations.

We note that with more missing data points, more power is reduced but overall, the operating characteristics remain reasonable allowing us to not interfere with the design.

Now, we want to address the key question at the design stage – whether to allow adaptation or not, when recommended by the BRAR design using the observed outcome data while ignoring any missing data. To investigate this, we utilise the parameter adaptability to observe the extent to which the design deviates from a balanced allocation in the simulations. Table 5 illustrates how adaptability varies in the cases considered in this work. Since the data size is small, we wish to see lesser adaptability under the Null hypothesis so that under the Alternative, we can expect lesser adaptability and just let the trial design decide whether to adapt or not even during missingness without any interference. To decide whether to adapt or not during the interim analysis, we examine the situations one by one. During the first interim that is, at the end of stage 1, we aim to understand how the adaptability varies for stage 2 when we have missing data points in stage 1 that is, cases 1, 2 and 5 are possible. Referring to Table 5, we find that for these specific cases, adaptability under the the Null is very high (highlighted in bold in the table). Therefore, we are inclined to not adapt away from balanced allocation in the stage 2 if there is any missing datapoint in stage 1. Moving on to Interim 2, with a similar rationality, we have cases 3, 4 and 5 to consider and we note that the Drop/ Keep % is very high under the Null. This leads us to not permit dropping either of the active treatment arms in stage 3 if missingness is found in stage 2. However, we may allow adaptation towards favouring an arm if need be. The final decisions on missing data handling in the ongoing trial are provided in the statistical analysis plan of the trial StratosPHere 2.

Table 4.
Operating characteristics of the Mapped- $α$ design evaluated under different cases of missingness.

Missingness Frequentist properties Empirical allocation

Scenario # Missing data points (per stage) Power Type-I error Arm $C$ Arm $T_{1}$ Arm $T_{2}$

Case 0 (0, 0, 0) 0.80 0.11 0.30 (0.0) 0.20 (0.11) 0.50 (0.11)

Case 1 (1, 0, 0) 0.76 0.11 0.30 (0.2) 0.20 (0.12) 0.50 (0.12)

Case 2 (2, 0, 0) 0.75 0.10 0.30 (0.3) 0.20 (0.13) 0.50 (0.13)

Case 3 (0, 1, 0) 0.76 0.11 0.30 (0.2) 0.20 (0.12) 0.50 (0.12)

Case 4 (0, 2, 0) 0.74 0.11 0.30 (0.3) 0.20 (0.12) 0.50 (0.13)

Case 5 (1, 1, 0) 0.74 0.10 0.30 (0.4) 0.20 (0.13) 0.50 (0.13)

The missing values were ignored in the simulations. Results are averaged across $10, 000$ independent replicas and reported in terms of mean (standard deviation). Standard deviation of zero reflects the imposed restrictions on the number of control arms (always fixed to $2$ ).

Table 5.
Adaptability under the null and alternative hypothesis.

Stage 2 favour/ disfavour % Stage 3 favour/ disfavour % Stage 3 drop/ keep %

Scenario $H_{0}$ $H_{1}$ $H_{0}$ $H_{1}$ $H_{0}$ $H_{1}$

Case 0 1.43 74.04 35.62 3.22 37.16 92.56

Case 1 63.14 83.73 5.62 1.64 77.80 95.48

Case 2 53.44 77.43 7.88 2.73 83.18 94.49

Case 3 1.32 74.19 9.59 1.84 72.95 95.95

Case 4 1.47 74.47 10.17 2.96 78.40 94.47

Case 5 63.89 83.90 1.08 2.61 90.67 94.50

Finally, we also review how the operating characteristics, particularly Power, would look like at the end of the study depending on our decision to adapt or not. This is illustrated in Figure 5 representing Power under different scenarios of adaptability. We can see that, when data is complete (Case 0), adapting gives maximum power as expected. Note that some of these scenarios are redundant for example, for missing case 3 where stage 1 has no missingness, it is pointless to consider the possibility of allowing favour in stage 2 as till that point data is complete and we do not interfere or interrupt. But, the plot helps us to understand how Power can get affected depending on what action we take.

Figure 5.
Power under different scenarios of adapting in the stage.

Imputation

Biswas and Rao⁴⁰ reports that based on simulations under the assumption of missing at random, statistical power improves when imputing missing data responses using the sample mean in their adaptive design as compared to not imputing. Here we assess how imputation affects the operating characteristics of the design.

Given the specified trial design and the implementation of Mapped- $α$ , we impute data only in the second stage of the trial with the sample mean of biomarker values observed up to the point of missingness for the corresponding missing arm in the following manner. Let $y_{i, k}$ be the vector of biomarker values for the $i$ th participant in the $k$ th arm, and ${\bar{y}}_{k}^{(t)}$ be the sample mean vector of the observed biomarker values in arm $k$ up to time $t$ (i.e., the timepoint at which the missingness occurs). If $y_{i, k}$ is missing in the second stage of the trial, it is imputed as $y_{i, k} = {\bar{y}}_{k}^{(t)}$ where ${\bar{y}}_{k}^{(t)} = \frac{1}{n_{k}^{(t)}} \sum_{m = 1}^{n_{k}^{(t)}} y_{m, k}$ . Here, $n_{k}^{(t)}$ is the number of observed biomarker vectors in arm $k$ up to time $t$ . Other imputation methods along with experts’ advice can be explored. Since this imputation can occur only when there is missingness in stage 2, only cases 3, 4 and 5 are relevant for comparisons.

In Table 6, the bold text cells show the updated values reporting adaptability when missing data points are imputed only in stage 2. Other slight changes in the non-bold values are due to the variability of results due to simulations. We note that when missing data points are imputed, then under the Null hypothesis, the adaptability values for the cases 3 and 4 are similar to the complete case (Case 0) as expected as there is no missingness anymore. However, the same values for Case 5 are dissimilar to Case 0 since there still exists an unimputed missing point which is from stage 1. The updated values for cases 3 and 4 show the chances of dropping an active treatment arm under the null have decreased which is a favourable outcome suggesting that imputation can be recommended. These results show that when we want to be conservative with respect to the extent of adapting in case of missingness in the data, then imputing can also lower the scope of extreme adapting such as Drop/ Keep % in stage 3. This observation is specific to our motivating trial but we believe it can be generalised to other studies highlighting the importance of checking levels of adaptability through simulations. However, the choice of imputation method will require a rigorous approach and expert advice.

Table 6.
Adaptability under imputation under the null and alternative hypothesis.

Stage 2 favour/disfavour % Stage 3 favour/disfavour % Stage 3 drop/Keep %

Scenario $H_{0}$ $H_{1}$ $H_{0}$ $H_{1}$ $H_{0}$ $H_{1}$

Case 0 1.43 74.04 35.62 3.22 37.16 92.56

Case 1 63.14 83.73 5.62 1.64 77.80 95.48

Case 2 53.44 77.43 7.88 2.73 83.18 94.49

Case 3 1.33 74.82 35.46 3.26 36.98 92.63

Case 4 1.41 73.93 36.52 3.49 37.05 92.60

Case 5 63.61 84.31 5.49 1.56 78.27 95.76

5. Further analysis challenges

Informative missing data

In Section 4, we addressed the issue of missing data from a design point of view. However, once the trial concludes, we will need to revisit the handling of missing data from an analysis perspective. First of all, the number of missing data cases to consider will increase as the missing data cases listed in Section 4 do not account for missingness in stage 3 due to their lack of impact on adaptive decisions. Secondly, the pattern of missingness needs to be examined. In our simulations, we assumed that missing data occurred at random. This makes sense in our trial because of the nature of the endpoint. However, in other settings there is the possibility of missingness not at random, where missing data may disproportionately occur in a specific treatment arm, signalling a non-random pattern. If this occurs, the risk of biased estimates increases, potentially compromising the trial’s operating characteristics. Therefore, identifying and understanding the missingness pattern is crucial. Finally, we will need to re-evaluate our imputation strategies and techniques, as those used previously may no longer be appropriate for the analysis stage of the trial. Before, in the design stage, we did not perform imputation in stage 1, and in stage 2 we imputed using historical data from the trial. However, after the conclusion of the study, if imputation is required, we can impute missing data in stage 1 also, as we now have more data points available to do so. A Bayesian imputation can be done to impute the missing data points.

Pooling strata versus independent analysis

In stratified designs or master protocols such as “umbrella”, platform or basket designs, which are differentiated into multiple parallel sub-studies,⁴¹ investigators are often faced with the dilemma of pooling data from subgroups, borrowing some information, or simply following a set of parallel analyses. In fact, while conducting independent or stand-alone analyses (such as the primary one in StratosPHere 2) perfectly fit into the tailored paradigm of precision medicine, there are a number of concerns with this approach (see e.g., Berger et al.,⁴² Berry⁴³), including the issue of multiple testing and the lack of sufficient power. This is particularly relevant in rare disease trials, where the sample size is inherently low for detecting significant effects. One of the secondary analyses pre-planned in StratosPHere 2 is, in fact, a final analysis based on a pooled sample from the two mutation strata. Nonetheless, this may conflict with the potential heterogeneity of the treatment effects in the two strata, especially when the primary analysis (per strata) does not reach a decisive conclusion about the treatment effect(s) due to low power. Dedicated analyses should be conducted to assess when and to what extent a pooled analysis would be suitable and superior compared to independent stand-alone type of analyses: we report some preliminary exploration in Appendix A.1. Then, information borrowing principles, as done in for example, Zheng and Wason,⁴⁴ may be included to enhance the informative content of a stratum. In alternative, a decision rule could be identified to guide a “Pool vs. Don’t Pool” analysis, for example, using a “Test and Pool” approach as discussed in Li et al.⁴⁵ Such a decision rule may have several benefits in clinical research, starting from enhancing power and/or minimising the risk of incorrect conclusions (under strata and treatment heterogeneity) for the current trial, and ending with informing the design of future phases of the trial. In StratosPHere 2, for example, it may suggest whether to keep a stratified approach in a future phase-2b or phase-3 trial.

Blinded versus veiled analysis

In placebo-controlled trials with multiple treatments that are noticeably different in terms of their drug's physical appearance or mode of administration, to ensure complete blinding (of both patients and physicians), a double-dummy approach must be implemented. That is, if we denote the two drug kits of the active treatments by A and B and the drug kits of their corresponding placebos by PA and PB, then each patient should receive two treatments having one of the following forms: (i) active A treatment arm patients receive drug kits A, PB, (ii) active B treatment arm patients receive drug kits B, PA, and (iii) placebo arm patients receive drug kits PA, PB. In this way, neither patients nor physicians can be informed about the drug kits they are given or, most importantly, about the drug kits they are not receiving (as would occur in a standard placebo-controlled trial). This double-dummy technique is recommended by FDA⁴⁶ for (confirmatory) platform trials but it can increase costs and place additional burdens on patients, potentially reducing compliance. An alternative approach is based on having different placebos matching all the active treatments and administering a single drug kit from the following set: (i) active A treatment arm patients receive drug kit A, (ii) active B treatment arm patients receive drug kit B, (iii) placebo A treatment arm patients receive drug kit PA, and (iv) placebo B treatment arm patients receive drug kit PB. In this way, the patients remain unaware of whether they are receiving an active treatment or a placebo, but they do know which active treatment they have not been given. This partial blinding is termed Veiled in Senn.⁴⁷ The level of blinding ultimately leads to a re-definition of the hypothesis in equation (1): that is, should we compare the active treatment, say A, to the combined placebo control arm (i.e., both PA and PB), or only to the corresponding placebo arm PA? In StratosPHere 2, we adopted a veiled approach and the control arm is represented by the combination of the two control arms (i.e., they are not differentiated). If we were to follow a double-dummy approach, we would need to differentiate between $C_{T_{1}}$ and $C_{T_{2}}$ and comparison should be made exclusively between $T_{1}$ versus $C_{T_{1}}$ or $T_{2}$ versus $C_{T_{2}}$ . This also raises a fundamental question for the trial: how the veiled blinding compares with the double-dummy approach in terms of power as in the later the number of controls will increase? This can be answered by performing the simulations under two levels of blinding, however the ultimate blinding choice may not rely entirely on the statistical output and may consider the costs and other practical issues.

Safety reports

In many trials, including adaptive trials, it is important to account for the differential nature of exposure to treatments. The patients will not have the same duration of taking treatments during the trial and these differences in follow-up times are not recorded in the commonly reported incidence proportion of an adverse event – thus, introducing biases in estimating the adverse events by not accounting for time in treatment or follow-up. One way to address this is by updating the incidence proportions such that the duration of the treatment until an adverse event is factored in as suggested by Allignol et al.⁴⁸ More methods are provided by Unkel et al.⁴⁹ In this trial StratosPHere 2, the drugs are repurposed, therefore, their safety information is already known to an extent. However, further work needs to be done while reporting safety analyses especially for adaptive trials where the follow-up times are bound to vary thereby, adding biases.

6. Discussion

In this article, we present research directly motivated by our involvement in conducting a stratified BRAR trial for a rare disease. We have explored methodological solutions to practical problems that limit the wider adoption of RAR in clinical trials. These specific challenges have been poorly addressed in the literature and continue to be a barrier to implementation. These specific challenges have been poorly addressed in the literature and continue to be a barrier to implementation in practice. In particular, we focus on two implementation issues that play a fundamental role in small-sample trials: (1) How can we minimise the chance of undesirable empirical treatment allocations while reflecting the theoretical allocation probabilities dictated by the RAR design and preserving an adequate level of the design’s operating characteristics; and (2) how should we handle the occurrence of missing data in an RAR trial when decisions regarding adaptations need to be made at the interims?

In addressing (1), we have provided a general Mapping procedure that converts the vector of continuous allocation probabilities into a discrete allocation ratio. We have evaluated two instances of the proposed rule accounting for two levels of granularity for the final discrete set according to a different number of probability thresholds. We would like to emphasise that the evaluation was driven by the design characteristics of our motivating study, StratosPHere 2. To decide the probability thresholds, we introduced an adaptability parameter recording deviations from the balanced allocation of the treatments. The thresholds have been appropriately selected in simulation studies by optimising the trade-off of the adaptability parameter under the Null and Alternative hypotheses. Note that the number of probability thresholds and their respective values can be adjusted with other practical considerations, for example, based on statistical summaries of the allocation distribution or other specific constraints dictated by the unique requirements of the trial in hand. The implemented Mapping rule also resulted in useful improvements in the operating characteristics of the trial, both in terms of frequentist errors and participants allocated to the most promising arms, when compared to the original StratosPHere 2 design.

Additionally, we examined the impact of missing data on the operating characteristics of the trial and the newly introduced adaptability parameters. This analysis suggested the potential to adapt probability thresholds based on the extent of missingness, with a primary goal, for example, of minimising extreme adaptations such as the dropping of a treatment arm during stage 3 under the Null hypothesis (i.e., when all treatment are equivalent). However, for the purpose of the ongoing trial, we have opted to keep the probability thresholds as constant across the different scenarios of missing data. Further, the possibility of imputation is introduced and its impact on the trial design characteristics.

In conclusion, our solutions provide a way to conduct a safe trial (avoiding undesirable allocations and minimising wrong decisions in case of missing data) while serving as a strategy to enhance the frequentist properties of a small-sample trials and also keeping the essence of an RAR design. This procedure of Mapping is also amenable to a straightforward implementation for any type of randomisation system. To illustrate, Sealed Envelope⁵⁰ is a randomisation system commonly used by Clinical Trial Units that randomises patients to the treatment groups by utilising blocked randomisation list(s). In the StratosPHere 2 trial, the unblinded statistician is responsible for generating the final randomisation list(s) that includes permuted blocks containing treatments and allocation ratios that match the Mapped design for BRAR. The unblinded statistician then provides the generated randomisation list(s) for the utilisation in the Sealed Envelope for patient assignments. Our experience with this trial shows that the Mapping procedure can address the challenges of achieving the allocation ratio within small sample sizes while also simplifying the implementation for the randomisation system. Moreover, our findings suggest that relying solely on operating characteristics for handling missing data in an RAR design, especially at the design stage, could be insufficient. Incorporating additional parameters, such as the frequency of adaptations triggered during the trial, could provide deeper insights, ultimately leading to a more informed decision-making.

Finally, we highlighted four statistical problems for post-trial (final) analyses. We would have more missing data patterns, requiring an understanding of the missingness pattern and then the decision for the imputation method, if needed. Then, we raised the question of whether to pool data from different strata or not, which is especially relevant for rare disease early-phase trials. Next, we consider the impact of the level of blinding on the hypothesis testing. As a last point, we emphasise the importance of carefully thinking of how to best report adverse events by factoring in the varying follow-up times in adaptive trials.

Directly motivated by our concrete experience in a stratified RAR trial for a rare disease, our proposals may warrant evaluation in broader contexts. This could include other trial designs or RAR rules. Further extensions could involve larger sample sizes or modifications to the number and size of trial stages. Another area for practical RAR exploration is missing data, particularly non-random missingness. While our analyses primarily focussed on design and practical implementation, this work can be extended to the final analysis stage; crucially, the final analysis stage warrants exploration of additional imputation methods. The overall goal of this article was to describe practical challenges often neglected in technical literature and offer potential solutions for addressing them. We hope this inspires greater synergy between practical and methodological research, which is crucial for translating RAR’s benefits into clinical practice.

Category	Description	Stage 1	Stage 2	Stage 3
Drop	$T_{1}$ can be dropped	Never	Never	2 : 0 : 6
Disfavour	$T_{1}$ can be disfavoured, but not dropped	Never	2 : 1 : 3	2 : 1 : 5
				2 : 2 : 4
Balance	The active arms can be allocated equally	2 : 2 : 2	2 : 2 : 2	2 : 3 : 3
Favour	$T_{1}$ can be favoured, without dropping the other	Never	2 : 3 : 1	2 : 4 : 2
				2 : 5 : 1
Keep	$T_{1}$ can be kept, while dropping the other	Never	Never	2 : 6 : 0

	Stage 2 favour/ disfavour %	Stage 3 favour/ disfavour %	Stage 3 drop/ keep %
Scenario	$H_{0}$	$H_{1}$	$H_{0}$	$H_{1}$	$H_{0}$	$H_{1}$
Case 0	1.43	74.04	35.62	3.22	37.16	92.56
Case 1	63.14	83.73	5.62	1.64	77.80	95.48
Case 2	53.44	77.43	7.88	2.73	83.18	94.49
Case 3	1.32	74.19	9.59	1.84	72.95	95.95
Case 4	1.47	74.47	10.17	2.96	78.40	94.47
Case 5	63.89	83.90	1.08	2.61	90.67	94.50

Footnotes

ORCID iDs

Rajenki Das

Nina Deliu

Sofía S Villar

Funding

The authors received no financial support for the research, authorship, and/or publication of this article: This research was supported by the UK Medical Research Council $M C$ _ $U U$ _ $00002 / 15$ (SSV) and Efficient Study Design $M C_U U_00040 / 03$ (SSV) and Cambridge NIHR Biomedical Research Centre (MRT).

Declaration of conflicting interests

The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: SSV is member of the advisory board for PhaseV. MRT received support from NIHR Cambridge BRC and MRC. He received consulting fees for advisory roles by Jansen, Apollo Therapeutics and Merck, and travel support from GSK and Jansen. He has been member of the ComCov and FluCov data safety monitoring/ advisory boards. This research is independent of these links.

Appendix A

References

May

. Rare-disease researchers pioneer a unique approach to clinical trials. Nat Med 2023: 1884–1886.

Williamson

Jacko

Villar

, et al. A bayesian adaptive design for clinical trials in rare diseases. Comput Stat Data Anal 2017; 113: 136–153.

Bothwell

Greene

Podolsky

, et al. Assessing the gold standard–lessons from the history of RCTs. N Engl J Med 2016; 374: 2175–2181.

Bhatt

Mehta

. Adaptive designs for clinical trials. New Engl J Med 2016; 375: 65–74.

Pallmann

Bedding

Choodari-Oskooei

, et al. Adaptive designs in clinical trials: Why use them, and how to run and report them. BMC Med 2018; 16: 29.

Thompson

. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 1933; 25: 285–294.

Efron

. Forcing a sequential experiment to be balanced. Biometrika 1971; 58: 403–417.

Lachin

. Statistical properties of randomization in clinical trials. Control Clin Trials 1988; 9: 289–311.

Rosenberger

Stallard

Ivanova

, et al. Optimal adaptive designs for binary response trials. Biometrics 2001; 57: 909–913.

10.

FDA. Food and Drug Administration, U.S. Department of Health and Human Services. Adaptive Designs for Clinical Trials of Drugs and Biologics: Guidance for Industry. November 2019 (2019, accessed June, 2020).

11.

Antognini

Vagheggini

Zagoraiou

, et al. A new design strategy for hypothesis testing under response adaptive randomization. Electron J Stat 2018; 12: 2454–2481.

12.

Robertson

Lee

Lápez-Kolkovska

, et al. Response-adaptive randomization in clinical trials: From myths to practical considerations. Stat Sci 2023; 38: 185–208.

13.

Villar

Bowden

Wason

. Multi-armed bandit models for the optimal design of clinical trials: Benefits and challenges. Stat Sci: a Rev J Inst Math Stat 2015; 30: 199.

14.

Bowden

Trippa

. Unbiased estimation for response adaptive clinical trials. Stat Methods Med Res 2017; 26: 2376–2388.

15.

Deliu

Williams

Villar

. Efficient inference without trading-off regret in bandits: An allocation probability test for thompson sampling, 2021.

16.

Deshpande

Mackey

Syrgkanis

, et al. Accurate inference for adaptive linear models. In: Proceedings of the 35th International conference on machine learning, Vol. 80, 2018, pp.1194–1203. PMLR.

17.

Hadad

Hirshberg

Zhan

, et al. Confidence intervals for policy evaluation in adaptive experiments. Proc Natl Acad Sci 2021; 118: e2014602118.

18.

Nogas

Song

, et al. Algorithms for adaptive experiments that trade-off statistical analysis with reward: Combining uniform random assignment and reward maximization, 2022.

19.

Nie

Tian

Taylor

, et al. Why adaptively collected data have negative bias and how to correct for it. In: Proceedings of the 21st International conference on artificial intelligence and statistics (AISTATS 2018), Vol. 84, 2017, pp.1261–1269. PMLR.

20.

Bartlett

Roloff

Cornell

, et al. Extracorporeal circulation in neonatal respiratory failure: A prospective randomized study. Pediatrics 1985; 76: 479–487.

21.

Deliu

Das

May

, et al. StratosPHere 2: Study protocol for a response-adaptive randomised placebo-controlled phase II trial to evaluate hydroxychloroquine and phenylbutyrate in pulmonary arterial hypertension caused by mutations in BMPR2. Trials 2024; 25: 680.

22.

Dunmore

Jones

Toshner

, et al. Approaches to treat pulmonary arterial hypertension by targeting BMPR2: From cell membrane to nucleus. Cardiovasc Res 2021; 117: 2309–2325.

23.

Jones

De Bie

, et al. BMPR-II biomarkers for testing therapeutic efficacy in pulmonary arterial hypertension – novel findings from the StratosPHere 1 study. Under Rev 2024.

24.

Trippa

Lee

Wen

, et al. Bayesian adaptive randomized trial design for patients with recurrent glioblastoma. J Clin Oncol 2012; 30: 3258.

25.

Wason

Trippa

. A comparison of bayesian adaptive randomization and multi-stage designs for multi-arm clinical trials. Stat Med 2014; 33: 2206–2221.

26.

Thall

Wathen

. Practical bayesian adaptive randomisation in clinical trials. Eur J Cancer 2007; 43: 859–866.

27.

Peckham

Brabyn

Cook

, et al. The use of unequal randomisation in clinical trials–an update. Contemp Clin Trials 2015; 45: 113–122.

28.

Dumville

Hahn

Miles

, et al. The use of unequal randomisation ratios in clinical trials: A review. Contemp Clin Trials 2006; 27: 1–12.

29.

Berger

Bour

Carter

, et al. A roadmap to using randomization in clinical trials. BMC Med Res Methodol 2021; 21: 1–24.

30.

van der Pas

. Merged block randomisation: A novel randomisation procedure for small clinical trials. Clin Trials 2019; 16: 246–252.

31.

Sverdlov

Ryeznik

. Implementing unequal randomization in clinical trials with heterogeneous treatment costs. Stat Med 2019; 38: 2905–2927.

32.

Kuznetsova

Tymofyeyev

. Preserving the allocation ratio at every allocation with biased coin randomization and minimization in studies with unequal allocation. Stat Med 2012; 31: 701–723.

33.

Soares

Jeff Wu

. Some restricted randomization rules in sequential designs. Commun Stat-Theory Methods 1983; 12: 2017–2034.

34.

Zhao

Weng

. Block urn design—a new randomization algorithm for sequential trials with two or more treatments and balanced or unbalanced allocation. Contemp Clin Trials 2011; 32: 953–961.

35.

Berger

Ivanova

Deloria Knoll

. Minimizing predictability while retaining balance through the use of less restrictive randomization procedures. Stat Med 2003; 22: 3017–3028.

36.

Kuznetsova

Tymofyeyev

. Brick tunnel randomization for unequal allocation to two or more treatment groups. Stat Med 2011; 30: 812–824.

37.

Tymofyeyev

Rosenberger

. Implementing optimal allocation in sequential binary response experiments. J Am Stat Assoc 2007; 102: 224–234.

38.

Neubauer

Robertson

, et al. Clinical trials using response adaptive randomization. GitHub repository, version 1.0.0, released 2025-01-19. Available at: https://github.com/lukaspinpin/RA-ClinicalTrials, 2025.

39.

Das

Deliu

Toshner

, et al. StratosPHere 2: Statistical analysis plan for a response-adaptive randomised placebo-controlled phase II trial to evaluate hydroxychloroquine and phenylbutyrate in pulmonary arterial hypertension caused by mutations in BMPR2. Trials 2025; 26(1): 243.

40.

Biswas

Rao

. Missing responses in adaptive allocation design. Stat Probab Lett 2004; 70: 59–70.

41.

Park

Siden

Zoratti

, et al. Systematic review of basket trials, umbrella trials, and platform trials: A landscape analysis of master protocols. Trials 2019; 20: 1–10.

42.

Berger

Wang

Shen

. A bayesian approach to subgroup identification. J Biopharm Stat 2014; 24: 110–129.

43.

Berry

. Subgroup analyses. Biometrics 1990; 46: 1227–1230.

44.

Zheng

Wason

. Borrowing of information across patient subgroups in a basket trial based on distributional discrepancy. Biostatistics 2022; 23: 120–135.

45.

Liu

Snavely

. Revisit of test-then-pool methods and some practical considerations. Pharm Stat 2020; 19: 498–517.

46.

FDA. Master protocols for drug and biological product development guidance for industry. Retrieved from https://www.fda.gov/media/174976/download, 2023.

47.

Senn

. A personal view of some controversies in allocating treatment to patients in clinical trials. Stat Med 1995; 14: 2661–2674.

48.

Allignol

Beyersmann

Schmoor

. Statistical issues in the analysis of adverse events in time-to-event data. Pharm Stat 2016; 15: 297–305.

49.

Unkel

Amiri

Benda

, et al. On estimands and the analysis of adverse events in the presence of varying follow-up times within the benefit assessment of therapies. Pharm Stat 2019; 18: 166–183.

50.

Sealed Envelope Ltd. Simple randomisation service. [Online] https://www.sealedenvelope.com/simple-randomiser/v1/ (2024, accessed 23 January 2025).

	Frequentist properties		Empirical allocation
Design	Power	Type-I error	Arm $C$	Arm $T_{1}$	Arm $T_{2}$
Fixed equal randomisation	0.748	0.12	0.34 (0.11)	0.33 (0.11)	0.33 (0.10)
Unrestricted	0.748	0.09	0.23 (0.12)	0.23 (0.12)	0.54 (0.17)
Control Protected	0.788	0.09	0.34 (0.07)	0.20 (0.13)	0.46 (0.13)
StratosPHere 2	0.788	0.10	0.32 (0.07)	0.21 (0.10)	0.47 (0.12)
Permuted block	0.783	0.12	0.30 (0)	0.35 (0)	0.35 (0)
Mapped- $α$	0.795	0.11	0.30 (0)	0.20 (0.11)	0.50 (0.11)
Mapped- $β$	0.789	0.11	0.30 (0)	0.20 (0.11)	0.50 (0.11)

Missingness		Frequentist properties		Empirical allocation
Scenario	# Missing data points (per stage)	Power	Type-I error	Arm $C$	Arm $T_{1}$	Arm $T_{2}$
Case 0	(0, 0, 0)	0.80	0.11	0.30 (0.0)	0.20 (0.11)	0.50 (0.11)
Case 1	(1, 0, 0)	0.76	0.11	0.30 (0.2)	0.20 (0.12)	0.50 (0.12)
Case 2	(2, 0, 0)	0.75	0.10	0.30 (0.3)	0.20 (0.13)	0.50 (0.13)
Case 3	(0, 1, 0)	0.76	0.11	0.30 (0.2)	0.20 (0.12)	0.50 (0.12)
Case 4	(0, 2, 0)	0.74	0.11	0.30 (0.3)	0.20 (0.12)	0.50 (0.13)
Case 5	(1, 1, 0)	0.74	0.10	0.30 (0.4)	0.20 (0.13)	0.50 (0.13)

	Stage 2 favour/disfavour %		Stage 3 favour/disfavour %		Stage 3 drop/Keep %
Scenario	$H_{0}$	$H_{1}$	$H_{0}$	$H_{1}$	$H_{0}$	$H_{1}$
Case 0	1.43	74.04	35.62	3.22	37.16	92.56
Case 1	63.14	83.73	5.62	1.64	77.80	95.48
Case 2	53.44	77.43	7.88	2.73	83.18	94.49
Case 3	1.33	74.82	35.46	3.26	36.98	92.63
Case 4	1.41	73.93	36.52	3.49	37.05	92.60
Case 5	63.61	84.31	5.49	1.56	78.27	95.76

Implementing response-adaptive randomisation in stratified rare-disease trials: Design challenges and practical solutions

Abstract

Keywords

1. Introduction

Primary versus adaptation endpoint

Bayesian response-adaptive randomisation

Implementation of Unmapped versus Mapped designs

3.2. Mapped designs

Definition of the discrete allocation ratio space

Definition of the Mapping functions

Determination of the Mapping thresholds

Operating characteristics of the Mapped versus Unmapped designs

Imputation

Informative missing data

Pooling strata versus independent analysis

Blinded versus veiled analysis

Safety reports

6. Discussion

Footnotes

ORCID iDs

Funding

Declaration of conflicting interests

Appendix A

References