Abstract
Early development of tuberculosis (TB) treatments often separates dose-finding in healthy volunteers (Phase I) and safety and early activity in patients (Phase IIa). The BTZ-043 study (NCT04044001) is a recent study that combined these two objectives in patients in a single study. In this work, we describe and compare three different design options which consider safety and activity endpoints differently in the dose escalation process. In simulations we show that the design that incorporates information about activity together with safety in the dose escalation process allows more precise estimation of the optimal dose and leads to a higher power on average in selecting at least one suitable dose at the end of the study compared to the design that considers only the safety endpoint for dose escalation.
Introduction
Tuberculosis (TB) remains the world’s second leading cause of death from a single infectious agent. 1 The development of novel TB drugs has recently progressed and a new consortium for clinical drug development in TB was founded in 2021. 2 The UNITE4TB Consortium aims to address major challenges in the design and development of clinical trial designs from innovative Phase II designs to large scale Phase III trials.
One of the challenges relates to the design of Phase IIa trials, which aim to investigate the early bactericidal activity (EBA) 3 of an agent. These are the first clinical studies that evaluate whether the drug is active in a small group of patients which are randomised to different doses of the drug. These studies are preceded by healthy volunteer Phase I dose-escalation trials that aim to identify the maximum tolerated dose (MTD) of a drug, which is the dose with a dose-limiting toxicity probability closest to a target toxicity value. Generally, standard early phase trials in TB setting consist of dose-ranging studies where all patients (approximately 10–15 patients per group) receive a specific dose of the experimental or control compounds. Descriptive studies with no inferential statistics or hypothesis testing are normally conducted.3,4 However, several studies have also reported results of statistical tests used to compare activity levels between arms. More details can be found in a recent systematic review that has been conducted to compare the methodological aspects of these trials. 5
In other clinical trial settings (e.g. oncology trials), early phase trials are performed directly in small groups of patients, due to the high toxicities of the drugs, 6 and they can be broadly divided into three classes of designs: the rule-based, the model-based and the model-assisted designs. 7 The first ones assign the participants to doses following pre-specified rules that are based on actual observation of target events in the observed data, without assuming any parametric dose-toxicity model. The model-based designs 8 instead aim to model a dose–toxicity relationship and to update it after observing the data in order to determine the dose for the next cohort of patients. The model-assisted7,9 designs do not impose a particular shape on the dose-toxicity and the decision rules can be pre-tabulated and derived from a statistical model. These model-assisted and model-based designs can improve efficiency and accuracy in finding the maximum tolerated dose compared to the rule-base designs, but sometimes they are not used, as they might be less straightforward to implement compared to the conventional rule-based designs. Early phase oncology trials can be further improved by making efficient use of the collected data in order to combine information coming from the safety assessment and the early activity component of the drug in making dose-escalation and de-escalation decisions. The so-called seamless phase I/II trials, which combine safety and activity assessments in one single protocol, might provide potential to improve dose-finding, despite drug activity may require longer observation windows, meaning that phase I/II designs might prolong trial duration (given the same sample size). 10
In TB, a small number of clinical trials have adopted these novel adaptive designs 11 and only a model-based adaptive design has been implemented for dose finding (ClinicalTrials.gov ID: NCT04044001 12 ) in participants with newly diagnosed pulmonary TB. The aim of this paper is to compare different dose escalation trial designs that use either safety alone or safety and EBA assessment within a single trial to increase the efficiency by trying to define the optimal dose rather than just the maximum tolerated dose. In particular, we will consider the designs which are summarised in Table 1. Our comparison of designs is informed by the recently completed TB trial (NCT04044001, described in Section 2) which evaluated safety and early activity of a novel compound in TB patients. Thus, dose escalation designs that consider both endpoints in a model-based dose escalation framework 13 are explored in this work as no previous work has explored these types of designs in the specific context of a Tuberculosis setting.
Description of the considered dose-escalation designs.
Description of the considered dose-escalation designs.
EBA: early bactericidal activity; MED: minimum effective dose.
The rest of the manuscript continues as follows: in Section 2 the motivating clinical trial is introduced before a detailed explanation of the different designs is provided in Section 3. In Section 4, a numerical evaluation in the setting of the motivating example is provided. We conclude with a discussion in Section 5.
The BTZ-043 study (ClinicalTrials.gov ID: NCT04044001 12 ) is a Phase I/IIa trial that evaluates the safey and early activity of the BTZ-043 compound in persons with TB. The study is divided into two parts.
The objective of the first part (Phase I) was to assess the safety and tolerability of BTZ-043 and was designed using a two-parameter Bayesian logistic regression model-based design (BLRM). 14 Participants were treated for 14 days and screened undergoing safety assessments, sputum sampling and additional pharmacokinetic evaluations. Patients could be assigned to several doses (250, 500, 750, 1000, 1250, 1500, 1750, 2000 mg) of the drug in cohorts of 3. The choice of dose was based on the data of previous patients and a parametric dose-toxicity model (using the MoDEsT web app 15 ) was used to make a recommendation to the Trial Steering Committee (TSC) regarding whether to escalate, de-escalate or continue with the same dose for the next cohort of patients. Safety data for a current cohort of patients was reviewed after all patients in the cohort have completed at least 7 days of dosing. The starting dose was 250 mg and the aim of the adaptive dose-escalation algorithm was to identify the dose that was closest to the safety toxicity target of 10%. Thus, the primary endpoint for this phase was a binary endpoint (yes/no) to indicate whether a dose-limiting toxicity (DLT) within 7 days was observed. Up to 33 patients were considered in this part of the study. At the end of the study, all doses up to 1750 mg were found to be safe. The highest dose (2000 mg) showed a plateau in the achieved exposures and no obvious increase in bacterial killing and thus no further escalation was recommended.
In the second part (Phase IIa) of the trial, the aim was to explore the early activity of the drug. Patients were randomised to receive one of the three pre-specified doses and the standard of care arm with a ratio of 3:3:3:2 and a total of 53 patients. When the trial was started the protocol was mentioning a ‘low’, ‘medium’, and ‘high’ dose of BTZ-043 for this stage. As the trial progressed, a protocol amendment was put through based on early data from Phase I (mainly pharmacokinetics data that was showing the exposure plateau) to target 250, 500 and 1000 mg doses, as it was clear that exposure did not increase beyond 1000 mg. Overnight sputa were collected on several days during the days of the exposure of the treatment. Time to positivity (TTP) in mycobacterial liquid culture was measured for each collected sputum sample. 16 The EBA of different doses was measured by estimating the average slope of change in time to positivity between sputa collected in day 1 and in day 14. Each patient had a 7 days post-treatment follow-up visit.
In the BTZ-043 study, the choice of the doses to continue to the second stage was based on safety and some pharmacokinetics considerations, but the EBA data was not formally incorporated inside the dose escalation process. To be able to make a more accurate choice of the doses and thus potentially improve the efficiency of the overall trial, we will explore different dose-escalation designs that combine safety and EBA information inside the dose escalation process in order to identify an optimal dose – based not only on safety but also on EBA – at the end of the trial. Toxicity and activity will be considered to be non-decreasing functions with increasing doses and several metrics will be compared among those designs.
Methods
In this section, we provide the details of three different design options that will be compared in the simulation study presented in Section 4.
BTZ-043 original study design – BLRM
Consider the setting of the trial described in Section 2, where
A two-parameter logistic regression (BLRM)
14
is used to model the dose–toxicity relationship. The model is defined as
The prior distribution for the parameters in the model,
The next dose
At the end of the first phase of the trial, three pre-specified doses (e.g.
Assume that a patient’s secondary outcome, an activity endpoint, follows a normal distribution, thus
In the original study design, there was no formal statistical test to compare each active dose to the control dose. However, in this work, in order to estimate the minimum effective dose (MED), that is the lowest dose that shows an improvement in terms of activity compared to the control one, we consider a hierarchical step-down Dunnett test. 20 The rejection procedure is presented in Section 3.5.
Design with dual-endpoint dose escalation – BDEM
The first design to be compared to the BTZ-043 original study consists of two parts. In the first part, differently from the original study design, the dose escalation is performed considering a dual-endpoint, which combines information about the safety of the drug together with the early activity information at a given dose. The evaluation of both safety and EBA in the dose-escalation process might result in a more accurate choice of the optimal dose with the best safety/activity trade-off. However, in terms of trial duration, this design might require longer time (up to 7 days more per participant as 7 days of treatment are required for the safety evaluation and 14 days of collection of the sputum and then 6–8 weeks for cultures to grow for the activity endpoint) in order to evaluate both safety and EBA endpoints for each cohort of patients. The second part of the study is the same as per the original BTZ-043 study, which is described in Section 3.1.
In order to incorporate the information of early activity in the dose escalation process, a dual-endpoint is considered here. Only the toxicity endpoint is modelled using a Bayesian logistic regression model (BLRM) as described in Section 3.1. This consists on using a two-parameter logistic regression model to describe the dose–toxicity relationship. The activity endpoint is modelled using a flexible Gaussian random walk. Thus, the full model comprises a BLRM for toxicity and a continuous response model for activity and we refer to this as ‘Bayesian dual-endpoint model’ (BDEM).
To describe the dose–activity relationship, a first order random-walk structure is used21,22 to model the mean activity outcomes. As shown in Yeung et al.,
22
this model allows to have a flexible non-parametric activity model so that non-monotone dose–activity relationship curves can be captured. Thus,
The rule used in this design to choose the next dose
At the end of the first phase of the trial, three pre-specified doses (e.g.
Design with dual-endpoint dose escalation and different choice of doses for Phase II - BDEM_T
The second alternative design consists of two parts. In the first part, differently from the original study design, the dose escalation is performed considering a dual-endpoint, which combines information about the safety of the drug together with the early activity information at a given dose. This part follows the same procedure as described in Section 3.2 and the idea is to efficiently make use of both safety and EBA information in order to find an optimal dose.
At the end of the Phase I part, a different choice of doses, compared to the original study design, are allowed to continue to the Phase II. This modification of the selection of the doses is explored in order to understand whether a more accurate choice of the doses can be made at this point of the trial and thus it leads to a better estimate of the optimal dose at the end of the trial. The three doses that are considered to be safe and for which the posterior estimate of the activity mean value is between the two target values at the end of Phase I are selected. This means finding the doses
Seamless Phase I/II design – BDEM_O
The third design consists of a single study where the dose escalation is conducted for all patients from Phase I and Phase II (
Hypothesis testing procedures for Phase II
As mentioned above, in the original study design there was no formal statistical test to compare each active dose to the control dose. However, in this work, in order to estimate the minimum effective dose (MED), which is the lowest dose that shows an improvement in terms of activity compared to the control one, we consider a hierarchical step-down Dunnett test.
20
All data accumulated until the end of the study (that is Phase I and Phase II patient data, with a maximum total of
Firstly, the highest dose (
Additional rejection procedures for the identification of the MED can be implemented in this setting. In this work, we will explore also two other procedures. These additional rejection procedures will be explored in the simulation study only for a specific design as the potential differences and conclusions can also be applied to the other proposed designs. We choose to apply those to the design described in Section 3.3. The so called ‘Dual Lower Test’ (BDEM_TDL) refers to the same rejection procedure as described above, but with an additional constraint that, at the end of the trial, the posterior estimate of the activity mean value of the dose compared to the control arm is above
The second is called ‘Dual Lower and Upper Test’(BDEM_TDUL) and it consists on the same rejection procedure as ‘BDEM_T1’ but with an additional constraint that, at the end of the trial, the posterior mean estimate of the activity level of the dose compared to the control arm is above
Numerical evaluation
In this section, we evaluate the operating characteristics of the designs described above under different toxicity and activity scenarios. The numerical results are found using R 23 and the package crmPack24,25 which makes use of the package rjags 26 to estimate the posterior distributions using MCMC samples. A thousand of replicate simulations are provided and the code to reproduce the simulations is provided at the link: https://github.com/aspapercode/evalph12design.git
Setting
We consider the following range of doses – 250, 500, 750, 1000, 1250, 1500, 1750 mg – for the escalation process. The highest dose of 2000 mg is not considered as in the BTZ-043 study a further escalation to the maximum dose was not recommended at the end of the process due to a plateau in the achieved exposures and no obvious increase in bacterial killing.
Patients are recruited in cohorts of 3 and up to
The prior parameters
For the toxicity scenarios, we consider three different cases. Firstly, the same setting as in the original BTZ-043 study where all doses up to 2000 mg were expected to be safe is considered. Secondly, the case where the MTD coincides with 1000 mg is considered and thirdly, the case where 1000 mg is safe but the MTD is 1500 mg. These three different toxicity scenarios for Phase I are summarised on the left panel of Figure 1 and these are referred as: ‘As Observed’: this corresponds to the scenario where all doses are considered as safe – the probability of toxicity is below the target value for all doses. Thus, the maximum tolerated dose in this setting is 2000 mg. ‘More Toxic’: in this scenario the maximum tolerated dose is 1500 mg. ‘Very Toxic’: in this scenario the maximum tolerated dose is 1000 mg.

(Left panel) Probabilities of toxicity for the considered range of doses. (Right panel) Different activity scenarios for the considered range of doses.
In regards to the activity scenarios, in order to generate them, we assume that the increase in slope in log(TTP) at day 14 follows a linear relationship with the dose level. The activity–dose relationship is informed by the BTZ-043 study and the linear model is estimated using the summary data available in the Supplementary Material of the BTZ-043 study.
12
The considered linear model is as follows:
Several activity scenarios are considered and summarised on the right panel of Figure 1. These are referred as: ‘Flat’: the increase in slope in log(TTP) at day 14 is equal to ‘VeryLow’, ‘Low’, ‘Moderate’, ‘High’: the increase in slope in log(TTP) at day 14 is equal to ‘Null’: the increase in slope is equal to zero for all active doses and the control dose.
For Phase II, the activity data are generated with standard deviation equal to Probability of finding any dose in each activity scenario and each design for the ‘As Observed’ Toxicity scenario. The probabilities of finding the safe dose with activity level equal to 
In the next section, we summarise how the prior parameters are chosen for the dose-activity model.
The prior parameters used for the dual-endpoint escalation model can be selected from a range of plausible initial guesses. These are the variance of the random walk
Several values for these parameters were tested, in particular
The results of the calibration procedure are summarised in Table A.3. The maximum value for the geometric mean of proportions of correct selections is found to be 0.47 and thus, the following parameters were chosen in this work:
Metrics and operating characteristics
The following metrics are compared among the different designs: the estimated maximum tolerated dose (MTD) at the end of Phase I or end of the study for the seamless Phase I/II design; the estimated optimal dose – that is the minimum dose that satisfies the safety and activity constraints, thus the minimum dose the estimated minimum effective dose (MED) at the end of Phase II or end of the study; the probability of finding at least one dose, all doses (with the condition that the MED is above the lower activity level the average power, expressed as the average probability across all non-null activity scenarios of finding at least one dose, all doses or any dose at the end of Phase II.
Numerical results
In this section, we provide the results of the simulations for each activity scenario and for the toxicity scenario where all doses are safe to reflect the same setting as in the original BTZ-043 study. Similar results and patterns are observed under the other two toxicity scenarios (‘More’ and ‘Very’ toxic) – see results in Tables A.1, A.2 and Figures A.1, A.2 in the Appendix.
Figure 2 reports the probability of finding any dose at the end of Phase II for each activity scenario and each method given the toxicity scenario where all doses are considered safe. It also provides the average power across scenarios for each design. Table 2 provides a detailed summary of all other operating characteristics of the approaches as described in Section 4.2.
Probability of finding at least one dose (AtLeastOne), probability of finding all doses (All), probability of finding the dose with biomaker level at
(TargetBmk), minimum effective dose (MED), minimum tolerated dose (MTD), optimal dose at the end of Phase I in terms of activity and safety (OptimalDose) and average power across non-null activity scenarios for each design and activity scenarios under the ‘As Observed’ toxicity scenario. In the first design also the true MTD and MED are reported for each activity scenario.
Probability of finding at least one dose (AtLeastOne), probability of finding all doses (All), probability of finding the dose with biomaker level at
It can be observed that the BLRM design, on average and for each activity scenario, provides a 38% to 60% chance of finding the correct dose at the end of Phase II, except for the ‘VeryLow’ activity scenario, where per design, there is no chance to correctly find the dose of 1500 mg. In contrast, for the BDEM design, the chance of finding the correct dose at the end of Phase II varies from 86% to 99% for each activity scenario, except for the ‘VeryLow’ activity scenario where, as before, there is no chance to correctly find the dose of 1500 mg. For the BDEM_T1 design, on average the chance of finding the correct dose at the end of Phase II varies from 18% to 77% for each activity scenario. Here, the probability of finding the dose 1500 mg in the ‘VeryLow’ activity scenario is 18%. Finally, for the seamless design (BDEM_O) the probability of finding the correct dose ranges from around 32% to 79%.
On average across all activity scenarios, the design that incorporates the information about activity in the dose escalation process (BDEM) provides a gain of around 13%, 21% and 33% in the probability of finding at least one dose, all doses and the correct dose, respectively compared to the design that only uses safety data in the escalation process (BLRM). The design that incorporates the information about activity in the dose escalation process, but allows three doses that are closer to the activity level to go to Phase II (BDEM_T1), provides a gain of around 9%, 42% in the probability of finding at least one dose, all doses, respectively compared to the BDEM design. However, the probability of finding the correct dose is decreased by 16% compared to the BDEM design. The design that considers more constraints in the rejection procedure (BDEM_T with Double Lower or Lower and Upper test) has similar operating characteristics as for the BDEM_T with no constraints, but on average slightly lower power across the activity scenarios. The seamless design is the one that leads to the highest probability of finding at least one dose - 30% increase compared to the BLRM design.
In terms of estimation of the MTD, the BLRM underestimates it for all activity scenarios (the MTD is estimated to be around 1500 mg instead of 2000 mg). All the other designs that consider a dual-endpoint in the escalation process provide the same estimates of the MTD for each activity scenario. This is always estimated to be below the true MTD of 2000 mg and the estimate is lower than the estimated MTD in the BLRM design as additional constraints in the escalation process (EBA information is included for the choice of the doses that are selected for the next cohort of patients) need here to be taken into account. The estimation of the optimal dose, that is the minimum dose that is estimated to be safe and active at the end of Phase I, is quite accurate for each dual-endpoint escalation procedure. In terms of estimate of the minimum effective dose (MED), the BLRM and BDEM provide a less accurate estimation compared to the BDEM_T design. The seamless design is the one that provides the most accurate estimate of the optimal dose in each activity scenario despite it underestimates the MED.
Overall, the designs that incorporate information about the safety and the early activity of a compound provide higher chances (up to 33%) of finding the correct dose at the end of the escalation process and they do provide an estimate of the optimal dose (that is the one that satisfies safety and activity constraints).
The aim of this work was to compare different dose escalation designs in the setting of a current TB trial in order to potentially increase the efficiency of the overall study by trying to define the optimal dose rather than just the maximum tolerated dose.
It has been shown that the designs that incorporate the information about the EBA outcome in the dose escalation process allow us to obtain an accurate estimate of the optimal dose and on average higher chances to select at least one suitable dose at the end of the study compared to the design that considers only the safety endpoint for the dose escalation. Overall, it has been found that the seamless Phase I/II trial is the one that leads to the most accurate estimation of the optimal dose despite it underestimates the minimum effective dose. The design that incorporates information about the EBA in the dose escalation process and allows the selection of the three doses that are closer to the activity target level (BDEM_T) shows similar probabilities to select the correct doses to the seamless design.
In this work, we have investigated and compared different designs in the setting of a specific TB study. However, further evaluations might be necessary in other TB trial settings. The exploration of these novel methodologies in this disease area is encouraged, as these novel designs might support better decision-making on optimal doses to be tested in later phases of a novel regimen development. These approaches, which combine safety and early activity information, can be efficient in the drug development process as they allow to find an optimal dose and the exploration of doses, which might be safe but provide less activity, can be reduced. They do make efficient use of all available data for decision-making and they allow to gain more knowledge and information on the range of doses that are closer to the target. On the other hand, however, these designs might result on a significant amount of additional time that is required in order to observe both endpoints, as incorporating activity into the decision process adds at least 6 to 8 weeks before each decision (e.g. for the specific trial setting explored here, these dual-endpoint approaches might impact the duration of the whole trial in roughly additional 2 years compared to the original study). Thus, trial specific considerations might be necessary in order to fully evaluate potential benefits these designs can provide.
Footnotes
Abbreviations
Authors’ contributions
All authors have directly participated in the planning and execution of the presented work.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article. This project has received funding from the Innovative Medicines Initiative 2 Joint Undertaking (JU) under grant agreement No 101007873. The JU receives support from the European Union’s Horizon 2020 research and innovation programme and EFPIA, Deutsches Zentrum für Infektionsforschung e. V. (DZIF), and Ludwig-Maximilians-Universität München (LMU). EFPIA/AP contribute to 50% of funding, whereas the contribution of DZIF and the LMU University Hospital Munich has been granted by the German Federal Ministry of Education and Research. P Mozgunov’s research is supported by the National Institute for Health and Care Research (NIHR Advanced Fellowship, Dr Pavel Mozgunov, NIHR300576). The views expressed in this publication are those of the authors and not necessarily those of the NHS, the National Institute for Health and Care Research or the Department of Health and Social Care (DHCS). Additional funding was received from the UK Medical Research Council (MC_UU_00002/14, MC_UU_00040/03). For the purpose of open access, the author has applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising.
Appendix
Geometric mean of correct selections across scenarios at the end of Phase II using the design described in Section 3.3 for different values of
| Geometric mean |
|
|
|---|---|---|
| 0 | 0.001 | 0.001 |
| 0 | 0.001 | 0.01 |
| 0 | 0.001 | 0.1 |
| 0 | 0.001 | 1 |
| 0 | 0.001 | |
| 0 | 0.001 | |
| 0 | 0.001 | |
| 0 | 0.001 | 2 |
| 0 | 0.01 | 0.001 |
| 0 | 0.01 | 0.01 |
| 0 | 0.01 | 0.1 |
| 0 | 0.01 | 1 |
| 0 | 0.01 | |
| 0 | 0.01 | |
| 0 | 0.01 | |
| 0 | 0.01 | 2 |
| 0.27 | 0.1 | 0.001 |
| 0.26 | 0.1 | 0.01 |
| 0.25 | 0.1 | 0.1 |
| 0.26 | 0.1 | 1 |
| 0.00 | 0.1 | |
| 0.00 | 0.1 | |
| 0.00 | 0.1 | |
| 0.25 | 0.1 | 2 |
| 0.28 | 1 | 0.001 |
| 0.19 | 1 | 0.01 |
| 0.20 | 1 | 0.1 |
| 0.16 | 1 | 1 |
| 0.26 | 1 | |
| 0.11 | 1 | |
| 0.07 | 1 | |
| 0.16 | 1 | 2 |
| 0.30 | 0.001 | |
| 0.00 | 0.01 | |
| 0.00 | 0.1 | |
| 0.00 | 1 | |
| 0.23 | ||
| 0.17 | ||
| 0.33 | ||
| 0.00 | 2 | |
|
|
|
|
| 0.45 | 0.01 | |
| 0.45 | 0.1 | |
| 0.45 | 1 | |
| 0.44 | ||
| 0.40 | ||
| 0.41 | ||
| 0.46 | 2 | |
| 0.36 | 0.001 | |
| 0.37 | 0.01 | |
| 0.37 | 0.1 | |
| 0.37 | 1 | |
| 0.35 | ||
| 0.33 | ||
| 0.32 | ||
| 0.36 | 2 | |
| 0.27 | 2 | 0.001 |
| 0.19 | 2 | 0.01 |
| 0.17 | 2 | 0.1 |
| 0.12 | 2 | 1 |
| 0.25 | 2 | |
| 0.12 | 2 | |
| 0.08 | 2 | |
| 0.19 | 2 | 2 |
The max value is highlighted in
