Abstract
Pilot studies and other small clinical trials are often conducted but serve a variety of purposes and there is little consensus on their design. One paradigm that has been suggested for the design of such studies is Bayesian decision theory. In this article, we review the literature with the aim of summarizing current methodological developments in this area. We find that decision-theoretic methods have been applied to the design of small clinical trials in a number of areas. We divide our discussion of published methods into those for trials conducted in a single stage, those for multi-stage trials in which decisions are made through the course of the trial at a number of interim analyses, and those that attempt to design a series of clinical trials or a drug development programme. In all three cases, a number of methods have been proposed, depending on the decision maker’s perspective being considered and the details of utility functions that are used to construct the optimal design.
Keywords
1 Introduction
There is little consensus on the way in which pilot studies and other small exploratory clinical trials should be designed, with a relatively wide range of approaches proposed.1–4 In part, this reflects the range of objectives for such studies. Pilot studies are usually designed to explore and evaluate the efficacy and safety of a new/experimental treatment or new combination/regime of treatments, perhaps to provide some evidence of response in order to justify the financial input required for larger-scale studies, though they may also address specific additional or alternative research questions. Generally, the sample size required in pilot studies is quite small, as they commonly precede a larger definitive clinical trial. The terminology of such small exploratory clinical studies reflects the range of objectives, with small trials that are conducted prior to a major definitive study referred to as ‘pilot studies’, ‘feasibility studies’ or – if the following study is a ‘phase III’ trial – ‘phase II clinical trials’. While a phase II clinical trial can be in some cases relatively large, it usually has key objectives to decide whether or not and how to conduct the following phase III trial(s). The different terminology depends sometimes on, often fairly subtle, differences in the aim of the study, but also on the setting in which the study is being conducted, with different names being used for trials with essentially the same purpose conducted by pharmaceutical companies and in the public sector, for example.
The variety in design approaches in pilot studies and other small early phase trials is in contrast to the setting of confirmatory or phase III clinical trials where almost all trials are designed using a frequentist paradigm so as to control probabilities of a type I error, corresponding in this setting to claiming an experimental treatment is better than the control treatment when it is not, and of a type II error, that is failing to claim that the experimental treatment is better when it is by some specified magnitude. The type I error rate is usually, although only by convention, set at a (two-sided) level of 0.05 and the type II error rate is conventionally set at 0.10 or 0.20, corresponding to a power of 0.9 or 0.8, respectively.5,6
The frequentist approach with conventionally used error rates for a typical effect size may lead to the relatively large sample sizes associated with confirmatory studies, so may not be appropriate when the sample size is smaller. This is the case in pilot studies and early phase trials. Small trials may also be unavoidable, for example, in the setting of trials in small population groups such as patients suffering from a rare disease, patient groups where patient recruitment is difficult such as children and other vulnerable populations or in a specifically targeted subpopulation. In these settings, then, either the frequentist approach can be applied with error rates relaxed, or some other method must be used to design the trial.7–10
A number of novel approaches have been proposed for pilot studies and other small clinical trials. As the sample sizes are small, often with recruitment occurring relatively slowly, sometimes in a single centre, multi-stage designs with decisions made at one or more interim analyses are often an attractive option, as are multi-arm trials, widening the range of design choices further. Alternatives to the frequentist paradigm that have been proposed for the design of such small clinical trials include the Bayesian approach and the decision-theoretic approach in which, as described in the next section, the consequences of decisions are explicitly modelled. The latter might seem particularly appropriate in pilot studies and early phase clinical trials as the outcome from such trials is often a relatively simple decision that is within the control of the clinical team conducting the trial, such as the decision whether or not to conduct further clinical research in the area, sometimes called ‘Go/No-Go’ decisions.
The aim of this article was to review methods for the design of small trials and pilot studies where the primary aim is to explore and evaluate treatment efficacy based on the Bayesian decision-theoretic framework. In order to provide as comprehensive as possible a review of the current literature in this area, we used systematic reviewing methodology to identify relevant published work in the area. In addition, human pharmacology studies that aim to assess toxicity, to explore drug metabolism and drug interactions, or to describe the pharmacokinetics and pharmacodynamics are usually designed as small studies. These studies are sometimes known as ‘phase I’ trials. Although the sizes of these studies tend to be small, their objectives are not to explore and evaluate the efficacy of a new/experimental or new combination/regime of treatments. As such, these studies are excluded from the review of this paper.
Pilot or feasibility studies may also be used to test practical aspects intended to be used in a later study such as drug supply, acceptability of randomization or visit schedules and so on. Such studies, which may closely resemble a larger study conducted in miniature are, however, outside the scope of this paper.
Following this introduction, the next section of the article gives a very brief overview of the decision-theoretic approach as it may be applied to clinical trial design. The third, and most substantial, section of the article then describes the review method used and gives details of publications found. Papers are classified into a number of groups according to the specific type of design being considered, the types of utility function proposed and the perspective of the decision-maker (commercial, regulatory/societal or patient). The aim is to allow the reader to rapidly identify key references in each particular area. The paper ends with a brief discussion of the scope of the review, suitability of the decision-theoretic paradigm for pilot study design and suggestion of areas where further research work might be most appropriate.
2 The decision-theoretic approach to clinical trial design
Decision theory is a statistical technique by which the problem of decision-making under uncertainty may be formalized. The method enables an optimal decision to be made between a number of possible actions on the basis of the consequences of each action under all possible scenarios.11–13
In the setting of a clinical trial, we may wish to decide between a number of possible design options. Prior to the start of the trial, this might entail a choice of the clinical trial sample size. During the trial at an interim analysis, a decision might be taken as to whether or not to terminate the trial or to modify the trial conduct in some way. At the end of the trial, this might correspond to a decision of whether or not to proceed with further trials, or in a multi-arm study, to choose an experimental treatment for further evaluation.
Denote by
= {
Uncertainty regarding the parameter
The utility functions should express the values of the consequences of possible actions from the perspective of the decision-maker. These could be monetary loss or reward, which is measurable on an existing scale, or could also express consequences that have no immediately obvious numerical scale of measurement, such as treatment success (patient experienced a positive response) or treatment satisfaction. In the latter cases, it can be very difficult to assign the numerical values required to specify the utility function to the qualitative values, as considered in the discussion section below.
Similarly, the prior distribution should reflect prior belief regarding the parameters of the distribution of the responses. The source of information for the prior distribution may be obtained from data from previous similar trials, elicitation of expert opinion or, as a conjugate prior, for computational convenience.
Once utility functions have been specified for all possible actions, the optimal trial design can often be obtained by working in a reverse time order using a method known as ‘dynamic programming’ or ‘backward induction’. For example, suppose we wish to obtain the optimal sample size for a single stage trial with a number of possible actions available at the end of the trial. First, all possible actions at the end of the trial are considered and, for given possible observed data
Although relatively easily described as aforesaid, the application of decision theory in a clinical trial setting can present a number of challenges: in specification of appropriate prior distributions and, perhaps more especially, in specification of utility functions. When considering decisions made during or prior to the start of a trial, calculations of how the utility functions should incorporate the impact of data as yet unobserved should also be taken into consideration. The required computation can also be challenging, particularly in the case of multi-stage trials.15,16
3 Review of decision-theoretic approaches to pilot studies and small clinical trials
3.1 Literature review search strategy and results
Articles by types of design, utility and perspective of decision-makers. a
The total number of articles in the cells exceed 67 as some described more than one design or perspective.

Flow diagram of articles identified, excluded and included for review.
Twenty-seven articles18–44 specifically described methods in pilot or phase II settings. The others14,45–83 did not describe methods specific to small clinical trials or pilot studies, but would nevertheless be appropriate in this setting.
A study of the articles identified indicated that the design problems considered fell into three broad categories, with different approaches used for each. The simplest type of design considered was that for clinical trials conducted in a single stage so that decisions considered are those taken at the end of the trial and those taken regarding trial design prior to the start of the trial essentially using the method outlined above. The second type of design considered was that for multi-stage trials, so that decisions taken through the course of the trial, at a number of interim analyses, are also considered. The third type of design considered was that concerning multi-arm trials or the simultaneous optimal design of a series of clinical trials. Within each of these three design types, there was a variety of approaches corresponding to the viewpoint of the decision-maker and the complexity of the utility functions considered, ranging from approaches in which a relatively simple utility function aims to reflect the number of patients successfully treated, to more complex utility functions based on a detailed elicitation of the costs and consequences of a range of possible outcomes from the perspective of a particular decision-maker. Table 1 shows the identified articles classified by types of design, utility function and perspective of the decision-maker. The literature in each of the areas identified is described in the following subsections. One paper 42 fell outside of the three categories just described, being concerned with the design of enrichment studies. This paper is also discussed further.
3.2 Single-stage designs
The simplest type of study design considered is the single-stage design in which no analysis of the data is conducted until the study is completed. For a trial comparing an experimental treatment with a control (a two-arm trial) or a historical control (a single-arm trial), the statistical design choices are relatively limited; the main one being the choice of sample size. Authors whose works considered this type of design include Brunier and Whitehead, 19 Chen and Beckman, 20 Chen et al., 21 Claxton and Posnett, 52 Claxton and Thompson, 53 Eckermann and Willan, 54 Gittins and Pezeshk,55–57 Halpern et al., 58 Hornberger and Eghtesady, 60 Kikuchi and Gittins,63,64 Kikuchi et al., 65 Lindley, 14 Maroufy et al., 68 Patel and Ankolekar, 71 Pezeshk and Gittins, 73 Pezeshk et al.,74,75 Staquet and Sylvester, 38 Sylvester, 39 Sylvester and Staquet, 40 Willan, 79 Willan and Eckermann80,81 and Willan and Pinto. 83
At the end of such a trial, a decision is made from a possible set of actions which typically consists of whether or not to accept the experimental treatment for further study. The utility function may be written simply as a function of the cost of sampling, which is independent of
Designs that considered simple utility functions did not generally specify the perspective of the decision-maker,14,38–40 whereas more realistic utility functions are often explicitly based on a commercial,21,55–58,65,68,71,73,75,79–81 regulatory54,56,57,73,74,81 or societal perspective.52,53,55,60,63,64,83 Exceptions are the work of Chen and Beckman 20 where the utility function is a simple function based on a commercial perspective that controls for type I and II error rates under the constraint of limited sample size, and Brunier and Whitehead 19 where the utility function is a more realistic one that incorporates costs that may be incurred during the trial but there was no specification of the perspective of the decision-maker.
Possible actions for designs based on a societal perspective include whether to delay the decision-making and start a new trial, to adopt the experimental treatment (granting licence or reimbursement of costs of the treatment in general clinical use) and start a new trial to gather more information or to adopt the experimental treatment without starting a new trial (i.e. requiring no further information).54,74,81
3.3 Multi-stage designs
A more complex decision-making process arises in trials that are conducted in a number of stages with interim analyses conducted at the end of each. The majority of clinical trials’ design methods based on decision theory have considered this setting, usually with the possible actions at the end of each stage taken to be those corresponding to stopping the trial for futility, stopping with a positive result or continuing to further stage(s). Specific articles include Banerjee and Tsiatis, 18 Berry and Ho, 45 Chen and Willan, 46 Chen and Smith, 22 Cheng and Berry, 47 Cheng and Shen,48,49 Cheng et al., 50 Chernoff and Petkau, 51 Ding et al., 23 Heitjan et al., 59 Jennison and Turnbull, 61 Jiang et al., 62 Jung et al., 25 Lewis and Berry, 66 Lewis et al., 67 Mehta and Patel, 69 Nixon et al., 27 Orawo and Christen, 70 Palmer, 29 Rossell et al., 30 Stallard,31,32 Stallard et al., 37 Wang, 76 Wathen and Christen, 77 Wathen and Thall, 78 Willan and Kowgier, 82 Zhao et al. 43 and Zhao and Woodworth. 44
A multi-stage trial may be designed based on consideration of a fixed and known maximum number of patients that can be included in the trial. In rare diseases settings, one may be able to estimate the number of cases eligible for clinical trials relatively easily, or it can be tied to budget allocation. In such scenarios, at the final stage when all patients have been recruited, there are two terminal actions to choose; stop and accept the treatment for further study or stop and reject the treatment from further study.
Some designs are aimed to optimize the patient allocation for a fixed and known number of future patients, known as the ‘patient horizon’,
The backward induction computation becomes very intensive as the number of stages increases. Orawo and Christen, 70 and Wathen and Christen 77 have considered using an approximation method rather than computing the exact value. For designs that do not assume a fixed known patient horizon, the optimal sequential design can sometimes be computed by forward simulation and constrained backward induction.23,30,43,78 Heitjan et al. consider a two-stage design, 59 reducing the computational burden considerably, and use direct numerical optimization rather than the backward induction approach to obtain optimal designs in this case, while Jung et al. 25 adopt a similar approach for two-stage single-arm cancer trials, comparing their approach with the commonly used design due to Simon. 84
Just as we had a range of single stage designs considered above, authors have taken a number of approaches to the design of multi-stage trials. Some authors have considered simple utility functions (costs of making incorrect decisions and cost of sampling) in their designs.18,25,29,43,44,47–51,59,61,62,66,67,76–78 Cheng and Shen 49 related the utility function parameters to frequentist error rates, while Nixon et al. 27 related it to the expected prior probability of success which is sometimes known as assurance, a term introduced by O’Hagan and Stevens. 85 Frequentist error rates are also considered by Jennison and Turnbull 61 who use backward induction to obtain group sequential designs that are optimal in that the expected sample size is minimized subject to the error rate requirements. Others have considered more realistic utility functions.22,23,30–32,37,43,45,46,69,70,82 In this latter case, the utility function is usually constructed from a commercial perspective.
3.4 Enrichment designs
Trippa et al. proposed a decision-theoretic approach to an enrichment design. 42 In this two-stage design, all patients enrolled to stage 1 receive the same experimental treatment. The data from these patients are then used to optimally identify the population of patients to be used in the main, second, stage of the trial, in which patients are randomized to receive either the experimental or the control treatment. In the design proposed by Trippa et al., the utility function encompasses the benefit that will be received by future patients if the experimental treatment is recommended for further study in a phase III setting in the population identified, the costs incurred for conducting the phase II and III trials, and the duration of treatment in stage 1 which influences the population of patients for whom treatment is successful that will be used in the second stage.
Although represented by a single paper in our review, the area of enrichment designs is one of considerable recent statistical interest (see, for example, Graf et al., 86 Simon and Simon 87 and Wang et al. 88 ), suggesting that this is an area in which new work on decision-theoretic approaches might be anticipated.
3.5 Designs for multi-arm trials, programmes of studies or a series of trials
Some articles extend the multi-stage designs for single-arm trials or two-arm comparative trials to seek optimal multi-arm designs. Such a problem is considered by Chen and Beckman, 20 Lai et al., 26 Palmer, 29 Patel and Ankolekar, 71 Patel et al., 72 Stallard et al. 35 and Thall et al. 41 This introduces, in addition to possible actions corresponding to stopping or continuing the trial, the option of dropping one or more treatment arms at an interim analysis, so that the decision process can become increasingly complex.
In multi-arm trials,
In the designs considered earlier, patients are generally considered in groups, with decisions made at interim analyses once the data from each group of patients has been observed. If patients are considered one at a time, the problem becomes one of optimally allocating treatments to each patient. Such an approach is considered in a two-arm study by, for example, Jiang et al. 62 This problem is closely related to the multi-arm bandit problem. In some settings, the different treatments being compared in a multi-arm design may be different doses of the same drug. In this case, a parametric dose–response model may be assumed. These two settings are considered briefly in the Discussion section.
The objective of the designs considered by Patel and Ankolekar 71 and Patel et al. 72 is to maximize the expected profit from a portfolio of treatments while incurring the costs of running the trials within the given budget. An optimal size is obtained for each treatment (each trial) and the trials may run concurrently.
Some articles use decision theory methods to design not a single study but a series of studies, which may themselves employ single stage or multi-stage designs. The problem of decision-making at the end of one trial in a series of potential trials is rather like that at an interim analysis in a multi-stage trial, so that the methods often build on those described above. In this setting, actions corresponding to moving on from one trial to another need to be considered. Articles considering designs for this setting include Chen and Beckman,
20
Hee and Stallard,
24
Pallay,
28
Stallard33,34 and Stallard and Thall.
36
The trials, usually with one experimental treatment per trial (either a single-arm design or two-arm comparison with a control), run sequentially, so that a decision made in one trial can affect possible future trials. In a series of single stage trials, suppose
The full backward induction approach can be very challenging in this case, so authors have generally either considered small sample sizes, 36 or sought simpler algorithms or asymptotic results to give approximately optimal designs.47,50
The expected utility for a series of sequential trials may also be computed via a backward induction algorithm similar to that described above. For a series of multi-stage trials, the backward induction is used within each trial as well as for the series of sequential trials. Almost all assumed a commercial perspective with realistic utility functions.24,28,33,36 Both Chen and Beckman 20 and Stallard, 34 on the other hand, assumed a commercial perspective with simple utility functions.
4 Discussion
The aim of this article was to review the literature on methods for pilot studies and small clinical trials that are based on the use of Bayesian decision theory. Methods have been published for single-stage and multi-stage clinical trials as well as for multi-arm trials or series of trials, with utility functions based on a number of different decision-makers’ perspectives. Most methods have focussed on a decision regarding the sample size of the trial, though in general other features of the design could be chosen in a similar way. Specific examples that have been considered include dropping of arms in a multi-arm trial and selection of the population in the enrichment design.
It is inevitable that when writing an article such as this, decisions must be made regarding the scope of the review. Within the limit of Bayesian decision-theoretic methods, our intention has been to keep the scope fairly wide, including discussion of methods for any clinical trial design that might be appropriate for a small trial or pilot study with an efficacy endpoint. One exclusion has been methods for phase I or dose-finding studies, where the main concern is usually a safety or toxicity endpoint. Although Bayesian and decision-theoretic methods are relatively common in this setting, the different endpoint, use of sequential designs with very small groups, often making decisions after each subject, and incorporation of dose–response information (so that data from one arm can lead to inference regarding other arms) mean that the methods proposed are rather different to those we have considered, and are less suited to other small trials. Readers interested in this area are directed to the work by Cheung 89 and Simes. 90 One of the few published applications of the decision-theoretic methodology is in the phase I oncology setting. 91 Our choice of search terms also excluded literature on the multi-arm bandit problem, identifying one paper 76 applying this methodology specifically to clinical trial design with the intention of optimally allocating patients one at a time to treatments in a multi-arm study. There is a relatively large body of literature on this problem in applied probability journals which, although considered from a more generally viewpoint, might be relevant to clinical trials of this type.92–94
A challenge, in any Bayesian methodology, is the specification of a prior distribution. Most of the papers identified in the review used conjugate prior distributions to facilitate mathematical derivation.14,22–24,26–28,30–37,41–51,54–58,60,62–70,73–83 In most cases, this involved using a beta prior distribution for a Bernoulli distribution, or in some cases taking a two-point prior corresponding to an experimental treatment that is either effective or ineffective.18,20,21,25,29,38–40,52,53,59,61,72 In some cases, the prior distribution may be for a vector of unknown parameters. Some examples are normal distribution with both unknown mean and variance57,63,65 or a time-to-event endpoint where the hazard function is modelled with a three-parameter generalized gamma and the unknown priors follow a gamma and inverse gamma distributions. 78 Authors whose works consider more than one endpoint, for example, Bernoulli efficacy and Bernoulli toxicity, assume a Dirichlet distribution,22,37,75 or for time-to-event and Bernoulli toxicity endpoints where the unknown parameters follow a bivariate gamma (regression) distribution 41 or gamma and beta distributions. 26 Some papers used MCMC methods,43,68 and one paper was based on a non-parametric approach. 43 When priors were specified, they were usually informative, sometimes with a number of alternative priors used and results compared. Although there is a considerable literature on elicitation of Bayesian prior distributions (see, for example, Chaloner et al., 95 Kadane and Wolfson, 96 O’Hagan, 97 and case studies by Blanck et al., 98 and Kinnersley and Day 99 ), only two articles identified in our review described the use of formal methods for prior elicitation methods.42,52
As described above, one major challenge in the development and application of decision-theoretic methods in clinical trials is that of constructing utility functions that accurately reflect the consequences of possible actions. It is clear from the articles discussed above that approaches to this challenge have varied. Some researchers have focussed on monetary costs and rewards, whilst others have compared these with improvement or deterioration in health states using approaches from health economics. The utilities should reflect the preferences of consequences from the point of view of the decision-maker. This can be particularly challenging when more than one individual or group will make a decision based on the results of a clinical trial, or be otherwise affected by the results, or if the decision-maker and the trialist have different viewpoints. For example, decision-making by a societal decision-maker such as National Institute for Health and Care Excellence (NICE) in the UK may primarily be based on cost-effectiveness, whereas decision-making by a pharmaceutical company may be based more on whether or not the current information is sufficient to apply for licensing for the experimental treatment. In an attempt to reconcile this challenge, Willan and Eckermann 81 proposed a design that combined both public health service and commercial perspectives where the utility function is made up of two thresholds, namely, a maximum price of the experimental treatment acceptable to the public health service for reimbursement and a minimum price to the pharmaceutical company that does not result in a loss of investment.
More general methods for construction of utility values based on direct consideration of consequences have been based on prioritizing preference, for example, using methods first proposed by Ramsey 100 or methods discussed by Lindley 101 or Emrich and Sedransk. 102 Although a variety of approaches have been taken, most researchers proposing single-stage designs have based utility functions on a patient or societal perspective, whereas commercial perspectives have been more common in development of multi-stage designs. In the description of the method aforesaid, we have taken the utility function to depend on the unknown parameters alone, as proposed by Lindley 14 and Raiffa and Schlaifer. 12 Most authors proposing simple utility functions have followed this approach. Some authors have proposed more complex utility functions in which the utility depends also on the observed trial data, for example, with a gain if a trial indicates a significant treatment effect. In spite of the numerous approaches proposed, it seems likely that it is this difficulty with specification of an appropriate utility function, together with a lack of familiarity, both with Bayesian methods in general and with decision-theoretic methods in particular, that is responsible for the very limited use of decision-theoretic methods in practice.
In spite of the challenges, we consider the Bayesian decision-theoretic approach to be appropriate for the design of pilot studies and early phase trials given the clear role of these trials is to inform decisions regarding further future clinical research. However such trials are designed, these decisions will be made and the decision-theoretic approach formalizes this by considering the decisions and their consequences explicitly. Even when trials are designed based on other approaches, we believe that the decision-theoretic methodology is a useful tool for trialists and statisticians designing trials, enabling the properties of trial designs obtained under one paradigm to be evaluated based on another. This is, perhaps, particularly important in small trials when compromise is inevitable, as it leads to a careful consideration of the purpose of the trial and its required properties, thus ensuring that it is fit for purpose. One thing that we believe could increase the use of decision-theoretic designs is a greater familiarity and improved understanding through retrospective evaluation of such approaches.
This review has identified many decision-theoretic approaches. In any real application, it is important to consider the purpose of the trial and ensure that this is reflected in the formulation of the decision problem and utility function so that the trial design proposed is appropriate to match this purpose.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was conducted as part of the InSPiRe (Innovative methodology for small populations research) project funded by the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement number FP HEALTH 2013 – 602144.
