Abstract
Background/Aims:
Bayesian designs for clinical trials using assurance to choose the sample size have been proposed in various trial contexts. Assurance allows for the incorporation of uncertainty on both the treatment effect and nuisance parameters into the sample size calculation. In the case of two-arm cluster randomised trials with continuous outcomes, assurance has been proposed with both a frequentist analysis (hybrid designs) and a Bayesian analysis (fully Bayesian designs). A Bayesian analysis in this context ensures a consistent treatment of probability throughout the design and analysis of the trial. In the fully Bayesian design, inference has been achieved via Markov chain Monte Carlo sampling, and since assurance itself is evaluated via simulation, the result is a computationally intensive and often slow-to-run approach. In the case of two-arm cluster randomised trials with binary outcomes, assurance has not yet been explored to specify sample sizes, either in the hybrid or fully Bayesian case.
Methods:
This article considers fully Bayesian designs for two-arm cluster randomised trials with continuous and binary outcomes. For the analysis of the trial, we use a (generalised) linear mixed-effects model. We summarise the inference for the treatment effect based on quantiles of the posterior distribution. We use assurance to choose the sample size. In the continuous case, we investigate Integrated Nested Laplace Approximations for inference to speed up calculation of the assurance and compare Integrated Nested Laplace Approximations in computation time and accuracy to Markov chain Monte Carlo. In the binary case, we develop the first fully Bayesian design for cluster randomised trials and conduct a similar comparison between Integrated Nested Laplace Approximations and Markov chain Monte Carlo. We demonstrate our novel approach using assurance to choose sample sizes for the SPEEDY cluster randomised trial, based on the results of a formal prior elicitation exercise with two clinical experts.
Results:
We report comparisons of Integrated Nested Laplace Approximations and Markov chain Monte Carlo for a range of different scenarios for cluster randomised controlled trials (RCTs), to determine when each inference scheme should be used, balancing the computational cost in terms of speed and accuracy. Overall Markov chain Monte Carlo with a very large number of samples produces very accurate inference but does not scale well in terms of computational speed compared to Integrated Nested Laplace Approximations. Based on our simulation study, we recommend that Integrated Nested Laplace Approximations is used for inference in cluster trials with binary outcomes and large (n> 500) cluster trials with continuous outcomes, and that Markov chain Monte Carlo is used in smaller (n≤500) cluster trials with continuous outcomes. Our case study demonstrated how to incorporate the uncertainty of trial clinicians into the sample size calculation to give an overall assessment of the likelihood of success of the trial.
Conclusion:
A fully Bayesian design can be used for two-arm cluster trials with both continuous and binary outcomes. Integrated Nested Laplace Approximations can allow for more efficient assessment of the assurance for cluster trials with binary outcomes and large cluster trials with continuous outcomes, without loss of accuracy in inference. A fully Bayesian design of a cluster randomised trial provides a coherent design and analysis framework and incorporates uncertainty in model parameters when choosing the sample size.
Keywords
Background/Aims
Introduction
Sample size calculations are important in clinical trials, as they balance the need for precision while taking into account practical considerations such as cost and time. It is unethical to recruit more participants than needed, but too few participants risk not being able to answer the research question, wasting time and money, and inconveniencing patients. In this article, we focus on sample size calculations for two-arm superiority cluster randomised trials (CRTs), both with a continuous outcome 1 and with a binary outcome. 2
For the CRT sample size calculation, we will use a Bayesian approach, as it has the advantage of using prior knowledge, or information from previous studies, which is useful when there is uncertainty in the parameters and complexity in the inferential model. The Bayesian approach gives an intuitive interpretation in these cases. It also allows more flexible decision-making. The Bayesian approach used to calculate the sample size is assurance, 1 which is an alternative to power. The evaluation of the assurance typically requires a two-loop Monte Carlo scheme, sampling from a design prior distribution in the outer loop and performing a Markov Chain Monte Carlo (MCMC) update to obtain samples of the treatment effect in the inner loop for each sample in the outer loop.
A particular challenge in this case, which is a problem more generally in Bayesian design of experiments, is computational cost. It can be time-consuming to run a full MCMC scheme for every iteration in the Monte Carlo procedure described above. In an attempt to reduce computation time, in this article, we investigate Integrated Nested Laplace Approximations (INLA) 3 as an alternative to MCMC. 4 This approach has been considered for individually randomised controlled trials, 5 but it has not been investigated before in CRTs, which are more complex trials inferentially, requiring modelling of the cluster effects and intra-cluster correlation coefficient (ICC). There are other papers that have discussed the comparison between INLA and MCMC6–9 with regression models of various types, but either their sole focus was accuracy, they only considered very large MCMC runs, or the models they considered were not comparable to those in this article. Our investigation focuses on the trade-off between speed and accuracy of inference on the treatment effect based on approximation using INLA and MCMC under varying numbers of posterior samples. As such, it provides a new perspective on the relative merits of MCMC and INLA in a clinical trials context.
We compare the inference resulting from MCMC using different numbers of posterior samples and INLA for continuous outcomes, considering a linear mixed-effects model as in Wilson (2023). 1 In general, it should be faster to obtain the posterior distribution for the marginal treatment effect using INLA than using a sampling scheme such as MCMC, particularly for complex designs and large sample sizes. However, INLA is an approximation, whereas MCMC samples from the true posterior distribution, and so with enough samples, can be arbitrarily accurate. We further outline Bayesian inference for a CRT with a binary outcome and undertake a comparison of MCMC and INLA for this case. Based on our investigation, we provide guidance on when INLA and MCMC are most suitable for Bayesian analysis of CRTs.
We demonstrate the approach by calculating the sample size of the case study SPEEDY trial 10 using assurance, 1 for both continuous and binary co-primary outcomes. Based on our investigation, we use MCMC for the continuous outcome and INLA for the binary outcome. To evaluate the assurance, we use the prior distributions resulting from an expert elicitation exercise with the two co-leads in SPEEDY. We report the assurance and required sample sizes in each case from the priors for each expert and from an equally weighted prior between the two experts.
This article is structured as follows. We review a standard approach to power calculations for two-arm superiority CRTs for continuous and binary outcomes. We detail Bayesian inference for two-arm superiority CRTs with continuous and binary outcomes. We detail how to calculate assurance for CRTs. Then, we perform a simulation study comparing inference via MCMC and INLA in both cases, evaluating their accuracy and computation time. After that, we have the application to the SPEEDY trial. Finally, we summarise this article and identify future work.
Power calculation for two-arm superiority CRTs
Here, we summarise standard power calculations for CRTs, to provide a contrast to the assurance described later.
The power for a two-arm CRT with a continuous outcome is given by the conditional probability that we reject the null hypothesis of a treatment effect of zero (for example), given an assumed treatment effect and values chosen for a set of nuisance parameters detailed below. We can approximate the power function, for sample size
where
The power in the binary case can be expressed12,13 as
where
and
here
In general, we will have uncertainty about the true values of the (nuisance) parameters in the power calculations above. By defining a prior distribution on the (nuisance) parameters, rather than assuming single values as in power, we can take this uncertainty into account in the sample size calculation. The resulting quantity is known as the assurance and can be used to choose the sample size for a CRT in combination with either a frequentist or a Bayesian analysis, respectively known as a hybrid and a fully Bayesian design.
Methods
Bayesian inference for two-arm CRTs
An alternative to the hypothesis-testing analyses which formed the basis of the power functions in the previous section is to perform a Bayesian analysis of the trial. This has the advantage of allowing prior information to be incorporated into the analysis and provides a coherent framework for design and analysis if the assurance is to be used to choose the sample size, which will be described below. In this section, we detail Bayesian inference for CRTs.
We describe the inference for the treatment effect for a CRT with a continuous outcome, based on the posterior distribution, as described in Spiegelhalter. 14 For the binary outcome, we can perform inference using a similar approach to that of Turner. 2 Then, based on this inference, we use the developed assurance from Wilson 1 to choose the CRT sample size. For the inference, we consider comparison of treatment with control.
A (generalised) linear mixed-effects model can be used, with continuous response
with
For Bayesian inference, the parameters
where each
The inference in both cases is not conjugate, and so numerical or approximation methods are needed to evaluate the posterior distribution on the treatment effect
Previous work1,14 in the continuous case has considered simulation from the posterior distribution of the treatment effect using MCMC. For large or complex CRTs, this can be computationally costly, and, when many runs of the MCMC are required as described for the design of the trial in the assurance, it may not be feasible to use MCMC at all. We propose INLA as an alternative to MCMC for inference on the treatment effect in CRTs and will compare MCMC and INLA under various scenarios, focusing on their accuracy and computational cost.
In the analyses in this article, we perform inference via MCMC using the R package rjags. 4 The rjags package is used for Bayesian data analysis and interfaces between R and the JAGS library. 15 It uses a combination of Gibbs sampling, Metropolis-Hastings sampling and slice sampling to sample from the posterior distribution. In our implementation of MCMC in rjags, we use a burn-in period to allow the MCMC chains to converge before recording samples.
To perform inference using INLA, we use the INLA package from the R-INLA project. 3 The idea behind INLA is that it approximates the required integral to evaluate the posterior distribution using Laplace’s method. It can be used for the analysis of CRTs since the (generalised) linear mixed-effects models can be written as latent Gaussian models, for which the Laplace method can be applied (for further information, see Gómez-Rubio 16 ). We obtain the required quantiles from the posterior distribution of the treatment effect directly from INLA, without the need for sampling.
Assurance
Following Wilson, 1 assurance evaluates the unconditional probability that the trial finds a significant treatment effect. This allows an appropriate sample size choice in the planning of any cluster randomised controlled trial (RCT) and is not conditional on chosen values of unknown parameters in the same way as the power.
Define an event ‘Success’ to be the successful outcome of the CRT, that is, treatment is superior to control. Then, for the sample size
where
The total sample size in a cluster RCT is given by
where
The assurance for total sample size
where ‘Success’ denotes that treatment is found superior to control based on the posterior distribution in the analysis of the CRT and
In the case of MCMC, we obtain samples of
Results
Comparison of INLA versus MCMC
For both the continuous and binary outcomes, we simulate a CRT with two different numbers of clusters (
To compare MCMC to INLA, we consider a range of numbers of MCMC samples,
Figure 1(a) and (c) and Figure 2(a) and (c) show the reported mean values of the posterior median minus

The difference between the posterior median and the true treatment effect and the run time for each method, in the continuous outcome case for scenarios

The difference between the posterior median and the true treatment effect and the run time for each method in the binary outcome case for scenarios
Overall, in the continuous outcome case, we see that both INLA and MCMC are accurate for small trial sample sizes, with MCMC requiring at least 10,000 samples from the posterior distribution for trials with large sample sizes to ensure convergence. MCMC is faster than INLA for small sample sizes, but INLA is much faster than MCMC for large CRTs. This suggests that we should use MCMC with at least 10,000 samples to analyse continuous outcome CRTs with sample sizes of 100–500, and INLA for CRTs with a sample size above 500. In addition, as we increase the number of clusters to
For the binary case, INLA is as accurate as MCMC with a large number of posterior samples for all CRT sample sizes and is considerably faster. In the binary case, MCMC is not able to exploit the same conjugacy in the precision priors as the continuous case, explaining this disparity. The result is that INLA is a suitable approach to use for inference for two-arm cluster RCTs with a binary outcome, irrespective of the sample size of the CRT.
Based on the simulations in each of the four scenarios, we provide the following conclusions for the fastest approaches that provide accurate Bayesian inference in two-arm CRTs:
For a continuous outcome, we found MCMC with
For a binary outcome, we found INLA to be best for all total sample sizes.
Application to the SPEEDY trial
Introduction to SPEEDY
SPEEDY 10 is a two-arm CRT which aims to determine the clinical and cost-effectiveness of a novel specialist prehospital redirection pathway intended to facilitate thrombectomy treatment for acute stroke. The comparator is standard care. The unit of cluster randomisation is ambulance stations which are work bases for ambulance practitioners who initiate the SPEEDY pathway or standard care. A broad study population is being enrolled, but the power calculation was based on a subset titled ‘primary analysis population’. This primary analysis population comprises the group of participants who are eligible for both pathway deployment and subsequent thrombectomy treatment. The wider population allows for other impacts of the pathway to be evaluated. The study has a co-primary outcome of thrombectomy rate and time from stroke symptom onset to thrombectomy. The sample size for thrombectomy rate is 894 participants, and time to thrombectomy is 564 participants.
The sample size for time to thrombectomy is based on 90% power,
Similarly, for the sample size calculation for the thrombectomy rate, the same values of power, significance level
Elicitation for the SPEEDY Trial
In line with standard frequentist sample size calculations, the SPEEDY trial did not account for uncertainty in the model parameters. We wish to incorporate such uncertainty by using the assurance in place of power. This requires informative design prior distributions for each model parameter. We used expert elicitation to determine suitable prior distributions for the SPEEDY trial parameters, relating elicited values on observable quantities to the design prior distributions of interest. We will use these design prior distributions in our assurance calculation in the next section.
To perform the elicitation, we first prepared an evidence dossier for the quantities of interest. We held an elicitation workshop with two experts who are the co-leads of the SPEEDY trial. In this elicitation workshop, we used the quartile method to perform individual elicitations of the quantities of interest. However, we did not elicit the cluster size variability
Assurance for the time to thrombectomy
We reproduce the sample size calculation for time to thrombectomy using the assurance, as detailed in the assurance. Based on the general advice from the results of the comparison simulation study, we use MCMC for inference with
The elicited prior distributions for each expert, and the average of both, for time to thrombectomy and thrombectomy rate.

The elicited prior probability density functions for the parameters (
The estimated sample size using the design prior distributions for expert 1, expert 2 and the average was 150 in each case, based on a minimum assurance of 90%. This is due to the fact that the experts were very optimistic about the improvement in time to thrombectomy in the treatment arm, represented by

(a) The assurance with different average cluster sizes for the time to thrombectomy. Expert 1 is more optimistic about the result than expert 2 since the assurance for any chosen average cluster size is larger. (b) The assurance with different average cluster sizes in the case of the thrombectomy rate. Expert 1 is more pessimistic about the result than expert 2 in this case, and we can see expert 2 and the average do reach an assurance of 0.9 in the plot, while for expert 1, the assurance with average cluster sizes of 25 is around 0.87.
We see that in each case the assurance, like power, is an increasing function with cluster size. Expert 1 is most optimistic about the treatment, with expert 2 less optimistic and the average lying somewhere between the two. As the cluster size gets very large. Each assurance curve will tend to the probability, under that expert’s design prior distribution, that the treatment effect is positive.
Assurance for the thrombectomy rate
The elicited design prior distributions associated with the thrombectomy rate from expert 1, expert 2 and the average are given in Table 1 and plotted in Figure 5. We use these to calculate the assurance, and hence sample size, with a target assurance value of 90%. In this case, we used INLA for our assurance and sample size calculations based on the general conclusions from the simulation comparison result.

The elicited prior probability density functions of the parameters (
The estimated sample sizes using the design prior distributions from expert 1, expert 2 and the average are 4800, 1650 and 3150, respectively. In this case, the experts were relatively pessimistic about the likely values of the treatment effect for the thrombectomy rate, relative to the sample size estimate from the power calculation of 894. However, the required primary analysis population to ensure an adequate sample size for the time to thrombectomy outcome means that there will need to be between 2600 and 4300 patients recruited to the trial, meaning that in practice both expert 2 and the average will likely achieve 90% assurance, and expert 1 will achieve relatively high assurance. Assurance also has a different interpretation to power, and so there is no reason why matching the values of power and assurance is an equivalent exercise.
We have produced a plot of the assurance for different average cluster sizes, based on the 150 clusters in SPEEDY, for both experts and the average, and this is given in Figure 4(b).
We see a similar scenario as in the continuous outcome case, with the assurance increasing for increasing numbers of patients in each cluster. The main difference is in the ordering of the curves, with expert 2 providing the highest assurance for each cluster size and expert 1 providing the lowest, whereas in Figure 4(a), this was the opposite way round.
Conclusion
In this article, we have considered the problem of choosing the sample size, using a Bayesian approach, for a two-arm superiority cluster RCT with a continuous outcome and a binary outcome. We have compared the inference using MCMC to INLA based on appropriate mixed-effect models. From the comparison, we found that the use of INLA has advantages in CRTs with Bayesian designs, as it was as accurate as MCMC with a large number of MCMC samples (
We used the SPEEDY trial as a case study of the sample size choice via an assurance calculation, as SPEEDY has two primary outcomes: both a continuous and a binary outcome. In the original sample size calculation, SPEEDY did not consider the uncertainty in the model parameters, and so we performed an expert elicitation to specify suitable design prior distributions for the parameters. The expert elicitation was performed with two experts, and we calculated the assurance and, hence, sample size for each expert separately and the average of both experts. The findings were that the assurance and resulting sample sizes were smaller than with the original power calculation for the continuous case, since both experts were relatively optimistic about the ability of the SPEEDY pathway to reduce the time to thrombectomy by more than the values used in the power calculation, whereas the resulting sample sizes were much larger than their values from the power calculations for the binary outcome, as both experts felt that the value used for power in this case was fairly ambitious. Due to the nature of the trial, with the two outcomes needing to be powered simultaneously, the actual sample size for the binary outcome realised in SPEEDY will provide high assurance for both experts.
In general, assurance is particularly beneficial when there is substantial uncertainty in the values of nuisance parameters to which the power calculation is sensitive. One such parameter considered in this article is the ICC in a CRT, which can be particularly challenging to estimate accurately a priori. Assurance provides a way to take this uncertainty into account and provides a sample size which is more robust to mis-specification than a power calculation using a single estimated value. Trial statisticians should consider using assurance in place of power whenever they have substantial uncertainty about sensitive parameters in a power calculation.
The calculation of the assurance and sample size for large trials, particularly with binary outcomes, would be almost prohibitively computationally expensive and time-consuming given current widely available computing power without the use of INLA, as MCMC takes a very long time in these cases. This assurance approach, together with INLA (or MCMC) for inference, could be extended to more complex CRT designs, including survival outcomes, longitudinal designs, multi-arm trials and adaptive designs. This is left for future work.
Supplemental Material
sj-pdf-1-ctj-10.1177_17407745261421842 – Supplemental material for Bayesian design and analysis of two-arm cluster randomised trials using assurance: Extension to binary outcomes and comparison of Markov chain Monte Carlo and Integrated Nested Laplace Approximations
Supplemental material, sj-pdf-1-ctj-10.1177_17407745261421842 for Bayesian design and analysis of two-arm cluster randomised trials using assurance: Extension to binary outcomes and comparison of Markov chain Monte Carlo and Integrated Nested Laplace Approximations by Abdullah Aloufi, Kevin J Wilson, Nina Wilson, Lisa Shaw and Christopher Price in Clinical Trials
Supplemental Material
sj-pdf-2-ctj-10.1177_17407745261421842 – Supplemental material for Bayesian design and analysis of two-arm cluster randomised trials using assurance: Extension to binary outcomes and comparison of Markov chain Monte Carlo and Integrated Nested Laplace Approximations
Supplemental material, sj-pdf-2-ctj-10.1177_17407745261421842 for Bayesian design and analysis of two-arm cluster randomised trials using assurance: Extension to binary outcomes and comparison of Markov chain Monte Carlo and Integrated Nested Laplace Approximations by Abdullah Aloufi, Kevin J Wilson, Nina Wilson, Lisa Shaw and Christopher Price in Clinical Trials
Supplemental Material
sj-png-3-ctj-10.1177_17407745261421842 – Supplemental material for Bayesian design and analysis of two-arm cluster randomised trials using assurance: Extension to binary outcomes and comparison of Markov chain Monte Carlo and Integrated Nested Laplace Approximations
Supplemental material, sj-png-3-ctj-10.1177_17407745261421842 for Bayesian design and analysis of two-arm cluster randomised trials using assurance: Extension to binary outcomes and comparison of Markov chain Monte Carlo and Integrated Nested Laplace Approximations by Abdullah Aloufi, Kevin J Wilson, Nina Wilson, Lisa Shaw and Christopher Price in Clinical Trials
Supplemental Material
sj-png-4-ctj-10.1177_17407745261421842 – Supplemental material for Bayesian design and analysis of two-arm cluster randomised trials using assurance: Extension to binary outcomes and comparison of Markov chain Monte Carlo and Integrated Nested Laplace Approximations
Supplemental material, sj-png-4-ctj-10.1177_17407745261421842 for Bayesian design and analysis of two-arm cluster randomised trials using assurance: Extension to binary outcomes and comparison of Markov chain Monte Carlo and Integrated Nested Laplace Approximations by Abdullah Aloufi, Kevin J Wilson, Nina Wilson, Lisa Shaw and Christopher Price in Clinical Trials
Supplemental Material
sj-png-5-ctj-10.1177_17407745261421842 – Supplemental material for Bayesian design and analysis of two-arm cluster randomised trials using assurance: Extension to binary outcomes and comparison of Markov chain Monte Carlo and Integrated Nested Laplace Approximations
Supplemental material, sj-png-5-ctj-10.1177_17407745261421842 for Bayesian design and analysis of two-arm cluster randomised trials using assurance: Extension to binary outcomes and comparison of Markov chain Monte Carlo and Integrated Nested Laplace Approximations by Abdullah Aloufi, Kevin J Wilson, Nina Wilson, Lisa Shaw and Christopher Price in Clinical Trials
Supplemental Material
sj-png-6-ctj-10.1177_17407745261421842 – Supplemental material for Bayesian design and analysis of two-arm cluster randomised trials using assurance: Extension to binary outcomes and comparison of Markov chain Monte Carlo and Integrated Nested Laplace Approximations
Supplemental material, sj-png-6-ctj-10.1177_17407745261421842 for Bayesian design and analysis of two-arm cluster randomised trials using assurance: Extension to binary outcomes and comparison of Markov chain Monte Carlo and Integrated Nested Laplace Approximations by Abdullah Aloufi, Kevin J Wilson, Nina Wilson, Lisa Shaw and Christopher Price in Clinical Trials
Footnotes
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
