Abstract
Background/Aims:
Sample size determination for cluster randomised trials is challenging because it requires robust estimation of the intra-cluster correlation coefficient. Typically, the sample size is chosen to provide a certain level of power to reject the null hypothesis in a two-sample hypothesis test. This relies on the minimal clinically important difference and estimates for the overall standard deviation, the intra-cluster correlation coefficient and, if cluster sizes are assumed to be unequal, the coefficient of variation of the cluster size. Varying any of these parameters can have a strong effect on the required sample size. In particular, it is very sensitive to small differences in the intra-cluster correlation coefficient. A relevant intra-cluster correlation coefficient estimate is often not available, or the available estimate is imprecise due to being based on studies with low numbers of clusters. If the intra-cluster correlation coefficient value used in the power calculation is far from the unknown true value, this could lead to trials which are substantially over- or under-powered.
Methods:
In this article, we propose a hybrid approach using Bayesian assurance to determine the sample size for a cluster randomised trial in combination with a frequentist analysis. Assurance is an alternative to traditional power, which incorporates the uncertainty on key parameters through a prior distribution. We suggest specifying prior distributions for the overall standard deviation, intra-cluster correlation coefficient and coefficient of variation of the cluster size, while still utilising the minimal clinically important difference. We illustrate the approach through the design of a cluster randomised trial in post-stroke incontinence and compare the results to those obtained from a standard power calculation.
Results:
We show that assurance can be used to calculate a sample size based on an elicited prior distribution for the intra-cluster correlation coefficient, whereas a power calculation discards all of the information in the prior except for a single point estimate. Results show that this approach can avoid misspecifying sample sizes when the prior medians for the intra-cluster correlation coefficient are very similar, but the underlying prior distributions exhibit quite different behaviour. Incorporating uncertainty on all three of the nuisance parameters, rather than only on the intra-cluster correlation coefficient, does not notably increase the required sample size.
Conclusion:
Assurance provides a better understanding of the probability of success of a trial given a particular minimal clinically important difference and can be used instead of power to produce sample sizes that are more robust to parameter uncertainty. This is especially useful when there is difficulty obtaining reliable parameter estimates.
Keywords
Background
Cluster randomised trials (CRTs) are a type of randomised controlled trial (RCT) in which randomisation is at the cluster-level, rather than the individual-level as in standard RCTs. This means that ‘groups’ of individuals (e.g. general practices, schools or communities) are randomly allocated to different interventions (e.g. vaccination programmes or behavioural interventions). A common reason for implementing this design is to mitigate the risk of contamination or where individual randomisation is not feasible. Other justifications are detailed in the study by Eldridge and Kerry. 1
Individuals within a cluster are likely to share similar characteristics (e.g. demographics), as well as be exposed to extraneous factors unique to the cluster (e.g. delivery of the intervention by the same healthcare professional). Consequently, outcomes from members of the same cluster are often correlated, which can be quantified by the intra-cluster correlation coefficient (ICC). This lack of independence reduces the statistical power compared to a standard RCT of the same size, meaning that the sample size needs to be inflated to allow for the clustering effect.
Various methods for sample size determination in CRTs exist,2,3 which all rely on estimation of the ICC. In practice, ICC estimates are typically based on pilot studies, but these are often too small to provide precise and reliable estimates. 4 An alternative simple approach is to use a conservative estimate of the ICC (e.g. the upper confidence interval limit) in the sample size calculation. 5 However, this can lead to over-powered and unnecessarily large trials. A more reliable method is to combine ICC estimates from multiple sources, such as previous trials or databases listing ICC estimates, 6 and use information on patterns in ICCs. 7 This raises further issues such as how to effectively combine the ICC estimates, how to adequately reflect their varying degrees of relevance to the planned trial and how to capture the uncertainty in the individual ICC estimates. 8 It was suggested to consider integrating over a range of possible ICC values, determined by confidence intervals obtained using methods in the study by Ukoumunne, 9 to provide an ‘average’ sample size with respect to the ICC. However, this does not consider the uncertainty present in other design parameters, such as the treatment effect and variability of the outcome measures. Furthermore, it assumes that each value of the ICC is equally likely. Other approaches to deal with uncertainty in the ICC include sample size re-estimation10,11 and robust designs, such as maximin designs. 12 For the latter approach, a range rather than a prior is used for the ICC.
Utilising a Bayesian approach for the trial design, in which prior distributions are assigned to the unknown design parameters such as the ICC, could further circumvent these issues and is particularly useful in settings where ICC estimates are not readily available. In the CRT literature, prior distributions for the ICC have been proposed based on subjective beliefs 13 and single or multiple ICC estimates, 14 which may be weighted by relevance of outcomes and patient population. 15 These are used to estimate a distribution for the power of the planned trial for a given sample size. Within the Bayesian framework, uncertainty in other design parameters can be incorporated into the sample size calculation in a similar way, and the relative likelihood of different parameter values is encompassed through specification of the prior distribution. For example, Sarkodie et al. 16 assigned a prior to the overall standard deviation, in addition to the ICC, then described a ‘hybrid’ approach to determine the sample size required to attain a desired ‘expected power’, defined as a weighted average of the probability that the null hypothesis is rejected (with weights determined by the priors).
Hybrid approaches, which combine a Bayesian design with a frequentist analysis of the final trial data, have gained increasing popularity, particularly with respect to standard RCTs.17,18 In this article, we adopt a hybrid approach by using the Bayesian concept of ‘assurance’ to determine the sample size for a two-arm parallel-group CRT with a Wald test for the analysis. In contrast to traditional frequentist power, which represents a conditional probability that the trial is a success, given the values chosen for the design parameters and the hypothesised treatment effect, assurance typically refers to the ‘unconditional’ probability that the trial will be ‘successful’. 19 We modify this definition by conditioning on the minimal clinically important difference (MCID) instead of assigning a prior distribution to, and integrating over, the treatment effect as is standard practice.17,20 This is more representative of the design stage of a trial, in which the treatment effect is typically fixed a priori by investigators. Moreover, this ensures that the assurance will tend to one as the sample size increases so can be used analogously to traditional power, thus aiding interpretation.
A key consideration when applying a Bayesian design is how to specify suitable prior distributions. In contrast to the study by Sarkodie et al., 16 which assumes independent priors on the ICC and standard deviation, we suggest a joint prior distribution for these parameters, as described in the Methods section. In addition, we account for the fact that many CRTs have unequal cluster sizes by defining a prior distribution on the coefficient of variation of cluster size. This is often overlooked in standard sample size calculations for CRTs.21,22
Our approach is motivated by a parallel-group CRT, Identifying Continence OptioNs after Stroke (ICONS), outlined in the ‘Results’ section. We illustrate the effects of redesigning this trial using the entire ICC prior distribution to inform sample size determination via an assurance calculation, rather than relying on a single point estimate from this distribution as in Tishkovskaya et al. 23 The impacts of varying the ICC prior distributions on the chosen sample size are evaluated. We perform sensitivity analyses on other design parameters in an additional simulation study provided in the Appendix.
Jones et al. 24 summarise the current state of play regarding the use of Bayesian methods in CRTs. In doing so, they highlight the ‘need for further Bayesian methodological development in the design and analysis of CRTs ... in order to increase the accessibility, availability and, ultimately, use of the approach’. This article is, therefore, a timely contribution.
Methods
Analysis for CRTs
Suppose that we are designing a two-arm, parallel-group CRT assuming 1:1 randomisation of clusters and normally distributed outcomes. A common analysis following the trial is to use a linear mixed-effects model. That is, if
where
The ratio of the variability between clusters
The superiority of the treatment is assessed via a hypothesis test of
Choosing a sample size using assurance
The power of the one-sided Wald test for significance level
where
In a standard power calculation, the sample size would be chosen as the smallest value which gives 80% or 90% power, based on values for
Alternatively, we can use assurance to choose the sample size. Whereas the power is conditioned on the chosen estimates for
One disadvantage of the assurance is that it tends to
The assurance in equation (3) assumes that we choose
The advantage of this is that the assurance will now tend to 1 as the sample size increases.
To evaluate the assurance in practice, we sample values of
Specification of priors
To evaluate the assurance, we are required to specify a prior distribution for
Since the coefficient of variation can only take positive values, a gamma distribution
One way to specify a joint prior distribution for
An alternative approach, relevant to our application, is to specify the joint distribution between
where
where
Results
The ICONS post-stroke incontinence CRT
The approach developed in this article is motivated by a planned parallel-group CRT, ‘Identifying Continence OptioNs after Stroke’ (ICONS), which investigates the effectiveness of a systematic voiding programme in secondary care versus usual care on post-stroke urinary incontinence for people admitted to NHS stroke units. 28 The primary outcome is the severity of urinary incontinence at three months post-randomisation, measured using the International Consultation on Incontinence Questionnaire. 29 Although a feasibility trial, ICONS-I 30 was conducted, the resulting ICC estimate was of low precision and could not be used as a reliable single source to inform the planning of the proposed trial.
ICONS, therefore, considered a Bayesian approach to combine multiple ICC estimates from 16 previous related trials. The opinions of eight experts regarding the relevance of the previous ICC estimates were elicited
31
and used to assign weights to each study and each outcome within a study. The elicited study and outcome weights were combined using mathematical aggregation
32
and incorporated into a Bayesian hierarchical model following the method by Turner et al.
15
The resulting constructed ICC distribution had a posterior median of
For the ICONS CRT, the sample size was chosen to give 80% power with a 5% significance level to detect
Redesigning the ICONS CRT using assurance
We consider assurance as an alternative to power to determine the sample size for the ICONS CRT. This seems like a more natural approach given the uncertainty in the ICC and the extensive elicitation and modelling that was conducted to construct the ICC posterior distribution (which forms the prior distribution for the assurance-based sample size calculation). Moreover, assurance incorporates the full ICC distribution into the sample size calculation, rather than relying on a single point estimate from it as in the power calculation.
We consider the following two forms of assurance.
Assurance based on the ICC prior only
In the first case, we fix

Histogram of 10,000 samples of the ICC,
To obtain an assurance of 80%, the resulting average sample sizes per cluster are
Summary of sample sizes obtained for the ICONS CRT based on power and assurance calculations.
ICONS: Identifying Continence OptioNs after Stroke; CRT: cluster randomised trial.
The left-hand side plot of Figure 2 illustrates the trade-off between cluster size and assurance/power, for

Power and assurance curves for the ICONS CRT (left). The power using the posterior median ICC is red, the power using the median ICC from the 34 ICC estimates is light blue, the assurance with a prior only on the ICC is black and the assurance with a prior on all of the nuisance parameters
We illustrate the effect of changing
Assurance based on the prior for
In the second case, we obtain the sample size required using an assurance calculation, which averages over a prior distribution on
To incorporate the dependence between
Sample
Calculate
The quantile function
The resulting joint prior distribution for

The joint prior distribution between
The resulting average cluster sample sizes for an assurance of 80% are
Table 1 summarises the sample sizes required to attain a target power/assurance of 80% for the various approaches applied to the ICONS trial. ‘Classical approach’ refers to the multiple-estimate method of taking the median of the ICC estimates without taking the relevance of the different studies into account. Relative to the classical approach that is often used in practice, the total sample size required when using the assurance-based method remains the same while incorporating uncertainty on all three parameters. ‘Conservative values’ refers to using conservative values for each of
We include the solutions for a smaller number of clusters,
Sensitivity analysis for the ICC prior
In the above, we consider the ICC prior distribution based on all eight reviewers and all 16 relevant studies. In this section, we investigate the sensitivity of the assurance-based sample size (with priors on
To recognise uncertainty in the individual reviewers’ responses, and in how these responses were pooled, the mathematical aggregation was refitted with alternative reviewer importance weights: equal weights of 0.125 for all reviewers and using a rank sum approach. 23 For the rank sum approach, we use Cronbach’s alpha score and assign ranks to each reviewer according to this score. In addition, we rerun the Bayesian hierarchical model for only the top 4 (25%), 8 (50%) and 12 (75%) most relevant studies. We refer to the five variations of the original ICC prior distribution as: equal weights, differentiated weights, top 4, top 8 and top 12.
The differentiated weights prior (red) and equal weights prior (green) are provided alongside the original prior (black) in the left-hand side plot of Figure 4. The top 4 prior (red), top 8 prior (green) and top 12 prior (blue) are given alongside the original prior (black) in the right-hand side plot of Figure 4. In both plots, the prior medians are given by vertical dashed lines.

Left: The densities of the differentiated weights (red), equal weights (green) and original ICC prior (black). Right: The densities of the top 4 (red), top 8 (green), top 12 (blue) and original ICC prior (black). The prior medians are represented by vertical dashed lines.
We see that the ICC prior remains similar to the original prior whether differentiated weights or equal weights are used, although both alternative weightings assign more probability to the ICC taking larger values. There is a larger change when using the top 4, top 8 or top 12 studies. In each case, the alternative prior is more diffuse than the original prior. Relatively large changes in the prior can cause only small changes in the prior median (e.g. the original prior compared to the top 12 prior). The effects of the alternative ICC priors on the sample sizes are shown in Table 2.
The average sample size per cluster
The power is based on the posterior median of the ICC.
ICONS: Identifying Continence OptioNs after Stroke; CRT: cluster randomised trial; ICC: intra-cluster correlation coefficient.
We see smaller changes in sample sizes for
To illustrate this point, compare the original ICC prior (black) to the top 12 prior (blue) in the right-hand side of Figure 4. They have substantially different priors, resulting in large differences in sample sizes required under assurance (600 versus 750 when
In the Appendix, we further evaluate the properties of the hybrid approach compared to power via a simulation study.
Conclusion
A standard sample size calculation requires pre-specification of parameters that are unknown at the design stage of a trial. Unique to sample size calculations for typical CRTs is the ICC, which requires robust estimation to avoid over- or under-powering the trial. Unnecessarily high ICC values, for example, lead to inefficient trials, increasing the number of clusters and/or participants and overall trial costs. In practice, parameter uncertainty is typically not considered, which can be problematic given the sensitivity of the sample size to small differences in the ICC.
This article proposes an alternative approach to sample size determination for CRTs using the Bayesian concept of assurance to incorporate parameter uncertainty into the design. The advantage of this approach is that it yields designs that provide adequate power across the likely range of parameter values and is, therefore, more robust to parameter misspecification. This is particularly important when there is difficulty obtaining a reliable ICC estimate, as in the ICONS post-stroke incontinence CRT used to motivate this work. Another approach in this context is to perform an interim analysis for sample size re-estimation. The approach proposed in this article could be used in combination with sample size re-estimation to provide further robustness to the design of CRTs.
We assign prior distributions to the ICC, overall standard deviation and coefficient of variation of the cluster size, while setting the treatment effect equal to the MCID in line with standard practice. We consider a joint prior for the ICC and standard deviation to model the dependency between these parameters. In the motivating case study, we use the entire ICC prior distribution elicited from expert opinion and data from previous studies to inform the sample size. Further work could consider using a commensurate prior to synthesise multiple sources of pre-trial information on the ICC, as in literature. 33
Sensitivity analyses of the assurance-based sample size to different ICC priors showed that different behaviour of the prior, particularly in the upper tail, can have quite a strong effect on the required sample size. Using a point estimate from this prior, for example the median, can miss this overall behaviour and result in sample sizes which are systematically too small, based on current knowledge about the ICC. Additional sensitivity analyses conducted on the overall standard deviation showed that the greater the uncertainty expressed in the prior, the more robust the assurance-based sample size is (see Appendix).
Uncertainty in the treatment effect can also be incorporated into the assurance calculation in a similar way. This may be appropriate for non-inferiority trials, for example, where the non-inferiority margin is fixed in advance and the treatment difference can be considered a nuisance parameter.
In line with regulatory requirements, we have maintained a frequentist analysis to present a hybrid framework. Further work could consider a fully Bayesian approach by using assurance when the success criterion is based on the posterior distribution of the treatment effect. 13
The hybrid approach presented in this article can be applied to avoid incorrectly powered studies resulting from ill-estimated model parameters, to mitigate the impact of uncertainty in the ICC and other nuisance parameters, and to incorporate expert opinion or historical data when designing a CRT. The approach proposed has been outlined in the case that the number of clusters is fixed and we aim to determine the total sample size for the trial. The approach would also allow the reverse process – to calculate the necessary number of clusters given a fixed total sample size.
Supplemental Material
sj-pdf-1-ctj-10.1177_17407745241312635 – Supplemental material for Hybrid sample size calculations for cluster randomised trials using assurance
Supplemental material, sj-pdf-1-ctj-10.1177_17407745241312635 for Hybrid sample size calculations for cluster randomised trials using assurance by S. Faye Williamson, Svetlana V Tishkovskaya and Kevin J Wilson in Clinical Trials
Supplemental Material
sj-r-2-ctj-10.1177_17407745241312635 – Supplemental material for Hybrid sample size calculations for cluster randomised trials using assurance
Supplemental material, sj-r-2-ctj-10.1177_17407745241312635 for Hybrid sample size calculations for cluster randomised trials using assurance by S. Faye Williamson, Svetlana V Tishkovskaya and Kevin J Wilson in Clinical Trials
Supplemental Material
sj-r-3-ctj-10.1177_17407745241312635 – Supplemental material for Hybrid sample size calculations for cluster randomised trials using assurance
Supplemental material, sj-r-3-ctj-10.1177_17407745241312635 for Hybrid sample size calculations for cluster randomised trials using assurance by S. Faye Williamson, Svetlana V Tishkovskaya and Kevin J Wilson in Clinical Trials
Supplemental Material
sj-r-4-ctj-10.1177_17407745241312635 – Supplemental material for Hybrid sample size calculations for cluster randomised trials using assurance
Supplemental material, sj-r-4-ctj-10.1177_17407745241312635 for Hybrid sample size calculations for cluster randomised trials using assurance by S. Faye Williamson, Svetlana V Tishkovskaya and Kevin J Wilson in Clinical Trials
Supplemental Material
sj-txt-10-ctj-10.1177_17407745241312635 – Supplemental material for Hybrid sample size calculations for cluster randomised trials using assurance
Supplemental material, sj-txt-10-ctj-10.1177_17407745241312635 for Hybrid sample size calculations for cluster randomised trials using assurance by S. Faye Williamson, Svetlana V Tishkovskaya and Kevin J Wilson in Clinical Trials
Supplemental Material
sj-txt-5-ctj-10.1177_17407745241312635 – Supplemental material for Hybrid sample size calculations for cluster randomised trials using assurance
Supplemental material, sj-txt-5-ctj-10.1177_17407745241312635 for Hybrid sample size calculations for cluster randomised trials using assurance by S. Faye Williamson, Svetlana V Tishkovskaya and Kevin J Wilson in Clinical Trials
Supplemental Material
sj-txt-6-ctj-10.1177_17407745241312635 – Supplemental material for Hybrid sample size calculations for cluster randomised trials using assurance
Supplemental material, sj-txt-6-ctj-10.1177_17407745241312635 for Hybrid sample size calculations for cluster randomised trials using assurance by S. Faye Williamson, Svetlana V Tishkovskaya and Kevin J Wilson in Clinical Trials
Supplemental Material
sj-txt-7-ctj-10.1177_17407745241312635 – Supplemental material for Hybrid sample size calculations for cluster randomised trials using assurance
Supplemental material, sj-txt-7-ctj-10.1177_17407745241312635 for Hybrid sample size calculations for cluster randomised trials using assurance by S. Faye Williamson, Svetlana V Tishkovskaya and Kevin J Wilson in Clinical Trials
Supplemental Material
sj-txt-8-ctj-10.1177_17407745241312635 – Supplemental material for Hybrid sample size calculations for cluster randomised trials using assurance
Supplemental material, sj-txt-8-ctj-10.1177_17407745241312635 for Hybrid sample size calculations for cluster randomised trials using assurance by S. Faye Williamson, Svetlana V Tishkovskaya and Kevin J Wilson in Clinical Trials
Supplemental Material
sj-txt-9-ctj-10.1177_17407745241312635 – Supplemental material for Hybrid sample size calculations for cluster randomised trials using assurance
Supplemental material, sj-txt-9-ctj-10.1177_17407745241312635 for Hybrid sample size calculations for cluster randomised trials using assurance by S. Faye Williamson, Svetlana V Tishkovskaya and Kevin J Wilson in Clinical Trials
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research presented in this paper used data from ICONS-I trial funded by the National Institute for Health Research (NIHR) under its Programme Grants for Applied Research scheme (grant number: RP-PG-0707-10059).
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
