Sage Journals: Discover world-class research

Abstract

Background:

Cluster randomised crossover designs (CRXOs) are a powerful type of longitudinal cluster randomised trial in which all participating clusters switch between two treatment conditions. “Multiple-period” CRXOs divide the trial duration into a number of periods of equal length and can allow for multiple crossovers between treatment conditions. It can be assumed that increasing the number of crossovers leads to an increase in statistical power. We investigate whether this is true for standard correlation structures, comparing CRXO designs with equal numbers of clusters, participants, and periods but differing in the number of crossovers, considering continuous outcomes.

Methods:

We consider the formula for the variance of the treatment effect estimator for multiple-period CRXOs under exchangeable and block-exchangeable within-cluster correlation structures, assuming equal cluster-period sizes and different patterns of treatment conditions in the treatment sequences varying in the number of crossovers. We also conduct a simulation study to compare the statistical power between multiple-period CRXO designs with different numbers of crossovers that share the same number and duration of periods and the same number of participants in each cluster period.

Results:

Under exchangeable and block-exchangeable correlation structures and equal cluster-period sizes, the number of crossovers in the treatment sequences does not impact study power, provided the design is balanced in terms of the number of periods and clusters implementing each condition.

Conclusions:

When an exchangeable or block-exchangeable within-cluster correlation structure and a time-invariant effect of treatment are assumed, power calculations for CRXO designs are invariant to the specific ordering of treatment conditions. In particular, a CRXO design with additional crossovers does not lead to increased statistical power compared to a CRXO design with just one crossover for these within-cluster correlation structures. Further work is required to investigate the utility of multiple crossovers in situations where the treatment effect varies over time.

Keywords

Additional crossovers cluster randomised crossover trial design comparison multiple-period design statistical power

Background

Cluster randomised crossover designs (CRXOs) are a type of longitudinal cluster randomised trial in which clusters alternate between two treatment conditions over time; each alternation is called a “crossover”.¹ These designs are suitable for interventions that can be removed once implemented. When suitable, CRXOs are efficient design choices: unlike standard parallel cluster randomised trials, CRXOs enable within-cluster comparisons, permitting more precise treatment effect estimation. CRXOs thus generally require far fewer clusters than parallel cluster randomised trials,² potentially making infeasible trials feasible.³

While there are many variants and generalisations of CRXOs, such as those that include more than two sequences or interventions,⁴ we consider two-sequence CRXOs. The most widely used variant is the two-period CRXO, where half of the clusters are randomly selected to commence with condition 1 before switching to condition 2, while the other half start with condition 2 before switching to condition 1;⁵ an example is in Figure 1 (Design 0). Multiple-period CRXOs extend the two-period design to more periods, allowing each cluster to switch between treatment conditions multiple times. Here, we consider balanced designs consisting of a pair of dual sequences with an equal number of clusters assigned to each sequence. Examples of multiple-period CRXOs with two treatment sequences are shown in Figure 1, indicating that there may be one or multiple crossovers between treatment conditions in each sequence of a multiple-period CRXO.

Figure 1.

Cluster randomised crossover (CRXO) design schematics with varying numbers of periods (rows), with one crossover (left column), two crossovers (middle column) and T-1 crossovers (right column).

Several recent trials have adopted multiple-period CRXO designs involving multiple crossovers. For example, the B-Free trial evaluated whether a restrictive benzodiazepine usage policy during cardiac surgery reduced the incidence of post-surgical delirium.⁶ Twenty hospitals were randomised to sequences that switched between the new restrictive and standard policies every 4 weeks, across 18 periods. The CALM-ICU trial was conducted over four 3-month periods in intensive care units, where clusters switched between two different antipsychotic drugs for treating hyperactive delirium among critically ill patients.⁷ The Aqueous-PREP trial similarly used a multiple-period CRXO design, with 14 hospitals altering between two preoperative antiseptic solutions every 2 months.⁸

There thus may be a perception that in multiple-period CRXOs, statistical power increases with the number of times clusters switch between treatment conditions. The B-Free trial investigators noted: “We incorporated multiple crossovers to gain statistical power,”⁶ and similarly, in the Aqueous-PREP trial, the authors stated “performing multiple treatment crossovers was expected to create a marginal increase in statistical power.”⁸ Here, we investigate whether increasing the number of crossovers in a multiple-period CRXO leads to gains in statistical power under standard modelling assumptions.

Here, we consider balanced designs consisting of a pair of dual sequences with an equal number of clusters assigned to each sequence and an equal number of participants included per cluster period. We consider a cross-sectional sampling structure where each individual provides a single measurement throughout the trial; cohort sampling structures are not considered. We aim to compare the statistical power of the designs in Figure 1 that are equal in total duration and number of periods but differ in the number of crossovers, for example, Designs 1A versus 1B versus 1C, 2A versus 2B versus 2C, and 3A versus 3B versus 3C.

Methods

Statistical framework for CRXOs

We consider a linear mixed model for $Y_{kti}$ , the continuous outcome of the i^th ( $i = 1, \dots, m$ ) participant from the k^th ( $k = 1, \dots, K$ ) cluster during time period t ( $t = 1, \dots, T$ ):

Y_{kti} = μ + β_{t} + X_{kt} θ + C_{k} + G_{kt} + ε_{kti}

(1)

where $μ$ represents the overall mean outcome, $θ$ is the treatment effect, $X_{kt}$ is the treatment indicator for cluster $k$ in period $t$ (1 for intervention, 0 for control), $β_{t}$ denotes the fixed effect for time period t (with $β_{1} = 0$ for identifiability), $C_{k} ~ N (0, σ_{c}^{2})$ is the cluster-level random effect, $G_{kt} ~ N (0, σ_{g}^{2})$ is the cluster-period random effect, and $ε_{kti} ~ N (0, σ_{e}^{2})$ is the participant-specific error term. As is standard when calculating study power, we consider the generalised least squares estimator for $θ$ , $\hat{θ}$ .⁹ For a multiple-period CRXO, $var (\hat{θ})$ is given by:¹⁰

var (\hat{θ}) = \frac{4 (σ_{c}^{2} + σ_{g}^{2} + σ_{e}^{2})}{mTK} [1 + (m - 1) ρ - m ρ CAC]

(2)

where $ρ = \frac{σ_{c}^{2} + σ_{g}^{2}}{σ_{c}^{2} + σ_{g}^{2} + σ_{e}^{2}}$ is the correlation between any pair of outcomes from the same cluster in the same period, known as the within-period intra-cluster correlation (ICC), and cluster autocorrelation ( $CAC) = \frac{σ_{c}^{2}}{σ_{c}^{2} + σ_{g}^{2} + σ_{e}^{2}} / \frac{σ_{c}^{2} + σ_{g}^{2}}{σ_{c}^{2} + σ_{g}^{2} + σ_{e}^{2}} = \frac{σ_{c}^{2}}{σ_{c}^{2} + σ_{g}^{2}}$ , which is the ratio of two correlations: the numerator is the correlation between participants’ outcomes in the same cluster but different periods; the denominator is the within-period ICC. $CAC = 1$ indicates that all participants have equally correlated outcomes, no matter how far apart in time they were measured (i.e. $σ_{g}^{2} = 0$ ; referred to as the “exchangeable within-cluster correlation structure”); $CAC < 1$ indicates that participants in the same cluster but different periods have a lower correlation than participants in the same cluster and the same period (referred to as the “block-exchangeable” or “nested-exchangeable” within-cluster correlation structure).⁹ Equation (2) is a reduced form of the variance expression that only applies for Model (1) (i.e. for exchangeable or block-exchangeable within-cluster correlation structures). More complex within-cluster correlation structures, for example, discrete-time or continuous-time decay where correlation depends on the time interval between measurements, would yield more complex variance expressions.^11–13

The power of a design to detect an effect size $δ$ at two-sided significance level $α$ is calculated using the standard formula:

Power = Φ (- z_{1 - \frac{α}{2}} + \frac{| δ |}{\sqrt var (\hat{θ})})

(3)

where $Φ$ is the cumulative distribution function of the standard normal distribution, and $z_{1 - \frac{α}{2}}$ is the corresponding critical value: $P (Z > z_{1 - \frac{α}{2}}) = \frac{α}{2}$ .

Results

Impact of the number of crossovers on statistical power

Equation (2) for $var (\hat{θ})$ does not depend on the specific ordering of control and intervention conditions within treatment sequences. Under exchangeable or block-exchangeable within-cluster correlation structures, this expression only requires that each period contains equal numbers of clusters assigned to the control and intervention conditions, and that each sequence includes an equal number of control and intervention periods. The treatment effect estimator in these balanced cases is the difference in the means of each condition within each period, averaged over periods, regardless of the pattern of conditions in each sequence (a special case of Equation A1 in Matthews and Forbes¹⁴). Hence, the exact pattern of control and intervention conditions does not appear in the variance expression. Thus, increasing the number of crossovers while keeping the number of periods ( $T$ ) and cluster-period size ( $m$ ) constant will not increase study power.

This is most obvious when $CAC = 1$ , where Equation (2) reduces to $var (\hat{θ}) = \frac{4 (σ_{c}^{2} + σ_{e}^{2})}{mTK} [1 - ρ]$ . This variance expression, and thus, the power in Equation (3), depends on the design only through the total sample size of each cluster, $mT$ , and the number of clusters; it does not matter how many crossovers between control and intervention conditions there are. When $CAC < 1$ , that is, a block-exchangeable within-cluster correlation structure, the equivalence for different study designs is less obvious due to the presence of the cluster-period size $m$ without $T$ in $[1 + (m - 1) ρ - m ρ CAC]$ ; but once again, the exact pattern of intervention and control conditions in each sequence does not appear.

We emphasise that we should not make comparisons between designs with the same total number of participants in each cluster but differing numbers of study periods; that is, while we can compare Design 1A with 1B, 1A with 1C or 1B with 1C (all with four periods), we cannot compare these designs with Designs 2A, 2B, 2C (six periods) or 3A, 3B, 3C (eight periods). The reason is that the ICC ( $ρ$ ) and the CAC are dependent on the model specified (including the fixed effects) and the definition of “study period”; even when the CAC = 1, the ICC depends on the formulation of the model.¹⁵ If we want to make a valid comparison between designs where the correlation between participants depends on the definition of the study period, we must ensure time is divided up into the same number of periods in the compared designs so that the periods are of equal duration.

Hence, when exchangeable or block-exchangeable correlation structures are assumed, the power of a CRXO does not depend on the number of crossovers in the treatment sequences: for a fixed number of periods, including multiple crossovers offers no statistical advantage.

Simulation study

We conducted a simulation study to illustrate and empirically validate the theoretical findings. We simulated data from Equation (1) for CRXO designs with the same number of clusters, periods, and participants per cluster period but differing in the number of crossovers: comparing Designs 1A to 1C, 2A to 2C, and 3A to 3C. Table 1 summarises the parameter settings. For each combination of parameters, theoretical power was calculated using the Shiny CRT calculator.¹⁰

Table 1.

Simulation settings.

Parameter	Meaning	Values
$T$	Number of periods in the multiple-period design	4, 6, 8
$K$	Number of clusters assigned to each sequence	20
$m$	Total number of participants in each cluster period	20
$ICC$	Within-period intra-cluster correlation	0.01, 0.05, 0.1
$CAC$	Cluster autocorrelation	0.5, 0.8, 0.95, 1
$δ$	Effect size	0.1

Two thousand datasets were simulated for each combination of parameters (36 combinations): for CRXO designs with $T$ periods and 1 crossover, and for designs with $T$ periods and $T - 1$ crossovers.

Each simulated dataset was analysed by fitting the linear mixed-effects model with random effects for cluster and cluster period in Equation (1), with parameters estimated via restricted maximum likelihood. When CAC = 1, only the cluster-level random effect was included in the model. Since the objective of the analysis was to compare the empirical power between each pair of designs, we calculated the proportion of simulated datasets in which $H_{0} : θ = 0$ was rejected at the two-sided 5% significance level for each parameter setting. Code to replicate this simulation study is available at: https://github.com/KMTanvir/RedundantCrossovers.

Figure 2 displays the empirical and theoretical power of the designs across all parameter settings, with separate panels for designs with different numbers of periods. In all scenarios, the empirical power for each pair of designs is identical up to Monte Carlo error and aligns with theoretical power. This supports our theoretical result: power is not influenced by the number of crossovers when the correlation structure is either block-exchangeable or exchangeable. As expected, the figure shows that power decreases with increasing ICC when the CAC is low (e.g. 0.5) but remains relatively stable as the CAC approaches 1. Power increases as the total number of participants in each cluster increases, regardless of the values of ICC and CAC. Similar results for smaller sample sizes (K = 15, m = 15 and K = 10, m = 10) are provided in Supplementary Figures 1 and 2.

Figure 2.

Empirical power for CRXO designs with a single crossover (solid line) and CRXO designs with multiple crossovers (dashed line), for designs with four periods (top panel), six periods (middle panel), and eight periods (bottom panel), across all combinations of intra-cluster correlation (ICC) and cluster autocorrelation (CAC). Theoretical power (dotted line) is included for comparison.

Conclusion

In this paper, we have demonstrated, theoretically and via simulation, that when an exchangeable or block-exchangeable within-cluster correlation structure is assumed when planning a cluster randomised crossover trial, all designs with a fixed number of periods of the same duration have the same precision, regardless of the number of crossovers in the treatment sequences. In particular, designs with the maximum number of crossovers confer no additional precision benefit compared to those with the same number of periods and just one crossover. Furthermore, as pointed out by Hemming et al.,¹ including multiple crossovers can reduce the practicality of a design by requiring clusters to stop and start interventions multiple times, and when washout periods are required to transition between interventions, it may not be desirable to include a large number of crossovers.

Researchers with access to a dataset to estimate ICCs and CACs when planning a trial may question how to divide time up into periods in this dataset: how many periods should they choose? In general, the length of periods used to estimate ICCs and CACs should match those assumed in the planned design. In many contexts, we would expect that the ICC and CAC will increase as the number of periods increases (due to the shorter duration of time of each period), but further investigation is required.

The results in this paper hold under two commonly used assumptions: (1) the within-cluster correlation structure is exchangeable or block-exchangeable and (2) treatment effects are immediate and constant across time. However, numerous crossovers may be beneficial in other circumstances. First, if a discrete-time or continuous-time decay within-cluster correlation structure is more appropriate, then there will be gains in study power through the inclusion of multiple crossovers.^11,16 Second, when the treatment effect is time-varying, a CRXO design with intervention periods of limited duration may provide more power to detect “immediate” or “time-limited” treatment effects; further work is required to investigate this.^17,18 If the objective is to estimate carryover effects or period-by-intervention interactions, or there is a need to reduce predictability of treatment assignment, multiple crossovers may be justified. We have also assumed that cluster periods are of equal sizes; the impact of deviations from this assumption requires further work. In addition, when a cohort sampling structure is considered, additional crossovers may increase power, but we expect this will depend on the within-individual correlation structure. Moreover, extending the number or duration of periods could lead to increased power. Increases in power under these alternative scenarios could be explored in future work, and where gains are possible, trialists would need to weigh those benefits against feasibility constraints.

Supplemental Material

sj-pdf-1-ctj-10.1177_17407745261431140 – Supplemental material for Additional crossovers in cluster randomised crossover trials do not always increase statistical power

Supplemental material, sj-pdf-1-ctj-10.1177_17407745261431140 for Additional crossovers in cluster randomised crossover trials do not always increase statistical power by KM Tanvir, Andrew B Forbes, Kelsey L Grantham and Jessica Kasza in Clinical Trials

Footnotes

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: J.K. is supported by an NHMRC Investigator (Leadership 1) Grant (GNT 2033380).

ORCID iDs

KM Tanvir

Andrew B Forbes

Kelsey L Grantham

Jessica Kasza

Supplemental material

Supplemental material for this article is available online.

References

Hemming

Taljaard

Weijer

, et al. Use of multiple period, cluster randomised, crossover trial designs for comparative effectiveness research. BMJ 2020; 371: m3800.

Arnup

McKenzie

Hemming

, et al. Understanding the cluster randomised crossover design: a graphical illustration of the components of variation and a sample size tutorial. Trials 2017; 18: 381.

Bellomo

Forbes

Akram

, et al. Why we must cluster and cross over. Crit Care Resusc 2013; 15(3): 155–157.

Jones

Kenward

MG.

Design and analysis of cross-over trials. Boca Raton, FL: Chapman and Hall/CRC, 2003.

Arnup

Forbes

Kahan

, et al. Appropriate statistical methods were infrequently used in cluster-randomized crossover trials. J Clin Epidemiol 2016; 74: 40–50.

Spence

Belley-Côté

Jacobsohn

, et al. Benzodiazepine-free cardiac anesthesia for reduction of postoperative delirium (B-Free): a protocol for a multi-centre randomized cluster crossover trial. CJC Open 2023; 5(9): 691–699.

Ankravs

Udy

Bellomo

, et al. Olanzapine versus quetiapine in critically ill patients with hyperactive delirium: protocol for a multicentre, cluster-randomised, double-crossover, pragmatic clinical trial (CALM-ICU). Crit Care Resusc 2024; 26(4): 249–254.

Slobogean

Sprague

Wells

, et al. Aqueous skin antisepsis before surgical fixation of open fractures (Aqueous-PREP): a multiple-period, cluster-randomised, crossover trial. Lancet 2022; 400: 1334–1344.

Hughes

Hemming

, et al. Mixed-effects models for the design and analysis of stepped wedge cluster randomized trials: an overview. Stat Methods Med Res 2021; 30(2): 612–639.

10.

Hemming

Kasza

Hooper

, et al. A tutorial on sample size calculation for multiple-period cluster randomized parallel, cross-over and stepped-wedge trials using the Shiny CRT Calculator. Int J Epidemiol 2020; 49: 979–995.

11.

Grantham

Kasza

Heritier

, et al. How many times should a cluster randomized crossover trial cross over? Stat Med 2019; 38: 5021–5033.

12.

Kasza

Hemming

Hooper

, et al. Impact of non-uniform correlation structure on sample size and power in multiple-period cluster randomised trials. Stat Methods Med Res 2019; 28(3): 703–716.

13.

Grantham

Kasza

Heritier

, et al. Accounting for a decaying correlation structure in cluster randomized trials with continuous recruitment. Stat Med 2019; 38: 1918–1934.

14.

Matthews

JNS

Forbes

. Stepped wedge designs: insights from a design of experiments perspective. Stat Med 2017; 36: 3772–3790.

15.

Kasza

Bowden

Ouyang

, et al. Does it decay? Obtaining decaying correlation parameter values from previously analysed cluster randomised trials. Stat Methods Med Res 2023; 32(11): 2123–2134.

16.

Moerbeek

Optimal design of cluster randomized crossover trials with a continuous outcome: optimal number of time periods and treatment switches under a fixed number of clusters or fixed budget. Behav Res Methods 2024; 56: 8820–8830.

17.

Kenny

Voldal

Xia

, et al. Analysis of stepped wedge cluster randomized trials in the presence of a time-varying treatment effect. Stat Med 2022; 41: 4311–4339.

18.

Kenny

Voldal

Xia

, et al. Factors affecting power in stepped wedge trials when the treatment effect varies with time. Trials. Epub ahead of print 20 February 2026. DOI: 10.1186/s13063-026-09558-x.