Abstract
Background/Aims:
The standard approach to designing stepped wedge trials that recruit participants in a continuous stream is to divide time into periods of equal length. But the choice of design in such cases is infinitely more flexible: each cluster could cross from the control to the intervention at any point on the continuous time-scale. We consider the case of a stepped wedge design with clusters randomised to just three sequences (designs with small numbers of sequences may be preferred for their simplicity and practicality) and investigate the choice of design that minimises the variance of the treatment effect estimator under different assumptions about the intra-cluster correlation.
Methods:
We make some simplifying assumptions in order to calculate the variance: in particular that we recruit the same number of participants,
Results:
There is a two-dimensional space of possible three-sequence, centrosymmetric stepped wedge designs with continuous recruitment. The variance of the treatment effect estimator for given
Conclusions:
In many different settings, a relatively simple design can be found (e.g. one based on simple fractions) that offers close-to-optimal efficiency in that setting. There may also be designs that are robustly efficient over a wide range of settings. Contour maps of the kind we illustrate can help guide this choice. If efficiency is offered as one of the justifications for using a stepped wedge design, then it is worth designing with optimal efficiency in mind.
Keywords
Background/aims
Stepped wedge trials are longitudinal cluster randomised trials where clusters are randomised, not to treatment conditions, but to sequences which dictate when each cluster will cross over uni-directionally from the control condition to the intervention condition.1,2 Since the seminal discussion of stepped wedge trials by Hussey and Hughes, 3 methodological work has tended to treat prospective time as a series of discrete periods, and we now understand a great deal about the optimal design of this kind of stepped wedge trial.4–9
But in many stepped wedge trials, including the first to be published with this label, 10 participants from a cluster are recruited/identified, exposed and assessed in one, long, uninterrupted stream. This is known as a continuous recruitment design.11,12 The standard approach to the design of these stepped wedge trials is to divide time into periods of equal length, with cross-overs at the boundaries between periods. With time on a continuous scale, however, there are no canonical cross-over times, and the number of sequences is limited only by the total number of clusters: each cluster could cross from the control to the intervention at any point on the continuous time-scale.
While this means that designs in continuous time can become quite complicated,13,14 there may be practical advantages, from the point of view of trial conduct, in choosing a design that has some parsimony, symmetry, or other simplicity of form. In this article, we are motivated in particular by an interest in designing continuous recruitment stepped wedge trials with small numbers of randomised sequences. We keep this investigation simple by concentrating on designs with just three sequences. (Previous work has considered the case of a two-sequence design where one sequence remains in the control condition throughout.) 15 Aside from their simplicity, designs with small numbers of sequences also make it easier to balance the randomisation according to cluster characteristics.
The problem of optimal design in this case is quite different to the problem of designing an optimal stepped wedge trial with discrete time periods and an equal number of participants in each period.4–9 In the continuous recruitment design problem, we can effectively move the cross-over time in each sequence with a slider control: by moving it to the right on the time axis we steadily increase the number of control participants in each cluster in that sequence, but at the cost of steadily decreasing the number of intervention participants by a corresponding amount. A continuous time-scale also allows us to model an intra-cluster correlation (ICC) that varies smoothly as a function of separation in time: the further apart we recruit two participants from the same cluster, the weaker we might expect the correlation between their outcomes to be. 12
Three-sequence designs are relatively simple to characterise, particularly if we focus attention on designs that have centrosymmetry. A centrosymmetric design has the property that if we run time backwards, and swap intervention and control, then we arrive at the same design.16,17 Given the number of clusters, duration of the recruitment period, and rate of recruitment at each cluster, the things that we can control in a centrosymmetric, three-sequence stepped wedge trial are (a) the proportion of clusters allocated to the middle sequence,

Schematic for a centrosymmetric, three-sequence stepped wedge trial with continuous recruitment. Centrosymmetry is the property that if we run time backwards, and swap intervention and control, then we arrive at the same design.
Here, we investigate the choice of design that minimises the variance of the intervention effect estimator under different assumptions about the correlation of outcomes within the same cluster and consider whether there might be simple design choices that are robustly efficient in this sense, even when there is a degree of uncertainty about these unknown correlation parameters. The lower the variance of the intervention effect estimator, the higher the power to detect a given intervention effect at a certain significance level.
We implicitly assume a large number of clusters are available to be randomised. We use generalised least squares methods and asymptotic approximations to calculate the variance of the intervention effect estimator, and we assume, whatever the number of clusters, that we can allocate them to different sequences in whatever allocation ratios we happen to be discussing. A surprisingly high proportion of published stepped wedge trials are conducted with fewer than 10 clusters (33% according to one review). 18 Designing stepped wedge trials with very small numbers of clusters goes against most methodological guidance,19,20 but for practical reasons, we can expect many stepped wedge trials to include only moderate numbers of clusters. We include simulations for one such trial design scenario to illustrate how empirical statistical power matches the power derived from our large-sample formula in this case.
Motivating example
PATHWEIGH is a weight loss intervention for use in a primary care setting that uses tools built into the electronic medical record to improve workflow and address various barriers to prioritising weight management. 21 Suresh and colleagues published a protocol for a stepped wedge trial of PATHWEIGH where the clusters are 57 family and internal medicine clinics in a large health system in Colorado, USA. 21 Patients are identified over a 4-year period beginning 17 March 2020 and are eligible to be included in the trial if they are aged 18 years or over and overweight (body mass index (BMI) ≥25 kg/m2) at an initial, index visit.
The investigators estimate that a minimum of 30 patients per clinic per year will be identified. The primary outcome measure is weight loss 6 months after the index visit, extracted from the electronic medical record. PATHWEIGH, being an electronic intervention, can be ‘turned on’ at a clinic at any time, and a patient’s intervention status is defined according to whether the clinic they attend was in the control (routine care) condition or intervention condition at the patient’s index visit.
The trial randomises clinics 1:1:1 to three sequences, in which the intervention is turned on after 1 year, 2 years, or 3 years, respectively. In terms of the design parameters of a centrosymmetric, three-sequence stepped wedge design, the timing of the first cross-over,
Methods
Statistical model
To describe the general scenario that we consider, we suppose that time is re-scaled so that the recruitment/identification period runs from time 0.0 to time 1.0. For simplicity, we assume that in each cluster we recruit the same number of individuals,
We assume a continuous outcome
where
The parameter
We assume that
In this article, we consider a more general model that allows us to investigate what happens if the ICC decays with increasing separation in time 15
The parameter
Finally, to make headway with deriving a variance for the treatment effect estimator, we make the simplifying assumption that the times at which individuals are recruited from each cluster are regularly spaced, at intervals
Time effect
The variance of the treatment effect estimator will be adjusted for the time effect, under the assumption that this is correctly specified in the analysis model. Exactly which design minimises this variance would seem to depend on the form of the time effect,
While it might seem strange at first sight to choose a model for the time effect that depends on the design, we make this choice for a number of related reasons. First, the regularly spaced recruitment times at each cluster,
In fact, if we were faced with a real-life dataset with
Variance of the treatment effect estimator
If we write outcomes
Then, the generalised least squares estimator for the parameters is obtained as
and the variance of
The results presented in this article were obtained with the help of numerical matrix inversion, making use of the fact that the matrix
Scenarios
Previous work on optimal stepped wedge design suggests, in the case
For given
Contour plots
We transform the variance of the treatment effect estimator to a log scale, and draw contour plots of the log-variance over the design parameter space
Results
Contour plots for

Contour plots of the log of the variance of the treatment effect estimator for different mr and m, where m is the recruitment rate at a cluster, and r is the intra-cluster correlation. Time is scaled from 0 to 1 over the recruitment period. Contour plots are drawn over the design parameter space 0≤s<0:5 and 0≤w<1, where s is the first cross-over time and w is the proportion of clusters allocated to the middle sequence. The solutions s=0, w=1/3, s=0:15, w=1/3, and s=0:25, w=1/3 (see text) are marked with a ‘+’ symbol. Contour lines are separated by log(1.1), so that moving from one contour to the next represents a 10% increase in the variance. The lowest contour value is set at the minimum of the log-variance surface (the small, circular mark on each plot marks this minimum). The factor, τ, by which the intra-cluster correlation decays over unit time is (a) τ=1:0; (b) τ=0:5; (c) τ=0:1.
When
The variance surfaces in the plots have relatively flat bottoms, suggesting, first, that for any given
A design with equally spaced steps and equally weighted sequences (
Design and sample size in the motivating example
In our motivating example, the PATHWEIGH trial, the assumption was that 30 patients would be recruited per year from each clinic over 4 years, so that
Looking at our contour plots, we might then choose
Recall from the Methods that the variance of the treatment effect estimator is a multiple of
where
Table 1 shows the variance of the treatment effect estimator for the design
Total number of clusters needed in the PATHWEIGH example (see text) to achieve 80% power at the 5% significance level to detect different treatment effects, under different scenarios concerning correlations between outcomes from the same cluster. J is the number of clusters, and σ2 is the variance of the outcome.
The theoretical power (from equations (5) and (6)) of each of the designs presented in Table 1 was compared with the empirical power estimated by simulation. For each scenario and design, 1,000 replicated datasets were generated in R, and the continuous time decay model in equations (1) to (4) was fitted using the glmmTMB package in R. 27 This analysis did not include any small-sample correction to mitigate inflation of the nominal Type I error rate and power that might result from the moderate number of clusters. Such corrections are becoming more widely available in software implementations for the analysis of cluster randomised trials, 28 but also add considerably to the processing time, which can make large-scale simulations challenging. Our code can be accessed online (https://github.com/richard-hooper/SW-3sequence-continuous-recruitment). Supplemental Figure 2 displays the results in a nested loop plot. Empirical power closely matched theoretical power when the time-specific ICC was 0.05. Empirical power was inflated when the time-specific ICC was 0.02, and this was particularly evident with the non-standard design. The performance of the glmmTMB package (and alternatives) for analysing data from longitudinal cluster randomised trials with a continuous time decay model for the ICC warrants further investigation.
Conclusion
We have illustrated optimal designs for three-sequence stepped wedge trials with continuous recruitment, under different assumptions about the correlation of outcomes from the same cluster. We conclude that under given assumptions there may be a relatively simple design that offers close-to-optimal efficiency and that there may be designs that are robustly efficient over a wide range of assumptions. If efficiency is offered as one of the justifications for using a stepped wedge design over a parallel groups design, then we should design with optimal efficiency in mind. The focus of this article has been on design, informed by theory. Suitable approaches to analysis that can handle a continuous time model (including the decay in the ICC) and also control the Type I error rate when the number of clusters is moderate or small need further evaluation and comparison.
The way in which the ICC changes over time matters to the design, and it is important to articulate these assumptions when reporting sample size calculations for stepped wedge trials. 25 We assumed a particular parametric form for the decay in the ICC to help us understand the more general impact of this kind of decay on optimal design. Other models for the ICC could, of course, be investigated. As in earlier work, 15 we simplified considerably in assuming that eligible participants present at regular, fixed intervals rather than as a random continuous-time process, but assuming that the arrival rate is constant over time we would expect arrival times in a sample to become increasingly uniformly spread as the rate increases. Simulation studies that have investigated the impact of unevenly spaced arrival times on precision of the treatment effect estimator in the context of stepped wedge designs suggest that this impact is small. 29
Our focus has been on the optimal design of three-sequence trials, but our findings may also offer clues about optimal design with larger numbers of sequences. With more sequences, there are more degrees of freedom to the design space for centrosymmetric designs, which becomes correspondingly harder to visualise and requires more effort to search exhaustively. Nevertheless, previous work on optimal design in the discrete time case, with no decay in the ICC, has shown that the ‘internal’ sequences (i.e. sequences other than the first and last) should all be given equal weight.4,5 There may be similar simplifications when we move over to considering designs with many sequences in the continuous time setting. Ultimately, however, we may prefer a design with fewer sequences for its greater simplicity and practicality.
Supplemental Material
sj-docx-1-ctj-10.1177_17407745241251780 – Supplemental material for Efficient designs for three-sequence stepped wedge trials with continuous recruitment
Supplemental material, sj-docx-1-ctj-10.1177_17407745241251780 for Efficient designs for three-sequence stepped wedge trials with continuous recruitment by Richard Hooper, Olivier Quintin and Jessica Kasza in Clinical Trials
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
