Abstract
This article introduces the ‘staircase’ design, derived from the zigzag pattern of steps along the diagonal of a stepped wedge design schematic where clusters switch from control to intervention conditions. Unlike a complete stepped wedge design where all participating clusters must collect and provide data for the entire trial duration, clusters in a staircase design are only required to be involved and collect data for a limited number of pre- and post-switch periods. This could alleviate some of the burden on participating clusters, encouraging involvement in the trial and reducing the likelihood of attrition. Staircase designs are already being implemented, although in the absence of a dedicated methodology, approaches to sample size and power calculations have been inconsistent. We provide expressions for the variance of the treatment effect estimator when a linear mixed model for an outcome is assumed for the analysis of staircase designs in order to enable appropriate sample size and power calculations. These include explicit variance expressions for basic staircase designs with one pre- and one post-switch measurement period. We show how the variance of the treatment effect estimator is related to key design parameters and demonstrate power calculations for examples based on a real trial.
Keywords
Introduction
Longitudinal cluster randomised trial designs such as the stepped wedge are made up of sequences where clusters may switch between implementing the control and intervention conditions over several trial periods. 1 The standard stepped wedge design, where all clusters begin in the control condition and then switch over just once to the intervention condition at staggered times over the trial, requires clusters to collect and provide measurements on subjects’ outcomes in every period of the trial. In some trial settings, there may be plenty of time to stagger the introduction of the intervention in different clusters, but it may be burdensome or costly to collect individual-level data in each cluster for the entire trial duration. Indeed, trials in practice are implementing non-standard stepped wedge designs despite the absence of appropriate methodology. Pragmatic alternatives to the stepped wedge design that do not require clusters to be involved for the entire trial duration are urgently needed, together with the underlying statistical theory to support their implementation.
Derived from the zigzag pattern of steps along the main diagonal of a stepped wedge which has been shown to contain a large amount of information for estimation of the treatment effect, 2 a staircase design is a novel longitudinal cluster randomised trial design in which the sequences consist of a limited number of pre-switch control periods and post-switch intervention periods. The staircase design shares several practical advantages with a stepped wedge design while reducing the implementation and data collection requirements from each cluster. As with a stepped wedge design, each sequence in a staircase design contains one unidirectional switch from the control to the intervention condition. This makes it suitable for testing interventions that cannot easily be revoked once implemented, such as education and training programmes where new knowledge is provided as part of the intervention. Moreover, this means that all participating clusters eventually receive the intervention during the course of the trial. Staircase designs also have the same staggered rollout of the intervention across clusters as with the stepped wedge, making it more logistically feasible to introduce the intervention over time. However, staircase designs may be more appealing to participating clusters than the stepped wedge: Each cluster receives the intervention sooner upon commencing data collection, and is only required to contribute data in a limited number of periods rather than in all periods of the trial.
Trials with staircase-like designs are already being implemented, driven by the need for a less burdensome design than the complete stepped wedge design. A trial from 2020, for example, sought to test the effectiveness of a peripheral intravenous catheter flushing education programme on all-cause catheter failure across several wards in a hospital. 3 The researchers wanted a design that would enable the staggered rollout of the intervention, but with a limited number of periods in each sequence to minimise the measurement burden on the wards. Another trial with a limited number of pre- and post-switch measurement periods in each sequence sought to test whether an education programme to improve self-regulation could reduce students’ disruptive behaviour across schools in remote Aboriginal communities. 4 The outcome measures for each student were determined by questionnaires completed by teachers and parents, and so data collection was somewhat onerous. The researchers stated that ‘the burden of data collection would have been too great in a stepped wedge’; in addition, a stepped wedge would have been too costly and also difficult to implement geographically as it would have required repeated physical collection of questionnaires from remote locations over an extended period of time.
While many of the trials with staircase-like designs have been referred to as stepped wedge designs,4–7 much of the methodology for stepped wedge designs assumes a complete design where all clusters provide measurements on subjects’ outcomes in all periods of the trial. Many formulae and publicly available tools for sample size and power calculations appropriate for stepped wedge designs do not readily extend to these types of ‘incomplete’ designs with periods of no measurement. This leads researchers into dangerous and uncharted territory at the trial design stage: formulae appropriate for stepped wedge designs may underestimate the required number of clusters for a desired level of power or overestimate trial power for a given sample size if applied to staircase designs. Despite some researchers explicitly acknowledging this issue, 4 in many cases it remains unclear how researchers conducted sample size and power calculations for trials with a staircase design in the absence of dedicated methodology.
Staircase designs have so far only made peripheral appearances in methodology papers in the cluster randomised trial design literature. These types of designs have been used as examples of incomplete stepped wedge designs, most commonly a staircase design with one control period followed by two intervention periods in each sequence.8,9 Designs with measurements concentrated along the main zigzag diagonal of a stepped wedge design have also arisen from investigations of the efficiency of potentially incomplete stepped wedge designs.10,11 A staircase design with treatment sequences consisting of just one control period followed by one intervention period can also be viewed as an extension of the dog-leg design. 12 While there is some indication that staircase designs can be efficient alternatives to the stepped wedge, 11 knowledge of the mathematical and statistical properties of these designs is lacking, thus limiting the uptake and proper implementation of such designs.
In this article, we formally introduce the staircase design by describing its properties and providing formulae to enable appropriate sample size and power calculations when a linear mixed model for an outcome is assumed. In Section 2 we present the notation, statistical model and an expression for the variance of the treatment effect estimator appropriate for general staircase designs. Section 3 focuses on basic staircase designs with one pre- and one post-switch measurement period in each sequence which permit explicit formulae for the variance of the treatment effect estimator. In Section 4 we demonstrate sample size and power calculations for staircase designs motivated by a real trial example and describe and demonstrate the use of some publicly available tools appropriate for staircase designs. Section 5 offers a discussion of our results and describes areas for further research.
Staircase designs
Design characteristics
A staircase design consists of overlapping treatment sequences that start in the control condition for one or more periods followed by the intervention condition for one or more periods, with periods of no measurements at one or both ends; design schematics for several staircase designs, each with six clusters, are shown in Figure 1. Each unique sequence begins taking measurements in a different period of the trial. We denote the general staircase design by

Design schematics for several staircase designs with 6 clusters: a basic staircase with two clusters assigned to each of three unique sequences (top left), a basic staircase with one cluster assigned to each of six unique sequences (top right), a balanced staircase with two control periods followed by two intervention periods in each sequence and one cluster assigned to each of six unique sequences (bottom left), and an imbalanced staircase with one control period followed by two intervention periods in each sequence and one cluster assigned to each of six unique sequences (bottom right).
We define a balanced staircase design as having an equal number of pre-switch control periods and post-switch intervention periods in each sequence so that
Individual-level model
Letting
The effect of time is encoded by specifying a form for
Intracluster correlation structures
Several forms for
The block-exchangeable, or constant between-period intracluster correlation structure, is obtained if we assume that
Cluster-period mean level model
For both time parameterisations and the intracluster correlation structures described above, time is synonymous with study period, and is treated as a discrete phenomenon. As such, model (1) can be collapsed to cluster-period means without loss of information.
18
Letting
While the models above are appropriate for a sampling scheme where different subjects are measured in each period and just one measurement is taken on each subject (e.g., a ‘repeated cross sectional’ sampling scheme),
19
a cohort design could be modelled by including a subject-level random effect term to account for repeated measures on the same subject. Specifically, we could include the term
Variance of the treatment effect estimator for general staircase designs
We present a formula for the variance of the treatment effect estimator for general staircase designs,
We show in Section A of the Supporting Information that the variance of the treatment effect estimator for a general staircase design can be represented as:
Variance and correlation parameters for the basic staircase
For the basic staircase design,
Variance of the treatment effect estimator, four-sequence design
Treatment effect estimator
First consider the

(a) Design schematic for a four-sequence basic staircase design; (b) visualisation of the treatment effect estimator in terms of the mean outcomes from the measured cluster-periods, assuming categorical period effects; (c) visualisation of the treatment effect estimator in terms of the mean outcomes from the measured cluster-periods, assuming a linear time effect.
In the subsections to follow, we derive explicit expressions for the best linear unbiased treatment effect estimator and its variance, first assuming categorical period effects and then assuming a linear effect of time.
Under model (2) and assuming categorical period effects such that
Condition (i) implies that the outcomes from the cluster-periods in the first and last periods of the design,
Condition (ii) suggests that the weights on cells in the same period but different clusters have the same magnitude but opposite signs. We can then rewrite the general treatment effect estimator in terms of three unique weights, letting
Then the variance of the treatment effect estimator is given by
We can then find the weights that give lowest variance by minimising the expression
Weights on cluster-period mean differences, depending on the correlation
Finally, plugging these weights back into the previous variance expression and simplifying gives
Using a similar approach to that outlined in Section 3.2.2, we show in Supporting Information Section B.2 that when assuming linear rather than categorical period effects, for a four-sequence basic staircase, the treatment effect estimator can be written as a linear combination of differences between centrosymmetric pairs of cluster-period cells:
Finally, the variance of the treatment effect estimator is given by
Obtaining general results
We consider two different approaches to obtaining explicit expressions for the variance of the treatment effect estimator for a basic staircase design with S sequences: extending the approach from Section 3.2 or by simplifying expression (3) using matrix algebra and results on explicit inverses of particular matrices. We briefly describe these approaches in the subsections to follow, with more detail available in Supporting Information Section C.
Categorical period effects
Using a similar approach to that used in Section 3.2.2 for a four-sequence design, we show in Section C.1 of the Supporting Information that for a general Variance of the treatment effect estimator for varying within-period intracluster correlation (ICC) values, assuming categorical period effects, for a basic staircase design with three and 10 sequences (columns) and cluster-period sizes of 10 and 100 (rows), where each subject is measured just once. The lines within each subplot correspond to different cluster autocorrelation values.

Another approach for deriving the variance is by simplifying expression (3) with results appropriate for the basic staircase design. Assuming categorical period effects, the matrices encoding the time effects,
Figure 3 displays the variance of the treatment effect estimator against the full range of within-period ICC values under the assumption of categorical period effects, for basic staircase designs with differing numbers of sequences (columns) and cluster-period sizes (rows), for several different cluster autocorrelation values. There appears to be a near-linear relationship between the variance of the treatment effect estimator and the within-period ICC. It tends to be an increasing relationship, however we see a slight decreasing relationship for the design with 10 sequences, a cluster-period size of 10, and for a cluster autocorrelation of

Variance of the treatment effect estimator for varying cluster-period sizes, assuming categorical period effects, for a basic staircase design with 10 sequences, where each subject is measured just once, and for within-period ICC values of 0.05 (left) and 0.2 (right). The lines within each subplot correspond to different cluster autocorrelation values.
In a generalisation of the results in Section 3.2.3 for a four-sequence design, the treatment effect estimator for an S-sequence basic staircase design where linear period effects are assumed can similarly be written as a weighted sum of the differences between centrosymmetric cluster-period cells, i.e.,

Variance of the treatment effect estimator for varying within-period intracluster correlation (ICC) values, assuming linear period effects, for a basic staircase design with three and 10 sequences (columns) and cluster-period sizes of 10 and 100 (rows), where each subject is measured just once. The lines within each subplot correspond to different cluster autocorrelation values.
Figure 5 displays the variance of the treatment effect estimator from expression (8) against a range of within-period ICC values, for basic staircase designs with differing numbers of sequences (columns) and cluster-period sizes (rows), for several different cluster autocorrelation values. The observations above for categorical period effects are borne out: higher values of r and lower values of

Variance of the treatment effect estimator for varying cluster-period sizes, assuming linear period effects, for a basic staircase design with 10 sequences, where each subject is measured just once, and for within-period intracluster correlation (ICC) values of 0.05 (left) and 0.2 (right). The lines within each subplot correspond to different cluster autocorrelation values.
Sample size and power calculations using formulae
The variance expressions provided in Sections 2.3 and 3.3, together with a standard power formula, can be used to calculate the power of a trial for a desired effect size or the required number of clusters for a particular cluster-period size and desired level of power. Expression (3) is applicable to general staircase designs, and expressions (6) and (7) are applicable to basic staircase designs, with terms a and
Staircase trial example
One example of a planned staircase design is described by White-Traut et al.
21
as a cluster randomised trial across four neonatal intensive care units (NICUs) to test whether a behavioural intervention for preterm infants leads to improved growth while in the NICU. The researchers specified a design with four unique sequences and four measurement periods in each sequence, with each sequence consisting of one control period followed by three intervention periods and each sequence commencing data collection in a different period. Labelled an ‘incomplete stepped wedge’ in the protocol paper, using the terminology and notation in this paper, this design would be deemed an ‘imbalanced staircase’, described in our notation as

Design schematics for a basic staircase design with one cluster assigned to each of four unique sequences (left) and an imbalanced staircase design with one control period followed by three intervention periods in each sequence and one cluster assigned to each of four unique sequences (right).
We first consider a basic staircase design for this trial setting and aim:
If categorical period effects were assumed, the variance of the treatment effect estimator for this design obtained from (4) is
If a staircase design with more than one control and/or intervention period were of interest, such as a
While statisticians and researchers could use our results to manually calculate sample size and power for their studies, some may wish to use existing tools to do so. Some, but not all, of the existing tools and software appropriate for stepped wedge and other cluster randomised trial designs, can also be used to calculate required sample size and power for staircase designs. 22 A requirement is that the tool or software can accommodate incomplete designs with periods in which no measurements are taken in some clusters, either by allowing users to specify an incomplete design schematic directly or by specifying an ‘inclusion’ matrix to exclude certain cluster-period cells from a standard design schematic. Further requirements to cover the scenarios in this paper are options for different intracluster correlation structures including the block-exchangeable and discrete-time decay correlation structures, and both categorical and linear forms for time. As of the time of writing, tools that fulfil all of the above requirements include the SteppedPower R package 23 and a SAS macro called %CRTFASTGEEPWR 24 ; a freely available web app called the Shiny CRT calculator 25 fulfils all of the above except allowing different forms for time. 22
The main power function in the SteppedPower R package 23 has an ‘incomplete’ argument: a scalar input retains the specified number of pre- and post-switch periods in each treatment sequence from a stepped wedge design, or a matrix input can be used to denote inclusion (cells with a value of 1) or exclusion (cells with a value of 0) of the cluster-period cells from a complete stepped wedge design schematic. A balanced staircase design, for example, could be specified by defining a complete stepped wedge along with ‘incomplete = 2’; however, the first and last sequences of the resulting design are truncated, lacking an initial control period and final intervention period, respectively. We note that this will not affect the variance of the treatment effect estimator or power when assuming categorical period effects but it will affect the resulting calculation if assuming linear period effects. We expand upon this point in Section 5. The %CRTFASTGEEPWR SAS macro 24 takes a matrix for the design specification, allowing incomplete designs through the use of a ‘2’ for cluster-period cells in which no measurements are taken. The Shiny CRT calculator web app 25 allows users to upload a design schematic as a CSV file. Staircase designs can be specified similarly to the design schematics in Figure 1, with empty cells for cluster-periods in which no measurements are taken. This app does not allow users to specify different time parameterisations, but rather assumes categorical period effects for all calculations. Sample code for each of these implementations is provided in Appendix A.
Discussion
In this article, we formally introduced the staircase design, considering the properties of this design when a linear model for the outcome is assumed, focusing on the properties of the basic staircase design with one pre- and one post-switch measurement period. We derived a simplified analytical expression for the variance of the treatment effect estimator for a general staircase design to enable appropriate sample size and power calculations for these designs. In addition, we derived explicit expressions for the variance of the treatment effect estimator for the basic staircase design under assumptions of categorical and linear period effects.
A staircase design is a pragmatic alternative to a stepped wedge design. Since sequences in a staircase design contain only a limited number of pre- and post-switch measurement periods, this design can potentially be much less burdensome and expensive than a stepped wedge. These reduced data collection requirements would likely be more enticing for participating clusters and could carry a reduced risk of attrition of clusters during the trial. Beyond the appeal of the limited number of measurement periods, the staggering of these sequences means that each cluster would receive the intervention sooner upon commencing data collection. This could also be appealing for participating clusters, but in practice it would mean that not all clusters would be actively involved at all times. It is possible that this reduced engagement could cause inactive clusters to lose interest and withdraw from the trial.
The implications of clustering for the efficiency of a basic staircase design is similar to other longitudinal cluster randomised trial designs in some ways but different in others. The variance of the treatment effect estimator is a linear or near-linear function of the within-period ICC. Under a discrete-time decay correlation structure, the stepped wedge design has a nonlinear relationship between the variance and within-period ICC. 17 As with other cluster randomised trial designs, a basic staircase design sees diminishing returns from increasing the cluster-period size: beyond a certain cluster-period size, minimal useful information about the treatment effect can be gained by measuring more subjects in a cluster-period. 26
Unlike the stepped wedge design for which assuming categorical or linear time effects yields the same variance expression and hence same calculated sample size for the same parameter value inputs, 13 we obtain different expressions under these different assumptions for the basic staircase design. As seen here and also noted elsewhere, assuming a linear time effect yields a slightly lower variance of the treatment effect estimator than assuming categorical period effects. 14 A more conservative sample size or power calculation could therefore be conducted by assuming categorical rather than linear period effects.
We have derived explicit expressions for the variance of the treatment effect estimator for the basic staircase design and an expression for general staircase designs in terms of the matrices corresponding to model (2) at the cluster-period mean level. These expressions can be used in sample size and power calculations for staircase designs. While we would have preferred an explicit expression for general staircase designs, further simplification of expression (3) would require inverting the matrix
The variance of the treatment effect estimator expression we derived for general staircase designs assumes that the observed periods in each sequence follow the same schedule of control and intervention periods, simply shifted in time; there may be a benefit to sequences following different schedules. For instance, Kasza et al. 11 considered a staircase-like design where sequences commencing data collection earlier in the trial had more intervention periods than control periods and sequences appearing later in the trial had more control periods than intervention periods. These types of designs also naturally arise if a staircase design with more than one pre- and/or post-switch period is considered as a subset of the cluster-periods in a complete stepped wedge: since a standard stepped wedge has just one all-control period at the start and one all-intervention period at the end, any staircase design other than a basic staircase taken as a subset of the stepped wedge will have some of its sequences truncated at the edges of the stepped wedge. For example, this occurs when specifying a staircase design with the SteppedPower R package 23 : it can be obtained by specifying an incomplete stepped wedge design, however, the sequences at the edges of the design schematic may be truncated.
Interestingly, the cluster-period measurements in the first and last periods are not used in the treatment effect estimator obtained from generalised least squares under an assumption of categorical period effects because the treatment and time effects cannot be separated. For example, the treatment effect estimator for a three-sequence basic staircase design matches that for the three-sequence dog-leg design, in which the first and third sequences contain only one period each, an intervention and control period, respectively. 29 However, under the stronger assumption of a linear time effect, the cluster-period measurements in the first and last periods do contribute to the treatment effect estimate, albeit with less weight than measurements from the intermediate cluster-periods of the trial.
The formulae we derive in this paper pertain to the generalised least squares estimator of the treatment effect, are based on asymptotic properties, and make the typical assumption of known variance and correlation parameters. The Type I error rate may be elevated and actual power may be lower than the theoretical power calculated with these formulae, particularly if the trial includes few clusters because there are few degrees of freedom for estimating the variance components. 30 Further, correlation parameter estimates arising from such trials ought to be treated with a degree of caution. When planning trials, simulation studies can be conducted to provide further insight into the performance of these estimators.
In this paper we have considered the staircase design in the context of linear mixed models for outcomes. Work is currently underway on the efficiency of staircase designs as compared with alternative incomplete designs and with complete designs run over fewer time periods. Future work will consider the properties of these designs when outcomes are modelled using non-linear link functions and estimation of the treatment effect is via generalised estimating equations. We expect that several of our theoretical results will provide useful starting points for these investigations. We also plan to investigate the properties of other types of staircase designs beyond the basic staircase in upcoming work. In particular, we intend to examine the efficiency of balanced designs with more than one pre- and post-switch measurement period (i.e., with
Supplemental Material
sj-pdf-1-smm-10.1177_09622802231202364 - Supplemental material for The staircase cluster randomised trial design: A pragmatic alternative to the stepped wedge
Supplemental material, sj-pdf-1-smm-10.1177_09622802231202364 for The staircase cluster randomised trial design: A pragmatic alternative to the stepped wedge by Kelsey L Grantham, Andrew B Forbes, Richard Hooper and Jessica Kasza in Statistical Methods in Medical Research
Footnotes
Acknowledgements
We thank the reviewer for insightful comments that significantly improved the paper.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Australian Research Council (grant number Discovery Project DP210101398).
Supplemental material
Supplemental materials for this article are available online.
Appendix
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
