Abstract
The stepped wedge design is an appealing longitudinal cluster randomised trial design. However, it places a large burden on participating clusters by requiring all clusters to collect data in all periods of the trial. The staircase design may be a desirable alternative: treatment sequences consist of a limited number of measurement periods before and after the implementation of the intervention. In this article, we explore the relative efficiency of the stepped wedge design to several variants of the ‘basic staircase’ design, which has one control followed by one intervention period in each sequence. We model outcomes using linear mixed models and consider a sampling scheme where each participant is measured once. We first consider a basic staircase design embedded within the stepped wedge design, then basic staircase designs with either more clusters or larger cluster-period sizes, with the same total number of participants and with fewer total participants than the stepped wedge design. The relative efficiency of the designs depends on the intracluster correlation structure, correlation parameters and the trial configuration, including the number of sequences and cluster-period size. For a wide range of realistic trial settings, a basic staircase design will deliver greater statistical power than a stepped wedge design with the same number of participants, and in some cases, with even fewer total participants.
Introduction
The stepped wedge cluster randomised trial design, which randomises the order in which clusters of participants commence implementation of the intervention, is an appealing yet potentially burdensome design. In typical applications of the design, all clusters start out implementing the control condition and end up implementing the intervention condition, with the timing of the switch from control to intervention being staggered over the trial periods. The design schematic resembles two triangular wedges of control and intervention cells, with a zigzag of steps along the diagonal where the switches occur (Figure 1(a)). This design is particularly appealing because it allows the intervention to be gradually rolled out across the participating clusters and all clusters will eventually receive the intervention.1,2 In a stepped wedge design, all participating clusters collect and provide data in each period of the trial (i.e. in each ‘cluster-period’ cell of the design). Extending data collection over multiple time periods has advantages: the opportunity to collect more data from each cluster, and to switch a cluster from control to intervention so that it acts as its own control, for example. But even in this case, it may not be necessary to collect data from every cluster in every period – indeed, this may be costly or burdensome. For example, a recent study investigating the integration of palliative care into residential aged care facilities in Australia noted the high burden of data collection in a stepped wedge design. 3 In this study, data collection for several outcomes of interest involved completing questionnaires which would have taken large amounts of staff time to complete for each participant throughout the duration of the trial; the researchers described the need to limit the length and number of measures due to the burden data collection would place on staff. Thus, there is a need to investigate alternative, less burdensome designs.

Design schematics for three-sequence designs, shown with a fixed number of clusters per sequence for illustrative purposes: (a) a stepped wedge design with one cluster per sequence and a cluster-period size of
Recent investigations have revealed that not all cluster-period cells in a stepped wedge design contribute equally to the estimation of the treatment effect, and have also considered how we might plan data collection in a stepped wedge trial in a more efficient way. Kasza and Forbes 4 showed that some periods of measurement in stepped wedge designs are much more informative about the treatment effect than others. Specifically, measurements from clusters in periods just before and after the switch from the control to the intervention conditions are more ‘information-rich’ than many cluster-period cells further away from the steps. In addition, cells in the corners of the design schematic may also contribute a relatively large amount of information, depending on the underlying modelling assumptions. More recently, Rezaei-Darzi et al. 5 showed that when low-information cluster-period cells were iteratively removed from a complete stepped wedge design to obtain a series of progressively more ‘incomplete’ designs, designs containing only cluster-period cells around the time of the switch from control to intervention in each sequence often retained adequate power. Similar patterns have emerged in the study of stepped wedge designs where participants are recruited in continuous time. 6 While these studies have considered linear mixed models, cells around the time of the treatment switch also appear to be the most information-rich when marginal models are fit via generalised estimating equations. 7 These findings motivate the further study of incomplete stepped wedge designs, where clusters are not required to provide data in all periods of the trial. In certain situations, incomplete designs such as the staircase design may provide sufficient power for testing new interventions.
The staircase design is a longitudinal cluster randomised trial design with treatment sequences containing a limited number of control periods followed by a limited number of intervention periods, where these measurement periods are staggered over time (e.g. Figure 1(b)). 8 Staircase designs are already being conducted in practice despite limited investigations into the statistical properties of these designs.9–13 A few recent studies have examined the efficiency of incomplete relative to complete stepped wedge designs for particular trial configurations, with staircase-like designs among the incomplete designs considered.5,14,15 Unni et al. 14 compared the power of three incomplete designs (a stepped wedge design with implementation periods, a staircase design with sequences consisting of one control period and two intervention periods, and a batched stepped wedge design), to a complete stepped wedge design in the context of a trial targeting heart failure. Kasza et al. 15 considered the power of four incomplete stepped wedge designs in a different trial context, where the incomplete designs were chosen based on the patterns of information contributed by cluster-period cells for the particular trial example. These designs resembled staircase designs with or without cluster-periods in the corners of the design schematic, including a staircase design with one control period followed by one intervention period in each sequence, and a staircase design with sequences containing a mix of two or three measurement periods around the main diagonal. These studies found that staircase-like designs had notably less power than the complete stepped wedge design, though this is not surprising as the considered incomplete designs simply deleted cluster-period cells from the complete stepped wedge design and therefore included considerably fewer total participants. Staircase-like designs showed more promise by Rezaei-Darzi et al., 5 where they were shown to have low precision loss compared to the complete stepped wedge design in certain trial contexts. An enhanced understanding of staircase designs and how their statistical properties compare to those of stepped wedge designs more generally is imperative.
The set of all possible staircase designs is large, so here we focus on a subclass called ‘basic staircase’ designs. 8 A basic staircase design has just one pre- and one post-switch measurement period in each sequence, with each sequence commencing data collection in a different trial period, forming a zigzag of steps as shown in Figure 1(b) to (d). To get a more complete understanding of when a staircase design might be chosen on efficiency grounds, we will consider several variants of the basic staircase design. Figure 1(b) to (d) depicts example design schematics of these variants. As our measure of efficiency, we will consider the variance (or alternatively, the precision) of the treatment effect estimator, where here we consider the generalised least squares estimator from linear mixed models, as is commonly considered in the design of stepped wedge trials. We will consider continuous outcomes but note that linear mixed models are sometimes also assumed for binary outcomes, for example, Hemming et al. 16 We will evaluate the efficiency of several basic staircase designs relative to the stepped wedge design for a range of realistic trial configuration parameters. First, we aim to determine the situations where a basic staircase design (e.g. Figure 1(b)) can be nearly as efficient as the stepped wedge design from which it was derived (e.g. Figure 1(a)), acknowledging that with other parameters fixed including cluster-period size, the staircase design will always be less efficient simply because it necessarily has fewer participants. We then will consider basic staircase designs with additional clusters assigned to each sequence (e.g. Figure 1(c)) and basic staircase designs with larger cluster-period sizes (e.g. Figure 1(d)), that have the same total number of participants as their comparator stepped wedge designs, and that have fewer total participants. In situations where there is some flexibility in the number of participating clusters or the number of participants that can be measured in each cluster-period, our central question is whether a staircase design can be as precise or more precise than a stepped wedge design.
The article is organised as follows: in Section 2, we introduce the design notation, statistical model, expressions for the variance of the treatment effect estimator for stepped wedge and staircase designs, and the relative efficiency metric. In Section 3, we then assess the relative efficiency of the stepped wedge design compared to different variants of the basic staircase design, with one control period followed by one intervention period in each sequence. We consider a stepped wedge trial inspired by a real trial in Section 4, comparing the efficiency of this design to those reimagined as staircase designs, and provide some concluding remarks in Section 5.
Design characteristics and notation
We first consider standard stepped wedge designs in which all sequences implement the control condition in the first period and the intervention condition in the last period, and each sequence switches over to the intervention in a different intermediate period of the trial. We denote such stepped wedge (SW) designs by
Statistical model for continuous outcomes
We assume a linear mixed model for continuous outcomes appropriate for longitudinal cluster randomised trial designs, with special cases of stepped wedge (with
While
Similarly to Hooper et al.,
17
Girling and Hemming
18
and Kasza et al.,
19
we work with cluster-period averages, that is, model (1) averaged over the m participants’ outcomes within each cluster-period:

Correlation between cluster-period means under a block-exchangeable correlation structure, for cluster-period sizes of
In this section, we provide expressions for the variance of the treatment effect estimator for the stepped wedge design under a variety of correlation structures. The following section provides analogous variance expressions for staircase designs, and Section 3 directly compares these variance expressions.
Under assumptions of either categorical period effects or a linear time effect over the trial periods, the variance of the treatment effect estimator obtained via generalised least squares appropriate for stepped wedge designs with
The variance of the treatment effect estimator obtained via generalised least squares for staircase designs differs from that for stepped wedge designs: the measured cells in all treatment sequences have the same pattern of control and intervention periods (i.e.
We define the relative efficiency as the ratio of the precision of the treatment effect estimator for the basic staircase design to the precision for the stepped wedge design as follows:
Relative design efficiency
Embedded staircase design versus encompassing stepped wedge design
We will first compare the efficiency of a standard stepped wedge design to its embedded basic staircase design (Figure 1(b) depicts an example with three sequences and one cluster per sequence and Figure A1(b) depicts an example with nine sequences and one cluster per sequence). All designs have the same cluster-period size, m, and K clusters allocated to each sequence. Note that since the designs also have the same number of sequences

Relative efficiency for
To put the results in a more direct trial context, we also show the relative efficiencies as contour plots for different values of the underlying parameters of

Relative efficiency for
The same total number of participants as the stepped wedge design
We now compare stepped wedge and basic staircase designs with the same total number of participants. These designs have the same number of sequences, the same cluster-period size, but the stepped wedge designs have just

Relative efficiency for
Figure 6 shows the relative efficiencies for designs with three sequences (left column) and nine sequences (right column), for cluster-period sizes of 10 (top row) and 100 (bottom row), assuming categorical period effects. Since the relative efficiencies involving extended staircase designs are inflated by a factor of

Relative efficiency for
The extended staircase designs in Section 3.2.1 with
Basic staircase design with larger cluster-period size versus stepped wedge design
The same total number of participants as the stepped wedge design
Here we compare stepped wedge and basic staircase designs that again have the same total number of participants, where the cluster-period size for the basic staircase designs is larger than that for the stepped wedge designs and the designs have the same number of sequences and K clusters per sequence (Figure 1(d) depicts an example with three sequences and one cluster per sequence and Figure A1(d) depicts an example with nine sequences and one cluster per sequence). Letting
Figure 7 displays the relative efficiencies for three-sequence designs (left column) and nine-sequence designs (right column), for cluster-period sizes of

Relative efficiency for
Here, rather than inflating the cluster-period size of the basic staircase designs by a factor of
Reimagining a stepped wedge design as a staircase design: The PROMPT trial
In this section, we consider a stepped wedge trial inspired by the PROMPT trial
23
which was a stepped wedge trial aiming to test whether a psychosocial intervention rolled out across different cancer treatment centres (the clusters) could reduce cancer patients’ depression scores. While this trial included just five clusters, we will instead consider a stepped wedge design with five sequences spanning six periods, with eight clusters randomly assigned to each sequence, for a total of 40 clusters (Figure 8, top left). An exchangeable correlation structure with an ICC of
Now suppose that the basic staircase design embedded in the stepped wedge design were to be implemented instead (Figure 8, top middle). This design has 80 cluster-periods of measurement compared to 240 in the stepped wedge design. With the same cluster-period size of 20 and assuming categorical period effects, this basic staircase design would have 70.7% power to detect an effect size of 0.15 with a two-sided significance level of 5%. The relative efficiency of this embedded staircase design compared to the stepped wedge design is 0.653 (as can also be seen for the five-sequence designs in Figure 3 when
A basic staircase design with a larger cluster-period size of 30 (as depicted in Figure 8, top right), that is, a 50% increase, would have 84.1% power and a relative efficiency of 0.91 (only 9% less precise) compared to the stepped wedge design with a cluster-period size of 20. Notably, this design would require 50% fewer total participants, with only 2400 total participants required across the 80 cluster-periods rather than 4800 total participants across the 240 cluster-periods in the stepped wedge design. Figure S1 of the Supplemental Material shows that a five-sequence extended basic staircase design with twice as many clusters as the stepped wedge would yield a relative efficiency of around 1.25 when
We note that these power and relative efficiency calculations use formulae based on asymptotic properties. For designs with a small number of clusters, theoretical power may not reflect actual power; however, for the designs considered here with 40–60 clusters, the empirical power and relative efficiency values align fairly closely with the theoretical values (see Section C of the Supplemental Material for details).

Design schematics for the designs inspired by the PROMPT trial: a stepped wedge design (top left), a basic staircase design with the same cluster-period size as the stepped wedge design (top middle), a basic staircase design with a 50% larger cluster-period size than the stepped wedge design (top right), each with five sequences
The basic staircase design is a particularly lean and potentially powerful alternative to the stepped wedge design. Basic staircase designs make use of only the immediate pre- and post-switch periods for each cluster, which are the cluster-periods in a stepped wedge design that have been shown to contribute a great deal of information to the estimation of the treatment effect. 4 At the trial design stage, trialists who are considering conducting a stepped wedge design may wish to also consider a basic staircase design if the candidate stepped wedge design yields power greater than what is required, or if there is some flexibility in the number of participants that could be measured in each cluster-period or in the number of available clusters. Without modification to cluster-period size or the number of participating clusters, the embedded staircase design contains a subset of the participants in the encompassing stepped wedge design and so will always be less powerful to some degree. However, the loss of efficiency associated with the use of an embedded staircase design instead of the stepped wedge design is far less than the proportionate reduction in the number of participants for most realistic trial settings. Moreover, in many realistic settings, that is, for many realistic combinations of correlation parameter values, the embedded staircase design is only slightly less efficient than the stepped wedge design. If the cluster-period size or number of clusters could feasibly be higher, then a basic staircase design can be more powerful than the stepped wedge design for the same number of total participants, as shown in Sections 3.2.1 and 3.3.1. Some basic staircase designs can even achieve power equal to or greater than stepped wedge designs while requiring measurements on fewer total participants, as was seen in Sections 3.2.2 and 3.3.2. Thus, depending on the design itself, the assumed correlation structure and parameters, and the desired effect size, there will be scenarios in which the staircase design offers sufficient power, while placing less of a burden on participating clusters than would the comparator stepped wedge design.
The variation in the relative efficiencies for the design comparisons in Section 3.1 can in part be explained by understanding when the treatment effect estimators for the stepped wedge and staircase designs do and do not have similar forms. Matthews and Forbes
25
showed that the stepped wedge design estimator uses a weighted combination of ‘vertical’ (within-column) and ‘horizontal’ (row-column) estimators of the treatment effect. In particular, the vertical estimator is a weighted combination of an intuitive form of within-column contrasts, namely the mean of the outcomes of the intervention cells minus the mean of the outcomes of the control cells within each column (period). Grantham et al.
8
showed that the basic staircase design estimator where categorical time effects are assumed is also a vertical estimator based on this same form of contrasts, simplified to one intervention and one control cell within each column. Note that the key results in Matthews and Forbes depend on the correlation between cluster-period means,
Across the comparisons between various basic staircase designs and a stepped wedge design, we observed that the relative efficiency is quite sensitive to the number of sequences and the strength of correlation between cluster-period means,
This article focused on situations where a block-exchangeable correlation structure and categorical period effects were assumed, for a repeated cross-sectional sampling scheme and designs without implementation periods. Other assumed forms for the ICC and time effects may be more appropriate for a given trial context, such as a discrete-time decay correlation structure and/or a linear effect of time over the trial periods. Section D of the Supplemental Material displays and describes relative efficiency results under these alternative assumptions. Results for a cohort sampling scheme and designs including implementation periods are provided in Section E of the Supplemental Material. In general, the relative efficiency results under these alternative assumptions, sampling schemes and design types are not vastly different from those in Section 3.
The relative efficiency results in this article apply to estimators obtained from linear mixed models for continuous outcomes and further work is required for other outcome types and/or the use of marginal models. Results for marginal models, for continuous and discrete outcomes, show that the cells near the time of the treatment switch in stepped wedge designs are typically the most information-rich cells.7,26 This is similar to results observed for the linear mixed model setting we explore, and suggests that embedded basic staircase designs comprised of only the cells before and after the treatment switch would be relatively efficient compared to the encompassing stepped wedge designs in these other settings too. However, the efficiency of staircase designs has not been directly compared to that of stepped wedge designs for binary and count outcomes using marginal models, and so the magnitude of the relative efficiencies may differ from our results. 7 In contrast to the results obtained for the linear mixed model setting we consider, recent results for continuous outcomes modelled using marginal models under a working independence assumption show that some cells further from the treatment switch in the stepped wedge can have information content less than one. Thus, in this setting, embedded staircase designs can be even more efficient than the encompassing stepped wedge designs. 26 This was shown for models with exchangeable and discrete-time decay correlation structures, but we would expect similar results for the block-exchangeable correlation structure primarily considered in this article.
This article considered when different basic staircase designs may be desirable over a stepped wedge design on the grounds of statistical efficiency, but feasibility and the associated trial costs would also factor into the choice of design. From a feasibility standpoint, a staircase design may hold greater appeal than a stepped wedge design in settings where data collection is particularly onerous. For example, in the INSPIRED stepped wedge trial assessing the impact of Palliative Care Needs Rounds in care homes, 3 secondary outcomes of interest pertained to subjective measures of patients’ quality of death and dying that were captured through surveys conducted by care home staff. In that study, the researchers needed to reduce both the length and number of outcome measures to reduce the burden of data collection on the staff, and train a large number of staff per site in data collection due to high staff turnover. Had a staircase design been considered, clusters would have been involved for much less time and may have had greater short-term capacity to take a variety of measurements. In addition, perhaps fewer staff would need to be trained. In choosing among the staircase design variants, there are likely to be setting-specific constraints on, for instance, the number of clusters and the number of participants that could be measured in each cluster-period that may guide the choice. From a trial cost standpoint, the staircase design variants would have different associated costs relative to the cost of the stepped wedge design. For example, suppose a simple formula for the total trial cost accounts for the cost of including a cluster and the cost of including and measuring a participant such as that considered in Grantham et al. 27 Then compared to the stepped wedge design, the embedded basic staircase would have a lower trial cost (an equal number of clusters but fewer total participants), the extended staircase would have a larger trial cost (greater number of clusters, same total participants), and the basic staircase with a larger cluster-period size would have the same trial cost (equal number of clusters and total participants). Additional factors could also affect the cost and feasibility; for example, it is possible that increasing the number of participants per cluster-period would require additional staff to take these measurements or implement the intervention. While beyond the scope of this article, it would be of interest to consider whether the additional cost and effort associated with including additional clusters or taking more measurements in each cluster-period would outweigh the benefits of increased precision that these staircase design variants would bring.
In conclusion, in this article, we have demonstrated that the basic staircase design is an alternative to the stepped wedge design that will, for a wide range of realistic design parameters, require fewer participants than the stepped wedge design but still have comparable power to detect treatment effects. Further, we have shown that there are scenarios in which slight increases in cluster-period sizes or the number of clusters recruited to a staircase design may lead to a design with greater power compared to the stepped wedge design, for the same total number of participants or even with fewer total participants. Whether such modifications would be feasible depends on the trial context: hence the appropriateness of the staircase design is heavily dependent on the setting in which a trial is being conducted. These observations are also dependent on design parameters, and thus we would encourage researchers who are planning to conduct a stepped wedge design to investigate staircase designs as potentially less burdensome alternatives.
Supplemental Material
sj-pdf-1-smm-10.1177_09622802251317613 - Supplemental material for The relative efficiency of staircase and stepped wedge cluster randomised trial designs
Supplemental material, sj-pdf-1-smm-10.1177_09622802251317613 for The relative efficiency of staircase and stepped wedge cluster randomised trial designs by Kelsey L Grantham, Andrew B Forbes, Richard Hooper and Jessica Kasza in Medical Research
Footnotes
Data availability
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Australian Research Council (grant number Discovery Project DP210101398).
Supplemental material
Supplemental material for this article is available online.
Appendix A
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
