This article introduces the ‘staircase’ design, derived from the zigzag pattern of steps along the diagonal of a stepped wedge design schematic where clusters switch from control to intervention conditions. Unlike a complete stepped wedge design where all participating clusters must collect and provide data for the entire trial duration, clusters in a staircase design are only required to be involved and collect data for a limited number of pre- and post-switch periods. This could alleviate some of the burden on participating clusters, encouraging involvement in the trial and reducing the likelihood of attrition. Staircase designs are already being implemented, although in the absence of a dedicated methodology, approaches to sample size and power calculations have been inconsistent. We provide expressions for the variance of the treatment effect estimator when a linear mixed model for an outcome is assumed for the analysis of staircase designs in order to enable appropriate sample size and power calculations. These include explicit variance expressions for basic staircase designs with one pre- and one post-switch measurement period. We show how the variance of the treatment effect estimator is related to key design parameters and demonstrate power calculations for examples based on a real trial.
Longitudinal cluster randomised trial designs such as the stepped wedge are made up of sequences where clusters may switch between implementing the control and intervention conditions over several trial periods.1 The standard stepped wedge design, where all clusters begin in the control condition and then switch over just once to the intervention condition at staggered times over the trial, requires clusters to collect and provide measurements on subjects’ outcomes in every period of the trial. In some trial settings, there may be plenty of time to stagger the introduction of the intervention in different clusters, but it may be burdensome or costly to collect individual-level data in each cluster for the entire trial duration. Indeed, trials in practice are implementing non-standard stepped wedge designs despite the absence of appropriate methodology. Pragmatic alternatives to the stepped wedge design that do not require clusters to be involved for the entire trial duration are urgently needed, together with the underlying statistical theory to support their implementation.
Derived from the zigzag pattern of steps along the main diagonal of a stepped wedge which has been shown to contain a large amount of information for estimation of the treatment effect,2 a staircase design is a novel longitudinal cluster randomised trial design in which the sequences consist of a limited number of pre-switch control periods and post-switch intervention periods. The staircase design shares several practical advantages with a stepped wedge design while reducing the implementation and data collection requirements from each cluster. As with a stepped wedge design, each sequence in a staircase design contains one unidirectional switch from the control to the intervention condition. This makes it suitable for testing interventions that cannot easily be revoked once implemented, such as education and training programmes where new knowledge is provided as part of the intervention. Moreover, this means that all participating clusters eventually receive the intervention during the course of the trial. Staircase designs also have the same staggered rollout of the intervention across clusters as with the stepped wedge, making it more logistically feasible to introduce the intervention over time. However, staircase designs may be more appealing to participating clusters than the stepped wedge: Each cluster receives the intervention sooner upon commencing data collection, and is only required to contribute data in a limited number of periods rather than in all periods of the trial.
Trials with staircase-like designs are already being implemented, driven by the need for a less burdensome design than the complete stepped wedge design. A trial from 2020, for example, sought to test the effectiveness of a peripheral intravenous catheter flushing education programme on all-cause catheter failure across several wards in a hospital.3 The researchers wanted a design that would enable the staggered rollout of the intervention, but with a limited number of periods in each sequence to minimise the measurement burden on the wards. Another trial with a limited number of pre- and post-switch measurement periods in each sequence sought to test whether an education programme to improve self-regulation could reduce students’ disruptive behaviour across schools in remote Aboriginal communities.4 The outcome measures for each student were determined by questionnaires completed by teachers and parents, and so data collection was somewhat onerous. The researchers stated that ‘the burden of data collection would have been too great in a stepped wedge’; in addition, a stepped wedge would have been too costly and also difficult to implement geographically as it would have required repeated physical collection of questionnaires from remote locations over an extended period of time.
While many of the trials with staircase-like designs have been referred to as stepped wedge designs,4–7 much of the methodology for stepped wedge designs assumes a complete design where all clusters provide measurements on subjects’ outcomes in all periods of the trial. Many formulae and publicly available tools for sample size and power calculations appropriate for stepped wedge designs do not readily extend to these types of ‘incomplete’ designs with periods of no measurement. This leads researchers into dangerous and uncharted territory at the trial design stage: formulae appropriate for stepped wedge designs may underestimate the required number of clusters for a desired level of power or overestimate trial power for a given sample size if applied to staircase designs. Despite some researchers explicitly acknowledging this issue,4 in many cases it remains unclear how researchers conducted sample size and power calculations for trials with a staircase design in the absence of dedicated methodology.
Staircase designs have so far only made peripheral appearances in methodology papers in the cluster randomised trial design literature. These types of designs have been used as examples of incomplete stepped wedge designs, most commonly a staircase design with one control period followed by two intervention periods in each sequence.8,9 Designs with measurements concentrated along the main zigzag diagonal of a stepped wedge design have also arisen from investigations of the efficiency of potentially incomplete stepped wedge designs.10,11 A staircase design with treatment sequences consisting of just one control period followed by one intervention period can also be viewed as an extension of the dog-leg design.12 While there is some indication that staircase designs can be efficient alternatives to the stepped wedge,11 knowledge of the mathematical and statistical properties of these designs is lacking, thus limiting the uptake and proper implementation of such designs.
In this article, we formally introduce the staircase design by describing its properties and providing formulae to enable appropriate sample size and power calculations when a linear mixed model for an outcome is assumed. In Section 2 we present the notation, statistical model and an expression for the variance of the treatment effect estimator appropriate for general staircase designs. Section 3 focuses on basic staircase designs with one pre- and one post-switch measurement period in each sequence which permit explicit formulae for the variance of the treatment effect estimator. In Section 4 we demonstrate sample size and power calculations for staircase designs motivated by a real trial example and describe and demonstrate the use of some publicly available tools appropriate for staircase designs. Section 5 offers a discussion of our results and describes areas for further research.
Staircase designs
Design characteristics
A staircase design consists of overlapping treatment sequences that start in the control condition for one or more periods followed by the intervention condition for one or more periods, with periods of no measurements at one or both ends; design schematics for several staircase designs, each with six clusters, are shown in Figure 1. Each unique sequence begins taking measurements in a different period of the trial. We denote the general staircase design by , where S is the number of unique treatment sequences, with K clusters assigned to each sequence and comprising control periods followed by intervention periods. The total number of clusters included in the trial is therefore equal to and the total number of periods the trial spans is equal to . Clusters assigned to sequence s are observed in periods s to only. Further extensions to this family of designs are possible, for example, each of the unique treatment sequences might be offset by more than one period relative to the previous sequence, but in this article, we limit attention to the framework specified.
Design schematics for several staircase designs with 6 clusters: a basic staircase with two clusters assigned to each of three unique sequences (top left), a basic staircase with one cluster assigned to each of six unique sequences (top right), a balanced staircase with two control periods followed by two intervention periods in each sequence and one cluster assigned to each of six unique sequences (bottom left), and an imbalanced staircase with one control period followed by two intervention periods in each sequence and one cluster assigned to each of six unique sequences (bottom right).
We define a balanced staircase design as having an equal number of pre-switch control periods and post-switch intervention periods in each sequence so that . We further define a basic staircase design as a special case of the balanced staircase design, with sequences comprised of measurements in just one control period followed by one intervention period , denoted by . An imbalanced staircase design may have different numbers of pre- and post-switch periods in each sequence so that . The basic staircase design is comprised of clusters and a total of periods. While the basic staircase design is embedded within a standard stepped wedge design with S unique sequences and K clusters per sequence spanning periods (i.e., with just one control period before the first intervention step and one intervention period after the last step), staircase designs with more than one pre- and/or post-switch measurement period are not entirely contained within a standard stepped wedge with the same number of sequences. Sections 2.2 and 2.3 present a statistical model and the associated expression for the variance of the treatment effect estimator for general designs, and Section 3 presents explicit expressions for the basic staircase design.
Statistical model for continuous outcomes
Individual-level model
Letting represent the outcome for subject in cluster assigned to sequence in period , we consider the following mixed effects model for a continuous outcome:
where is a -dimensional column vector of fixed time effects and is a -dimensional row vector specifying the form for the fixed effects corresponding to period t. This dimension p depends on the time parameterisation adopted, and we discuss some common choices of time parameterisation, such as categorical and linear time effects, in Section 2.2.2. is the treatment indicator for sequence s in the period t, is the treatment effect, and is the subject-level error term. Model (1) is appropriate for a design where each participant provides only one measurement, and a modification for cohort designs is presented in Section 2.2.5. The term is the random effect for cluster k assigned to sequence s in period t and is the covariance matrix of the cluster-period random effects across the periods of measurement. We assume that all clusters assigned to all sequences have random effects with an identical distribution, and so we let be the covariance matrix for the cluster-period random effects. We describe some possible structures for in Section 2.2.3.
Time parameterisations
The effect of time is encoded by specifying a form for and . In models for standard stepped wedge designs, it has been shown that including linear or categorical fixed time effects does not result in a different form for the variance of the treatment effect estimator.13 Hence, when a model of the form of model (1) is assumed, sample size expressions and power calculations are unaffected by the choice of time parameterisation. As a result, much of the development of the theory for the standard stepped wedge has considered only categorical period effects. However, as we will see in Section 3, for designs such as the staircase the variance of the treatment effect estimator is not invariant to the selected time parameterisation, i.e., the form of the time effects will have an impact on the variance of the treatment effect estimator. In our development we consider both categorical period effects and linear time effects, which may be a more appropriate assumption in some settings.14 Categorical period effects are returned if is a -dimensional vector with a in the th position and zeros elsewhere, and so that . A linear time effect is returned if and so that .
Intracluster correlation structures
Several forms for have been proposed for longitudinal cluster randomised trials, differing in the extent to which the off-diagonal terms, i.e., the covariance between outcomes from subjects in the same cluster measured in different periods, vary over the periods. A general form is induced by the following relationships: and , for , so that the diagonal elements of are given by and the off-diagonal elements are given by . These variance components, together with the other variance components in model (1), yield the intracluster correlation parameters. The within-period intracluster correlation (or within-period ICC) describes the correlation between the outcomes of subjects measured in the same cluster in the same period and is given by . The between-period intracluster correlation describes the correlation between different subjects’ outcomes in the same cluster measured in different periods of the trial, given by .
The block-exchangeable, or constant between-period intracluster correlation structure, is obtained if we assume that , for all pairs . This model was introduced by Hooper et al.15 and Girling and Hemming,16 and has been considered in many analyses of stepped wedge and related trials. This model assumes that the correlations between all pairs of subjects’ outcomes in different periods are the same and do not depend on the length of time between their measurements. An alternative correlation structure allows for these correlations to decrease as the time between subjects’ periods of measurement increases. This decreasing correlation over the trial's periods is encoded by the discrete-time decay correlation structure and is returned if we assume , , where r is called the cluster autocorrelation.17 This structure has been primarily discussed in the context of stepped wedge designs but is also applicable to other longitudinal cluster randomised trial designs. A special case of these intracluster correlation structures is returned when , yielding an exchangeable correlation structure.1
Cluster-period mean level model
For both time parameterisations and the intracluster correlation structures described above, time is synonymous with study period, and is treated as a discrete phenomenon. As such, model (1) can be collapsed to cluster-period means without loss of information.18 Letting denote the mean of all observations in sequence s in cluster k during period t, model (1) collapses to:
where . We denote the -dimensional vector of cluster-period means for cluster k assigned to sequence s as . The covariance matrix associated with this vector is then given by:
where is the identity matrix.
Modifications for cohort designs
While the models above are appropriate for a sampling scheme where different subjects are measured in each period and just one measurement is taken on each subject (e.g., a ‘repeated cross sectional’ sampling scheme),19 a cohort design could be modelled by including a subject-level random effect term to account for repeated measures on the same subject. Specifically, we could include the term in model (1), where , which would become in model (2), where . The inclusion of this term would imply exchangeability between a subject's repeated measures, conditional on cluster and cluster-period, although more sophisticated relationships such as decaying correlation encoded by an autoregressive structure could be assumed. This additional random effect would then yield different correlation parameters, as would appear in the denominator of , and we would have an additional correlation parameter, representing the correlation between observations on the same subject at different periods, e.g., . The expressions we derive below for the variance of the treatment effect estimator are applicable to cohort designs through the appropriate specification of the covariance matrix of the cluster-period means, .
Variance of the treatment effect estimator for general staircase designs
We present a formula for the variance of the treatment effect estimator for general staircase designs, , when is the generalised least squares estimator. This variance is a key ingredient in sample size and power calculations.18 Let denote the vector of treatment indicators for the periods of measurement in sequence s. Since for the designs we consider in this paper the vector of treatment indicators is common across sequences, we let . Let denote the design matrix for sequence s encoding the parameterisation of the time effects and denote the covariance matrix for a cluster over its periods of measurement.
We show in Section A of the Supporting Information that the variance of the treatment effect estimator for a general staircase design can be represented as:
Further simplification of this expression is limited by the ability to reason about the elements of the matrix obtained by inverting the inner term, , for an arbitrary number of sequences S and complex forms of the covariance matrix . We return to this point in Section 5. However, the basic staircase with permits explicit expressions which we present in the next section.
Basic staircase designs: An in-depth look
Variance and correlation parameters for the basic staircase
For the basic staircase design, , the vector of treatment indicators is given by and is a covariance matrix. Suppose contains elements a on the diagonal, representing the variance of a cluster-period mean, and b on the off-diagonal, representing the covariance between the cluster-period means. For model (2), these elements would be and for either the block-exchangeable or discrete-time decay intracluster correlation structure; these structures are identical when there are only two periods of measurement per sequence as in a basic staircase design. We could also represent these elements in terms of the correlation parameters, and , and cluster-period size, m. Assuming the total variance , we would have and . We can then write the correlation between cluster-period means, as . Although the variance of the treatment effect estimator for the general staircase design does not have a tractable form, we are able to reason about the treatment effect estimator and obtain explicit expressions for the variance of the treatment effect estimator for the basic staircase. We first assume categorical period effects and then assume a linear effect of time over the trial periods.
Variance of the treatment effect estimator, four-sequence design
Treatment effect estimator
First consider the design, a four-sequence basic staircase design with one cluster per sequence (Figure 2(a)). Let denote the mean outcome corresponding to sequence s in period t. We can represent the estimator for as a linear combination of the means for all measured cluster-periods:
(a) Design schematic for a four-sequence basic staircase design; (b) visualisation of the treatment effect estimator in terms of the mean outcomes from the measured cluster-periods, assuming categorical period effects; (c) visualisation of the treatment effect estimator in terms of the mean outcomes from the measured cluster-periods, assuming a linear time effect.
In the subsections to follow, we derive explicit expressions for the best linear unbiased treatment effect estimator and its variance, first assuming categorical period effects and then assuming a linear effect of time.
Categorical period effects
Under model (2) and assuming categorical period effects such that , has expectation
Since we want to be an unbiased estimator of , the following conditions must hold:
and
.
Condition (i) implies that the outcomes from the cluster-periods in the first and last periods of the design, and , are not used at all to estimate the treatment effect. This holds for larger basic staircase designs: When a categorical period effect is included in the model, the outcomes from the cells in the first and last periods of the design do not contribute to the estimation of the treatment effect.
Condition (ii) suggests that the weights on cells in the same period but different clusters have the same magnitude but opposite signs. We can then rewrite the general treatment effect estimator in terms of three unique weights, letting , , and :
where . Figure 2(b) depicts this expression in the context of the design schematic.
Then the variance of the treatment effect estimator is given by
where a represents the common variance of a cluster-period mean and represents the correlation between the two cluster-period means within a cluster.
We can then find the weights that give lowest variance by minimising the expression using Lagrange multiplier equations, with the constraint from above that . This approach (detailed in Supporting Information Section B.1) yields the following weights:
Table 1 displays these weights in terms of the correlation between cluster-period means, and for three different scenarios: when the cluster-period means are uncorrelated , moderately correlated , and fully correlated . If the outcomes from the cluster-periods within a cluster are uncorrelated, then all but the outermost cluster-periods are weighted with the same magnitude to estimate the treatment effect. As the correlation between cluster-period means increases, the magnitude of the weights on the innermost cluster-periods increases slightly, as these cluster-periods are correlated in both directions with cluster-periods in the adjacent periods.
Weights on cluster-period mean differences, depending on the correlation between cluster-period means.
Weight
Cluster-period mean difference
General
Finally, plugging these weights back into the previous variance expression and simplifying gives
Note that if K clusters were randomised to each of the S sequences of the design, then the variance of the treatment effect estimator would simply be reduced by a factor of .
Linear period effects
Using a similar approach to that outlined in Section 3.2.2, we show in Supporting Information Section B.2 that when assuming linear rather than categorical period effects, for a four-sequence basic staircase, the treatment effect estimator can be written as a linear combination of differences between centrosymmetric pairs of cluster-period cells:
where the weights have the following values:
The treatment effect estimator can then be written as
Note that unlike when categorical period effects are assumed, all cluster-period means contribute to the treatment effect estimate when linear period effects are assumed. Moreover, the weights on the cluster-period means do not depend on the correlation between cluster-period means, . A depiction of the treatment effect estimator in the context of the design schematic is shown in Figure 2(c).
Finally, the variance of the treatment effect estimator is given by
As before, if K clusters were randomised to each of the S sequences of the design, then the variance of the treatment effect estimator would be reduced by a factor of .
Variance of the treatment effect estimator, -sequence design
Obtaining general results
We consider two different approaches to obtaining explicit expressions for the variance of the treatment effect estimator for a basic staircase design with S sequences: extending the approach from Section 3.2 or by simplifying expression (3) using matrix algebra and results on explicit inverses of particular matrices. We briefly describe these approaches in the subsections to follow, with more detail available in Supporting Information Section C.
Categorical period effects
Using a similar approach to that used in Section 3.2.2 for a four-sequence design, we show in Section C.1 of the Supporting Information that for a general -sequence basic staircase design where categorical period effects are assumed, the outcomes from the cluster-periods in the first and last periods have weights of zero and therefore do not contribute to the estimation of the treatment effect. In addition, the set of weights within each period sum to zero. Therefore, the treatment effect estimator can be represented as the weighted sum of several ‘vertical comparisons’ between the cluster-period means within each of the intermediate periods of the trial:
Extending the approach from Section 3.2.2, we show in Section C.1 of the Supporting Information that the resulting variance of the treatment effect estimator can be represented as
Variance of the treatment effect estimator for varying within-period intracluster correlation (ICC) values, assuming categorical period effects, for a basic staircase design with three and 10 sequences (columns) and cluster-period sizes of 10 and 100 (rows), where each subject is measured just once. The lines within each subplot correspond to different cluster autocorrelation values.
where, as described in Section 3.1, and .
Another approach for deriving the variance is by simplifying expression (3) with results appropriate for the basic staircase design. Assuming categorical period effects, the matrices encoding the time effects, , , are -dimensional matrices comprised entirely of zeros except for a identity matrix starting in column s. The innermost term in expression (3), , is a tridiagonal matrix comprised of on the super- and sub-diagonals, with elements in the first and last positions of the diagonal and in the inner positions of the diagonal. Utilising results in Tan20 for inverting real symmetric tridiagonal matrices of this form, we show in Section C.2 of the Supporting Information that the resulting variance of the treatment effect estimator is also given by expression (6).
Figure 3 displays the variance of the treatment effect estimator against the full range of within-period ICC values under the assumption of categorical period effects, for basic staircase designs with differing numbers of sequences (columns) and cluster-period sizes (rows), for several different cluster autocorrelation values. There appears to be a near-linear relationship between the variance of the treatment effect estimator and the within-period ICC. It tends to be an increasing relationship, however we see a slight decreasing relationship for the design with 10 sequences, a cluster-period size of 10, and for a cluster autocorrelation of as for exchangeable intracluster correlation. The variance of the treatment effect estimator is lower for higher autocorrelation values: less decay in correlation from one period to the next means that subjects’ outcomes from the control period will generally be more similar to those in the intervention period, making it easier to attribute any difference to the treatment effect. Variances are lower for designs with more sequences, as these designs have more clusters and therefore more measurements with which to estimate the treatment effect. Increasing the cluster-period size yields a slight decrease in the variance, although there is less benefit to measuring more subjects in a cluster-period as the within-period ICC increases: the subjects are more similar and so each additional subject offers less new information about the treatment effect. Furthermore, Figure 4 illustrates that there are rapidly diminishing returns to increasing the cluster-period size. After an initial sharp decrease in the variance of the treatment effect estimator as the cluster-period size increases, the rate of decrease of the variance then quickly slows such that little precision would be gained from designs with increasingly large cluster-period sizes.
Variance of the treatment effect estimator for varying cluster-period sizes, assuming categorical period effects, for a basic staircase design with 10 sequences, where each subject is measured just once, and for within-period ICC values of 0.05 (left) and 0.2 (right). The lines within each subplot correspond to different cluster autocorrelation values.
Linear period effects
In a generalisation of the results in Section 3.2.3 for a four-sequence design, the treatment effect estimator for an S-sequence basic staircase design where linear period effects are assumed can similarly be written as a weighted sum of the differences between centrosymmetric cluster-period cells, i.e., and :
To obtain an explicit expression for the variance of the treatment effect estimator, we simplify expression (3) using results appropriate for an S-sequence basic staircase design where a linear time effect over the trial periods is assumed. The matrices encoding the time effects, , , are -dimensional matrices comprised of ones in the first column and the elements s and in the second column. The inner term in expression (3), , remains a matrix and hence is straightforward to invert. We show in Section C.3 of the Supporting Information that the resulting variance of the treatment effect estimator is given by:
Since and , this variance expression can also be written as a function of the intracluster correlation parameters and r and cluster-period size :
While the variance is linear in , the coefficient on may be positive or negative, depending on the trial configuration: greater similarity between participants’ outcomes within a cluster may increase or decrease the precision of the treatment effect. This coefficient would be negative in settings where . One such example is a basic staircase design with sequences, a cluster-period size of , and a cluster autocorrelation of which includes exchangeable intracluster correlation (, shown in Figure 5). Under an assumption of exchangeable intracluster correlation, these settings could be denoted by a design with number of sequences , e.g., eight or more sequences for a cluster-period size of 10, and 25 or more sequences for a cluster-period size of 100.
Variance of the treatment effect estimator for varying within-period intracluster correlation (ICC) values, assuming linear period effects, for a basic staircase design with three and 10 sequences (columns) and cluster-period sizes of 10 and 100 (rows), where each subject is measured just once. The lines within each subplot correspond to different cluster autocorrelation values.
Figure 5 displays the variance of the treatment effect estimator from expression (8) against a range of within-period ICC values, for basic staircase designs with differing numbers of sequences (columns) and cluster-period sizes (rows), for several different cluster autocorrelation values. The observations above for categorical period effects are borne out: higher values of r and lower values of correspond to higher precision. There are again diminishing returns to increasing the cluster-period size, with the benefit quickly tapering off (Figure 6). Note that by a close inspection of Figures 3 and 5, for the same combination of within-period ICC and cluster autocorrelation values, the variance of the treatment effect estimator under the assumption of linear period effects (Figure 5) is slightly lower than under the assumption of categorical period effects (Figure 3). This is to be expected given the assumption of linear period effects requires the estimation of fewer parameters than the assumption of categorical period effects.
Variance of the treatment effect estimator for varying cluster-period sizes, assuming linear period effects, for a basic staircase design with 10 sequences, where each subject is measured just once, and for within-period intracluster correlation (ICC) values of 0.05 (left) and 0.2 (right). The lines within each subplot correspond to different cluster autocorrelation values.
Sample size and power calculations
Sample size and power calculations using formulae
The variance expressions provided in Sections 2.3 and 3.3, together with a standard power formula, can be used to calculate the power of a trial for a desired effect size or the required number of clusters for a particular cluster-period size and desired level of power. Expression (3) is applicable to general staircase designs, and expressions (6) and (7) are applicable to basic staircase designs, with terms a and again representing the variance of and correlation between cluster-period means, respectively. A standard power formula for cluster randomised trials8 is given by
so that
where is the cumulative standard Normal distribution, is the target effect size, is the two-sided significance level of interest, is the value for the standard Normal distribution corresponding to right tail area , and is the variance of the treatment effect estimator for the trial design of interest.
Staircase trial example
One example of a planned staircase design is described by White-Traut et al.21 as a cluster randomised trial across four neonatal intensive care units (NICUs) to test whether a behavioural intervention for preterm infants leads to improved growth while in the NICU. The researchers specified a design with four unique sequences and four measurement periods in each sequence, with each sequence consisting of one control period followed by three intervention periods and each sequence commencing data collection in a different period. Labelled an ‘incomplete stepped wedge’ in the protocol paper, using the terminology and notation in this paper, this design would be deemed an ‘imbalanced staircase’, described in our notation as . While we will not replicate their exact study design which appears to involve multiple cohorts, we note that for their power calculations, the authors assumed a standardised effect size of 0.5, two-sided significance level of 0.05, and an ICC of 0.1. The authors anticipated that between 62 and 252 infants would be eligible for the trial in a 6-month period in the participating NICU clusters. Generally, we would recommend that staircase designs, or any cluster randomised trial design, be conducted with a larger number of clusters than the four in the White-Traut et al. trial, as the asymptotic normality of the treatment effect estimator that we have considered relies on a large number of clusters. However, here we apply our derived expressions to two designs with four clusters, as inspired by this trial (Figure 7).
Design schematics for a basic staircase design with one cluster assigned to each of four unique sequences (left) and an imbalanced staircase design with one control period followed by three intervention periods in each sequence and one cluster assigned to each of four unique sequences (right).
We first consider a basic staircase design for this trial setting and aim: . We will suppose that 50 subjects per cluster per period could feasibly be included in the trial, with each subject measured just once. Then assuming an exchangeable intracluster correlation structure , we can calculate the power of a basic staircase design with , , and using equations (4) or (5) to first obtain the variance of the treatment effect estimator and then use a standard expression for power. The variance a of a cluster-period mean and correlation between cluster-period means are given by and .
If categorical period effects were assumed, the variance of the treatment effect estimator for this design obtained from (4) is
corresponding to power to detect an effect size of 0.5:
Assuming linear period effects instead, the variance of the treatment effect estimator for this design obtained from (5) is
corresponding to power:
Note that equations (6) and (7) could instead be used to calculate the variance of the treatment effect estimator for this design, and for basic staircase designs with other numbers of sequences.
If a staircase design with more than one control and/or intervention period were of interest, such as a design with one control period followed by three intervention periods in each of four sequences, then equation (3) can instead be used to calculate the variance of the treatment effect estimator:
where , is a matrix consisting of elements along the diagonal and in each off-diagonal cell, and is either a matrix containing a identity matrix starting in column s and zeros elsewhere if categorical period effects are assumed, or a matrix made up of ones in the first column and integers s up to in the second column if linear period effects are instead assumed. The variance of the treatment effect estimator assuming categorical period effects is corresponding to power to detect an effect size of 0.5, and the variance assuming linear period effects is corresponding to power.
Available tools for calculating required sample size and power
While statisticians and researchers could use our results to manually calculate sample size and power for their studies, some may wish to use existing tools to do so. Some, but not all, of the existing tools and software appropriate for stepped wedge and other cluster randomised trial designs, can also be used to calculate required sample size and power for staircase designs.22 A requirement is that the tool or software can accommodate incomplete designs with periods in which no measurements are taken in some clusters, either by allowing users to specify an incomplete design schematic directly or by specifying an ‘inclusion’ matrix to exclude certain cluster-period cells from a standard design schematic. Further requirements to cover the scenarios in this paper are options for different intracluster correlation structures including the block-exchangeable and discrete-time decay correlation structures, and both categorical and linear forms for time. As of the time of writing, tools that fulfil all of the above requirements include the SteppedPower R package23 and a SAS macro called %CRTFASTGEEPWR24; a freely available web app called the Shiny CRT calculator25 fulfils all of the above except allowing different forms for time.22
The main power function in the SteppedPower R package23 has an ‘incomplete’ argument: a scalar input retains the specified number of pre- and post-switch periods in each treatment sequence from a stepped wedge design, or a matrix input can be used to denote inclusion (cells with a value of 1) or exclusion (cells with a value of 0) of the cluster-period cells from a complete stepped wedge design schematic. A balanced staircase design, for example, could be specified by defining a complete stepped wedge along with ‘incomplete = 2’; however, the first and last sequences of the resulting design are truncated, lacking an initial control period and final intervention period, respectively. We note that this will not affect the variance of the treatment effect estimator or power when assuming categorical period effects but it will affect the resulting calculation if assuming linear period effects. We expand upon this point in Section 5. The %CRTFASTGEEPWR SAS macro24 takes a matrix for the design specification, allowing incomplete designs through the use of a ‘2’ for cluster-period cells in which no measurements are taken. The Shiny CRT calculator web app25 allows users to upload a design schematic as a CSV file. Staircase designs can be specified similarly to the design schematics in Figure 1, with empty cells for cluster-periods in which no measurements are taken. This app does not allow users to specify different time parameterisations, but rather assumes categorical period effects for all calculations. Sample code for each of these implementations is provided in Appendix A.
Discussion
In this article, we formally introduced the staircase design, considering the properties of this design when a linear model for the outcome is assumed, focusing on the properties of the basic staircase design with one pre- and one post-switch measurement period. We derived a simplified analytical expression for the variance of the treatment effect estimator for a general staircase design to enable appropriate sample size and power calculations for these designs. In addition, we derived explicit expressions for the variance of the treatment effect estimator for the basic staircase design under assumptions of categorical and linear period effects.
A staircase design is a pragmatic alternative to a stepped wedge design. Since sequences in a staircase design contain only a limited number of pre- and post-switch measurement periods, this design can potentially be much less burdensome and expensive than a stepped wedge. These reduced data collection requirements would likely be more enticing for participating clusters and could carry a reduced risk of attrition of clusters during the trial. Beyond the appeal of the limited number of measurement periods, the staggering of these sequences means that each cluster would receive the intervention sooner upon commencing data collection. This could also be appealing for participating clusters, but in practice it would mean that not all clusters would be actively involved at all times. It is possible that this reduced engagement could cause inactive clusters to lose interest and withdraw from the trial.
The implications of clustering for the efficiency of a basic staircase design is similar to other longitudinal cluster randomised trial designs in some ways but different in others. The variance of the treatment effect estimator is a linear or near-linear function of the within-period ICC. Under a discrete-time decay correlation structure, the stepped wedge design has a nonlinear relationship between the variance and within-period ICC.17 As with other cluster randomised trial designs, a basic staircase design sees diminishing returns from increasing the cluster-period size: beyond a certain cluster-period size, minimal useful information about the treatment effect can be gained by measuring more subjects in a cluster-period.26
Unlike the stepped wedge design for which assuming categorical or linear time effects yields the same variance expression and hence same calculated sample size for the same parameter value inputs,13 we obtain different expressions under these different assumptions for the basic staircase design. As seen here and also noted elsewhere, assuming a linear time effect yields a slightly lower variance of the treatment effect estimator than assuming categorical period effects.14 A more conservative sample size or power calculation could therefore be conducted by assuming categorical rather than linear period effects.
We have derived explicit expressions for the variance of the treatment effect estimator for the basic staircase design and an expression for general staircase designs in terms of the matrices corresponding to model (2) at the cluster-period mean level. These expressions can be used in sample size and power calculations for staircase designs. While we would have preferred an explicit expression for general staircase designs, further simplification of expression (3) would require inverting the matrix which can have a complex form depending on the trial characteristics, the assumed intracluster correlation structure, and the assumed form for the time effects. Assuming categorical period effects, for example, pre-multiplication by and post-multiplication by places the elements of into a matrix in the location corresponding to the observed periods, with staggered overlap in the location of the elements of over the S sequences. The summation over all S sequences then adds the elements from these partially overlapping matrices, yielding diagonal bands of nonzero elements with the remaining diagonals made up of zeros. For the basic staircase, this yielded a tridiagonal matrix for which there are many papers covering representations of its inverse.27 Even still, the elements of the inverse of a tridiagonal matrix are obtained through recurrence relations, making a simple analytical form difficult.28 We instead used recent results on the summation of the elements in a row of the inverse20 to obtain a relatively simple analytical form for within expression (3), from which the explicit expression (6) followed. For designs with more than one pre- and/or post-switch measurement period, it is less clear how to obtain a general form for the inverse of the matrix or summations of certain sets of elements from this inverse.
The variance of the treatment effect estimator expression we derived for general staircase designs assumes that the observed periods in each sequence follow the same schedule of control and intervention periods, simply shifted in time; there may be a benefit to sequences following different schedules. For instance, Kasza et al.11 considered a staircase-like design where sequences commencing data collection earlier in the trial had more intervention periods than control periods and sequences appearing later in the trial had more control periods than intervention periods. These types of designs also naturally arise if a staircase design with more than one pre- and/or post-switch period is considered as a subset of the cluster-periods in a complete stepped wedge: since a standard stepped wedge has just one all-control period at the start and one all-intervention period at the end, any staircase design other than a basic staircase taken as a subset of the stepped wedge will have some of its sequences truncated at the edges of the stepped wedge. For example, this occurs when specifying a staircase design with the SteppedPower R package23: it can be obtained by specifying an incomplete stepped wedge design, however, the sequences at the edges of the design schematic may be truncated.
Interestingly, the cluster-period measurements in the first and last periods are not used in the treatment effect estimator obtained from generalised least squares under an assumption of categorical period effects because the treatment and time effects cannot be separated. For example, the treatment effect estimator for a three-sequence basic staircase design matches that for the three-sequence dog-leg design, in which the first and third sequences contain only one period each, an intervention and control period, respectively.29 However, under the stronger assumption of a linear time effect, the cluster-period measurements in the first and last periods do contribute to the treatment effect estimate, albeit with less weight than measurements from the intermediate cluster-periods of the trial.
The formulae we derive in this paper pertain to the generalised least squares estimator of the treatment effect, are based on asymptotic properties, and make the typical assumption of known variance and correlation parameters. The Type I error rate may be elevated and actual power may be lower than the theoretical power calculated with these formulae, particularly if the trial includes few clusters because there are few degrees of freedom for estimating the variance components.30 Further, correlation parameter estimates arising from such trials ought to be treated with a degree of caution. When planning trials, simulation studies can be conducted to provide further insight into the performance of these estimators.
In this paper we have considered the staircase design in the context of linear mixed models for outcomes. Work is currently underway on the efficiency of staircase designs as compared with alternative incomplete designs and with complete designs run over fewer time periods. Future work will consider the properties of these designs when outcomes are modelled using non-linear link functions and estimation of the treatment effect is via generalised estimating equations. We expect that several of our theoretical results will provide useful starting points for these investigations. We also plan to investigate the properties of other types of staircase designs beyond the basic staircase in upcoming work. In particular, we intend to examine the efficiency of balanced designs with more than one pre- and post-switch measurement period (i.e., with ), imbalanced staircase designs (i.e., with ), staircase designs with a transition period, and more general staircase designs for which the sequences may follow different schedules.
Supplemental Material
sj-pdf-1-smm-10.1177_09622802231202364 - Supplemental material for The staircase cluster randomised trial design: A pragmatic alternative to the stepped wedge
Supplemental material, sj-pdf-1-smm-10.1177_09622802231202364 for The staircase cluster randomised trial design: A pragmatic alternative to the stepped wedge by Kelsey L Grantham, Andrew B Forbes, Richard Hooper and Jessica Kasza in Statistical Methods in Medical Research
Footnotes
Acknowledgements
We thank the reviewer for insightful comments that significantly improved the paper.
Data availability
No new data were created or analysed in this study. Project code is available at .
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Australian Research Council (grant number Discovery Project DP210101398).
ORCID iDs
Kelsey L Grantham
Jessica Kasza
Supplemental material
Supplemental materials for this article are available online.
Appendix
References
1.
HusseyMAHughesJP. Design and analysis of stepped wedge cluster randomized trials. Contemp Clin Trials2007; 28: 182–191.
2.
KaszaJForbesAB. Information content of cluster-period cells in stepped wedge trials. Biometrics2019; 75: 144–152.
3.
KeoghSShelvertonCFlynnJ, et al.Implementation and evaluation of short peripheral intravenous catheter flushing guidelines: a stepped wedge cluster randomised trial. BMC Med2020; 18: 1–11.
4.
WagnerBLatimerJAdamsE, et al.School-based intervention to address self-regulation and executive functioning in children attending primary schools in remote Australian Aboriginal communities. PLoS One2020; 15: 1–19.
5.
DreischulteTGrantADonnanP, et al.A cluster randomised stepped wedge trial to evaluate the effectiveness of a multifaceted information technology-based intervention in reducing high-risk prescribing of non-steroidal anti-inflammatory drugs and antiplatelets in primary medical care: the DQIP study protocol. Implement Sci2012; 7: 1–13.
6.
LundströmEIsakssonEWesterP, et al.Enhancing Recruitment Using Teleconference and Commitment Contract (ERUTECC): study protocol for a randomised, stepped-wedge cluster trial within the EFFECTS trial. Trials2018; 19: 1–11.
7.
MazurekMOParkerRAChanJ, et al.Effectiveness of the Extension for Community Health Outcomes model as applied to primary care for autism: A partial stepped-wedge randomized clinical trial. JAMA Pediatr2020; 174: –9.
8.
HemmingKLilfordRGirlingAJ. Stepped-wedge cluster randomised controlled trials: a generic framework including parallel and multiple-level designs. Stat Med2015; 34: 181–196.
9.
UnniRRLeeSFThabaneL, et al.Variations in stepped-wedge cluster randomized trial design: insights from the Patient-Centered Care Transitions in Heart Failure trial. Am Heart J2020; 220: 116–126.
10.
HooperRKaszaJForbesA. The hunt for efficient, incomplete designs for stepped wedge trials with continuous recruitment and continuous outcome measures. BMC Med Res Methodol2020; 20: 279.
11.
KaszaJBowdenRForbesAB. Information content of stepped wedge designs with unequal cluster-period sizes in linear mixed models: informing incomplete designs. Stat Med2021; 40: 1736–1751.
12.
HooperRBourkeL. Cluster randomised trials with repeated cross sections: alternatives to parallel group designs. Br Med J2015; 350: h2925.
13.
GranthamKLForbesABHeritierS, et al.Time parameterizations in cluster randomized trial planning. Am Stat2020; 74: 184–189.
14.
HemmingKTaljaardMForbesA. Analysis of cluster randomised stepped wedge trials with repeated cross-sectional samples. Trials2017; 18: 101.
15.
HooperRTeerenstraSde HoopE, et al.Sample size calculation for stepped wedge and other longitudinal cluster randomised trials. Stat Med2016; 35: 4718–4728.
16.
GirlingAJHemmingK. Statistical efficiency and optimal design for stepped cluster studies under linear mixed effects models. Stat Med2016; 35: 2149–2166.
17.
KaszaJHemmingKHooperR, et al.Impact of non-uniform correlation structure on sample size and power in multiple-period cluster randomised trials. Stat Methods Med Res2019; 28: 703–716.
18.
GranthamKLKaszaJHeritierS, et al.Accounting for a decaying correlation structure in cluster randomized trials with continuous recruitment. Stat Med2019; 38: 1918–1934.
TanLSL. Explicit inverse of tridiagonal matrix with applications in autoregressive modelling. IMA J Appl Math2019; 84: 679–695.
21.
White-TrautRBrandonDKavanaughK, et al.Protocol for implementation of an evidence based parentally administered intervention for preterm infants. BMC Pediatr2021; 21: 1–13.
22.
OuyangYLiFPreisserJS, et al.Sample size calculators for planning stepped-wedge cluster randomized trials: a review and comparison. Int J Epidemiol2022; 51: 2000–2013.
ZhangYPreisserJSLiF, et al.%CRTFASTGEEPWR: A SAS macro for power of generalized estimating equations analysis of multi-period cluster randomized trials with application to stepped wedge designs. arXiV 2022.
25.
HemmingKKaszaJHooperR, et al.A tutorial on sample size calculation for multiple-period cluster randomized parallel, cross-over and stepped-wedge trials using the Shiny CRT Calculator. Int J Epidemiol2020; 49: 979–995.
26.
HemmingKEldridgeSForbesG, et al.How to design efficient cluster randomised trials. Br Med J2017; 358: j3064.
27.
MeurantG. A review on the inverse of symmetric tridiagonal and block tridiagonal matrices. SIAM J Matrix Anal Appl1992; 13: 707–728.
28.
MallikRK. The inverse of a tridiagonal matrix. Linear Algebra Appl2001; 325: 109–139.
29.
HooperRBourkeL. The dog-leg: an alternative to a cross-over design for pragmatic clinical trials in relatively stable populations. Int J Epidemiol2014; 43: 930–936.
30.
SennSJ. Various varying variances: The challenge of nuisance parameters to the practising biostatistician. Stat Methods Med Res2015; 24: 403–419.
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.