Abstract
Trials of interventions that aim to slow disease progression may analyze a continuous outcome by comparing its change over time—its slope—between the treated and the untreated group using a linear mixed model. To perform a sample-size calculation for such a trial, one must have estimates of the parameters that govern the between- and within-subject variability in the outcome, which are often unknown. The algebra needed for the sample-size calculation can also be complex for such trial designs. We have written a new user-friendly command,
1 Introduction
Sample size is a critical design consideration when planning a randomized controlled trial (RCT). Given an estimate of the target treatment effect, a formula for the variance of the treatment effect (which will depend on the trial design and analysis model), and the acceptable type I and type II error rates, the sample size is calculated with a simple algebraic formula (Campbell, Julious, and Altman 1995). However, for some designs and analysis models, the algebra to obtain the formula for the treatment-effect variance can be complex, and it can be difficult to derive reasonable guesses for the parameters that appear in that formula.
Consider a disease where progression can be measured by a continuous variable that is expected to deteriorate over time. Now consider an intervention whose aim is to slow that disease progression: we could use the continuous outcome as our trial outcome and see whether it responds to treatment over time. In such a trial, this outcome is typically recorded at participants’ baseline visits (prior to treatment allocation) and at least one follow-up visit with the aim of comparing randomized groups.
One way to analyze such an outcome is to use a linear mixed model (LMM) (Verbeke and Molenberghs 2000; Goldstein 2011; Longford 1993; Rabe-Hesketh and Skrondal 2012). In the simple case of a single follow-up measure and no missing data, a properly specified LMM can also be expressed as a generalized least-squares model (Frost, Kenward, and Fox 2008) and will give the same estimated treatment effect as analysis of covariance, albeit with reported standard errors that are only asymptotically equal (Frost, Kenward, and Fox 2008; White and Thompson 2005; Winkens et al. 2007). When there are multiple follow-up times, LMMs offer a flexible way of modeling the data that allows various assumptions to be made about the way the outcome changes over time. For example, it could be assumed that the outcome will change linearly over time in both groups and hence that the treatment difference between the groups is proportional to time (Frost, Kenward, and Fox 2008). LMMs also provide a convenient way of handling missing data, provided that a missing-at-random assumption can be made (Molenberghs and Kenward 2007).
Specifying the treatment-effect variance formula from such an LMM for a sample-size calculation requires knowledge of all the parameters that govern between- and within-subject variability in outcomes, which are often unknown. In such situations, one can use data from any relevant previously conducted longitudinal studies to estimate these parameters. We introduce a new package,
In section 2, we summarize existing methodology for estimating sample sizes for this design; in section 3, we describe the
2 Methods
2.1 Future trial setup and analysis method
It is important to base a sample-size calculation on the model that will be used to analyze the trial. In this section, we therefore describe the sort of trial that
When analyzing this outcome, we assume that it can be modeled as a linear change over time in the control group, with treatment acting to lessen that change proportionally over time. The analysis model can be written as
where yij
is the outcome for person i at time point j, β
0 is the expected mean baseline measurement of the outcome in both arms, β
1 is the change in the outcome over time (slope) in the control group, tj
represents the times of the visits, β
2 is the treatment effect (that is, the difference in slopes between the arms), gi
is an indicator that is 0 in the control group and 1 in the active group for the ith person, ai
is a random person-level intercept, bi
is a random person-level slope, and
Note that in this model, the baseline measure of the outcome, yi 0, is treated as a correlated outcome. We assume that randomization is successful, so there is no expectation of a difference between the two groups at baseline (that is, at t 0 = 0), and we estimate a single intercept for both arms. After baseline, it is assumed that the outcome changes linearly over time and that the treatment effect is therefore also constant and linear over time. In this formulation, the treatment effect is defined as the difference between the slope in the treated arm compared with that in the untreated arm.
Once the analysis model is specified, the treatment effect and its variance and thus sample-size requirements follow from the theory of linear mixed models (Frost, Kenward, and Fox 2008). A general formulation for a linear mixed model is
where
Now, let us rewrite our model in a form that is not conditional on the random effects
Here Σ is the variance–covariance matrix for unconditional
and
Equation (4) can be used to estimate the treatment effect, while (5) defines a variance–covariance matrix for the estimated fixed parameters that permits calculation of the standard error of the treatment effect.
To illustrate these equations, let us relate (1) to our particular analysis model in (1) for the simple case of a two-person trial (one person per treatment group) with a baseline visit and two follow-up visits. In this case, we see that
Σ from (3) therefore becomes a 6 × 6 matrix of form
where
and we can see that the algebra to obtain
2.2 Predicting a sample size for a future trial
Now that we have set up our trial design and analysis model, we can move on to how we would calculate a sample size for such a trial. For a sample-size calculation, we need a formula for the variance of the treatment-effect estimate, and we have shown how we can calculate this in the previous section. Because the matrices in (5) can get very large, we will use a simplifying trick—we shall first calculate the treatment-effect standard error for a two-person trial s∗
. Because the standard error for the treatment effect from a trial with N independent subjects in each arm is
Note that s* will depend on the design matrix
2.3 Stage 1: Slope and variance parameter estimation
Single group: dataset contains data from subjects with the disease of interest who are considered to be similar to the control group in the prospective trial. These may be subjects who are not receiving any treatment, for example, or are receiving standard of care. For simplicity, we shall refer to these subjects as untreated subjects. Such data could be from an observational study or from the control arm of a previously conducted RCT. Two group, observational: optionally, the data can also include subjects without the disease (healthy controls). Two group, RCT: again optionally, the data can include subjects with the disease who are receiving an additional treatment, possibly the treatment of interest in the future RCT (treated subjects).
First, let us consider a single-group dataset that contains only untreated subjects with the disease (situation 1 above). The outcomes yij for person i at occasion j are modeled as a linear function of time elapsed since baseline tij with random intercepts ai , slopes bi , and residual errors ϵij :
Note that we have marked the coefficients from the model in (7) with primes to distinguish them from the coefficients in the proposed analysis model for the future RCT from (1). Note also that time is now indexed by i and j because if the data are from an observational study, then visit times might vary by participant. For each person, the baseline visit is at time zero: ti
0 = 0, and
The expected slope from the user-supplied data in (7) is
If the supplied dataset also includes healthy controls (two-group, observational data, situation 2), then parameters are estimated separately in each group, such that the healthy controls have their own intercept, slope over time, and variances and covariances. It is possible to have
Under this scenario,
Finally, if the dataset is from a previous RCT and includes treated subjects (twogroup, RCT data, situation 3), then the model in (1) is used. In this model, both groups have the same intercept because we expect the two groups to have the same mean at baseline under randomization, but the slopes over time are allowed to differ. The variance parameters are constrained to be the same in the two groups.
In this final scenario,
2.4 Stage 2: Treatment-effect variance estimation, sample-size calculation
In the second stage,
In addition to s*, (6) depends on the target treatment effect. The command allows three scenarios regarding the effectiveness of the treatment under study. In these scenarios, the treatment effect is defined as being the following:
1. Toward no annual change; that is, it will reduce the rate of (future) change by a certain proportion of the way to zero. Under this scenario, a 100% effective treatment is defined as one that would halt change but not reverse it. Using singlegroup data without healthy controls or trial data from unrelated interventions implies this scenario. In this situation, the target treatment effect used in the sample-size calculation, β
2, is calculated from the slope obtained from the usersupplied data (
2. Toward the slope observed in healthy controls; that is, it will reduce the rate of change over and above that seen in a disease-free population (the “excess” rate of change) by a certain proportion. Under this scenario, a 100% effective treatment would slow the change in subjects with the disease to the change observed in healthy controls but would not halt or reverse it. Using two-group observational data that include healthy controls implies this scenario, and the target treatment effect in this case is calculated as
where
Note that the slope in the healthy controls could be interpreted as an upper limit on what is achievable with treatment, particularly when the outcome is expected to change over time even in healthy people. For example, say the outcome is a measure of cognitive decline in patients with Huntington’s disease (HD), and we know that even healthy people experience cognitive decline because of aging. Then, even a very effective treatment for patients with HD is unlikely to eliminate or reduce cognitive decline to a level below that of aging.
3. Equal to a previously observed treatment effect. For example, if a dataset from a previously conducted trial of the same or a similar treatment is available (perhaps a phase II trial that is being used to plan a phase III trial), the treatment effect observed in the previous trial can be used. Using such trial data, along with the appropriate
where
Note that one can also use a treatment effect that is proportional to the previously observed treatment effect in the previously conducted trial. This can be done by running the model under treatment effectiveness scenario 3 to obtain the sample size when targeting the previously observed treatment effect and then multiplying by the appropriate inflation factor (see example in section 4.1.3).
The sample size calculated by
Note that by fitting a model to data observed at discrete time points,
2.5 Sample-size adjustment for trial dropouts
To compensate for individuals who withdraw early from the trial,
In brief,
2.6 Some notes of caution
It is important that the dataset used for the first stage of model fitting is from a population that is sufficiently similar to that in the proposed trial so that we can generalize the estimates of the variance parameters to the planned RCT. In practice, that might mean that inclusion criteria used in the previous dataset are similar to those proposed in the future trial and that the untreated subjects suffer from a severity of disease similar to that expected in the participants of the planned trial at baseline. It may be that no such dataset exists, and in such a case it might be necessary to collect some data in a pilot study.
Note that, as always, variances and covariances will be estimated more precisely given more people and time points in the dataset. Users should proceed with caution, especially if they have a small dataset, and be aware that their sample-size estimates will contain uncertainty due to the estimation of the variance parameters in the first stage.
As with any statistical model, one can make out-of-sample predictions. The command
Also note that, other than subject-specific random effects,
3 The slopepower command
The syntax of
3.1 Description
The user-provided dataset can be of three basic types as described in section 2.3: containing subjects with the disease who are untreated only (or minimally treated, for example, receiving standard of care); containing untreated subjects with the disease and healthy controls; or a previous RCT containing subjects who are untreated and subjects who are treated. In all cases, the data should contain repeated measurements of the outcome in long format (see
3.2 Options
3.2.1 Options for data in memory
3.2.2 Options for planned trial
3.2.3 Model options
4 Examples
4.1 How to use the code
In this section, we use simulated data to illustrate the options described above. The three examples given cover the three types of data that can be used with
These example datasets together contain three groups of people: people with HD who are receiving standard of care (untreated subjects); people without HD (or the genetic mutation that leads to it) (healthy controls); and people with HD who are being treated as part of a trial (treated subjects).
Section 4.1.1 describes the situation when you have a dataset containing only people from group 1. Section 4.1.2 describes a dataset containing people from groups 1 and 2, and section 4.1.3 is for a dataset containing groups 1 and 3.
In all datasets, we have assumed that the “cases” (or untreated subjects) are people with HD, a neurodegenerative disorder in which cognitive functioning typically declines during disease progression. The outcome of interest is their score on the Symbol Digits Modalities Test (Smith 2007), a measure of cognitive function taking integer values between 0 and 110, with higher scores indicating better function. We have not simulated any missing data. In all cases, the data are in long format, ready for use with
4.1.1 Single-group data with untreated subjects only
We have simulated three years of data on 200 people with HD, with measurements recorded each year; the
We first show the syntax to plan an RCT with annual visits over two years, assuming no dropouts, with 80% power to detect a treatment effect that will eliminate one-third of the slope. Note that here the assumed effectiveness is toward “no annual change” or a slope of zero. The
This shows that a total sample size of 712 will be required for the planned trial. The first section of the output shows three results from the linear model run on the data in memory: the number of observations and subjects that were included in the model and the estimated slope from the data. The remaining output confirms the user-contributed parameters, or the defaults used if they were not specified, and gives the target treatment effect that
Visits do not have to be scheduled at regular intervals. If you wish to extend the above trial to five years, with no additional interim visits, you would specify the command below. However, note that this is extending the estimates out of the initial sample duration. Here we have also assumed that 10% of participants would be lost to follow-up between the visit at year two and the final visit.
Here the sample size is reduced because of the extended follow-up, despite the loss to follow-up, which is shown as a proportion in parentheses after each visit in the schedule list.
If you wish to schedule visits every six months, you must use the
Again, the sample size is slightly reduced compared with the first example because of an increase in efficiency gained from the interim visits. Also note that the slope observed in the data has halved; this is because it is reported in the units of the planned trial, so here it relates to a difference per six months (rather than per year as in the earlier examples).
4.1.2 Observational data with cases and healthy controls
Here we have simulated 250 people with HD and 250 without, with dates of observation used rather than visit number. For cases (or untreated subjects), the
Note that
Because we now have healthy controls in our data, we drop the
The first thing to note is that because the time variable is a date,
The output shows that a total sample size of 296 will be required for the planned trial. The decreased sample size compared with that in the previous section is partly because here we have an estimate for the slope of healthy individuals, so instead of relating our effectiveness to no change over time (a slope of zero), we relate it to the difference between the slope in untreated subjects and that in healthy controls. Hence, the target treatment effect is larger here than above, even though an effectiveness of 0.33 was specified both times, because the healthy controls have a positive slope.
Let us suppose that we are interested in obtaining a bias-corrected and accelerated bootstrap CI for this predicted sample size. We can do this by using the following command:
There are several important things to note about the
One can also calculate the power for a specified sample size by using the
The estimated power is 60%. The other main difference in output here is that two values for the total N are given: the value specified by the user and the value actually used in the power calculation, which is either n or n − 1 if the user specified an odd number.
4.1.3 RCT data with treated and untreated groups
The simulated RCT data contain 75 people who received treatment and 75 who did not receive active treatment. In this dataset, the outcome was generated from a model with an intercept of 34, a slope in the untreated arm of −1.8 units/year, a slope in the treated arm of −0.8 units/year, and variance and covariance parameters as in section 4.1.1.
Example data from one participant in each arm are shown here:
Again, note that
If the aim of the planned study is to detect the same effect size as in the previous RCT, then the
Here we see that a sample size of 318 is required to detect a 0.75 units per year change in annual decline that was seen in the previous RCT.
Suppose that the previous RCT is a pilot study or phase II trial and that the investigators suspect that, because of its small size, the treatment effect might have been overestimated. They may wish to plan the future RCT such that it has power to detect a treatment effect that is 50% of that observed previously. To do this, we can multiply the sample size above by 4 (that is, 1 over 0.5 squared), so we would need a sample size of 1,272. More generally, note that if we want a sample size for a target treatment effect that is p times that observed in the previous trial, Np , we need to multiply the N that uses the previously observed treatment effect (318 in this example) by p− 2. This follows from (6):
Note that if we had data from a previous RCT that was trialing a completely different treatment from that under consideration in the future trial, we might have decided to use only the untreated arm as our dataset and use the options for a single group of untreated subjects as shown in section 4.1.1.
4.2 Exploring future trial designs with slopepower
Estimated power for different trial designs and dropout scenarios
As can be seen from table 1, adding extra follow-up visits increases the power. For example, when there are no dropouts, the power increases from around 80% with a single follow-up visit to almost 87% with six-month follow-up visits. As the anticipated rate of dropouts increases, the trial designs that include extra follow-up visits become increasingly efficient because they allow data collected at interim visits to be used in the analysis. Note that in this simulated example, when 10% of participants are expected to be lost each year, adding six-month visits recovers information to the extent that it achieves nearly the same power as a trial with a single follow-up visit with no dropouts.
5 Conclusion
We have presented a new command,
The package is based on linear mixed-model methodology, described for this setting by Frost, Kenward, and Fox (2008), and requires a user-supplied dataset containing longitudinal data on a similar population to that expected in the future trial. In the first stage of this approach,
Supplemental Material
Supplemental Material, sj-zip-1-stj-10.1177_1536867X211045512 - Power and sample-size calculations for trials that compare slopes over time: Introducing the slopepower command
Supplemental Material, sj-zip-1-stj-10.1177_1536867X211045512 for Power and sample-size calculations for trials that compare slopes over time: Introducing the slopepower command by Stephen Nash, Katy E. Morgan, Chris Frost and Amy Mulick in The Stata Journal
Footnotes
6 Acknowledgments
We acknowledge the contribution of Mike Kenward to developing the approach to sample-size calculation that
Katy Morgan acknowledges the support of an MRC skills development fellowship.
7 Programs and supplemental materials
To install a snapshot of the corresponding software files as they existed at the time of publication of this article, type
A Appendix
Here we provide the code used to simulate example datasets used in section 4. All data were generated using Stata 16.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
