Abstract
For two-treatment randomized trials with clustering in one of the treatment arms and a continuous outcome, designs are presented that minimize the number of subjects or the amount of research budget, when aiming for a desired power level. These designs optimize the treatment-to-control allocation ratio of study participants but also optimize the choice between the number of clusters (such as therapy groups) versus the number of persons per cluster (therapy group) in the arm with clustering. Optimal designs require prior knowledge of parameters from the analysis model, which are unknown during the design stage. We present maximin designs which address this by ensuring a pre-specified power level for plausible ranges of the unknown parameters, while maximizing the power for worst-case values of these parameters. Maximin designs are also derived when the number of clusters, or the cluster size is fixed due to practical constraints. An empirical example illustrates how to calculate sample sizes for such practical designs and shows how much these maximin designs can reduce the required research budgets compared to designs with equal subject numbers in treatment and control. A user-friendly R Shiny app facilitates these sample size calculations.
Keywords
Introduction
In randomized trials observations can be correlated when prior to randomization individuals are nested within clusters, and these clusters are assigned treatments. Examples are persons nested within health centers, pupils nested within schools, or employees nested within companies. When whole clusters are assigned to treatments, these trials are known as cluster randomized trials. 1 Clustering may also occur when individuals are assigned to treatments, but these treatments are given to groups of individuals.2–4 In such trials,5,6 interactions between persons within a group may lead to their outcomes being correlated. These trials are referred to as individually randomized group treatment (IRGT) trials. 6 Clustering may occur exclusively in one of the treatment arms, for example, when comparing group therapy with a condition that lacks any form of intervention or with a condition involving only medication. Examples are a trial in which patients with chronic musculoskeletal pain either receive usual treatment supplemented by participating in a learning program given in group sessions versus individual treatment with medication only, 7 or a trial where tinnitus patients either receive group-based cognitive behavioral therapy addressing their dysfunctional cognitions versus receiving no treatment at all. 8 Figure 1 displays an IRGT trial with groups of size 6 in treatment arm G, and with no clustering in treatment arm I.

Graphical display of an Individually Randomized Group Treatment trial, with individuals assigned to six therapy groups in one of the treatment arms.
Clustering effects can also arise from a treatment that is administered individually rather than to a group. If multiple patients receive treatment from the same therapist, these patients are likely to receive a more similar treatment compared to patients treated by different therapists. This therapist-related impact can result in correlated observations from different patients having the same therapist.6,9 Also in such scenarios, clustering may occur exclusively in one of the two treatment arms, for instance, when comparing psychotherapy with medication, 10 or with a waiting-list condition.11–13 This design is also represented by Figure 1, where the therapy groups are actually persons sharing the same therapist, and the group size represents the caseload of a therapist.
For trials an important aim is to choose a design such that a sufficient power level is achieved for a test on the treatment effect. For designs with partial nesting, this involves choosing an appropriate number of groups and group sizes for one treatment, and an appropriate number of persons for the other treatment. When testing the null hypothesis of no treatment effect against the alternative hypothesis of a treatment effect, we perform a two-tailed test, and the variance of the estimated treatment effect estimate,
Here
The variance of the treatment effect estimate is also related to the width of a confidence interval for the treatment effect. For a
Study design optimization entails different types of optimality, each determined by a different optimization criterion. For an overview, see Berger et al. 14 and Atkinson et al. 15 This paper targets maximizing the power of the test of the treatment effect by deriving designs that minimize the variance of the treatment effect estimate for a given research budget. Designs minimizing the variance of an estimate of a single parameter, like the treatment effect, are termed c-optimal designs. As shown in equation (1), minimizing the estimate's variance maximizes the power of the statistical test. Since the variance of the treatment effect estimate is proportional to the width of the confidence interval, these c-optimal designs also minimize, for a given research budget, the width of the confidence interval for the treatment effect. It should be noted that the optimal designs also minimize the research budget needed for a pre-specified power of a test on the treatment effect, or for a required precision of the effect estimate. If another design existed that attained the same power or precision of the effect estimate with a smaller research budget, then, since the variance of the treatment effect decreases with increasing budget, the design derived for the given research budget would not be optimal. 16
The more a design minimizes the variance of the treatment effect estimate for a given research budget, or minimizes the budget needed for a given variance and thus given power and precision, the more efficient the design is. Now, it is well known that a cluster randomized trial is less efficient than an individually randomized trial, especially as the dependency of the observations within a cluster increases and the number of individuals per cluster grows.17–19 A trial with clustering in one arm lies between these two designs in terms of efficiency. A trial with clustering in only one arm typically arises when one treatment is administered in groups and the other individually, or when one treatment is delivered by therapists while the other involves only medication or no treatment at all. In such cases, researchers generally are not able to choose the most efficient one of these three designs.
Finding an optimal design for the trials considered in this paper involves determining how many subjects to allocate to each treatment, and for the arm with clustering, how many clusters or therapy groups versus how many subjects within each cluster or group to include. A complication is that both the value of
Several approaches to address local optimality have been proposed in the literature. The Bayesian approach starts from a prior distribution for the unknown parameters and by repeatedly drawing from the prior, one can calculate the mean, median, or other desired percentile of the power. 20 In this approach the design is chosen which maximizes the average power or some power percentile. This process is computationally time-intensive and yields a design that does not guarantee the required power level for an individual trial. In adaptive designs one starts with a predefined design, followed by intermediate analyses to update guesses about relevant model parameters and adapt the design accordingly. 21 For cluster randomized trials, proposals for adapting the number of clusters 22 or the sample size within each cluster 23 have been examined. More recent studies considered design adaptions that take care of the uncertainty in the estimates obtained in the interim analysis, in a frequentist approach 24 or in a Bayesian approach updating a prior in the interim analysis. 25 These approaches require a duration of the trial that is sufficient for conducting intermediate analyses and subsequently modifying the design.
In this paper, we will take a rather simple approach, known as the maximin approach.
17
Deriving a maximin design involves four steps: Specify plausible ranges for those parameters of the analysis model on which Given a research budget, specify the set of feasible designs. For each design find the parameter values within their plausible ranges which maximize Choose the design that minimizes the maximum (worst-case)
The resulting design is called the maximin design, which, for a given research budget, is the optimal design for the worst-case scenario, as defined by the set of parameter values chosen in step 3. The maximin design offers the advantage of not only maximizing efficiency, power and estimation precision in the worst-case scenario but also ensuring at least the same efficiency, power and precision for all other plausible parameter values. So, for all other parameter values than the worst-case values chosen in step 3, the variance of the effect estimate is smaller, and the power for hypothesis testing and the precision of estimating this effect is larger.
Instead of considering the efficiency of a design, the maximin approach can also be employed with a relative efficiency criterion: the variance of the estimated treatment effect under the optimal design relative to the variance of the estimated treatment effect under the design that is being considered. For each feasible design, this relative efficiency is then first determined as a function of each parameter vector in the parameter space because the optimal design itself varies across the parameter space. Comparable to step 3 above, for each feasible design, the smallest relative efficiency is then obtained across the parameter space. Then, comparable to step 4 above, of all feasible designs, that design is selected that maximizes this minimum relative efficiency.14,26 This design is safe in that it stays as close to the optimal design as possible over the whole range of plausible parameter values. If the design's minimum relative efficiency is close to 1, then it can be considered a robust design. Such a design may be different from the design that maximizes the minimum efficiency. 27 The efficiency approach may yield a design that may be much less efficient than the optimal design for some of the parameter values in their plausible ranges. Also, the maximin design based on efficiency, as in this paper, may turn out to be optimal at the boundary values of parameter ranges – values that may not be most plausible. This overemphasis on an unlikely scenario may lead to a large research budget. But, on the other hand, employing a relative efficiency criterion is not safe in that it does not yield a design that guarantees a desired power level across the whole parameter space. Since a maximin approach based on efficiency is safe in that sense, we adopt this approach in this paper.
In this paper, we will examine two-treatment parallel trials with nesting in one of the arms. We will derive optimal and maximin designs and will also consider when practical constraints fix the total number of groups or therapists or the size of groups or the caseload per therapist. We propose a linear mixed model for analysis, allowing for different outcome variances as well as different costs across treatments, thus presenting rather general optimal and maximin designs.
We will show how to calculate sample sizes for maximin designs with a real example. While the optimal and maximin designs assume group sizes or therapist caseloads within a treatment to be equal, real-world scenarios often involve varying group sizes and caseloads. Even if one recruits an equal number of individuals for each included group or for each therapist, dropout may lead to varying group sizes and caseloads in the data analysis phase. This results in efficiency and power loss. 28 We will address how to restore efficiency due to varying group sizes and caseloads. Below, we first present the model for the analysis of a parallel trial with clustering in one arm and then move on to the optimal and maximin designs.
Let treatment G be the condition with clustering (G for groups) and treatment I be the condition without clustering (I for individuals). If there are K clusters in treatment arm G and there are
Here
This design and associated analysis model are special cases of the design and model for cluster randomized trials with clusters in both arms, in that the intraclass correlation
Since optimal designs minimize the variance of the treatment effect estimate for a given research budget, a budget function has to be defined. Let c be the costs for including a cluster (e.g. group or therapist) in treatment arm G, and, similarly, let
Moerbeek et al.
31
derived the optimal ratio of the total number of subjects in treatment G,
Substituting these optimal sample sizes into equation (3) gives the variance of the treatment effect estimate for the optimal design. From equation (5) it follows that the optimal allocation ratio of persons to the two treatments is given by:
The optimal design in equation (5) and the variance of
Substituting the calculated budget b into equation (5), then yields the number of clusters,
In some cases, the cluster size for treatment G will be (more or less) fixed, for instance, in case of group therapy, where there may be an ideal group size or, in case of multiple persons being assigned to the same therapist, there may be an ideal caseload. In these cases,
By substituting equation (8) into equation (3), we obtain the variance of the effect estimate for the optimal design.

Ratio of the budget of an unrestricted maximin design versus the budget of a maximin design with the cluster size in treatment G fixed at a value on the horizontal axis.
The maximin design and its variance of the effect estimate is again obtained by choosing the largest values for
It is instructive to explore how much more budget is required if
In practice, as in group therapy, cluster sizes cannot always be chosen freely. Figure 2 shows that suboptimal choices for
Suppose that for the cognitive behavioral treatment feasible values for the size of the groups,
Number of groups K, group size
, with
and number of subjects in the individual condition
needed by the maximin design (MMD) and balanced design (BD) (in which
) of the group-based cognitive behavioral treatment trial, for a power of 80% to detect a treatment effect of size
= 0.50, with
% two-tailed, for various cost ratios
and
and for
and
.
Number of groups K, group size
The maximin design can be determined as follows: For each
For the maximin design it follows from equation (8) and equation (9) that for the same
In case therapists treat multiple patients individually, there may be a fixed or limited number of therapists, or, in case a therapist carries out group therapy there may be maximum number of groups that practically can be handled. In such a case K is fixed, and one can determine the optimal values for
Substituting
The maximin design and associated variance of the treatment effect estimate are again obtained by choosing the largest values for
If the cluster sizes of the maximin design required for a certain power level are too large, for instance, as this group size in group therapy is not feasible, one either has to accept a lower power level or a larger effect size for the study. One could also try to slightly increase the number of groups, K. Finally, note that when the number of clusters is fixed, a desired power level may not always be obtained, since the variance of the treatment effect estimate has a lower bound. Specifically, even if
Let's also consider how much more budget is required if the number of clusters K is fixed compared to a design in which K is chosen optimally. For the same scenarios as in Figure 2, Figure 3 shows the budget ratio, that is, the budget required by a maximin design without restrictions on K relative to the budget required by a maximin design in which K is fixed. In all scenarios

Ratio of the budget of an unrestricted maximin design versus the budget of a maximin design with the number of clusters in treatment G fixed at a value on the horizontal axis.
Returning to the example in Table 1, suppose that the clusters do not represent therapy groups, but caseloads of different therapists who give cognitive behavioral therapy on an individual basis. Because of a limited pool of therapists, there is a maximum of 22 therapists in the trial, so that a maximin design must be determined with restrictions on K. The same cost scenarios are considered as in Table 1. Also assume that
Column 3 of Table 2 contains the number therapists of the maximin design if there are no restrictions on K. Column 4 displays the maximin design where K is at most 22, whereas column 5 displays the balanced design with the same restrictions on K. Since the maximum feasible number of therapists is, for each cost scenario, lower than the maximin number of therapists if there are no restrictions on K (see column 3 of Table 2), K = 22 is always the choice that minimizes the required research budget (see column 4 of Table 2). Furthermore, for a fixed K, as
Number of therapists K, with
, group size
and number of subjects in the individual condition
needed by the maximin design (MMD) and balanced design (BD) (in which
) of the group-based cognitive behavioral treatment trial, for a power of 80% to detect a treatment effect of size
= 0.50, with
5% two-tailed, for various cost ratios
and
and for
and
.
Number of therapists K, with
Number of clusters in the maximin design without restrictions on the number of clusters.
The maximin design is also compared with a balanced design in which the number of persons in both treatment conditions is the same. For the balanced design, for each K
Until now, we considered minimization of the study budget needed, either with, or without, constraints on the cluster size or the number of clusters. In some cases, one may want to minimize the total number of persons involved in a trial. This can be accommodated by setting
In case there are restrictions on the number of clusters K, where K can be 22 at most and we aim to minimize the total sample size, we also set
Interactive shiny app for sample size calculation
For individually
17
and cluster randomized trials
35
menu-driven interactive programs are available to calculate sample sizes for optimal and maximin designs. To also facilitate sample size calculation for maximin designs for trials with nesting in one arm, an R Shiny app
36
has been developed: https://unimaasmc.shinyapps.io/Sample_size_PNRT_MMD/. In Table 1, calculations for each integer-valued
For the empirical illustration in Section 4 with
Correcting sample sizes for unknown variances and unequal cluster sizes
Sample sizes were determined using equation (1), which assumes a standard normal approximation for the test statistic. Adjustments are needed when estimating intraclass correlations and outcome variances. For non-varying cluster sizes, the treatment effect can also be assessed by an independent samples t-test comparing group or therapist means in one treatment to individual scores in the other. This implies that power according to the t-distribution can be used to adjust the sample sizes. The Supplemental materials (Section 3) and the Open Science Framework (https://osf.io/p68gm/) include the R code 36 to calculate the minimum number of units to be added to the groups or therapists, and to individuals in the other arm to achieve the required power in a maximin design. In cluster randomized trials where outcome variances, intraclass correlations, cluster sizes, or number of clusters differ across arms, numerical evaluations show that for 80% or 90% power, with at least 8 clusters per arm, two additional clusters are needed for two-tailed tests at a 5% significance level and four for tests at a 1% level. 37 In partially nested trials the intraclass correlation in one of the arms is zero, but since Candel et al. 37 also considered intraclass correlations as small as 0.01, we expect this rule of thumb also to hold for partially nested trials, though in the arm without clustering the required increase applies to the number of subjects instead of the number of clusters.
Our results assumed equal group sizes or therapist caseloads. However, therapy groups often vary in participant numbers, and therapist caseloads can differ. Even if equal recruitment is achieved initially, dropout leads to varied group or therapist sample sizes during data analysis. Unequal group sizes or therapist caseloads in treatment G reduce efficiency and power. This can be repaired in an almost cost-efficient way, by recruiting more groups or therapists in one arm and increasing the number of individuals in the other by the same percentage. Let CV be the standard deviation of cluster sizes divided by the average cluster size in treatment G. If CV
For the empirical illustration in Section 4 with
Conclusion and discussion
This paper presents optimal designs for trials with clustering in one arm and quantitative outcomes, minimizing research costs while achieving a desired power level. The designs assume data analysis with a linear mixed model with heterogeneous outcome variances and heterogeneous costs for the two arms. Since optimal designs require knowledge of parameters of the analysis model that are not known at the design stage, maximin designs are presented. Maximin designs guarantee a specified power level for plausible parameter ranges at the lowest cost and maximize, for a fixed research budget, power for the worst-case values of these parameters. Maximin designs are also developed with constraints on the number of clusters or cluster size. Sample size formulas are provided for all design types and implemented in an interactive R Shiny app. The formulas are based on a z-test assuming known variance components, but in practice variance components are unknown and a t-test will be done. For using a t-test instead of a z-test, a rule of thumb is provided to adjust the number of clusters and participants. Guidelines are also given to correct for power loss due to size variation between groups or caseloads.
In planning a study, it is useful to have some information on unknown model parameters. For IRGT trials there is an overview study documenting the intraclass correlations related to groups in psychotherapy trials 33 and for trials involving individual psychotherapy there is an overview of intraclass correlations associated with therapists. 41 However, to plan a maximin design, not only intraclass correlations are relevant, but also the ratio of outcome variances of one treatment versus the other. Researchers should thus be encouraged to report not only the intraclass correlation, but also the total outcome variance, for each treatment arm, thereby facilitating future planning of similar studies.
An assumption of IRGT trials and trials where therapists treat multiple persons is that persons are randomly assigned to groups or therapists after having been randomly allocated to one of two treatments. However, in the second stage of the assignment process nonrandom sorting of individuals into groups or therapists may occur. This may be because of self-assignment to groups or therapists or because geography or other practical constraints do not allow for randomly assigning an individual to a specific group or therapist. 42 Such non-random allocation may be a source of additional outcome variance between groups or therapists on top of that caused by group dynamics or therapist effects. That in turn increases the standard error of the treatment effect estimate. Some strategies to mitigate these effects are discussed by Weiss et al. 42
Our work presented maximin designs for trials with a quantitative outcome. For group treatment trials with binary outcomes, Moerbeek et al. 31 derived optimal designs assuming fixed group sizes. Future research could extend this by developing maximin designs without fixed group sizes. Additionally, studies on three-level designs, where units are allocated to different treatments at the highest level,43–45 are relevant for settings where persons are assigned to (therapy) groups and these groups in turn are assigned to different therapists or counsellors. Some of these studies may involve partial nesting, with three-level nesting in one arm and no nesting in the other. Also, therapists, for example, might serve both groups in one arm and individuals in the arm with individual therapy. 46 Further research into optimal and maximin designs for such nested trials, incorporating cost and outcome variance heterogeneity, would be valuable.
Supplemental Material
sj-docx-1-smm-10.1177_09622802251409388 - Supplemental material for Efficient design of partially nested randomized trials: A maximin approach
Supplemental material, sj-docx-1-smm-10.1177_09622802251409388 for Efficient design of partially nested randomized trials: A maximin approach by Math JJM Candel and Gerard JP van Breukelen in Statistical Methods in Medical Research
Footnotes
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability
Supplemental material
Supplemental material for this article is available online.
Appendix A: Derivation of the optimal design for a two-treatment parallel randomized trial with clustering in one arm
The variance of the treatment effect estimate is given by (see eq. (8) of the main text):
Suppose the total budget
So the budget
Taking the derivative of
Let
The derivative of eq. (A5) with respect to
Taking the derivative of eq. (A5) with respect to
The second derivative is positive for the second solution in eq. (A6),
Rewriting u in terms of the optimal cluster size in treatment G given in eq. (A2), we obtain:
For the optimal
Fixed cluster size
Substituting the expression of
Multiplying the numerator and denominator of eq. (A13) by
Since
Multiplying the numerator and denominator of eq. (A16) by
Fixed number of clusters K: The variance of the treatment effect estimate is:
Since the number of clusters K is fixed, and thus
Eq. (A20) is the expression for the variance of the treatment effect estimate in a randomized controlled trial, where the number of subjects in one arm is
The budget function for the optimal design can now be written as
Substituting
Noting that
