Abstract
Background
This article studies the design of trials that compare three treatment conditions that are delivered by two types of health professionals. The one type of health professional delivers one treatment, and the other type delivers two treatments, hence, this design is a combination of a nested and crossed design. As each health professional treats multiple patients, the data have a nested structure. This nested structure has thus far been ignored in the design of such trials, which may result in an underestimate of the required sample size. In the design stage, the sample sizes should be determined such that a desired power is achieved for each of the three pairwise comparisons, while keeping costs or sample size at a minimum.
Methods
The statistical model that relates outcome to treatment condition and explicitly takes the nested data structure into account is presented. Mathematical expressions that relate sample size to power are derived for each of the three pairwise comparisons on the basis of this model. The cost-efficient design achieves sufficient power for each pairwise comparison at lowest costs. Alternatively, one may minimize the total number of patients. The sample sizes are found numerically and an Internet application is available for this purpose. The design is also compared to a nested design in which each health professional delivers just one treatment.
Results
Mathematical expressions show that this design is more efficient than the nested design. For each pairwise comparison, power increases with the number of health professionals and the number of patients per health professional. The methodology of finding a cost-efficient design is illustrated using a trial that compares treatments for social phobia. The optimal sample sizes reflect the costs for training and supervising psychologists and psychiatrists, and the patient-level costs in the three treatment conditions.
Conclusion
This article provides the methodology for designing trials that compare three treatment conditions while taking the nesting of patients within health professionals into account. As such, it helps to avoid underpowered trials. To use the methodology, a priori estimates of the total outcome variances and intraclass correlation coefficients must be obtained from experts’ opinions or findings in the literature.
Keywords
Introduction
Subjects are often nested within health professionals in trials on the prevention or treatment of addiction, disease or disorder. Examples of health professionals are dentists, surgeons, psychologists and psychiatrists. As health professionals vary with respect to their skills, experience, competence and enthusiasm, it is very likely outcomes of subjects treated by the same health professional are dependent. It is therefore important that a random factor for health professional is included in the model that relates treatment condition to outcome.1–4
Walwyn and Roberts 5 give an overview of developments in trials where treatment is delivered by therapists, and provide a review of different designs that can be encountered in such trials. In the nested design, therapists are nested within treatments, so each therapist delivers just one treatment. Such a design is often chosen to avoid the risk of contamination 6 and may also lower costs since each therapist has to be trained to deliver only one treatment. A parallel can be drawn between a nested design and a cluster randomized trial by equating the cluster in a cluster randomized trial to a therapist in a nested design. The design and analysis of cluster randomized trials have been widely discussed in the statistical literature.7–17
In the partially nested design, there is no therapist involved in one of the treatments, which occurs when the control is a waiting list or self-help. See the statistical literature for analysis methods18–21 and sample size calculations.22–25
In the crossed design, therapists are crossed by treatment, so that each therapist delivers multiple treatments, which makes it a more efficient design than the nested design. 15 Another advantage is that it allows for the estimation of the variability of the treatment effect across therapists. The crossed design is in particular feasible in pharmaceutical trials where the new medication is administered to patients using injections or tablets that differ from the placebo only by the amount of active substance. In the ideal case, double blinding is used so that neither the patient nor the health professional knows which treatment the patient receives. Double blinding may eliminate bias due to preferences or expectations with respect to the effect of medication. A parallel can be drawn between a crossed design and a multisite trial.15,26,27
This overview of designs is not exhaustive. There are trials in the field of mental health that used designs that are a combination of a nested and crossed design, where one type of health professional delivers just one treatment while another type of health professional delivers multiple treatments.28–32 Let us use a trial on treatment of social phobia 29 as an illustrative example. Cognitive therapy was delivered by clinical psychologists, whereas medication and placebo were delivered by psychiatrists. Even if psychologists were licensed to deliver fluoxetine and placebo, it would not be recommendable to let psychologists actually deliver each treatment because it would be difficult for psychologists to not let patients in the fluoxetine or placebo group benefit from cognitive therapy. However, it could be very well feasible to let psychiatrists deliver both fluoxetine and placebo, especially when double blinding is used. A nested design rather than a crossed design is less efficient.
The flow diagram in Figure 1 shows the design is a multi-tiered experimental design since randomization is done in two steps. 33 First, all eligible patients are randomized to a psychologist or psychiatrist. Second, all patients who were randomized to a psychiatrist are randomized to medication or placebo. Those who were randomized to a psychologist receive cognitive therapy and in fact no randomization in the second step is done for these patients, as is indicated by a dashed arrow.

Flow diagram of the multi-tiered experimental design. It is assumed all patients receive allocated treatment. Follow-up and data analysis are not included in this graph. Sample size notation:
In this design, the risk of contamination of patients in the medication and placebo groups by those in the cognitive therapy group is minimized, while the efficiency of the comparison of medication and placebo is maximized. It may be considered an interesting alternative to a nested design and has indeed been used in the field of mental health. However, it is also very relevant for other fields where therapy is provided by one type of health professional and is compared to medication and placebo that are provided by another type. Examples are trials to treat excessive alcohol use, binge eating or hypertension. So, even though the remainder of this article uses an example and terminology from mental health, it is also very relevant for practitioners in other fields.
To my knowledge, there are no papers on power and sample size issues for this type of design. A relevant question in the design phase is how many psychologists, how many psychiatrists and how many patients per psychologists and per psychiatrist are required. It is obvious treatment effects are estimated more efficiently when these sample sizes increase, but in practice, they cannot increase without bounds. As an example, the total number of patients may be limited when treatments for a rare disease are compared. It is therefore needed to study which combination of sample sizes results in adequate power. It is important the nesting of patients within health professionals is taken into account while calculating the sample size as ignoring this nested data structure may result in inadequately powered designs. The aim of this article is to provide methodology to calculate the required sample size in a correct way. As such, it helps researchers to plan their trials such that sufficient power is guaranteed and the costs (or total sample size) are minimized.
Mixed-effects model and statistical power
As outcome scores of patients within the same health professional are dependent, the mixed-effects model should be used for analyzing the data.34–37 In addition to that, the variances between and within health professionals may vary across the two types of health professional and three treatment conditions.2,3,22 The following mixed-effects model for patient i treated by the jth health professional takes dependency and heterogeneity into account
Here,
These dummies are also used to indicate which random effects are associated with each treatment. The random effects
The amount of dependency between outcomes of patients within the same psychologist is quantified by the intraclass correlation coefficient
Fairly simple expressions for the estimators of
and an estimate is obtained when the (co)variance components are replaced by their estimates. The entries in equation (2) depend on the sample sizes at the level of the health professional and patient. The numbers of psychologists and psychiatrists are indicated as
The covariance matrix in equation (2) shows that the estimated mean score for cognitive therapy is independent of the mean estimates in the other two treatments because they are delivered by different health professionals. However, the estimated means for medication and placebo are correlated since both treatments are available within each psychiatrist. The precision of the estimated mean for cognitive therapy depends on the number of psychologists and the number of patients per psychologist, but not on the sample sizes in the other two treatments. The precision of the estimated mean for medication depends on the number of psychiatrists and the number of patients per psychiatrist who receive medication, but not on the number of placebo patients per psychiatrist or the sample sizes in the cognitive therapy group. Similarly, the precision of the mean estimate for placebo is only determined by the number of psychiatrists and the number of placebo patients per psychiatrist.
The trial contains three conditions, hence three pairwise comparisons can be made. The effect of medication versus placebo is estimated by the difference in their average outcomes,
The significance of the difference in means is tested with the test statistic
where
Similar relations between power and effect size can be formulated for the other two pairwise comparisons by making the appropriate changes in equation (4). For the comparison between cognitive therapy and placebo
For the comparison of cognitive therapy and medication
Comparison to a nested design
As both medication and placebo are available within each psychiatrist the variance in equation (3) does not depend on the between-psychiatrist variance component
with
A specific case is a balanced design:
As this value is always <1, the crossed design outperforms the nested design.
Finding the cost-efficient design
Design space
The power of the test for a pairwise comparison depends on the design
Costs of a trial
The costs of a trial depend on the costs for training and supervising health professionals and for treating and measuring patients. The costs for training and supervising one psychologist to deliver cognitive therapy are denoted as
A special case of equation (9) is achieved when
In this case, the total number of patients is used to select the design. This is relevant when the trial compares treatments for a rare disorder and where the number of patients is limited but costs are less relevant.
Finding the cost-efficient design
The cost-efficient design is found by evaluating all possible combinations of sample sizes
Conditional designs
In some studies, one or more sample sizes may be fixed to a constant. The number of patients per health professional may be fixed based on the professionals’ work schedules. The number of health professionals may be fixed due to contracts that were made while planning the trial. Such designs are referred to as conditional optimal designs 38 and they are in general more expensive than the cost-efficient design. They can be found by using the same Internet application.
Example: placebo-controlled comparison of treatments for social phobia
A total of 60 patients with social phobia were randomly assigned to cognitive therapy, fluoxetine plus self-exposure, or placebo plus self-exposure. 29 Each treatment was delivered to 20 patients, and allocation to fluoxetine or placebo was double-blind. Cognitive therapy was delivered by four experienced clinical psychologists, so the average number of patients per psychologists was five. Fluoxetine and the placebo were delivered by four psychiatrists, so on average 10 patients were treated by each psychiatrist.
Patients had up to 16 weekly treatment sessions; measurements on 10 quantitative outcome variables were taken at baseline, halfway treatment and posttest. Analyses were intent to treat. One-way analyses of variance were performed to identify any differences between treatment groups before the start of treatment. One-way analyses of covariance, with pretreatment scores as covariate, were performed at the next two measurements. This article did not mention any strategies to deal with the nesting of patients within psychologists and psychiatrists.
The Beck Anxiety Inventory as measured at posttest will be used to illustrate the design methodology. The average outcome in the cognitive therapy group was
Assume this study is to be replicated in a larger study such that power levels of at least 80% are achieved for all pairwise comparisons while costs are minimized. The estimates for the means and standard deviations as given above are used in finding the design. Values of the intraclass correlations coefficients were not provided. Baldwin et al.
40
investigated intraclass correlation coefficients for a variety of outcomes in psychotherapy trials. The mean estimate for the Beck Depression Inventory was
The following costs are used:
Table 1 lists the three scenarios that are used in this example. In the first scenario, all sample sizes have upper limits, where the maximum number of patients per psychologist is less than the maximum total number of patients per psychiatrist. This reflects the fact that cognitive therapy is more time-consuming to deliver. In the second scenario, the number of health professionals is fixed to a constant, while in the third scenario, the number of patients per health professional is fixed. Hence, in the latter two scenarios, we seek conditional optimal designs.
Description of three scenarios in the example on social phobia.
The cost-efficient designs for these three scenarios are given in Table 2, along with their total sample size, costs and power levels for the three pairwise comparisons. For scenarios 1 and 3, the number of psychologists is lower than the number of psychiatrists, which is not surprising given the higher costs to train and supervise a psychologist. For a similar reason, the number of patients in the placebo group is higher than the number of patients in the medication group. In all three scenarios, a psychiatrist treats more patients than a psychologist.
Cost-efficient designs for the three scenarios in the example on social phobia.
The design for scenario 1 has the lowest costs but the highest total sample size. The costs for the other two scenarios are higher than those for scenario 1 because these are conditional designs. However, the difference in costs is only minor while the conditional designs include fewer patients. For each scenario, the comparison of cognitive therapy versus placebo has highest power, and the power levels for the other two comparisons are about the desired value 0.8. Furthermore, for each scenario, the total sample size is much higher than the total of 60 patients in the original study.
Conclusion and discussion
This article provides the methodology to calculate optimal sample sizes in trials with one or two treatments per health professional. Optimal sample sizes are calculated such that sufficient power is achieved at minimal costs or minimal total sample size. The optimal design does not necessarily assign equal number of patients to each treatment condition, neither is the number of psychologists necessarily equal to the number of psychiatrists. In the illustrative example, the optimal sample sizes reflect the costs for the different treatment conditions and for both types of health professionals.
The sample sizes are calculated based on the mixed model (1) that explicitly takes into account the nesting of patients within health professionals. This model should also be used for analyzing the data once the trial has been executed. Ignoring the hierarchical nature of the data may result in underestimates of the standard errors of treatment effect sizes and hence inflated type I error rates. 41 The specific feature of model (1) is that it needs treatment indicators in its random part to account for heterogeneity.
The mixed model allows the effect of medication versus placebo to vary across psychiatrists. Given that the design is double-blind, one may argue if such variation is plausible in all practical settings. Psychiatrists may vary with respect to the amount of emphasis they put on the importance of treatment adherence. As a result, patients’ treatment compliance, and hence treatment effect estimates in an intention to treat analysis, may vary across psychiatrists. Psychiatrists may also vary with respect to the amount of attention they pay to their patients and how well they are able to reassure them. Such attention and reassurance may be of importance in trials that treat some psychological disorder, such as anxiety. If such attention and reassurance strengthen the effect of medication, then between-psychiatrist variability in attention and reassurance may result in treatment effects that vary across psychiatrists, even in the case of double blinding. However, when the effect of treatment is physiological in nature, then the effect of treatment may probably not vary. As an example, one can think of the effect of growth hormone versus placebo on final body height of adolescents with growth retardation. If there are plausible reasons to assume treatment variation is absent, then the model and Internet application can still be used by setting
The flow diagram in Figure 1 assumes random assignment in both steps. The order of these steps may also be reversed such that patients are first randomized to treatments and subsequently randomized to health professionals. Random allocation of patients to health professionals is important if confounding of therapist variation by patient characteristics is to be avoided. Random assignment is not always possible, for instance, when patients are recruited in real time and allocated to the next available therapist, or when it is practical or desirable to maintain pre-existing therapist–patient allocations. Non-random assignment would not change the data structure or the model, but it may affect the standard errors of intervention effect estimates. See also the section on internal validity in Walwyn and Roberts. 5
It should be noted that each psychologist treats
The optimal design may include a very low number of health professionals. In such cases, the variance components at the level of the health professional may be estimated with bias, which in its turn may have an effect on significance of treatment effects. In such cases, one may consider alternatives to the multilevel model, such as the fixed-effects model. 44
The design is restricted to the case where there is one health professional delivering care to each patient but this is not always the case. There are situations in which patients receive therapy that consists of multiple sessions delivered by therapists of the same type, creating a multiple membership structure.35,45 Another example is an intervention that consists of different components, which are each delivered by therapists of different types, so patients are crossed by therapist. 45 Further levels are introduced when several therapists deliver a group treatment or when patients are nested within pre-existing groups, such as general practices or clinics, that are crossed by therapists. 46
To calculate the optimal sample sizes, the values of the total variances and intraclass correlations in each treatment need to be known a priori. These values are often unknown in the design phase of a trial and have to be replaced by an educated guess from experts’ opinions or findings in the literature. For cluster randomized trials, a large amount of papers that list estimates of intraclass correlation coefficients have been published over the past 20 years. 47 Such papers should also be published for the design that is considered in this article.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship and/or publication of this article.
