Abstract

Introduction
The cluster randomised trial is commonly considered a relatively new research study design.1–3 Here we trace to a few very early reports the idea of comparing interventions applied to groups of individuals, and through the evolution of this idea to the modern-day cluster randomised trial. This has been defined as a comparative study in which the units randomised are pre-existing (natural or self-selected) groups whose members have an identifiable feature in common, and in which outcomes are measured in all, or a representative sample of the individual members of the groups. 1 Summaries of the reports of many of the examples of cluster randomised trials published before the methodological review of this research design by Donner et al. 4 can be viewed in the James Lind Library (http://jameslindlibrary.org/topics/allocation-bias/cluster-allocation/ see Appendix for details of our literature search).
What is a cluster?
The groups used in cluster randomised trials vary widely and range in size from families to entire communities. The common feature shared by members of a cluster may be:
geographical, for example, villages;
5
communities;
6
administrative areas;
7
social, for example, families, households or religious congregations;8–11 educational or occupational, for example, schools or school classes;12–17 professional, for example, all students taught by a specific teacher, or patients treated by a specific clinician.
21
Why use cluster randomised trials?
Cluster randomised trials are well suited and are now commonly used to evaluate public health, health policy and health system interventions. They are ideal for testing interventions when the decision (policy) about whether or not to implement the intervention will be taken on behalf of a group. Cluster randomised trials are also useful when the nature of the intervention carries a high risk of contamination when individuals randomised to different comparison groups are in frequent contact with one another and thus may be influenced (‘contaminated’), in either or both directions, by the alternative treatment(s). Contamination is likely to occur in comparisons of public health promotion interventions within the same community, and of different approaches to healthcare provided by the same clinician to patients under his or her care. In addition to these scientific reasons, cluster designs can also have practical advantages over individual randomisation because of lower implementation costs or administrative convenience.
Early examples of group allocation
The earliest mentions of which we are aware of treatment comparisons in which the intervention was assigned to a group, rather than to an individual, are centuries old. In 1648, Van Helmont proposed a trial of his new methods of treating febrile patients without purging and blood-letting, in which the participants would be put into groups then randomised by ‘casting lots’ to decide which group would receive which of the treatments to be compared. 22 It is unlikely that this trial ever took place, but the idea of cluster randomisation is there.
In 1657, Starkey proposed a trial in defence of van Helmont’s treatment methods, in which groups of patients were to be assigned to be treated by Starkey (according to van Helmont’s methods), or by van Helmont’s critics. Starkey seems to have appreciated that the process of treatment allocation should be designed to prevent confounding by differences between the groups receiving different treatments. He suggested that patients first be divided into groups of 10. Starkey and his opponent should then alternately divide each 10 into two groups of five, allowing those who did not do the dividing to choose one of the groups of five patients. The ‘divider’ should then treat the remaining five patients. As in van Helmont’s proposed trial, the groups of patients did not exist prior to the trial but were created specifically for the trial. 23
Celli’s 1900 trial may be the first in which pre-existing groups were allocated to treatment – an important step towards the modern cluster randomised trial design.9,24 Celli studied whether mosquito netting reduced malaria in households of Italian railway workers. The households were selected (although not randomised) to receive or not receive the intervention. Neighbouring households were used as controls. This trial heralds one of the most common uses of cluster randomised trials today: the evaluation of infectious disease control methods and, in particular, of methods to prevent malaria.
The clinical trial reported by Amberson and his colleagues in 1931, challenging the use of gold for treating pulmonary tuberculosis, was an early trial using randomisation by a single coin toss to allocate two matched comparison groups, either to injections of a gold-containing treatment (sanocrysin), or to control injections of distilled water. In addition, patients and the investigators measuring the trial outcomes were blinded to the participants’ treatment allocation. 25 This trial has sometimes been considered well designed and conducted, but it falls short of the current standards for a cluster randomised trial; the clusters were created for the trial and only two clusters were randomised. 26
Many early cluster randomised trials in non-medical fields were school-based evaluations of educational interventions. Indeed, methodological discussion of the cluster randomised design appears to have begun in 1940 with Lindquist’s book on methods in education research in schools.27,28 Much of what Lindquist wrote, however, also applies to clinical and public health interventions.
Recent developments
Only sparse use of cluster randomised trials was evident before the 1980s. 29 However, the last half century has seen a steady increase in the number of cluster randomised trials published in the medical literature: from one a year in the 1960s, to seven in 1990, when Donner, Brown and Brasher published their methodological review of cluster randomised trials; 4 to over 120 in 2008.
Every pre-1960s cluster randomised trial of which we are aware tested some aspect of infectious disease prevention or treatment.4,30–32 In the 1970s, cluster randomised trials were used extensively for such trials, particularly in low-income countries.33–35 Cluster randomised trials were also recognised as being suitable for evaluating public health interventions aiming to change health behaviour, such as improving dental care, 36 promoting hand-washing 37 and attending for immunisation. 38
The risk of ‘contamination’ between comparison groups is high in studies evaluating screening interventions. In the 1980s two large-scale cluster randomised trials of screening interventions showed how this design can be used to reduce the influence of contamination on the effects of an intervention. A trial by Grant et al. 39 evaluated the effect of routine counting of fetal movement by pregnant women on the likelihood of antepartum stillbirth, and showed how the cluster randomised trial design can be useful for assessing the effects of interventions which would otherwise be compromised by a high risk of contamination. 39 In a Swedish screening mammography study published in 1985 by Tabár et al. 40 contamination among trial communities was reduced by offering screening to communities selected randomly from matched communities, separated by 200 km on average.
The use of cluster can make large scale trials, like trials of screening and other public health interventions, more practicable. 40 These features of cluster randomised trials – reduction of contamination, and practicability of very large-scale public health trials – are also well illustrated in a trial of the effect of vitamin A supplementation on childhood mortality, morbidity, and preschool growth.41–43
Other cluster randomised trials conducted around this time showed that the design is useful for evaluating the impact of multifaceted approaches to health improvement, for example, a trial of nutritional supplementation and maternal education in expectant mothers and infants at risk of malnutrition, 44 and a trial of breast cancer screening methods and the nurses who implemented them. 45
Schools have often been used in public health cluster randomised trials. They are convenient places to implement health education interventions relevant to children and adolescents, such as prevention of tobacco, alcohol and drug use; promotion of sexual health; and primary prevention of chronic disease through promotion of healthy eating and physical activity. Entire schools or classes within schools are ready-made clusters.46–50 School clusters have also been used to evaluate interventions aimed at helping children to become ‘health messengers’, as in a trial assessing whether hypertension education of children had an impact on the blood pressure of their parents. 51
Cluster randomised trials have become recognised as being valuable in evaluating many different types of health system interventions – healthcare delivery,52–54 governance, financial arrangements and implementation strategies. A cluster randomised trial reported by Vogt et al. in 1983 was used to compare methods to increase reporting of notifiable diseases by doctors. 55 This trial also illustrates an important group of cluster randomised trials in which clusters consist of patients treated by the same clinician. These trials are often called ‘professional cluster randomised trials’. A clinician is randomly assigned to the intervention, and the intervention is targeted at individual clinicians – not at his or her patients. Such clinician-targeted interventions often aim to modify the behaviour of healthcare providers in some way, for example, by using clinical guidelines, training or decision support systems.56–59
In some such trials, clinicians implement the intervention without involving patients in the decision, for example, in reporting cases of a notifiable disease, 55 or arranging for medical assistants to screen for and manage patients with hypertension. 52 With other clinician-targeted interventions, the intervention is intended to impact on both clinician practices and on patient outcomes, for example, educational programmes to help physicians improve blood pressure control among their patients. 59
Through the 1990s, the number of published trials including cluster randomisation increased, 29 and the terms ‘group randomized’ (Murray 1998), ‘community randomized’,1,4 and even ‘place randomized’ were all used to describe cluster randomised trials. A BMJ series on statistics in 1997 and 1998 used the term ‘cluster randomised’.60,61 By the early 2000s, with the published extension of the CONSORT statement on reporting guidelines for cluster randomised trials 62 and several reviews of cluster randomised trials,35,63,64 the term ‘cluster randomised trial’ had become the most commonly used term for this design. The publication of important, large-scale, well-conducted cluster randomised trials in this century, such as those evaluating the effects of community groups on birth and other outcomes in poor rural populations,65–69 can be considered as a ‘coming of age’ of the cluster randomised trial design.
Current and future challenges
Study design
As experience with cluster randomised trials has increased over time, difficulties and problems have become apparent. The design (especially with respect to blinding), analysis and conduct of cluster randomised trials are often more complicated than for individually randomised trials. Cluster randomised trials are conducted with as few as one intervention and one control cluster, and insufficient numbers of clusters (inadequate sample sizes) are a persistent problem. We have not included cluster randomised trials with fewer than two clusters in each arm in the James Lind Library, or as examples in this article. Stratification and matching have been used to increase the comparability of the clusters, which helps increase precision with small numbers of clusters. While blinding of participants and outcome assessors is ideal in all randomised trials, it is often difficult or even impossible in cluster randomised trials. The units of randomisation and the units of observation may be different, and this affects informed consent, recruitment, sample sizes, randomisation and analysis.
Study analysis
Cluster randomised trials can be analysed in the same way as any individually randomised trial by each cluster providing one data item into the analysis, for example average blood pressure among all patients of a randomised physician. By using all the individual data points in each cluster in the analysis, the statistical power of a trial can be increased. However, the effect of clustering must be considered. As early as 1940, Lindquist recognised the need to account for clustering in the analysis of cluster randomised trials.27,28 In 1978, Cornfield pointed out the need for special consideration of the statistical features of cluster randomised trials in health research and, in particular, the need to account for between-cluster variation. 70 And as Donner and Klar pointed out in 2000, analysis must also take into consideration variation in cluster size, which is often substantial. 1
Members of clusters are more likely to have similar outcomes than a randomly selected sample of individuals from the same population, particularly when members self-select into a cluster. The most commonly used measure of the degree of similarity among members of a cluster is the intra-class correlation coefficient. The larger the intra-class correlation coefficient, the larger the number clusters of individuals needed to achieve comparable statistical power to a trial using individual randomisation. Analysing a cluster randomised trial without accounting for clustering yields a falsely low estimate of variance and hence inflates statistical significance. And as shown by Kramer et al., 71 loss of statistical power is even more dramatic when the outcome measurements are also clustered with treatment (‘double jeopardy’).
A recent study re-analysing the results from cluster randomised trials of health system interventions using time series methods shows that, if data from cluster randomised trials are analysed without taking account of trends over time, the findings may be misleading. 72 Fretheim and colleagues suggest adding time series approaches to the overall comparison of randomised groups, so as to gauge changes in effect of the intervention over time.
Reporting
The quality of reports of cluster randomised trials has been very variable. Some studies are reported simply as ‘randomised trials’, leaving readers unaware that the unit of randomisation is anything other than the individual. Specific key words that would identify cluster randomised trials are often not provided in abstracts, so full-text publications have to be retrieved to establish whether or not the study reported was a cluster randomised trial. Despite extension of the CONSORT statement on reporting guidelines for cluster randomised trials, 62 the titles or abstracts of 50% of cluster randomised trials still fail to indicate this. 73 Conversely, other papers report to be cluster randomised trials in their titles but, on careful inspection, are clearly not cluster randomised trials. 29
Summing up
Cluster randomised trials have a long history in both educational and health research and, over the last several decades, have assumed an increasing role in rigorous evaluations of complex clinical, public health and health system interventions in which individual randomisation is likely to be ‘contaminated’ by contact among individual participants randomised. Cluster randomised trials can also help overcome the administrative barriers and economic costs inherent in contacting, recruiting and randomising large numbers of individuals. Major challenges include ensuring statistical power by recruiting adequate numbers of clusters, possible use of randomised crossover of clusters,74–76 minimising intra-cluster correlation of outcomes and measurements, and better reporting.
