Abstract

The detailed and exceptionally clear 1948 report of the British Medical Research Council's randomized trial of streptomycin for pulmonary tuberculosis is rightly regarded as a landmark in the history of clinical trials. 1 Of crucial importance, it describes how a treatment allocation schedule (based on random number tables) was concealed, thus preventing foreknowledge of allocations among those making decisions about patient participation. 2,3
Although the report of the streptomycin trial is rightly iconic, the attention it has attracted has led many historians to overlook earlier evidence relevant to the evolution of unbiased prospective allocation of patients to treatment comparison groups. This has led some of them to assume that random allocation to treatment comparison groups reflected the development of statistical theory by RA Fisher. 3,4 In fact, for half a century before the MRC trial and Fisher's writings, some medical practitioners wishing to evaluate the effects of treatments had used alternate allocation to assemble similar groups of patients, and so ensure that like would be compared with like. And these developments reflected an even earlier history during which some clinicians and others began to conceptualize what was needed for tests of treatments to be fair. 5–7
Appreciation of the need to compare like with like
More than a millennium ago, some clinicians appreciated that comparisons are needed to arrive at causal inferences about the effects of medical treatments. In the 9th century CE, the Persian physician Al-Razi (Rhazes) explained why he recommended that bloodletting be used to treat the symptoms of meningitis:
‘…I once saved one group [of patients] by it, while I intentionally neglected [to bleed] another group. By doing that, I wished to reach a conclusion.’ 8,9
Other people recognized centuries ago that, if treatment comparisons were going to be fair, like must be compared with like. Francisco Petrarch, in a letter to a fellow poet, wrote in 1364:
‘I solemnly affirm and believe, if a hundred or a thousand men of the
‘…to form a just comparison and calculate right in this case, the circumstances of the patients,
‘….I took twelve patients in the scurvy…
Introduction of methods to ensure that like will be compared with like
Methods to ensure that like will be compared with like in fair treatment comparisons were proposed at least as early as the 17th century. Reflecting a time-honoured device for ensuring fairness,
13
Van Helmont
14
and Starkey
15
proposed casting lots to decide which patients should be assigned to orthodox physicians (to be bled and purged), and which to their own, alternative treatments. A century later, Anton Mesmer challenged his orthodox physician detractors to cast lots to decide which patients should be treated by them, and which by him, using ‘animal magnetizm’:
In order to avoid any later argument and all the questions that could be raised about differences in age, in temperament, in diseases, in their symptoms etc.
Some accounts of the use of unbiased treatment allocation appear early in the 19th century. In his 1816 Edinburgh doctoral thesis, Alexander Lesassier Hamilton reports having used rotation to allocate sick soldiers to different treatments at a base hospital in Elvas during the Peninsular War.
17,18
Patients were allocated either to his care; or to the care of a surgeon colleague who, like him, did not use bloodletting; or to a surgeon colleague who did use bleeding.
It seems reasonable to speculate that concern to compare like with like, and so to ‘avoid the imputation of selection’, explains the increasing use of alternate allocation to treatment comparison groups during the late 19th and early 20th centuries (in animals
23
as well as in humans). Writers in several countries emphasized the need to compare like with like. These included, for example, Jules Gavarret in France.
24,25
Elisha Bartlett in the USA,
25,26
William Guy In Britain,
27
and Alfred Ephraim in Germany (1890–1894).
28
A quotation from an 1877 Danish doctoral thesis on tracheotomy for diphtheria gives a flavour of the developing thinking about the grounds for causal inferences about the effects of treatments:
‘If any surgeon with material as large as chief physician Holmer could really take the decision, as a test, to
During the early decades of the 20th century, alternate allocation became increasingly common as a feature of research design, and was designated formally using specific terms in several languages. In 1902, in an article published in Muenchener Mediziner Wochenschrift referring to alternate allocation trials on treatments for plague in India, Dr G Polverini of the Institute of Experimental Pathology in Florence deemed ‘die alternative Methode’ as the most appropriate ‘for assessing the healing power of a serum in humans’. 30 Six years later, one of the physicians responsible for the trials in India – Nasserwanji Hormusji Choksy – referred to the method they had been using as ‘the alternate case method’ and ‘rational alternation’. 31 In France at about the same time, Maurice Cousin 32 and his thesis supervisor Arnold Netter 33 referred to their use of ‘la méthode alternante’ in studies to assess ways of reducing serum sickness. In the USA, Jesse Bullowa 34 and Russell Cecil and Norman Plummer 35 referred to ‘alternation’ and to 'the alternate case method', respectively, in connection with their trials to assess the effects of serum treatment in pneumonia. And in Austria, Julius Wagner-Jauregg decided to ‘baptize’ the method ‘Simultanmethode’ in German after applying it in studies using fever to treat syphilis. 36
It is worth noting that this designation of alternation as a methodological principle by clinician researchers antedated Ronald Fisher's promotion of the theoretical statistical qualities of random allocation in The Design of Experiments. 37 Indeed, although there are examples of random allocation being used during the 1930s and early 1940s (see, for example, Doull; 38 Theobald; 39 Bell 40 ), use of the word ‘random’ to describe treatment allocation sometimes actually referred to alternation, 41 even in the writings of Austin Bradford Hill, the statistician most closely associated with the adoption of randomization in Britain. 42,2,3
Where was alternate allocation used, in whom, and to test which interventions?
Pre-1948, alternate allocation trials were done across the world. To date, we have found examples in Algeria, Austria, Australia, Britain, Denmark, Egypt, Finland, France, Germany, India, Italy, Malaya, Netherlands, Sudan, the USA, and Vietnam. Among these, a few programmes of alternate allocation trials stand out. Those done in India by Waldemar Haffkine and Nasserwanji Hormusji Choksy at the turn of the century on vaccines and treatments for plague and cholera are early examples of separate studies done within a series of planned controlled trials. 43–46 In the USA (and in New York and Boston in particular), Jesse Bullowa, William Park, Russell Cecil, Max Finland and others were responsible for a remarkable series of trials testing serum treatment for pneumonia during the third and fourth decades of the 20th century. 47 The only example of anything comparable in Britain appears to have been a cluster of trials done by Thomas Anderson and his colleagues at Ruchill Hospital in Glasgow in the late 1930s, to assess the effects of sulphonamides in a variety of infections. 48
Unsurprisingly, given the overwhelming importance of infectious diseases at the time, many alternate allocation trials were done to assess the effects of interventions to prevent or treat infections. The target infections included bacillary dysentery, cerebrospinal fever, cholera, the common cold, diphtheria, erysipelas, gonorrhoea, impetigo, infant diarrhoea, infectious hepatitis, influenza, malaria, mastitis, measles, meningococcal meningitis, plague, pneumonia, poliomyelitis, puerperal fever, scarlet fever, syphilis, tonsillitis, trichomoniasis, Tsutsugamushi disease, tuberculosis, typhoid fever, typhus, and whooping-cough. The interventions tested included antibiotics, antiseptics, diet, Eucalyptus oil, gamma globulin, physical therapies, proteins and amino acids, specific sera, sulphonamides and other drugs, ‘therapeutic malaria’, vaccines, and vitamins.
Alternate allocation trials were also used to assess the effects of nutritional and other interventions to promote health and growth: unpolished and polished rice for beri-beri; germinated beans compared with lemon juice for scurvy; vitamin B1 for polyneuritis in alcohol addicts; and vitamins, minerals, milk and ultraviolet light to promote child growth and development. In pregnancy and childbirth, alternate allocation was used in studies to assess the effects of micronutrients to prevent anaemia and toxaemia; salt for leg cramps; analgesics for pain in labour; perineal shaving and postpartum care of the perineum; ergot alkaloids to reduce postpartum haemorrhage; treatments for acute mastitis and deficient lactation and for preventing sore nipples; and the effects of knee-chest position and postural exercises on postpartum uterine retroversion.
‘The alternate case method’ was also used to challenge claims that surgery was an effective treatment for psychosis, and to put some ‘old wives’ treatments' to the test: a Dr Middleton in Edinburgh reported that he had alternated tannic acid with ‘strong tea of the lumberjack variety’ 49 for treating scalds in children, with results suggesting that the preferences of ‘old wives’ were as likely to be valid as those of medical experts.
More research is needed to increase understanding of the reasons for the explosion of alternate allocation studies from the 1890s onwards. One explanation may have been the gradual adoption of probabilistic, statistical thinking by some physicians. 24,25,28,50 However, even Almroth Wright, who made a career out of dismissing the application of statistics to medicine in the early part of the 20th century, had started doing alternate allocation studies by the early 1910s. 51
What is clear is that, at least as early as the second decade of the 20th century, there were some very clear accounts of the principles that need to be observed when testing treatments. For example, in a paper entitled The crucial test of therapeutic evidence, which was based on an address given at the 1917 annual meeting of the American Medical Association, Torald Sollmann alluded to the unacceptability of biased under-reporting of commercial tests of drugs, and called for independent evaluations, using alternation to control allocation bias and blinding to reduce observer bias. 52 A study published by Adolf Bingel the following year provides a nice example of these two principles being applied in practice. 53–55
The gradual move from alternation to random allocation
It is clear that, contrary to a common assumption, 3 randomized trials did not suddenly fill a methodological vacuum beginning in 1948. Long before the concept of random allocation was introduced by statisticians, some doctors who wanted to compare preventive and therapeutic strategies recognized that comparison groups generated by alternate allocation would yield more credible evidence than comparison groups based on clinical decisions. There is some evidence of statistical expertise being brought to bear in a few of these early trials. For example, in 1912, a formal statistical test was applied to data from one of Choksy's many plague studies. 56 And during the 1920s, Louis Dublin, an actuary at the Metropolitan Life Insurance Company, seems likely to have been influential in the design and analysis of a series of methodologically sophisticated alternate allocation studies done to evaluate the effects of serum therapy for pneumonia. 47,57
So what led to the gradual move away from alternation to random allocation? The principal disadvantage of alternate allocation is that it usually means that those making decisions about who will participate in treatment comparisons have foreknowledge of upcoming allocations, and this sometimes leads them to undermine an allocation schedule that, in principle, should be unbiased.
In 1933, when assessing the reasons for baseline imbalances in a Medical Research Council trial of serum treatment for pneumonia, 58 Austin Bradford Hill learned how alternation could be subverted by those recruiting patients. 59 A dozen years later, Bradford Hill was one of the three-man team designing the MRC's randomized trial of streptomycin. One of the others was Philip D'Arcy Hart. In a trial that D'Arcy Hart had designed for the Medical Research Council in 1943, allocation had been by rotation to one of four groups – two antibiotic, and two placebo – with the specific purpose of preventing foreknowledge of treatment allocations. 60,61 Although one of the reasons that the streptomycin trial has become iconic is that the treatment allocation schedule was based on random number tables, 1 this was not for any esoteric statistical reason. 62 It was because successful concealment of allocation schedules and prevention of foreknowledge of upcoming allocations among clinicians entering patients in trials is more likely to be achieved with allocation schedules based on random numbers than with schedules using alternation. 2,3
The need to fill gaps in the history of controlled trials
Over most of the past two decades, our identification of pre-1948 reports of controlled trials using potentially unbiased treatment allocation schedules has been ‘opportunistic’. More recently, we have been able to use full text digital searches of the British Medical Journal, the Lancet, the Journal of the American Medical Association, the New England Journal of Medicine and the Proceedings of the Royal Society of Medicine, from the inceptions of the journals to 1947. In addition, a hand search of the Indian Medical Gazette from 1890 to 1910 was prompted by some of the important information about trials done in India at the turn of the 20th century. Table 1 (below) provides a summary of our findings as they stand currently.
Pre-1948 reports of controlled trials using potentially unbiased treatment allocation schedules
The methods we have used to identify pre-1948 reports of controlled trials using potentially unbiased treatment allocation schedules are adequate to illustrate the use of this important element of trial design before the widespread adoption of randomization from the late 1940s onwards. However, the numbers in the Table are certainly minimum estimates of numerators, and they lack denominators to allow some estimate of the proportion of all articles on treatment evaluation which have had this feature of trial design. We invite readers to draw our attention to any other pre-1948 reports of trials using potentially unbiased treatment allocation schedules which are not currently included at
Medical historians have not given adequate attention to the use of unbiased treatment allocation before random allocation began to be adopted more widely from the middle of the 20th century onwards. Some relevant material exists in doctoral theses of which we are aware, but most of this relates to developments in Britain (
We have provided some tantalizing examples of relevant material published in Danish, French and German. Research funders and researchers in the countries where these languages are used need to recognize how important it is that they contribute to the investigation of an era of fundamental importance in the international development of fair tests of treatments. We hope that our findings will prompt interest in and support for research to document and understand the efforts made to develop reliable tests of treatments in a number of countries during the first half of the 20th century.
DECLARATIONS
Competing interests
None declared
Funding
None
Ethical approval
Not applicable
Guarantor
Iain Chalmers
Contributorship
All authors contributed to searches of the literature for eligible reports and preparation of the manuscript
Acknowledgements
We dedicate this article to the memory of Harry Marks, a generous adviser to the James Lind Library, and a leading and inspiring historian of the development of the randomized clinical trial, who died in 2011. We thank Ulrich Tröhler and Christian Gluud for translating material published in French, German and Danish; Rosie Wild and Jane Ferrie for independent hand searches of the Indian Medical Gazette; Patricia Atkinson, Rebecca Brice, and Olivia Clarke for clerical help; and Doug Altman, Mike Clarke, Christian Gluud, Iain Milne and Ulrich Tröhler for helpful comments on earlier drafts. Additional material for this article is available from The James Lind Library website: (
