Abstract

Introduction
The UK Medical Research Council (MRC) had its 100th birthday in 2013, and its ‘Centenary Timeline’ (http://www.centenary.mrc.ac.uk/) contains the following brief reference to the Council’s role in the development of clinical trials:
MRC scientists developed what is today the gold standard for clinical trial design while testing streptomycin to treat pulmonary tuberculosis.
The website directs visitors to a British Medical Journal video made in 2009, in which Colin Blakemore, a neurophysiologist and former MRC Chief Executive, speaks to John Crofton (a pioneer of tuberculosis trials) and me about how randomisation and blinding in clinical trials has helped to generate reliable evidence to inform clinical practice.
Surprisingly, the MRC itself has been curiously silent about the enduring value of its role in developing clinical trial methods and about the important methodological legacy left by the director of its Statistical Research Unit – Austin Bradford Hill. Stephen Lock, 1 a former editor of the British Medical Journal, has suggested that ‘the randomised controlled trial is a British invention’ and that Bradford Hill should have had a Nobel prize for his key role in helping to put medicine on a rational scientific footing.
Although the MRC draws attention to some of the influential randomised trials that it has funded, it does not really celebrate the key role it played in the 1950s in developing and applying clinical trial methods (see Appendix 1). The MRC’s achievements in this sphere had become clear by the 1970s, yet a two-volume, 700-page history of the MRC published in 1975 (made available on the Centenary website) assigns a mere five pages to ‘Clinical evaluation of remedies’ and fails to highlight the MRC’s role in developing scientifically more robust study designs. 2 This is rather like referring only to uses of the polymerase chain reaction without referring to the fundamental importance of developing the method itself. Furthermore, reference to the important emergence of multicentre controlled trials is recognised in just two lines 3 in the 83-page report of a Wellcome Witness Seminar on clinical research in Britain between 1950 and 1980. 4
How might the apparent reluctance of the MRC (and others) to take credit for something so creditworthy be explained? Research by Desirée Cox-Maximov, 5 Ben Toth, 6 Martin Edwards 7 and Keith Williams 8 helps to explain the Council’s lack of interest in controlled clinical trials during the 1930s. More research is needed to gather relevant data and understand the MRC’s continuing lukewarm celebration of its role in developing clinical trial methods. I hope this article will help to prompt such research.
The MRC’s first multicentre controlled clinical trial – a bit of a shambles
Methods to improve the design of clinical trials took an important step forward with the adoption of alternate allocation schedules to create similar treatment comparison groups. Alternation began to be used in earnest at the beginning of the 20th century,
9
notably in research on plague and cholera in India.10–12 The BMJ’s response to the Plague Commission’s emphasis on this feature of research design suggests that some people at least felt that it was no longer necessary to keep repeating this obviously desirable feature of reliable clinical trials: [The Commission] lay the chief stress upon the fallacies resulting from an improper selection of control cases. Their lengthy and laboured criticisms on this matter are uninteresting. It is quite obvious that some of the statistics above quoted are unconvincing. The only important point the Commissioners bring out is that the one indisputable guarantee of the efficacy of a serum is its success when applied on the “alternate method,” every personal factor in the selection of cases being rigidly excluded.
13
Whatever the reasons may have been for apparently discounting the evidence from the United States, the MRC embarked on what turned out to be a poorly coordinated effort to recast initially separate initiatives in Aberdeen, Edinburgh, Glasgow and London as a multicentre trial, centrally controlled by the Council. In 1931, a year after the Wellcome Physiology Research Laboratory had made serum available to Edinburgh, Aberdeen and St Bartholomew’s Hospital in London, Walter Fletcher, secretary of the MRC, passed responsibility for organizing and managing a multicentre trial to the newly formed Therapeutic Trials Committee to control this research,7,19 which had been convened to facilitate relations between drug companies, researchers and clinicians.
The first meeting of the Committee recognised that more data were needed and attempted to include a research group from Glasgow. It drew up a standard scheme of enquiry, which included standardised case and control selection and alternate allocation. It is not clear why the MRC recommended alternate allocation. There is no evidence that it did so on the basis of statistical advice, but there may have been some appreciation that having an alternate scheme at each trial centre would make it easier to combine or compare results. 6
The four centres were asked to submit their data to MRC staff, who forwarded them to Professor Thomas Elliott, director of the Medical Unit at University College Hospital in London. On 10 November 1933, Professor Elliott chaired a conference to discuss the results, and the MRC asked him to prepare a report for publication, taking into account the variety of interpretations of the data expressed by the investigators (papers at the Public Record Office FD1/2372. Serum treatment of pneumonia).
Some commentators questioned whether the research had actually been needed. As Dr John Cowan of Glasgow wrote in a letter to the Secretary of the MRC: On the facts available in U.S.A. and at home serum seems to me to be proved to be beneficial in [Type] I and probably proven in [Type] II. It should be available in consequence in ALL hospitals. Why have so many folk - in London and here too - fought shy of it? Why are not Barts etc all using it? The days of controls are no longer possible: it is not fair to them (John Cowan to FHK Green, 17 November 1933). Nine cases of Type I pneumonia in the control series died, and only one in the treated. We are unable to explain these excellent results except: (1) on the grounds of the beneficial effects of serum, and (2) that these results are exaggerated by chance, owing to the small number of cases involved (LSP Davidson to A Landsborough Thompson, 24 November 1933). I have told Professor Elliott that the evidence for the ‘miracle of Aberdeen’ appears to be unassailable and he has replied that, this being so, it must clearly be a case of go to Peebles for pleasure: go to Aberdeen if you get pneumonia (FHK Green to LSP Davidson, 28 November 1933).
Data in the report indicate that the alternate allocation scheme had not been rigorously applied. Whatever Bradford Hill’s report concluded, it is clear that it must have been devastating: FHK Green, Secretary of the MRC, deemed it so damning that it was ‘to be kept, not only from public scrutiny, but even from the investigators themselves’.7,19
The published report of a flawed trial – unexpectedly very good in parts
Bradford Hill’s criticism of the study almost certainly (see below) led the MRC 22 to ask him to help draft the report of this unsatisfactory multicentre trial. Some aspects of the report reveal a methodological sophistication which was not evident among members of the Therapeutic Trials Committee. 6 In 1988, Jan Vandenbroucke 23 noted that the report contains ‘a beautiful discussion of selection and comparability of treatment groups’. The section entitled ‘Selection of Cases for Treatment’ notes that (i) the effects of some treatments are so dramatic and constant that carefully controlled research is not necessary; (ii) serum treatment for pneumonia is not such a treatment; (iii) trying to match patients treated with serum with control patients (to ensure that the two comparison groups were alike in all the respects that mattered) is impractical; and (iv) assigning cases alternately to either a serum group or a control group addressed the need to have two comparable groups of patients.
The opening paragraph of the section reads as follows: The good results of insulin on patients with diabetes or of liver treatment in pernicious anaemia are so constant that the trial of these remedies in a very few cases was enough to establish their value. With the antiserum treatment of lobar pneumonia the conditions are very different. The action of the serum is only that of a partial factor for good, and its influence may be overwhelmed by an infection that has been allowed several days to establish its dominance in the patient, or by other complicating factors that weaken the patient’s resistance. In order to measure precisely what this partial benefit may be it would be necessary to take two groups of cases of identical severity and initial history and compare the sickness and the fatality in each, the one being treated with serum and the other serving as a control. But this is impracticable, for very few cases, even of “Type 1” lobar pneumonia, are quite alike, and a sufficient number of similar cases could never be got together under one observer and under similar conditions. Some American workers have sought to avoid this difficulty by using a special system of ratings for the various harmful features of the disease, thus expressing each patient’s numerical value in reference to a common standard. Such differentiation seemed too intricate, and perhaps too much a matter of personal judgement, for the present inquiry. If a straightforward comparison of treated cases with controls, under the average conditions whereby patients succeed one another in the wards of a hospital, could not reveal any advantage for those treated by serum, then common sense would conclude that the use of this remedy should be disregarded in the routine of practical medicine. The method consequently agreed upon for London, Edinburgh and Aberdeen was that alternate cases of lobar pneumonia, taken simply in the order of their admission to hospital, should be used respectively for serum treatment and controls. So far as possible both were treated in the same wards and under the care of the same physicians. In the independent inquiry at Glasgow, however, the “serum” cases were treated in the Royal Infirmary, and a series of patients of the same social stratum, admitted during the same period to the Belvedere Isolation Hospital under the care of one physician, served as the control group. It is clear that there may be serious fallacies in any system which contrasts a group of serum treated patients with a control group drawn from a different stratum of the population, or with a control group in a previous year, when the severity of the prevailing pneumonia might have been different. Certain principles of selection were laid down so as to make the data derived from the centres homogeneous, and to exclude from the comparison patients in whom the serum could not be expected to have any effect. For the latter reason all patients admitted later than the fifth day of illness were excluded from the inquiry. Also all patients dying within twenty-four hours of admission to hospital were taken out of the series, though the evident severity of their illness would not have prevented their inclusion at first, either in the control or in the serum group. No case of pneumonia complicated by other obvious disease, such as gross nephritis, advanced heart disease, diabetes, etc., was accepted for either group. All forms diagnosed as bronchopneumonia were also excluded. That these limitations were desirable was agreed upon by all the workers at a preliminary conference on the subject. It will be appreciated, however, that, with such restrictions, it was difficult in three years to obtain fully adequate data for statistical purposes. Sex was disregarded, but the question of age was too important to be neglected. Table II from the present series illustrates afresh the well-known fact that the fatality of lobar pneumonia tends to be much greater over the age of 40 than in younger persons. The fortuitous inclusion of a few more elderly patients in one group than the other might influence unfairly the final figures for comparison. It was therefore decided to omit from the series all patients under the age of 20 and over the age of 60, and to classify the remainder into broad age groups. It will be noted that this plan still left altogether unregulated the chance scatter of distribution of patients with severe or mild pneumonia into either the serum or the control groups, and also of those for treatment early or relatively late in the progress of the disease. It was thought better not to attempt a deliberate sorting of cases in respect of mildness or severity, but to trust that the distortion of chance scatter would become almost negligible in a fairly large number of cases. Reference to a possible influence of the “severity factor” on the results is, however, made later in the report. Subject to the criteria mentioned above, patients at London and Aberdeen were placed in the groups for serum treatment, or for control, alternately in the order of their admission to hospital without selection as to age or severity. At Edinburgh the same general rules and criteria were observed, and there was no selection of cases for serum treatment. But in some wards of the General Infirmary serum was not used throughout the whole period of the inquiry, and consequently the patients from these wards overload the number of controls. In the other wards the alternate case plan was maintained to the end. At Glasgow the alternate case plan was not used, but patients in one hospital were treated with serum and those in another hospital served as controls. Hence it is only at Aberdeen and London that the serum treated cases equal the control cases in number. The variation in results at the different centres cannot be explained, but they show the difficulties in the way of accurately evaluating a treatment of this nature on the basis of small numbers of cases.
In 1988, I sent Bradford Hill a copy of Vandenbroucke’s article, and he responded in a letter to me as follows: Thank you for sending me the Dutch article on the history of the R.C.T. I am interested in his comment on the M.R.C. Therapeutic Trials Committee’s report on the serum treatment of lobar pneumonia which contains “a beautiful discussion of selection and comparability of treatment groups & that this came before the publication of Fisher’s Design of Experiments.” I feel certain that I wrote that para and I had learned from Pearson & Greenwood & Yule (vide the references No 21 & 22). I had applied that teaching to the M.R.C’s trial of a vaccine against whooping cough and was itching to apply it in the clinical field. Streptomycin provided the opportunity. Of course later I may have been influenced by Fisher but not very much - in fact in his famous ‘tea and milk’ experiment I think he was wrong [Austin Bradford Hill to Iain Chalmers, 7 August 1988].
The methodological legacy of the MRC’s first multicentre clinical trial
For a decade after the MRC’s initial foray into multicentre clinical trials, it made no further attempts to do any more such studies. 24 Nevertheless, the lessons learned from this imperfectly conducted trial did pave the way for the methodologically robust trials that were to become a hallmark of the MRC’s work in the 1940s and 1950s.
The limitations of the MRC’s first multicentre clinical trial reflected the lack of relevant methodological experience among members of the MRC Therapeutic Trials Committee. As Ben Toth has noted: During its existence [the MRC Therapeutic Trials Committee] did not organise one rigorous comparative clinical trial, despite prima facie evidence of the problems of not doing so. None of the factors that were later to be recognised as vital to producing meaningful evaluations of therapies were advocated by the TTC.
6
The lessons from the trial of serum for pneumonia are nevertheless likely to have been very important in leading Bradford Hill and some others within the MRC to go on to design large, methodologically robust trials using concealed allocation schedules. Four years later, in discussing the planning and interpretation of experiments in the Lancet and in the first edition of his book Principles of Medical Statistics, Bradford Hill 30 states that the allocation of alternate cases to the treated and control groups ‘is often satisfactory’ because ‘in the long run (emphasis in the original) we can fairly rely upon this random allotment (my emphasis) of the patients to equalise in the two groups the distribution of other characteristics that may be important’.
This reference to alternate allocation as if it was random allocation was to continue for many years (see Appendix 1).31–33 Peter Armitage
34
has commented that Bradford Hill’s initial failure to distinguish clearly between alternation and randomisation was due partly to an underestimate of the danger of selection bias and partly to a feeling that alternation would be easier to swallow than randomisation. In an article published half a century later, Bradford Hill
35
wrote: … I was trying to persuade the doctors to come into controlled trials in the very simplest form and I might have scared them off … I thought it would be better to get doctors to walk first, before I tried to get them to run.
Bradford Hill’s text implies that these differences reflect chance; but he may have suspected that they reflected biases introduced by failure to adhere strictly to the alternate allocation scheme. Indeed, in spite of his insistence that alternate allocation must be strictly applied, from the first edition of his book onwards, Bradford Hill 36 does not comment on the fact that the totals of 159 and 163 patients in the serum and control groups are clearly incompatible with strict alternate allocation.
The imperfectly conducted but carefully assessed and reported MRC multicentre trial of serum treatment for pneumonia seems likely to have played a key role in one of the most important methodological advances in the history of clinical trials.37–40 Steps were taken subsequently to conceal allocation schedules from those recruiting participants to prevent foreknowledge of treatment allocations.
The first multicentre MRC trial to do this was the trial of patulin for the common cold, which was designed and run by Philip D’Arcy Hart.41–43 As he said to me in an interview in 2003: Everyone had thought we would use alternation, and we thought we were very clever in setting up a scheme with two patulin groups and two placebo groups using letters to designate each of the four groups, then using rotation to allocate people to the different groups … We thought we were doing something completely new. We wanted to muddle people up. In fact we succeeded in muddling ourselves up. We didn’t always remember what the letters stood for. None of us was a statistician, but we felt that the patulin trial was the first decently controlled trial the MRC had done (Philip D’Arcy Hart, interview with Iain Chalmers, 2 May 2003).
International recognition of the MRC’s role in developing the science of clinical trials
Was Stephen Lock 1 justified in suggesting that ‘the randomised controlled trial is a British invention’? The streptomycin trial was certainly not the first trial to use allocation based on random numbers,50,51 but it has become iconic and is widely seen as ushering in a new age of clinical trials. The report of the streptomycin trial deserves its iconic status because it is exceptionally clearly written and describes the measures taken to prevent foreknowledge of allocations.
The trial also heralded the beginning of a substantial programme of clinical trials addressing a wide variety of questions (Appendix 1), organised under the aegis of one funder, and exploiting the new opportunities created by creation of a National Health Service. As far as I am aware, there are no examples in other countries of comparable trial development programmes with these features.52,53
The MRC did not deliver the randomised controlled trial to the world fully formed. The trial of serum treatments for pneumonia, 22 and the patulin trial 41 provided invaluable learning. Each multicentre trial was designed and run under the aegis of a steering committee, which almost always included a member of staff from the MRC Statistical Research Unit – if not Austin Bradford Hill, then Peter Armitage, Richard Doll, John Knowelden, Donald Reid or Ian Sutherland. However, the learning continued throughout the 1950s. Appendix 1 shows how key methodological aspects of the MRC’s multicentre trials were reported, with descriptions ranging from less than 30 words to more than 300 words in length, and the names of representatives of the Statistical Research Unit who were members of trial planning committees. The quotations from the articles reveal that the distinction between random and alternate allocation and the need to conceal allocation schedules from those making decisions about trial recruitment were not always made as clear as they might have been.
An early example of international recognition of the MRC’s pioneering work in the design and management of clinical trials was Harvard University’s 1952 invitation to Bradford Hill 54 to speak about ‘The Clinical Trial’, and the New England Journal of Medicine’s decision to publish his talk prominently. International respect for Bradford Hill and other contributors to the MRC programme of clinical trials was made clear in 1959 when the Council for International Organizations of Medical Sciences (CIOMS), which had been established under the joint auspices of UNESCO and WHO, asked Bradford Hill to organise a conference on ‘Controlled Clinical Trials’. The meeting was held between 23 and 27 November 1959, in Vienna. Unfortunately, CIOMS does not have any documents relating to the meeting (Sev Fluss, email to IC, 2 August 2013), so it is unclear why Vienna was selected, or who, apart from the speakers, attended.
All those presenting papers were British doctors and statisticians who, together, brought a wide range of practical experiences of clinical trials to the conference. The conference covered general issues, including ethics; aspects of design, management and analysis; trials of surgical as well as medical interventions; and exemplar trials in acute infections, pulmonary tuberculosis, rheumatoid arthritis, coronary thrombosis and cancer. Given the methodological focus of this paper, Peter Armitage’s 55 account of ‘The construction of comparable groups’ is of particular relevance.
The meeting generated an unexpectedly large demand for the background papers prepared for it, so, the following year, these were published as books in English
56
– Controlled Clinical Trials, and in French
57
– Les essais thérapeutiques cliniques. These books might reasonably be regarded as the earliest textbooks about clinical trials. Two years later, Bradford Hill
58
drew on the practical experience that had been acquired between 1948 and 1960 in Statistical methods in clinical and preventive medicine. In that book, he was clearer about what was needed to avoid allocation bias: The appropriate cases having been accepted, they are allocated at random to one or another of the treatments under study – usually by the use of random sampling numbers ... So that the observer may not be influenced in his decision as to whether or not a patient should be brought into the trial … it is sometimes wise to deny him any prior knowledge of the treatment which the patient will receive in the event of acceptance.
58
I hope I have shown that the MRC’s ‘Centenary Timeline’ is wrong to suggest that ‘MRC scientists developed the randomized controlled trial design between 1940 and 1949’. The MRC’s contribution is far more substantial than this implies, and it was independent of the statistical reasons for using randomisation. It is high time that the Council made more of its achievement.
