Abstract

Setting the scene
It’s a challenge to give an account of the ‘history’ of something that I have been a part of in recent decades. If I write about something that took place 25 years ago, is that historical, when it feels so near in time to me? When I teach undergraduates and show an article that was published before they were born, is that history or part of the here and now? And, when I look back on lectures from early in my career that mentioned things from the previous century, they felt, and were, some distance in the past. Writing this in August 2015, the last century is only 15 years behind us, and we need to refer to the century before last when we think about the 1800s. In less than 90 years, the ‘present’ will be the last ‘century’ and current and upcoming decades will be ‘history’. That history is being made right now in relation to evidence synthesis. And the last couple of decades have seen developments that will become seen as pivotal.
In this essay, I try to capture some thoughts on the last hundred or more years, writing partly as a historian with a strong interest in how ideas evolve in parallel and independently, and also as someone who has been part of the history for 25 years and been fortunate to work with others who have been part of it for much longer. I will look at who and what happened ‘early’ rather than engage in a competition to find who was ‘first’ and try to provide a framework to help readers to think about history when it is still being made around us. I will try to highlight how several key elements for the successful conduct and uptake of evidence synthesis to assess treatment effects came together over recent decades to produce the upsurge in this activity, and a step change about 20 years ago. I draw on examples from the James Lind Library (www.JamesLindLibrary.org) as well as other accounts of various aspects of the history of evidence synthesis in health and social care,1–6 including the influences of women in this history. 7 This account should not itself be considered to be a ‘systematic review’. It is a collection of illustrative examples to describe the journey that evidence synthesis has taken over more than 100 years, and to highlight examples of how the quality of this research has changed over time.8–12 I am sure that examples have been missed, some of which may be particularly important, and I should welcome information on any such examples and suggestions for improvements.
What does it mean?
There are many terms used for evidence synthesis, just as there are many terms for ‘evidence’. This article focuses to a large extent on systematic reviews, in which a question is formulated, eligible studies are identified and appraised, and the findings are combined (sometimes mathematically) to summarise the effects, and perhaps to draw conclusions about the implications for future practice and research. The emphasis will be on research into the effects of interventions in health and social care, but it is important to note that there is a growing body of systematic reviews of other key areas for decision-making. 12 These include diagnostic accuracy and prognosis, and the use of evidence from other types of investigation including qualitative research, animal studies and modelling. This essay might, therefore, be considered to be a history of research synthesis with a focus on systematic reviews, and the important role played by a particular type of review: the Cochrane Review. The history of more statistical aspects of meta-analyses is dealt with in a companion article in the James Lind Library. 13
An illustration of how historical analyses have been transformed by the living history of modern developments is the work involved in the review of documents from the past; 30 years ago, someone wanting to know if a particular term had been used in 19th century medical journals would need to go to library, take the journals from the shelves and work through them methodically. Now, we go online and run a search of the digitised archives in seconds. This makes it much easier for us to find today’s terms in the 19th century medical literature, but we still need to apply critical reasoning to consider whether the terms mean the same. This can make document review easier if a term was invented for the specific purpose of our interest and has no pre-history. This is the case with the term ‘meta-analysis’ but not with ‘systematic review’. However, early uses of the latter can provide insight into why we use it now, and how people did similar things in the past. To begin this journey, a search of the British Medical Journal digitised archive for the phrase ‘systematic review’ finds an article from 1867 discussing the recently published edited reports of St Bartholomew’s and St George’s hospitals, which notes Daunted by the difficulty of any systematic review of these collections of monographs, we shall only take a flying run through the pages; warning our readers, that they will do well to indemnify themselves by procuring the volumes for systematic perusal.
14
Understanding the purpose of evidence syntheses, to understand why people do them
There are many reasons for doing evidence synthesis. These include the need to minimise bias by bringing together all of the available evidence on a particular topic, so that the emphasis is on the totality of the evidence and not merely a sample of the studies, highlighted because of their results. There is also a need to reduce the effects of the play of chance, by increasing the statistical power through the incorporation of as much data on the topic as possible, which can also be achieved by bringing together all of the available evidence on a particular topic but also requires that the data from that evidence can be combined mathematically, in meta-analyses. The history of the latter is dealt with partly in the companion article by Keith O’Rourke.
13
Some of the reasons for doing evidence synthesis overlap, but some are mutually exclusive. Some have changed in emphasis over time. However, the following list helps to orientate any work that wishes to look at why people have done and continue to do them. The examples that follow highlight some of these reasons, and these reasons help to provide a basis for understanding why an evidence synthesis, rather than a single study or a haphazard collection of studies, became so important:
To organise a collection of the evidence; To appraise the quality of the evidence; To minimise bias, including avoiding undue emphasis on individual studies; To compare and contrast similar studies; To combine their findings, if possible and appropriate, to increase statistical power; To improve access to the evidence; To identify cost-effective interventions; To design better studies in the future.
As a starting point for considering the scientific value of evidence synthesis, let’s go back to the 1880s and a presidential address to the British Association for the Advancement of Science by Lord Rayleigh
20
in Montreal. He said: If, as is sometimes supposed, science consisted in nothing but the laborious accumulation of facts, it would soon come to a standstill, crushed, as it were, under its own weight. The suggestion of a new idea, or the detection of a law, supersedes much that has previously been a burden on the memory, and by introducing order and coherence facilitates the retention of the remainder in an available form. Two processes are thus at work side by side, the reception of new material and the digestion and assimilation of the old. One remark, however, should be made. The work which deserves, but I am afraid does not always receive, the most credit is that in which discovery and explanation go hand in hand, in which not only are new facts presented, but their relation to old ones is pointed out. I look forward to such an organisation of the literary records of medicine that a puzzled worker in any part of the civilized world shall in an hour be able to gain a knowledge pertaining to a subject of the experience of every other man in the world.
James Lind: An early trial and early evidence synthesis
In his 1753 treatise on scurvy, not only did James Lind
25
describe his celebrated trial on scurvy but he also provided what the cover subtitle describes as a ‘Critical and Chronological View of what has been published on the subject’. He outlines the need for this with the words: As it is no easy matter to root out prejudices … it became requisite to exhibit a full and impartial view of what had hitherto been published on the scurvy, and that in a chronological order, by which the sources of these mistakes may be detected. Indeed, before the subject could be set in a clear and proper light, it was necessary to remove a great deal of rubbish.
By way of illustration from the 1970s, in 1971, Feldman 27 wrote that systematically reviewing and integrating research evidence ‘may be considered a type of research in its own right – one using a characteristic set of research techniques and methods’. In the same year, Light and Smith 28 noted that it was impossible to address some hypotheses other than through analysis of variations among related studies, and that valid information and insights could not be expected to result from this process if it depended on the usual, scientifically undisciplined approach to reviews. Eugene Garfield 29 drew attention to the importance of scientific review articles in advancing original research, showing how review articles had high citation rates and review journals had high impact factors. He proposed a new profession, ‘scientific reviewer’, and his Institute for Scientific Information went on to co-sponsor (with Annual Reviews Inc.) an annual award for ‘Excellence in Scientific Reviewing’, administered by the National Academy of Sciences. 30
Mathematics, statistics and meta-analyses
One of the early examples cited by Chalmers et al.
2
of an evidence synthesis highlight how the use of statistical techniques helped to introduce scientific rigour to evidence synthesis. In the British Medical Journal of 5 November 1904, Karl Pearson, director of the Biometric Laboratory at University College London, pooled data from five studies of immunity and six studies of mortality among soldiers serving in India and South Africa to investigate the effects of a vaccine against typhoid. He calculated mean values across the two groups of study, noting: Many of the groups in the South African experience are far too small to allow of any definite opinion being formed at all, having regard to the size of the probable error involved. Accordingly, it was needful to group them into larger series. Even thus the material appears to be so heterogeneous, and the results so irregular, that it must be doubtful how much weight be attributed to the different results.
31
The comparison of the statistics of more than one experiment suggests a counterpart: the combination of them for an estimate of total significance.
32
My major interest currently is in what we have come to call – not for want of a less pretentious name – the meta-analysis of research. The term is a bit grand, but it is precise, and apt, and in the spirit of ‘metamathematics’, ‘meta-psychology’, and ‘meta-evaluation’. Meta-analysis refers to the analysis of analyses.
33
The purpose of the present research has three parts: (1) to identify and collect all studies that tested the effects of counseling and psychotherapy; (2) to determine the magnitude of effect of the therapy in each study; and (3) to compare the effects of different types of therapy and relate the size of effect to the characteristics of the therapy (e.g., diagnosis of patient, training of therapist) and of the study. Meta-analysis, the integration of research through statistical analysis of the analyses of individual studies,
33
was used to investigate the problem.
34
Glass introduced an approach called meta-analysis in which the properties of several studies could be recorded in quantitative terms and descriptive statistics applied to derive an overall conclusion. Thus, reviewing the published works ceases to require the judgment of Solomon and becomes a quasiempirical procedure. We used the meta-analytic technique to review non-pharmacological treatments for hypertension.
One of the things that subsequently accompanied these statistical techniques was a new way to display the findings of the meta-analyses: a graph that is now sometimes called the forest plot. 1 This shows the results for each study as a single line of data and graphical image, with a symbol at the bottom to indicate the overall average. Freiman et al. 37 displayed the results of 71 ‘negative’ trials with horizontal lines for the confidence interval for each study and a mark to show the point estimate.
Lewis 38 produced something similar to display a meta-analysis of the effects of beta blockers on mortality. The Antiplatelet Trialists’ Collaboration 39 published what would now be widely recognised as a forest plot in a systematic review of the prevention of vascular disease by antiplatelet therapy. This used squares of different sizes to show the weight of each study in the meta-analysis and the point estimates for the odds ratio from each trial, with the associated confidence intervals running through these. A rhombus, whose width was its confidence interval, provided the average at the bottom of the plot. 39
Systematic reviews as we know them today
In the month before Glass used the term ‘meta-analysis’ at the American Educational Research Association meeting, Shaikh et al.
40
published their article called a ‘A systematic review of the literature on evaluative studies on tonsillectomy and adenoidectomy’. They outline their purpose as being: to review the English language literature pertaining to evaluation of [tonsillectomy and adenoidectomy] with a particular emphasis on an assessment of the scientific merit of studies which have attempted to determine the efficacy of this procedure. Aside from the high cost and lack of clear cut evidence of therapeutic efficacy, there is morbidity and mortality associated with tonsillectomy and adenoidectomy. … In view of the cost, financial and human, as well as the lack of evidence clearly supporting the continued performance of this procedure, it is suggested that a prospective, properly randomized controlled study be undertaken and that the methodologic pitfalls annotated in our review be guarded against. … In this era of escalating health care costs, society can only afford therapies which have been demonstrated to be of benefit.
40
The routine use of postoperative irradiation in early breast cancer must be seriously questioned. Survival data argue against its use, despite the local effect on recurrence rates. If the routine use of prophylactic local radiotherapy after radical mastectomy were stopped, survival might increase and resources might be saved. These are minor and insignificant differences, but in most studies the severity of symptoms was significantly worse in the patients who received the placebo. … All differences in severity and duration were eliminated by analyzing only the data from those who did not know which drug they were taking. Since there are no data on the long-term toxicity of ascorbic acid when given in doses of 1g or more per day, it is concluded that the minor benefits of questionable validity are not worth the potential risk, no matter how small that might be.
Collaboration and the 1980s
The following example from the start of the 1970s introduces the concept of the collaborative overview, in which researchers share their data. This need for researchers to collaborate together to ensure progress and reduce waste
44
had been highlighted in the 1950s by Kety.
52
This approach to research synthesis became more common during the following decade. In 1970, in an early example of an individual participant data meta-analysis,
53
the International Anticoagulant Review Group54 collected centrally and analysed original records for nearly 2500 patients from 9 of 10 identified trials to assess the effects of anticoagulant therapies after myocardial infarction. They wrote: Although we recognised that the best solution would be a new collaborative controlled trial in a large number of patients, we decided that this was, at that time, quite impracticable. As a potentially useful and simple alternative we agreed on a systematic review of the data on individual patients pooled from all the adequately controlled trials that had been published recently. Since the future treatment of many women might be importantly affected by this – or a further – overview of all available trials those meeting agreed to explore the possibility of extending their collaboration to include the central review of individual patient data.
57
The spirit of collaboration to resolve uncertainties in healthcare in the 1980s extended beyond the establishment of groups of researchers willing to share individual participant data for collaborative meta-analyses. A notable example is the considerable international collaboration that led to the preparation of a large collection of systematic reviews of controlled trials relevant to perinatal care,63,64 and the use of electronic media to update and correct the reviews when necessary.
65
Looking back two decades later, Daniel Fox
5
wrote: The influence … on policy was mainly a result of … powerful blending of the rhetoric of scientific and polemical discourse, especially but not exclusively in ECPC; a growing constituency for systematic reviews as a source of ‘evidence-based’ health care among clinicians, journalists, and consumers in many countries; and recognition by significant policymakers who allocate resources to and within the health sector that systematic reviews could contribute to making health care more effective and to containing the growth of costs.
Cochrane Collaboration
Towards the end of the 1970s, in what might be considered to be a rallying call for evidence synthesis,
66
Archie Cochrane had written: It is surely a great criticism of our profession that we have not organised a critical summary, by speciality or subspeciality, adapted periodically, of all relevant randomised controlled trials.
67
The systematic review of the randomised trials of obstetric practice that is presented in this book is a new achievement. It represents a real milestone in the history of randomised trials and in the evaluation of care, and I hope that it will be widely copied by other medical specialties.
68
In 1995, the Collaboration’s publishing partner, Update Software released the first issue of the Cochrane Database of Systematic Reviews. 3 From 50 full Cochrane reviews in that first year, the number has grown to more than 6000 in 2015. The history of evidence synthesis took another major step in 1998, when the Database went onto the internet and, now, in its partnership with Wiley-Blackwell, the Collaboration publishes the full collection of reviews in the Cochrane Library online, with new and updated reviews appearing every few hours, rather than in quarterly or monthly bundles (www.cochranelibrary.com). The Collaboration itself has also grown considerably, from 77 people at the first Cochrane Colloquium in October 1993 to more than 30,000 in more than 100 countries (www.cochrane.org). 70
Growth
Although the Cochrane Collaboration remains the world’s largest single producer of systematic reviews, its output now accounts for only a small minority of the global output of evidence syntheses. Moher et al. 12 estimated that Cochrane reviews made up approximately 500 of the 2500 systematic reviews published each year. More recently, Bastian et al. 4 used a variety of search strategies to show how steady growth in the number of evidence syntheses from the 1990s had transformed into a surge in recent years. Their graph clearly shows this, and it is important to note that what, at first sight, might look like a cumulative count of the number of systematic reviews found by the different types of search is actually the count for articles published in each single year, showing that, for non-Cochrane reviews in particular, each year saw more publications than the previous year. They estimated that 4000 reviews were being published annually by 2010 and predicted that this would continue to grow. This has been the case, and a search of PubMed in April 2015 finds 6313 articles published in 2014, using the Publication Type term meta-analysis. There are many more to come. For example, the international, prospective register of systematic reviews, PROSPERO, established in 2011 71 is likely to surpass 10,000 records by the end of 2015 (www.crd.york.ac.uk/PROSPERO).
When the present and future have become history
I conclude by thinking forward to the next century. How will evidence synthesis in our current decades be viewed? What will be regarded as pivotal moments, step changes or gradual evolution? Some candidates that historians of the future might look to are:
the increased use of prospective registries of trials to make it easier to find what trials have been done;
72
increased automation of the systematic review process;
73
greater access to the data from clinical trials
62
and its use in individual participant data meta-analyses;
74
greater use of material submitted to drug regulators;
75
the use of new statistical techniques such as network meta-analyses;76,77 use of meta-epidemiology to improve the design and conduct of new studies;
78
use of systematic reviews of animal research to inform research in humans;79,80 improvements in ways to summarise reviews and make them more accessible;81–83 the use of core outcome sets;
84
the conduct of empirical research into the methods for doing research and reviews of these studies;
85
and perhaps, most importantly, even more recognition of the need for and benefits of systematic reviews as a way to justify and interpret new trials, and reduce waste.
44
The past hundred or more years have seen several developments in the science and practice of evidence synthesis. The last 20–30 years have seen important step changes in the numbers of these syntheses, and in the techniques to prepare and maintain them. The underpinning scientific rationale continues to resonate with the words of Lord Rayleigh. 20 The practical benefits of making it easier for people to make well-informed decisions and choices mean that Gould’s 21 vision of much improved access to knowledge and Kety’s 52 hope for greater collaboration among researchers may have been achieved.
