Abstract

The beginning of medical statistics
The burgeoning support for evidence-based medicine in the 21st century has made clear why physicians must depend on quantitative data generated by clinical trials for decisions about treatment. The value to them of data reported from trials depends in part on the adequacy of a trial's design and how carefully the trial carried out an appropriate design. In addition, if the conclusions reached in a trial are to be reliable, they must be justified by appropriate statistical assessment of the data reported. Today's ubiquity of statistical analysis of data from clinical trials developed mainly during the second-half of the 20th century. But the concept of needing numerical data to justify conclusions about treatments goes back at least three centuries.
An early major attempt to assess with quantitative data the validity of a medical treatment came in the first half of the 18th century, when James Jurin 1 and other English physicians gathered data on differences in mortality from smallpox inoculation as a preventive treatment compared with mortality from naturally-acquired smallpox. 2, 3, 4 Jurin's judgments, however, came simply from inspecting those mortality data and not from critical statistical analysis of the validity of conclusions that might be drawn from them; such statistical methods had not surfaced for applications in medicine. His use of such mortality data can be described as descriptive statistics – crudely put, ‘simply eyeballing the data’.
Jurin's simple statistical assessment of quantitative data was only one of an increasing
number of such assessments in 18th century Britain.
5
But these judgments relied, in essence, on comparing mortality or morbidity
proportions in groups of treated and untreated persons. Even the concept of considering
the value of data for judgments began to be entertained. For example, in 1785 Gilbert
Blane
6
noted that:
‘There is … a great difficulty attending all practical inquiries in
medicine; for in order to ascertain truth, in a manner that is satisfactory to
a mind habituated to chaste investigation, there must be a series of patient
and attentive observations upon a great number of cases, and the different
trials must be varied, weighed, and compared, in order to form a proper
estimate of the real efficacy of different remedies and modes of
treatment.’
Blane's ‘weighed’ suggests the concept of critical statistical assessment, but there is nothing elsewhere in his book to suggest specifically how this should be carried out. There is a hint in Blane's ‘compared’ of the concept of looking for bias, but he does not explain his use of the word.
Relatively simple assessments of treatment data continued into the early 19th century. The most famous among these were the analyses of Pierre-Charles-Alexandre Louis (1787–1872), an eminent clinician in Paris. He used data comparing mortality of patients treated relatively early with blood-letting and others in whom treatment was delayed 7 , 8 to judge the efficacy of the treatment. Louis's reports stimulated advocacy of la methode numerique (the numeric method) for formal judgments on the efficacy of treatments, rather than simply accepting physicians' opinions. His reports provoked intense debates in Paris in the Académie des Sciences in 1835 and in the Académie de Médecine in 1837, one faction lauding the numerical method as a scientific advance and an opposing faction lauding the central importance of a physician's judgment based on the experience of applying a treatment to a particular patient.
The origins of Jules Gavarret's interest in medical statistics
The debates in the Académies did not include any data on the precision or
reproducibility of the mortality rates reported by Louis. There had already been
suggestions by the French mathematicians Pierre-Simon Laplace (1749–1827)
9
and Siméon-Denis Poisson (1781–1840)
10
that le calcul des probabilités (probability calculation) could
be applied to judgments on quantitative data in medicine. Such calculations had already
been applied in various ways – to gambling decisions, insurance risks, demographic data,
juridical questions and astronomical data, amongst others – but not to judgments on
medical treatment. Laplace11 noted in his Essai Philosophique sur les
Probabilités (Philosophical Essay on Probabilities) its potential for its
use in medicine.
‘The probability calculus can make one appreciate the advantages and
disadvantages of the methods used in the speculative sciences. Thus, to
discover the best treatment to use in curing a disease, it is sufficient to
test each treatment on the same number of patients, while keeping all [other]
circumstances perfectly similar. The superiority of the most beneficial
treatment will become more and more evident as this number is increased, and
the calculus will yield the corresponding probability of its benefit and of the
ratio by which it is greater than the others.’
Jules Gavarret, a young physician who had not yet established the reputation he later
attained in Paris medicine, attended the debates in the Académies and applied the
probability calculation to Louis' data to judge the validity of his conclusions on
blood-letting. Gavarret's work surfaced in 1840 in his Principes Généraux de
Statistique Médicale ou Développement des Règles Qui Doivent Présider à Son
Emploi (General Principles of Medical Statistics, or Development of Rules
That Should Govern Their Use).12 It must be noted here that the meaning
elsewhere of ‘statistics’ in medicine of this period was narrower than that of today.
One of today's dictionaries of scientific terms13 defines ‘statistics’ thus:
‘A discipline dealing with methods of obtaining data, analyzing and
summarizing it, and drawing inferences from data samples by the use of
probability theory.’
In Gavarret's time, ‘statistics’ in medicine represented the first half of this definition and not the ‘inferences’ in the second half. In the case of a datum on a mortality rate, an inferential statistical analysis of the data could permit one to judge the probable range of mortality data that would be found with re-sampling of a patient population. As will be seen below, Gavarret's book appears to be the first and pioneering work on how to apply inferential statistics to therapeutic data for critical judgments on the value of therapies. His book has never been translated into English for publication, and awareness of his concepts has been largely limited to historians of medicine and statistics. My translations are not highly literal but are cast, I hope, in today's scientific idiom in English. Parts of the quotations within ‘square brackets’ ([…]) are my judgments on proper explanatory enlargements of Gavarret's own text.
Who was Gavarret and what propelled him to write his Principes?
Louis-Dominique-Jules Gavarret
14
,
15
was born 28 January 1809 in Astaffort, Lot-et Garonne, a town roughly half way
between Bordeaux and Toulouse. In 1829 he entered l'école Polytechnique in Paris and
then in late 1831 went into military service as an artillery officer (‘un
sous-lieutenant d'artillerie’). At the beginning of 1833 he resigned his commission to
begin medical studies with the already eminent Gabriel Andral (1797–1876).
16
Gavarret's collaborative research with Andral centred on chemical studies of
blood and respiration, and they helped to establish clinical chemistry and haematology
as clinical sciences. Andral was among the supporters of Louis and his advocacy of the
numerical method for judgments on treatments, but Andral apparently did not stimulate
Gavarret's statistical inquiry into Louis's data. In his preface to
Principes, Gavarret describes (pp. ix–xi) the stimulus:
‘At the time I began my medical studies, one heard everywhere of
propositions mathematically proved, of laws mathematically established. From
the professor and academician to the lowliest student, everyone spoke the same
way. One had to say, truly, that thanks to the rigor of the new methods of
investigation, nothing could stop the progress of medicine. However, on
considering this more closely, one was not slow to see that if the great
studies of pathological anatomy had made and were making clear every day the
great service of research in establishing the site, and defining diagnostic
criteria, of diseases, therapy was far from making such rapid progress … In the
midst of so many tireless and meritorious students [of new therapies], several
distinguished men perseveringly fought for adopting the use of statistics in
medicine. [Note that here Gavarret is using statistics solely in the
sense of numerical data, what we can call descriptive statistics, and not in the
sense of today's inferential statistics.] It was, they said, the sole
means of collecting the current results of therapies. With no aim but looking
for the truth, I gave myself up to a serious study of works published on this
subject; I followed with the greatest care the arguments made in the medical
journals, and despite all my efforts, it was impossible at first to understand
what seemed to be attached to this discussion. For finally, in the way which
the question was posed and seen, was it seen as debating for one side or the
other? Only to know if one replaced by numeric reports the words often, rarely,
in most cases, et cetera, et cetera? The numerical method, considered from this
narrow point of view, could be taken to mean a simple reform of language, but
it was impossible to see it as a question of scientific method and philosophic
principle.’
Then Gavarret describes the debate in the Académie des Sciences in October 1835 that
stimulated his interest in applying probability calculation to therapeutic data.
‘From the beginning the field of discussion was widened; instead of
considering what up to then was called the numeric method, there was discussion
of the possibility of applying probability calculation to therapeutic research.
The reporter, M. Double, opposed the possibility of this application. Navier,
in an outstanding discourse, treated the principal points of the topic with the
greatest lucidity and closed with conclusions favourable to this kind of
calculation.’
Claude-Louis-Marie-Henri Navier (1785–1836)
17
was a highly regarded mathematician and engineer who had been elected to the
Académie des Sciences in 1824. He had been a member of the faculty of the école
Polytechnique, attended by Gavarret before the start of his short military career and
probably regarded by him as a speaker with real authority (pp xi, xii).
‘From then on I had the conviction that the question of medical statistics
was not a trivial one but right off a question exciting one's further interest.
Navier's discourse made clear his grasp of the subject and the judgments one
could draw from the use of the principles of probability calculation in
therapeutic research.’
On the following page, Gavarret makes clear his dependence on the mathematician Poisson
for applying the probability calculation to medical data.
‘The sources from which I have drawn the principles I develop on this
subject are the course of M. Poisson [in the école
Polytechnique?] and his fine work on the Probabilité des
Jugements.’
Gavarret goes on in his Preface (p xv) to justify further his writing of
Principes. Here is his central stimulus.
‘A final reason that got me to prepare this book is the complete lack of
any work on the use of calculation in medicine. One has been able, it is true,
to properly speak of “statistics”, of “probability calculation”, of “numeric
method”, et cetera, but no one has had the idea of preparing a treatise on this
material. And, nevertheless, is it not indispensable that physicians, strangers
for the most part to the study of higher mathematics, have at their disposal,
stripped of all algebraic formulas, the fundamental principles on which depend
all investigations with medical statistics? I thus tried to do nothing less
than fill this important void up to now in [medical] science.’
The Preface is followed by four Chapters. Following these are six Notes providing the calculation methods and results that have been summarized and discussed in the preceding two chapters. Chapters I and II discuss general principles underlying and justifying Gavarret's more specific discussion in Chapters III and IV, such as the inadequacy of logic for coming to conclusions about results in medicine, the influence of variables on determining ‘facts’ in medicine, and other similar considerations.
Gavarret's definition of the conditions for reliable statistical analysis
Gavarret's Chapter III (p 100), Application of the Law of Large Numbers to
Therapeutic Research, opens with an echo of Laplace's ‘[testing] each
treatment on the same number of patients, while keeping all [other] circumstances
perfectly similar’.
‘Observations, to be legitimately added up, do not have to be identical,
but only related to phenomena whose manifestation was due to the effect of any
cause among a group of possible invariable causes during the whole duration of
the trial … It is thus the group of all possible causes of death and of cure
that affect the patients that one has to make invariable to be able to regard a
medical statistic as containing homogeneous qualities.’
He then specifies (pp 110–2) the five principles that define ‘the sources of all
possible causes of death and cure that affect a patient with a known disease and treated
with a medication’.
‘The individual conditions … All the circumstances that relate to the
age and the sex of the patient, to his temperament, and to his constitution,
to the diseases he has already had, to the state of his health in which the
present affliction presented …
The state of health preceding the development of the illness … The
profession, the social position, the life style of the patients, the
condition of ventilation, of nourishment which they regularly find
themselves, the moral influences that might have affected them.
The hygienic conditions during treatment … The healthiness of the place
in which the patient was cared for, the moral influences that could have
affected him during the duration of the illness, and the exactness with
which the orders of the physician were carried out.
The illness itself … All the causes that relate to the nature of the
illness, to the extent and severity of the organic lesions, to its influence
on all the body's economy, to the time between the onset of the illness and
the beginning of treatment, to the various complications that could develop
during the course of observation.
The therapeutic method used … Not only such and such medication, but
all the means that make up the treatment of the patient. The dose used could
vary depending on the particulars of the cases.'
A few pages further on (pp 116–7) he defines the conditions needed during a trial for
proper statistical judgments.
‘So that a statistic can be considered composed of similar facts, and that
therefore its information can enable us to measure the value of a medication,
the observer has to hold to the following conditions:
The patients have to be drawn exclusively in the same locality
and from the same classes of the population.
The experienced illness has to have a precise diagnosis and
perfect definition. It has to be nosologically well delineated and
separate from the illnesses resembling it most in this group
…
The statistic within the makeup of the illness considered to be
specific has to contain the precise indication of the number of cases
within each of its varieties.
The medication tried has to be clearly formulated, as well as its
main modifications for each of the varieties of the illness.
The medical statistician has to be competent.'
Surely it is clear from these last two excerpts that Gavarret is setting very high standards for the design of trials and the quality of the data that will result from a trial. But it must be kept in mind that Gavarret did not necessarily have in mind prospectively-initiated trials but, perhaps, simply case collections for comparisons of a treatment and no treatment, or of two different treatments. They are standards that some investigators years later attempted to meet in case-control or cohort studies. The practical difficulties in thus trying to meet such standards eventually led to the development of design methods to attempt, instead, to carry out adequate randomized allocations of patients to one of two or more arms of a prospectively-initiated trial, so that influences on the outcome variable studied, other than the treatment itself, would be adequately distributed between, or among, trial arms. The fidelity of the randomization could then be judged by comparing potentially significant prognostic variables other than the treatment(s) under study in the patients in the different arms of the trial. These were developments that did not fully develop for medicine until the second half of the twentieth century. I have found no evidence in Principes suggesting that Gavarret anticipated the need for alternation or rotation to treatments being compared. There is, however, a suggestion that he anticipated the need for randomization in selection of patients for treatment in each arm of a treatment regimen. This can be seen below in the quotation from Principes, page 156.
Gavarret applies Poisson's probability calculation to Louis' mortality data. In his
Chapter III, to demonstrate the weakness of ‘the numerical method’ when the probable
correctness of a datum from a numerical summary of outcomes is not known (p 141),
Gavarret applies le calcul des probabilités (probability calculation)
of Poisson to data from Louis on ‘cures’ and ‘death’ in 140 patients treated with
blood-letting.
‘To finally finish with numerical reports considered as a measure of the
influence exercised by a medication, look at what errors have been recently
produced by the physicians recommending the use of statistics. We will be
satisfied with one example of such and we will allow ourselves to select those
of M. Louis, which represent the largest number of observations. This skilful
observer, in his research on typhoid fever, has tried to classify the treatment
of this disease in carrying out the most detailed analysis of 140 cases of this
disease. The observed subjects are divided thus:
52 died
88 cured
140 total patients
Thus the mean mortality is, in these cases, equal to 0.37143.
This is to say, in taking this datum as the average measure of the cure
used, one has to take as shown that 37143 persons among 100,000 patients died
or approximately 37 of 100 patients. [In today's terms, 37%.]
If, with the help of the principles of the law of numbers, we seek to determine
the extent of possible error that may weaken such a conclusion, we find it
equal to 0.11550.
Thus all that we have learned from the work of M. Louis, in reality, is
that under the influence of the curative means used in his 140 observations the
number of deaths must vary between 48 493 and 25 593 per 100 000 patients, or
approximately between 49 and 26 per 100 patients.'
Here Gavarret tells us that we cannot take 37% [0.37143] to necessarily be the true mortality with this treatment. With a sample of 140 patients, the true mortality could be, with a probability of just over 99.9 to 1, as high as 49% or as low as 26%, depending on which 140 patients made up the sample. He goes on demonstrate that, as the number of patients sampled goes up, the range of probably correct values for mortality narrows. He used term ‘les limites d'oscillation’ (limits of oscillation) when referring to this range of calculated values.
Judging differences in the effects of two different medications
In the closing ten pages of Chapter III, Gavarret shows (pp 156–7) how calculating ‘the
limit of possible errors’ enables one to judge whether a difference between two average
mortality rates in two groups of patients – each group having been received different
treatments – probably represents a true differential effect between the treatments.
‘Suppose that in an epidemic, 500 patients chosen at random have been
assigned to a medication, and 500 others also chosen at random to a different
treatment, one were to obtain the following results:
The difference between the two mortalities is thus: 10,000 of 100,00
patients
In thus following the logic of M. Louis, one concludes from it that the
first medication is preferable to the second.
To estimate the true value of this difference, in calculating the limit of
possible errors in this case, we will find it equal to 7,694 of 100,00
patients.
The difference between the mortalities found is greater in this a
posteriori conclusion, so thus we must recognize that in reality the first
medication is superior to the second.'
The mathematical details of these calculations are in Principes's Note D (pp 287–8).
Gavarret's summary of his views
The closing chapter of Principes takes up application of the
probability calculation (determining the limits of possible errors) to medical
demographic, non-therapeutic questions and need not be considered here. It is followed,
in addition to the technical Notes mentioned above, by Gavarret's General Conclusions
(pp 245–8), the first seven of which are relevant to his views on judging the effects of
therapies.
If we now take a quick look at all of the considerations we have developed
in the course of this work, we are led to put forth the following propositions
as definitely demonstrated.
PROPOSITION I
The rules of logic are inadequate for judging the effect of a given
medication in an equally given disease and for classifying the
medications recommended for this same disease in the order according to
their effects.
PROPOSITION II
The principles of the law of large numbers are strictly applicable
to therapeutic research and are solely able to solve these two important
problems.
PROPOSITION III
The mean mortality, as provided by statistics, is never the exact
and precise representation of the effect of the treatment tried but
approaches it as the number of observations is increased.
PROPOSITION IV
A therapeutic law ensuing from the comparison of a small number of
observations may be so far from the truth that it merits no degree of
confidence in any case whatsoever.
PROPOSITION V
A therapeutic law can never be absolute; its application can always
oscillate between certain limits which are all the narrower the more the
collected observations are multiplied and which one can determine with
the aid of the numbers constituting the statistics that have provided the
law.
PROPOSITION VI
To be able to decide in favour of one treatment over another, it is
not enough that the method yields better results but that the difference
found must also exceed a certain limit, the value of which is a function
of the number of observations.
PROPOSITION VII
Any difference between the obtained results that is below this
limit, while this limit decreases as the number of observations
increases, must be disregarded and deemed void.’
The remaining three propositions summarize Gavarret's conclusions in Chapter IV.
The reception and fate of Gavarret's Principes
Gavarret's book got wide attention in Europe and some notice in the USA.
18
The one American visibly impressed with Gavarret's thinking was Elisha Bartlett
(1804–1855).
19
Bartlett had spent many months observing medicine in Paris after his graduation
in 1826 and apparently read French without difficulty. In his 1844 An Essay on
the Philosophy of Medical Science
20
,
21
he gives seven pages to summarizing, with clear approval, Gavarret's principles
needed for the collection of satisfactory data on treatments and the need for large
numbers of cases in trials.
‘… I shall enter into a somewhat detailed exposition of the subject before
us … the treatment of disease; for the materials of which I am almost entirely
indebted to the admirable treatise of M. Gavarret, on Medical
Statistics.’
20
European views of Gavarret's Principes through the remainder of the
19th century differed widely, ranging from approval with advocacy of Gavarret's views
and procedures, to complaints that they did not contribute to ‘science’.
17
Yet by the end of the nineteenth century, Gavarret's application of the
probability calculation for inferential statistical judgments on treatments seems to
have sunk out of sight. It seems to have been unknown to (or unacknowledged by) the
early twentieth century founders of the principles on which medical statistical concepts
and methods now stand; for example, such men as William Sealy Gosset (‘Student’)
(1876–1937), Ronald Aylmer Fisher (1890–1962), Jerzy Neyman (1894–1981) and Austin
Bradford Hill (1897–1991). In particular, the 1934 paper in which Neyman
22
advanced the concept of confidence intervals and coined the term does not
mention Gavarret and his use of the calculation of limits of possible error.
‘The form of this solution [of the problem of finding ‘the distribution of
certain characters in repeated samples’] consists in determining certain
intervals, which I propose to call confidence intervals … in which we may
assume are contained the values of the estimated characters of the
population.’
It seems fair to say that the concept in Gavarret's pioneering use of the probability calculation for estimation of the limits of possible error (limits of oscillation) for inferential judgments on data on treatment surfaced and became widely applied in the form of today's closely-related and widely applied confidence intervals more than a century after publication of his Principes. Gavarret's statistical work has been well represented in late 20th-century histories of statistics. 23 Both David Lilienfeld 24 and Alvan Feinstein 25 specifically refer to his probability calculation as producing the equivalent of today's confidence interval.
It is important to emphasize, however, that Gavarret's book was not ‘pioneering’ in the sense of offering an innovation in statistical concepts and methods. But it does appear to be ‘pioneering’ in that, as far I know, it was the first application of a method in inferential statistics to a medical question about the efficacy of a treatment. As I have already noted above, the probability calculation had been applied in other fields but not, apparently, to questions in clinical medicine.
The most prominent of early advocates of a wider use of confidence intervals in reporting medical research was Kenneth Rothman, 26 founding editor of the journal Epidemiology. The strongest advocates among English-language clinical journals were the Annals of Internal Medicine in the USA and the British Medical Journal in the UK, with their publication in the 1980s and early 1990s of a number of articles of advocacy. 27 In general, the position of the advocates was that the confidence interval gave more information on the reported variables than the then much more widely used hypothesis testing and resulting P values.
The remainder of Gavarret's life
After publication of Principes, Gavarret did not return to work on medical statistics. He continued work with Andral on blood chemistry and respiratory physiology. His later work covered many topics in biophysics and physiology, including acoustic and phonation phenomena, heat production and vision, and his productivity is manifested in the 85 references to his publications in the Index-Catalogue of the Library of the Surgeon General's Office. 28 Gavarret developed a prominent place in Parisian medicine and medical education, serving for a term as president of l'Académie de Médecine. His scientific eminence was recognized in 1847 with the decoration of the Legion of Honor, and fully marked in 1886 with his appointment as a Commander of the Legion. He died in 1890, four years later, aged 91.
Conclusion
Why did the view of what was needed in critically-judged numerical data for conclusions on the value of treatments – a view exemplified by Gavarret's Principes – take over a century to be applied for estimating the value or lack of value of a treatment? The answer is probably complex. Clearly, innovations in treatment, aside from surgery, were scarce through the nineteenth century until the years around the beginning of the twentieth century. Established treatments were apparently generally accepted as justified during a continuing reign of authoritarianism in medicine: what the professor says is right and need not be challenged by running trials to re-examine a treatment's efficacy. Perhaps these emerged only when economic pressures and the growth of complexity in treatment possibilities began to tell us that we had better be sure of the value of what we do in medicine. There may be other reasons yet to be teased out by historians.
Footnotes
DECLARATIONS
