Abstract

Two comments from outside and inside Germany
The contributions made by German writers discussed in the previous article in this series had not passed unnoticed. A conscientious young medical historian doctor sensed that issues of evaluation and probability were in the air:
A young German insider engaged in an overview of the methods available in clinical research, particularly in therapeutics.
As Oesterlen and Schweig had done some 30 years previously, Martius begun by clarifying the confused terminology. He analysed the French and German works (Radicke was not mentioned!) by putting them in the wider historical context of the theories of cognition: the term ‘induction’, he wrote, was often used without distinguishing whether ‘logical’, ‘numerical’ or ‘experimental’ induction was meant. Numerical induction he understood as being based on statistics and probability calculus. For, although they had been developed separately, statistics and the calculus of probabilities could be summarised under the common term ‘numerical method’. Indeed, they complemented each other necessarily and happily […]. The calculus of probabilities needs for its application materials collected according to the strict rules of statistics and the latter, without the calculus, would not […] always find its critical utilization and the elaboration of which it is capable (Transl. from Martius, 1881, p. 349).
As to the fundamentals on which the true ‘numerical method’ rested, Martius identified some open questions. He wrote that Gavarret’s famous probability ratio of 212/213 (99.5%) had been chosen arbitrarily on the basis of Poisson’s formulas, as had Hirschberg’s simplification by fixing a ratio of 9/10. This showed the arbitrariness and unreliability of such ratios and of the whole process: which of these haphazardly proposed probabilities excluded the hazard? Here was Martius’s answer: To remedy this undeniable drawback, Liebermeister now intends - by dropping Poisson’s formulas completely and, departing from other preconditions - to develop new formulas, that can serve to calculate, with certitude and precision, the degree of probability with which the hazard is excluded. And this for any […] observational material, be it ever so small […provided] the comparability of the cases, this eternal crux of all statistical data collections, can be demonstrated (Transl. from Martius, 1881, pp. 375–376). … they completely neglect the law of large numbers, and they offer nothing but the reflection, expressed in numbers of probability, that when one ignores the nature of the process in course the best thing to do is, faute de mieux, to stick to true, existing successes (Trans. from Martius, 1881, p. 376). the highest and most consummate level of all research methods usable in therapeutics. Rather it is and remains a makeshift, albeit a very important one […], that is undoubtedly worth an even deeper foundation and more extensive application (Transl. from Martius, 1878, p. 1185).
When reading these two testimonies one will realise that formal, mathematical probabilistic reasoning had clearly made an impact on its authors. But on the practical side, the consequences were limited, while on the theoretical, new requirements for scientific evaluation were identified – for the new century.
Towards the fin-de-siècle
Unsystematically compiled statistics continued to be worked up and interpreted in the manner of shopkeepers, and without additionally calculating probabilities despite what Martius had called for. So, Diagnoses must have been made using the same diagnostic methods, which is particularly difficult when cases are assembled from various sources. Adherence to treatments must be strictly observed. Trials of treatment should be conducted over sufficient duration.
Moreover, the quest for untreated cases for the equally necessary comparisons was not new. But, in practice, they were difficult to find. If lack of treatment seemed inhuman, it could be justified because most treatments had actually not been demonstrated to be useful. ‘Non-adherence [to these precepts] was being seen every day and lead to delusive therapeutic-statistical conclusions’ (p. 715), but they were as difficult to fulfil as they were indispensable.
Ephraim concluded that those who deemed these requirements insurmountable must be aware that they are renouncing trustworthy therapeutic knowledge. He noted that ‘to substantiate the efficacy of mercury in syphilis, of quinine in malaria …, one might perhaps not need statistics’ (p. 711). Yet, reliable identification of less dramatic treatment effects could only be assured by the results of statistical research. However, he did not mention the calculation of probabilities as a complementary method of evaluation, thus once more ignoring terminological precision.
Like Ephraim, Martius and many others before them, Rosenbach criticised the misuse of therapeutic statistics: Although everybody now knows that small numbers prove absolutely nothing, although everyone knows […Poisson’s] law of large numbers, yet people preferentially use small numbers, and even many of those who with aplomb only exploit large numbers are in error about their bearing in that it is not the large numbers as such that matter, but the circumstances [over time] in which they are generated (Transl. from Rosenbach, 1896, p. 913).
As a consequence, statisticians’ concentration on the Genesungsquotient (recovery rate) was misleading, since both numerator (number of cures) and denominator (number of diseased) were often based on variable and subjective criteria. In short, these statistics served only to reinforce preconceived opinions and frequently, when there was no comparison group, to fall into the trap of the post-hoc-ergo-propter hoc fallacy (pp. 912–913).
As new elements, he drew attention to bias mechanisms, namely:
The selection of cases by enthusiasts who, with what they refer to cynically as ‘scientific thoroughness’, eliminate all unsuitable cases so that, under the new method, deaths must, in reality, no longer occur. (That they still happen is, by the way,… always the fault of unhappy external circumstances, never imputable to the procedure …. Or it is the impossibility of using the panacea sufficiently promptly). The historical insight that this procedure of unevenly distributing light [on successful cases] and shadow [on failures] has repeated itself in the history of medicine countless times, and it never loses its impression on credulous minds who do not want, or are unable to understand that highly astounding results can be brought about by the simple ‘slight-of-hand’ (legerdemain) of a new scientific definition (pp. 912–913). A historical comparison was only admissible if the forms of an epidemic remained essentially unchanged over the years. In the case of diphtheria, for example, where Behring’s serum-therapy had been introduced since 1893, he demanded ‘that one should try once again to obtain a large series of observations [of patients treated] without serum-therapy’ over many years (Transl. from Rosenbach, 1899, p. 256).
Of course, many people did not understand or like Rosenbach’s method-based objections. And maybe they did not like him: had he not, while still an aspiring Privatdozent, written quite aggressively in a book on the Foundations, Duties and Limits of Therapeutics (Grundlagen, Aufgaben und Grenzen der Therapie, Rosenbach, 1891): Statistics – what would they not have sanctioned in the hands of able arrangers. [And later] The history of medicine furnishes enough examples of friends and foes fighting with equal obstinacy and equal certainty for a dogma established on the basis of such contradictory [statistical] results (Transl. from Rosenbach, 1891, pp. 66,183). always to treat one case with and the next without the promoted medicine, whether the medicine is tested in all places at the same time, or in different places one after the other. And of course, this holds not only for the treatment of diphtheria (Transl. from Rosenbach, 1905, p. 233).
Germany by 1900
According to Petersen and many of his contemporaries (and later historians), the overall situation in Germany in the 19th century may have been similar in practice to that in France and Britain: if considered at all, the actuarial method of counting and statistical analysis prevailed. In particular, however, Louis’s numerical method as applied to the evaluation of therapies was also constructively criticised right from the beginning, and appropriate efforts to ameliorate it were made. In the end, selection bias, making results positive by changing criteria and quantification of preconceived ideas were decried as misuses (Rosenbach, 1896). The notion of probability was understood by many clinicians, and a few of them actually struggled to apply formal mathematical probability and its consequences throughout the whole second half of the 19th century. The influence of Poisson and particularly of his medical pupil, Gavarret, was pivotal.
Footnotes
Declarations
Acknowledgements
My heartfelt thanks to: Iain Chalmers, without whose unflinching encouragement, gentle whip, intellectual and unrenounceable practical help over the years, I would neither have begun nor ever terminated this work; Thomas Schlich, who critically and helpfully read all previous versions; Robert Matthews, whose help with mathematical matters was very welcome; Brigitte Wanner and Christian Wyniger of the Institute of Social and Preventive Medicine, Bern, who helped me, together with Patricia Atkinson, Oxford, with ever so many IT technicalities; my wife, Marie Claude, whose patient love is not probably, but absolutely true.
Provenance
Invited contribution from the James Lind Library.a
Supplementary file
The reference listed below is chosen as essential to the reading of the article. However, the full list of primary and secondary references is available online both on the Journals’s website as supplementary material, and with the original publication at
. Except when otherwise mentioned, translations into English are author’s own.
