Abstract

Introduction
Arising as a concept for differences in astronomers’ observations in early 19th-century Europe, the ‘personal equation’ is a crucial piece of the pre-history of what would later be technically termed ‘observer bias.’ The term ‘personal equation’ spread into a variety of fields, including medicine, where it was used widely and variously from the late 19th century to the middle of the 20th century. 1 We have elsewhere described the complexities of the use of the term in Anglo-American medicine between the mid-19th and mid-20th centuries, which reflected evolving concerns over the perceived art and science of medicine. 2
A principal use of the term ‘personal equation’ reflected concern about observer bias. It thus serves as a useful marker for examining the variety of methods invoked to reduce or remove bias and so promote fair assessments. Medical professionals adopted the ‘personal equation’ term to denote such bias in many types of observations and in many different facets of medicine. These included assessments of symptoms and physical examinations, laboratory data, emerging technologies (such as X-ray), diagnosis and classification of diseases, and estimates of therapeutic effects.
The sources of observer bias associated with the ‘personal equation’ were manifold, as John Shaw Billings suggested in 1886: Almost all men suppose they think scientifically upon all subjects; but, as a matter of fact, the number of persons who are so free from personal equation due to heredity, to early associations, to emotions of various kinds, or to temporary disorder of the digestive or nervous machinery that their mental vision is at all time achromatic and not astigmatic, is very small indeed. (Billings,
3
p. 561)
Observers – Numbers and arrangement
Controlling the number and arrangement of observers was an oft-proposed method of limiting the personal equation, though authors differed about how this could be done. Some argued in favour of limiting observations to those of a single observer. While this may be counter-intuitive to 21st-century readers, many authors claimed that having multiple observers risked mixing multiple ‘personal equations’, which could likewise mix the impacts of variation on observations and thereby make it difficult to extract meaningful knowledge. In a study of the Wassermann test in a maternity hospital, for example, one author tried to reassure his readers of the integrity of his data by stating that ‘eighty-seven per cent of the laboratory work was performed by the same technician, thus largely eliminating the personal equation’ (Belding and Adams, 4 p. 816).
By contrast, others argued in favour of using multiple observers to limit the impact of individual personal equations. This could take the form of observers of equal skill or status crosschecking their observations and then reaching a consensus or deferring to an authoritative observer. One group of authors, for example, claimed in their study of diphtheria that ‘the personal equation has been eliminated by three persons making the examinations with checking of results’ (Geiger et al., 5 p. 645).
Another researcher, who had examined an association between the differences in blood pressure readings between different arms and aortic aneurysm, tried to eliminate ‘as far as possible’ his ‘personal equation’ through cross-checking his diagnoses of aneurysm with assessments made by other clinicians (Williamson, 6 p. 1516). Other researchers used a more hierarchical approach, as when one author sought to bolster his results by stating that ‘in order to remove the personal equation, Dr. P. Challis Bartlett, who for three years was superintendent of the Turland State Sanatorium, has kindly gone over the records’ (Pratt, 7 p. 15).
A variation on this theme entailed comparing or combining results gathered independently by different observers. In a study of body posture and body mechanics among first-year students at Harvard, for example, Lloyd Brown noted that, among physicians placing students into one of four graded categories, ‘the grading … was remarkably uniform and, while there was undoubtedly individual variation, the factor of personal equation seems to have been very slight’ (Brown,
8
p. 653). Such an approach could extend to a hope that individual variation would be diluted by still more observers. At the end of the 19th century, this ethos underpinned efforts at large-scale, medical society-driven ‘collective investigations’.
9
Along these lines, one contributor had addressed the Colorado State Medical Society in 1889 about collective investigations of the effects of climate on tuberculosis: To relieve [the investigations] from the element of the personal equation which an individual’s writing must always bear, this Society voted last year to entrust a consideration of this question to a ‘Committee of Collective Investigation’, which should have power to solicit reports from individual members of this Society. (Fisk,
10
p. 173)
Standardisation and emerging technologies
Many medical authors claimed that standardising methods of data acquisition could reduce the effects of personal equations. Such standardisation, reflecting 19th-century aspirations towards a ‘mechanical objectivity’, 11 could cover the sequence and timing of laboratory steps, classification schemes, and procedural rules. Thus, while discussing leucocytosis as an indicator of pneumonia, Richard Cabot noted that ‘in order that the influence of the personal equation might be as nearly as possible the same in all cases, an exactly identical technique [of drawing and preparing the blood and enumerating the cells] was used in all’ (Cabot, 12 p. 117). To support the rigor of standardisation, authors could also hold that training and experience in particular methods further limited the effects of the personal equation (Anon, 13 p. 79).
The advent of new technologies was frequently championed as means to check the personal equation. In an 1881 address, Billings referred to this hope for medical devices when he stated that: the balance and the galvanometer, the microscope and the pendulum, the camera, the sphygmograph and the thermometer are some of the means by which investigators, at the bedside and in the laboratory, are seeking to obtain records which shall be independent of their own sensations or personal equations; which shall be taken and used as expressing not opinions, but facts. (Billings,
14
p. 270)
Nevertheless, many also recognised that interpretation of the outputs of medical devices, ranging from sphygmomanometers to X-rays to electrocardiograms, were not immune to the influence of the personal equation. As late as 1947, a JAMA editorialist commenting on inter-individual and intra-individual variation in the reading of chest X-rays continued to point to the importance of the ‘“personal equation” in the interpretation of a chest roentgenogram.’ In line with the implementation of blinded chest X-ray assessments in the MRC trial of streptomycin at the same time, 20 he warned that ‘there has been a tendency to assume that roentgenology is an exact science and that the objectivity of the medium defied error. Complacency has been a consequence of such assumption’ (Anon, 21 pp. 399–400).
Blinding
Seemingly independent of one another yet each invoking the personal equation, several authors on both sides of the Atlantic turned to a range of methods that would later come to be termed ‘blinding’ (sometimes ‘masking’). In attempting to offset suggestion and bias, they carried forward variants of a methodology that had been periodically invoked for centuries.22,23 Some researchers invoking the ‘personal equation’ blinded themselves to patient identifiers or conditions. In 1911, for example, authors seeking to assess the different forms of leukocytes in pulmonary tuberculosis attempted ‘to eliminate the personal equation as much as possible’ by requiring that the ‘one who examined the blood knew nothing about the patients, or what they were getting, or how they were affected, or when they began or ended treatment’ (Solis-Cohen and Strickler, 24 pp. 564–565). Analogously, blinding was also proposed within medical education. In France, a new policy was implemented whereby ‘the examiner [would be made] ignorant of the identity of the examinee’ and thus limit the effects of the personal equation during grading (Anon, 25 p. 809).
Researchers used several measures in attempts to blind themselves to influences on the measurements they were making in real-time. Investigators examining the diurnal variation in the haemoglobin content of blood used a Duboscq colorimeter because: it leaves the observer in absolute ignorance of the numerical reading until he has finally matched the colour [to the comparison solution], and therefore eliminates the personal equation, a factor of the greatest importance where minute changes have to be ascertained. (Dreyer et al.,
26
p. 589)
Others would similarly blind themselves and their patients to the results of previous measurements. In a study assessing the frequency of diseases in different populations, the tabulator took ‘great pains … to avoid errors due to the personal equation,’ by remaining blinded to the project’s results until all of the data had been collected. It was thus impossible to form any estimate of how [the tabulated results] were coming out until the research was finished and the totals were added up. It was thus impossible for the observer to push or bend the figures in the direction of any theory of his own. (Cabot,
12
p. 117)
Researchers also used blinding methods to remove the personal equation from attempts to settle academic disputes. In an assessment of the accuracy of percussion of the heart as a measurement of the Nauheim (bath) treatment of heart disease, a critical author encouraged his reader to demonstrate to himself that the personal equation affected heart percussion, instructing him to ‘blindfold himself and make out upon a given case the upper limit of relative cardiac dulness [sic], marking it upon the surface of the chest with an aniline pencil’ and then repeat the process, upon which he would find that ‘the result is a series of lines at short distances from each other upon the chest, some of them intersecting others’ (Herschell, 30 pp. 413–414).
Authors also considered blinding patients and/or researchers to limit the personal equation in assessments of therapy. Invoking patient blinding, one researcher argued that ‘to properly test a drug or method of treatment it is well to give no intimation of the effects expected’ (Anon,
31
p. 86) because patients, with their own subjective personal equations, could be ‘very impressionable and amenable to suggestive therapeutics.’ Invoking researcher blinding, in 1913, Michigan’s AW Hewlett noted specifically with respect to therapeutic trials: The personal equations of different observers, the tendency to bias, differences in the modes of administration, in the doses employed, and in the cases selected for treatment, all tend to obscure the significance of reported results. In order to obtain trustworthy data, it is necessary that a considerable number of observations on patients should be made under considerations which eliminate personal bias and reduce to minimum the errors inherent in statistics. (Hewlett,
32
pp. 319–321)
Control groups and random allocation
As the Hewlett example suggests, in addition to blinding, certain authors suggested or employed methods that separated participants into control groups to limit the personal equation in the rendering of comparisons and assessments of causality or efficacy. In an evaluation of tuberculosis statistics, an author invoking ‘the statistical method’ advocated ‘isolating and recording control cases’ to ‘eliminate to some extent the ‘personal equation’ of the observer’ and so better characterise the course of the disease (Clark, 33 p. 1693). Control groups were also invoked in this sense to assess therapeutic effects. In a discussion of antistreptococcic serum, for example, one investigator critical of the current state of research on the topic and the degree to which the ‘personal equations’ of investigators had gone unchecked, argued that investigators should ‘compare long series of cases with and without the given treatment under otherwise like surroundings’ (Cotton, 34 p. 107). Another ‘personal equation’-invoking investigator employed control cases (not alternated, it seems) to assess the effectiveness of several vaccines against post-surgical sepsis (Goadby, 35 pp. 589–592).
Control groups could also be created by those referencing the personal equation through the systematic, prospective random or alternate allocation of patients to treatment and non-treatment groups. Investigators invoked the personal equation in the very first line of their report detailing the effects of ‘convalescent serum in the treatment of preparalytic poliomyelitis.’ They designed their study to limit the personal equation by treating alternate patients with the serum; however, because family physicians frequently demanded that serum be used, many more patients were treated than not (Fischer, 36 p. 482).
In a discussion of research about the effectiveness of out-patient medical care, another author held that the only way to answer the question scientifically and eliminate the personal equation was to ‘make a definite study of a number of individual patients selected at random’ (Davis, 37 p. 916). The Hewlett study cited above offset both the ‘personal equation’ as observer bias and the ‘personal equation’ as ‘cases selected for treatment’ and ‘as modes of administration’ of remedy, which is to say the variability of patients and their treatments. It offset observer bias by blinding clinician-evaluators as to which remedy they had in fact employed in each case, while the variability of the patients and their treatments was offset by the random allocation of patients to various treatment groups, as each investigator was given five boxes with one of the two remedies being studied, and five with the other. As we have noted previously, 2 Hewlett’s 32 use of the term personal equation and the actions associated with it served as a bridge to 20th-century attempts to add blinding to random allocation as key features of fair comparisons in assessing the effects of treatment.
Conclusion
Methodologies to curtail observer bias and ensure fair comparisons are cornerstones of 21st century medicine. Therapeutic assessments rely upon random allocation to comparison groups and blinded outcome assessment. The 1948 British Medical Research Council’s trial of streptomycin 20 is frequently considered a watershed in medical research study design, but as several authors have previously noted, each of the methods the trial employed have histories of their own that predate the landmark study. 38 Attempts to limit the distorting effects of the personal equation are an important part of this rich history. Nevertheless, it would be a mistake to understand attempts to curtail the personal equation solely in a teleological fashion in which authors gradually anticipated the methods in the British Medical Research Council’s report and 20th-century medicine practices more broadly. Instead, attempts to limit the personal equation as observer bias were eclectic, both temporally and methodologically. In this way, responses to the personal equation reflect the United States and British medical communities being in flux across the late 19th and early 20th centuries, striving for scientific objectivity but still lacking a consensus about how to reach that goal.
Footnotes
Declarations
Acknowledgements
The authors thank Andrew Turner for his valuable research and insights during the construction of their prior paper on the history of the personal equation, and thank Sir Iain Chalmers for his ongoing and thoughtful feedback on the paper and the topic.
Provenance
Invited article from the James Lind Library.
