Unbiased Histological Examinations in Toxicological Experiments (or,the Informed Leading the Blinded Examination)

Abstract

Keywords

biostatistics histopathology statistical analysis toxicologic pathology

Introduction

The subject of unbiased examination of toxicological histological material has been a contentious topic. A major reason for this arises because pathologists are unwilling to examine histopathological material blind to treatment, and consequently cannot claim to have unbiased data. Probably the most direct expression of the views on this subject was given in Comment on Toxicology (vol. II, no. 2, 1988), in which the editors (R. W. Leader and B. W. Wagner) devoted the entire issue to invited position statements from knowledgeable individuals and groups who held widely differing views. Dodd (1988) stated the commonly held view of those with pathology training: that a blind examination is unwise for a wide variety of reasons. The Guidelines for Best Practice have favored this position (Crissman et al. 2004; Ward et al. 1995) without ruling out blind or masked examination. Temple et al. (1988) held the opposite position that without a blind examination, any results were potentially biased.

This article leads to the conclusion that this is not an either-or situation. Neither a blind nor an examination informed of treatment (informed or unblind) is absolutely better; they are complimentary. Both have different and important roles as parts of a full examination of experimental histological material. This article attempts a Hegelian synthesis, starting with these two extreme antitheses:

only an informed examination is valid

vs.

only the results of a blind examination are unbiased and so meaningful.

This synthesis completely removes the defects perceived in a purely blind examination, produces data free of observer bias, and has the additional advantage that any results will be amenable to formal probabilistic analysis. This conforms with the previously stated view that toxicological histopathology investigations should be reported to experimental norms rather than diagnostic practice (Holland 2010).

It would be sensible that issues raised here should be debated without any changes to the current regulatory regime. These aspects of methodology are anything but obvious, and time is needed to work out fully the implications of any possible changes. If a consensus emerges at variance with current regulatory practice, then the regulatory implications can be addressed as a separate issue.

Unfortunately, the terminology is confounded, with different groups using terms with specialist definitions. In everyday English, “bias” refers to a prejudice or a one-sided inclination of mind. To remove this pejorative overtone, one has to qualify bias as “subconscious bias” or “justifiable bias.” Experimentally, “bias” simply means systematic deviation from the true value. A balance that is not tared will give biased measurements because it is not measuring from true zero. It has no implication that the balance has prejudicial motives. Measurements, statistics, and even whole tests and experimental designs can all be biased. If a study has plausible bias in it (and it is not necessary to prove bias or even the direction of any bias), then the validity of the results could be seriously to fatally undermined. This latter, much broader sense is the one used here. The subject of bias in toxicology has been addressed in general terms by Wansdall, Hansson, and Ruden (2007). This article concentrates on countering potential bias in toxicological histopathology experimental work.

Statement of the Problem

It is good experimental practice to have data that is demonstrably independent of the opinions, views, or prejudices of the individuals who gather it—unbiased data (using either definition of bias). The standard experimental method of gathering unbiased subjective data is to have those gathering the data “blind” (synonym, “masked”) as to the identity and treatment of the material that they are assessing. They have no knowledge of the treatment, so they cannot be influenced by any even unconscious bias.

There exists one pair of peer-reviewed articles of toxicologic histopathologic experimental results that unequivocally shows that knowledge of treatment group very substantially biases the results. The compelling example of bias comes from two readings of the same set of slides from the National Cancer Institute (NCI) carcinogenicity studies of Malathione and Malaoxon. They were published in the journal Environmental Research (Huff et al. 1985; Reuber 1985). One group explicitly stated that they read slides blind to treatment (Huff et al. 1985); the other individual makes no claim to blind reading. The differences are worrying to anybody concerned about the reliability of toxicological pathology methodology. The blind reading found no tumors to be treatment-related. The study read with knowledge of treatment found 23 different tumor types to be treatment-related, many without a no observed effect level. With non-tumor findings the reverse could be the case, the blind reading showing a substantial treatment effect that the treatment-informed examination failed to detect so clearly. In this instance, knowledge of treatment had clearly biased results in favor of showing a tumor effect and hiding any non-tumor toxic effects compared with a blind reading. Table 1 gives an indication of the sort of differences in the data from a single organ.

Table 1.

Findings from the same set of slides of a Fischer 344 Rat Malathion/Malaoxon carcinogenicity study, n = c. 50.

	Method	Control group	Low dose	High dose
Forestomach ulcers	Blind	2	8	17
	Informed	2	7	9
Forestomach benign tumors	Blind	0	1	0
	Informed	2	14	29

The monographs of the original NCI staff reading of the slides from the study generally agrees with Huff et al. (1985) (NCI 1979a, 1979b). It is not possible to discern what differences in diagnostic criteria existed between the two groups from the articles. However, whatever the differences in criteria, it is difficult to see any two sets of criteria being applied consistently, where one finds no trends and the other commonly finds a treatment-related increasing trend being a chance event. Bias—systematic variation from the true values—in one (or possibly both) sets of data must be present.

To define terms more precisely: “blind” is often the shortened form of “blind to treatment.” Simple “blind to treatment” is inadequate in toxicological histological experiments. If a pathologist knows that animals or their tissues are in a group together but not what treatment that group received, then changes in one animal in the group may betray that this is the treated group, so the pathologist will be potentially biased in any assessment of the other animals in the group. Even “blind to group”—so the examination is conducted without any group information—is inadequate, because changes in one organ may betray a treatment effect in that animal that potentially affects the assessment in all the other organs of that animal. So to achieve a genuinely blind histological examination, it has to be “blind to animal,” in which the organs are examined individually in complete isolation to any other findings in that animal from anatomic, clinical, chemical, necropsy, or any other technique.

So to achieve a blind examination, pathologists are required to perform their examination tissue by tissue, blind to the animal, blind to the group, blind to the treatment, blind to any other information. How are they to detect treatment effects?

One recourse open to them is to take their personal assessment of normal, and record their “blind to animal” findings against that personal standard. However, the individual’s perceptions of normal are a potent source of bias in themselves. If the perception of normal is too wide (so the boundaries of normal variation are inappropriately set), then small but real treatment changes will be systematically underreported, so the data will be biased. Simple blind examination swaps bias due to knowledge of treatment for bias due to different perceptions of normal.

It may be a surprise to find that a genuine unprejudiced blind examination can still be irredeemably biased in other respects. Consider an examination by someone totally ignorant of pathology, who systematically fails to note true findings that exist and consistently records as pathological findings normal, but variable, features. This produces data that systematically varies from the true in both directions, clearly biased data. Potential bias is an extremely subtle source of data distortion that calls for a detailed understanding of the practicalities of data collection.

The other possible approach is to record everything that might conceivably change with treatment against ordinal grades that include normal variation. This creates insuperable practical problems: the total number of findings from a study would be enormous (several hundreds or even thousands for each tissue), diagnostic drift would be difficult to control, and combinations of findings would need to be considered so a combinatorial explosion of possible treatment related effects would occur. Without huge groups of animals, it would be impossible to sort out the enormous number of chance associations to treatment from any real toxic effects. A more detailed examination of the intractable problems with a “record it all” approach is in Holland (2010).

Furthermore, this blind to animal approach is at complete variance to fundamental principles instilled in the training that pathologists receive. There, the emphasis is on the integration of findings within organ systems (e.g., within the gastrointestinal tract), across organ systems (e.g., the mineralizing effects of renal failure), with gross findings, organ weights, clinical findings, hematology, and clinical chemistry—to limit the list to only the common sources of correlation. It is poor practice to fail to demonstrate the integrity of the data by neglecting to explicitly correlate gross and microscopic findings. Understandably, pathologists have always been extremely reluctant to perform what their training has taught them to be negligent examinations.

Combining the need to assess treatment effects without bias, and the need to get analyzable data, it is clear that simple direct “blind to animal” examination is an impractical approach. Asking pathologists to assess the material from a study, “blind to treatment,” group and animal introduces a different potent source of bias or creates a huge data set that defies analysis. A more nuanced approach is required.

The Synthesis

The synthesis from the two antitheses comes from noting that the histological examination falls naturally into two phases. In the first part, the requirement is to find any putative treatment effects. In the second part, the emphasis is on clearly demonstrating if the possible treatment effects found in the first part of the examination are significantly associated with treatment or otherwise. The methods by which a pathologist identifies a putative novel toxic effect can be really very different from the methods by which he or she subsequently shows convincingly that it is a real, unbiased, experimentally demonstrated result—and both parts of the examination can be simultaneously experimentally valid.

So the initial part of the histological examination fulfills the function of “go-look-see,” to find if there is anything in the histological material that might be treatment related. This can be done in full knowledge of the animal and its treatment, the findings in other organs, the gross findings, organ weights, clinical chemistry, hematology, the pharmacology of the test substance, results of earlier studies in the same or different species, or any and all other information that might enhance the possibility of detecting a putative effect and hinder missing a treatment effect. The controls can be explicitly used as a standard against which the treated are knowingly compared and measured. It is very similar to the process in which pathologists are trained in their early induction into the art of diagnosis.

The result of this initial examination is a limited list of specified lesions in specified organs for which there is a prima facie case for treatment-relatedness.

The second part of the examination takes place completely blind to animal, and each organ is treated as a completely separate entity, in which only the change putatively associated to treatment that was identified in the initial examination is assessed. The data collected on this second examination are collected in such form that a rational analysis of them can be made. Interestingly, this is actually the current common practice (Holland 1996, 2001, 2005, Holland and Holland 2011).

It is an easy task to include both the “go-look-see” method and its data and also the specific blind examination method, data and analyses, in the study report. Then there is a comprehensive experimental report of the methods, general findings, unbiased data, formal analyses of the unbiased data, and the results.

This division of the data processes into two parts has an exact parallel in statistical data analysis. In statistics, the process of reaching results is also commonly divided into two phases. Initially, exploratory data analysis handles the data in graphical and summary form so that hypotheses about it can be generated (originating from Tukey 1977). In the second part, confirmatory data analysis and mathematically proven tests are rigorously applied to tightly defined hypotheses (classic inferential statistics).

Unresolved Problems

A problem arises if two unrelated changes occur in one organ (say, cholangitis and centrilobular hepatocellular hypertrophy in the liver). Because the bile tract cannot be examined in isolation from the liver parenchyma, it is impossible to do one lesion’s examination unbiased by the change in the other. So setting an unbiased no effect level for cholangitis may be impossible, even if it exists, when the hypertrophy affects all treated groups. A cursory examination of any material will immediately show which is hypertrophied (i.e., treated) and which is control—so unbiased bile duct data may be impossible to gather. Pharmacological effects commonly frustrate a blind examination of toxic lesions.

Some treatments are automatically self-disclosing. When ink jet printers were being first developed, a wide range of novel pigments were tested for toxicity. Some dyes were colorful, toxicologically bland, and systemically well absorbed. A casual glance at the live animals, their fresh or fixed tissues, or the slides from them would disclose which were the control animals—they were the only ones not some garish hue. Blind examination of toxic lesions is clearly not easy under these circumstances.

Simple practical problems are more amenable to experimental resolution. Can the pathologist who does the initial identification of putative lesions also do the blind examination without introducing unacceptable recall bias? If so, does she or he have to leave an interval so that she or he can no longer identify individual sections as coming from known animals? Or should a pathologist completely naive to the slides do the blind examinations (for instance, the peer-reviewing pathologist)?

A further large but separate problem is that this particular two-part method of avoiding bias is inapplicable to both the gross data, generated at necropsy, and the procedures during the preparation of the slides. Because slides are permanent, they can be reexamined blind. A necropsy cannot be run twice; the findings must be made promptly so the tissues can be fixed well. If the study material is put onto a staining machine in group sequence, then the gradual changes in staining strength, as the stain and washes are exhausted (or sudden changes when the reagents are renewed), is a potential cause of bias. However, there are different approaches that can be applied to generate unbiased histological material and data at necropsy (Holland and Holland 2011).

The role of the peer reviewer is not defined in this scheme. In scientific journals and regulatory reports, the peer-review process is completely hidden from the reader. In the process above, the reviewer could do the blind examination of any putative treatment effects without any possibility of recall bias, because he or she need not have seen the slides before doing the blind examination. This would also have the desirable effect that the study pathologist and peer reviewer could never disagree over any positive result—the study pathologist found the lesion in the first place, and the reviewer provided the unbiased evidence for its relationship to treatment.

Conclusion

Both informed and blind examinations of experimental histological material are valid methods. Each achieves a different end. To find possible treatment effects, an informed examination is appropriate. To generate data free from bias that can be meaningfully tested by firmly framed hypotheses with mathematically rigorous tests, a blind examination is required.

Footnotes

Acknowledgments

This work is drawn from a Fellowship Thesis of the Royal College of Veterinary Surgeons (). Tom Holland owes a great debt to his two supervisors, Mr. Peter Lee and Dr. John Glaister. We gratefully thank Prof. John Foster for his encouragement and help with the article.

The author(s) declared no potential conflicts of interest with respect to the authorship and/or publication of this article. The author(s) received no financial support for the research and/or authorship of this article

References

Crissman

J. W.

Goodman

D. G.

Hildebrandt

P. K.

Maronpot

R. R.

Prater

D. A.

Riley

J. H.

Seaman

W. J.

Thake

D. C.

(2004). Best practice guidelines: toxicologic histopathology. Tox Path 32, 126–31.

Dodd

D. C.

(1988). Blind slide reading or the uninformed versus the informed pathologist. Comment on Toxicol 2, 81–91.

Holland

(1996). An investigation of discriminant methods used in the pathological examination of rodent toxicological studies. MSc thesis, Sheffield Hallam University Library.

Holland

(2001). A survey of the discriminant methods used in toxicological histopathology. Tox Path 29, 269–73.

Holland

(2005). The comparative power of the discriminant methods used in toxicological pathology. Tox Path 33, 490–4.

Holland

(2010). A study of methods used in toxicological pathology. Fellowship thesis of the Royal College of Veterinary Surgeons, RCVS Trust Library.

Holland

(2011) Analysis of Unbiased Histopathology Data from Rodent Toxicity Studies (or ‘Are these groups different enough to ascribe it to treatment’) Tox Path 39 (4) in press.

Huff

J. E.

Bates

Eustic

S. L.

Haseman

J. K.

McConnell

E. E.

(1985). Malathion and Malaoxon: histopathological re-examination of the National Cancer Institute carcinogenicity studies. Env Res 37, 154–74.

National Cancer Institute (1979a). Bioassay of Malaoxon for Possible Carcinogenicity. CAS No. 1634-78-2, Technical Report Series, No. 135. Natl. Cancer Institute, Bethesda, MD.

10.

National Cancer Institute (1979b). Bioassay of Malathione for Possible Carcinogenicity, CAS No. 121-75-5, Technical Report Series, No. 192. Natl. Cancer Institute, Bethesda, MD.

11.

Reuber

J. A.

(1985). Carcinogenicity and toxicity of Malathion and Malaoxon. Env Res 37, 119–53.

12.

Temple

Fairweather

W. R.

Glocklin

V. C.

O’Neill

R. T.

(1988). The case for blinded slide reading. Comment on Toxicol 2, 99–109.

13.

Tukey

J. W.

(1977). Exploratory Data Analysis. Addison-Wesley, Reading, MA.

14.

Wansdall

Hansson

S. O.

Ruden

(2007). Bias in toxicology. Arch Toxicol 81, 605–17.

15.

Ward

J. M.

Hardisty

J. F.

Hailey

J. R.

Streett

C. S.

(1995). Peer review in toxicologic pathology. Tox Path 23, 226–34.