Regulatory Forum Opinion Piece: Blind Reading of Histopathology Slides in General Toxicology Studies*

Abstract

With the intention of reducing bias, a recent European Food Safety Authority draft guidance document included a recommendation for blinded evaluation of histopathology slides in general toxicology studies (EFSA 2011). Although blinding as to treatment status reduces bias in many types of scientific experiment and is sometimes also appropriate in toxicologic pathology (Holland and Holland 2011), it is most unlikely to help achieve the overall goal of improved human safety when used for routine histopathology evaluation of tissues in general toxicology studies. This is the case because (1) blinding is not applicable to the inductive reasoning process used to identify test article effects in the tissues and would dramatically reduce the chances of these being successfully identified; and (2) in any case, the bias that would be reduced by blinding is actually a bias favoring diagnosis of a toxicological hazard and a conservative safety evaluation, which is appropriate in this context. Other unintended consequences of blinding histopathology evaluation include reductions in sensitivity for a variety of additional reasons and increased subjectivity of the pathology data.

Keywords

histopathology preclinical safety assessment/risk management regulatory affairs risk assessment toxicologic pathology.

Key Points

Blinding is not applicable to a scientific investigation for which the potential outcomes are not defined in advance and there is no specific hypothesis to test. Since in toxicology studies the task of the pathologist is to identify toxic end points from an unlimited number of possibilities, removing the pathologist’s ability to identify control and treated animals will hamper identification of all but the most florid effects of treatment.

For many findings, there is no discrete cutoff between affected and unaffected animals, so it is not possible to state whether a toxicology finding is present or absent without concurrent reference to the study controls; in blinded studies, only changes considered to be clearly outside of any normal range could be recorded, reducing the sensitivity of the study.

The subjectivity of pathology data would be increased, since without knowledge of the identity of the study controls, diagnosis would depend on the personal opinion of the pathologist as to what is “normal” and what might potentially be a treatment-related change. Comparison with concurrent controls as a standard basis for initial diagnosis would be eliminated.

The type of bias that is corrected by blinding is the bias of an investigator toward demonstrating an expected outcome (in this case, a toxic effect of treatment). It is not clear how removing bias in favor of showing a safety hazard would benefit public safety.

Blinding is administratively burdensome and would increase the cost, time, and likelihood of mistakes when coding and decoding animal identities, potentially compromising the data and inflating the cost of drug development. It would also delay the detection of any adverse test article–related effects in longer-term studies such as carcinogenicity studies, in which histopathology is evaluated on an ongoing basis concurrently with human exposure.

Introduction

The recent EFSA draft guidance (2011) contains the following recommendation for the conduct of toxicology studies:

Staff should, where possible, be “blinded” to the experimental treatment. . . . This is particularly important if there is any subjective element to assessing experimental outcomes. For example, pathologists should be blind to the treatment group when assessing histological slides.

Similar suggestions recommending blind reading of histopathology (where the pathologist is unaware of the treatment status of the animals) have been made in various contexts, both recently and in the past. These suggestions appear to be based on theoretical considerations, since no evidence has been offered to suggest that the histopathology data from unblinded toxicology studies are inaccurate or otherwise inadequate for safety assessment. Since it has been some time since there has been any explanation in print of the reasons why the theoretical rationale for blinding investigators to treatment status is not applicable to histopathology in general toxicology studies, the following summary has been compiled. It reflects the personal opinions of the authors; we welcome counter-opinions as submissions to this Regulatory Forum section of the journal.

Blinding Is Not Applicable When There Is No Specific Hypothesis to Test

In a routine toxicology study, the pathologist is asked to generate hypotheses about the potential effects of the test article on the tissues examined from an unlimited number of possibilities. There may be many, one, or no effects at all; the potential histopathology end points are not identified in advance. This lack of predefined end points is inherently different from, say, a clinical efficacy trial, in which one or a few hypotheses have already been generated (e.g., this drug reduces blood pressure), and the end points to be measured have been identified. Because there is a specific question being asked, the latter example could be investigated using blinded data evaluation. In contrast, blinded data evaluation could not be used to generate the original hypothesis from among any number of possible effects of that particular molecule on humans or animals.

These examples illustrate the difference between deductive reasoning, in which a single hypothesis (e.g., a predetermined histopathology change being related to treatment) is tested, and inductive reasoning, or “Bayesean inference,” in which the evidence is gathered in order to generate one or more hypotheses (in this case, the tissues are examined to identify any consistent differences between treated and control animals that could represent effects of treatment). With inductive reasoning, the more information that the scientist has to formulate his hypothesis, the more accurate it is likely to be. Clearly, blinding should not be used in this context, because the key piece of information that the pathologist needs to form a hypothesis about test article effects is the treatment status of the animals. Once identified, of course, any one of the putative test article–related tissue changes could potentially be measured objectively and/or the data evaluated in a blinded fashion (Holland and Holland 2011). In summary, blinding is appropriate only where there is already a specific hypothesis to test. When the pathologist evaluates the tissues in a toxicology study, there is no specific hypothesis, so blinding serves no purpose and actually hampers the pathologist in his task, which is to generate hypotheses about potential toxic effects of the test article. Note that depending on the subtlety of the putative finding, the pathologist will frequently re-read slides in a blinded fashion in order to test any particular hypothesis that he formulates while evaluating the tissues. In this way, the pathologist both generates and tests hypotheses to produce conclusions as to toxic effects of the test article on the tissues. Hypothesis generation using blinded data is by definition almost impossible unless the effects are extremely obvious: consider a situation in which hypotheses about the potential efficacy of a molecule for treating any human disease whatsoever had to be generated using blinded information.

Blinding Results in Reduced Sensitivity for Any Findings Where There Is a Continuum in Morphology between Normal and Test Article–Related Effects

For most tissue features, there is no discrete distinction between normal and abnormal, but instead a continuous spectrum of possible morphologies between a normal animal and one that has a test article–related change. Some examples include changes in cell size, cell density, epithelial height, proportion of apototic cells, and presence of inflammatory cells. In addition, with new drug mechanisms constantly appearing, novel tissue changes that have not been encountered in either toxicology studies or natural disease states can occur. If the pathologist is unaware of the treatment status of the animals, there is a tendency for only those changes familiar to the pathologist and clearly outside the “normal range” to be considered for diagnosis, particularly if those potential treatment effects are subtle. Sensitivity would be drastically reduced by blinded evaluation because many toxicological findings, if not the majority, include at least some cases in which the change may be within the range encountered in untreated animals, and these findings would be lost in blinded evaluation. The likelihood is that blinded histopathological evaluation would ultimately result in identification of only those lesions severe enough to cause a disease state (as in diagnostic pathology). Subtle tissue changes that might precede a loss of function would potentially be lost.

This loss of sensitivity would also hamper identification of a dose response. Without knowledge of the study control range for a putative test article–related tissue change, the range of severity grades used will necessarily be narrower and not adapted to the range presented to the pathologist in that particular study. In addition, the potential for “diagnostic drift” is increased: a potential treatment-related change is identified, but the pathologist cannot reference the available control population to ensure that he or she is not becoming more or less sensitive as he or she becomes familiar with the change.

Blinding Increases Subjectivity, Since Comparison with Concurrent Controls as a Standard Basis for Initial Diagnosis Would be Eliminated.

If the pathologist is not aware of which animals have been treated and which have not, the best he or she can do is compare the tissue in front of him or her with a hypothetical range of “imaginary” controls based on the individual’s prior experience, which will obviously vary enormously among different pathologists. Thus, instead of comparing the tissues of treated animals with the study controls, the threshold for diagnosis is set by the individual operator and leads to a much greater subjectivity in diagnosis (in an area that is already more subjective than most). The broad objective of the study, which is to compare treated animals with concurrent controls that are identically managed, is not achieved when the pathologist is blinded as to the identity of the controls. Instead, all animals are compared with a standard known only to the pathologist. This issue is particularly significant for certain tissues such as lymphoid organs. The thymus is a good example, for which not only the exact age of the animals affects the appearance of the organ, but also stress caused by factors independent of the test article such as elaborate/frequent dosing procedures or a particular vehicle. Lack of an identified, study-specific control population leaves the pathologist with no benchmark for choosing a threshold for diagnosing changes in this organ, and hence determining whether the test article potentially has adverse effects on the immune system.

Is Removing Bias in Favor of Showing a Safety Hazard Really Desirable?

Blinding is generally done to prevent bias toward fulfilling the expectations of the operator based on knowledge of the treatment (Kaptchuk 2003). The pathologist’s expectation would be that treated animals in a toxicology study would show toxic changes, so if blinding reduces bias, by design it reduces the chances of identifying a safety hazard. This is something that should be carefully considered by regulators, because this “bias” potentially compensates for low sensitivity of toxicology studies in which animal numbers are limited. Of course, one benefit of blinding is that any bias toward overdiagnosing random biological variation as a test article–related finding would be reduced and might prevent some drugs from being dropped or having their development slowed by overconservative histopathology evaluation. However, the risk can easily be reduced by informal (or formal) blinding once a potential treatment-related finding is identified (Holland and Holland 2011)—something practiced by most if not all toxicologic pathologists in certain circumstances. Regardless, in most cases, toxicology programs consist of several successive studies of increasing duration, and any overdiagnosis in a particular study is corrected in subsequent studies in that species (in which the finding is not repeated).

Other Practical Issues

Practical issues with blinding of histopathology slides that need to be mentioned only briefly include the administrative burden and increased likelihood of errors inherent in coding and decoding animal identities in studies that involve many thousands of tissues from large numbers of animals, and the fact that very obvious toxic and/or pharmacological effects frequently alert the pathologist to the treatment status of the animals and thus undermine the original purpose of blinding as a means of reducing bias. Another practical issue is that without knowledge of the identity of the study control population, the pathologist is not continuously refreshed with each study as to the normal range, and so his reference point for designating tissues as “normal” is not constantly updated as it is currently with unblinded slide evaluation. Furthermore, in longer-term studies such as six-month, nine-month, and carcinogenicity studies, in which histopathology evaluation takes place over a period during which humans are concurrently exposed to the test article, any test article–related findings of concern would become apparent much later than in an unblinded study and the public would potentially be put at unnecessary risk.

Footnotes

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

The author(s) received no financial support for the research, authorship, and/or publication of this article.

References

EFSA Scientific Committee, 2011. Draft for public consultation - Scientific Opinion EFSA guidance on repeated-dose 90-day oral toxicity study on whole food/feed in rodents [45pp.] Available online www.efsa.europa.eu

Holland

. (2011). Analysis of unbiased histopathology data from rodent toxicology studies (or, are these groups different enough to ascribe it to treatment? Toxicol Pathol 39, 569–75.

Kaptchuk

T. J

. (2003). Effect of interpretive bias on research evidence. BMJ 326, 1453–55.