Abstract
Both, the Society of Toxicologic Pathology (STP) and the U.S. Food and Drug Administration (FDA) have released documents discussing histopathology methods in biomarker qualification studies. These documents appear to disagree on two critical and controversial aspects of methodology; blinding of pathologists and binning of data (Burkhardt et al. 2011; U.S. FDA 2011). Upon closer examination, however, both documents propose that blinded evaluation of biomarker studies is appropriate under similar strict criteria. However, they differ in their recommendations on the binning of data (i.e., individual binning of all changes vs. common binning of changes observed in control animals), seemingly based on different perceptions of study objectives and the role of the pathologist. This article offers a personal opinion on blinded evaluations and data binning in the context of biomarker qualification studies.
Keywords
Toxicologic Pathology recently published an article (Rouse et al. 2014; doi:10:1177/0192623314562072) based on research from the Food and Drug Administration’s (FDA) Division of Applied Regulatory Science laboratories at the Federal Research Center White Oak Campus in Silver Spring, Maryland. This research was intended to examine histopathology methods relative to biomarker assessment. We concluded that (1) interpathologist variability could be large enough to significantly impact biomarker assessment but that alignment measures, such as using well-understood and defined lesions, standard lexicons, and peer review, could mitigate this variability; (2) intrapathologist variability is relatively small and nonsignificant; (3) generally, study variability and outcome were not significantly impacted by whether evaluations were conducted blinded or open. Prior to designing these biomarker experiments, opinions were also sought from the academic, regulatory, and pharmaceutical pathology communities. These opinions were strongly held and sometimes opposing. Some of those consulted felt this avenue of investigation could strengthen pathology reviews while others believed the proposed work should not go forward because of the potential consequences for pathology practiced in a regulatory context. This first small effort does not strengthen or change current pathology practices; it simply provides preliminary quantitative information about the impact of pathologists and pathology methods on biomarker assessment. Although the FDA and Society of Toxicologic Pathology (STP) agree over the need for improved consistency in the pathology methods applied in biomarker qualification, the two groups differ in some of the essential details for achieving consistency in this context. The difference in approaches is reflected primarily in recommendations on the level of data available to the pathologist during evaluations (blinding status) and the handling of changes observed in control animals (data filtering). I would posit that these approaches are not as contradictory as they may appear and the differences that do exist arise from differing perspectives as to the “purpose” of the study, the role of the pathologist, and subsequently what is “fit for purpose” as a methodology. Just as “context of use” is a critical delimiter in FDA biomarker qualification (Woodcock et al. 2011), the intended use for pathology data is essential to defining the optimum, or fit for purpose, methodology.
In an attempt to avoid confusion in terminology, the following definitions were used in the current article. Blinded evaluation was used to refer to evaluations done without knowledge of exposure group (including control) or of any associated biomarker data. Evaluation with knowledge of species, strain, and gender without exposure and biomarker data was still considered blinded. Individual binning designated an individual counting category for each morphological change from normal histology and then counting incidence of each change in all animals regardless of exposure group. Common binning referred to the practice of placing morphological changes in control animals (background changes) in a single category typically designated zero or no change. This practice essentially sets the threshold for recording changes beyond those described in controls and for which other individual bins or categories are created.
A relatively unsophisticated view is that safety study pathology is intended to identify the impact of an agent on the existing status of an organism, whereas biomarker qualification pathology provides a basic link between tissue morphology and a biomarker. In each case, the objective of the work should inform the methods used. Safety study pathology has ample history and an able community of professionals who have carefully considered how to best identify toxic injury especially when unknown agents or agents with unknown effects are involved. Methods have been developed and discussed and best methods published (Crissman et al. 2004). In safety studies, pathologists typically make their initial observations with knowledge of exposure or dosing groups and sometimes with additional biomarker and/or other meta-data as well. This knowledge is used to enhance the sensitivity of the evaluation by distinguishing changes due to agent exposure from changes for other reasons (homeostatic, spontaneous, concurrent disease, etc.). Because the objective is to identify changes caused by an agent above and beyond that occurring for other reasons, filtering the data so that all changes observed in controls are combined into a single counting bin sets the threshold for identification of toxicity. In the context of toxicity, these methods are not only appropriate but perhaps optimal.
In contrast, studies designed specifically to assess biomarker performance on the path to formal qualification as a drug development tool are fewer, and the optimal methodology to accomplish this is only now being discussed and refined. However, the ultimate objective of a biomarker qualification study is not to accurately identify agent related changes but to link observed deviation from normal histology (including any spontaneous changes) to quantitative measures of a biomarker. Thus, in order to distinguish normal histology from altered histology, the pathologist requires “normal” animals to review. However, the identification of a deviation from normal morphology is the sole interpretation required (although rarely an obvious or simple interpretation). This approach relies upon the individual binning and complete accounting of all observed changes that are then followed by statistical assessment of the complete data set with the ultimate goal of correlating these observations (preferably, quantitative observations) to quantitative measures of a biomarker. This statistical analysis will be the basis of interpretation for the relevance (or irrelevance) of observed changes to biomarker response.
STP’s best practices for safety biomarker qualification studies (Burkhardt et al. 2011) are specifically for “safety” biomarkers, a critical fact in understanding the STP perspective. This document supports use of safety study best practices in biomarker qualification studies, including non-blinded evaluations and data filtering (common binning of changes in control animals as a threshold for interpretation). In contrast, FDA’s draft guidance on histopathology methods in biomarker qualification (U.S. FDA 2011) is not restricted to “safety” biomarkers but states a preference for blinded evaluations and complete reporting of all morphological changes in separate bins regardless of suspected source of change (toxic, spontaneous, disease, etc.). Interestingly, the STP document does indicate that blinded evaluations are appropriate if criteria have been well defined for specific end points and/or there is an attempt to provide quantitative pathology information as continuous data and then goes on to list the conditions that are required to support blinded evaluations. Close examination reveals that the FDA guidance criteria for biomarker qualification studies mirrors STP’s required conditions for appropriate use of blinded evaluation. Complete alignment of these criteria would resolve some conflict over methodology. By recommending that all biomarker qualification studies reasonably meet STP’s listed criteria (essentially, the current recommendation), FDA could assure STP agreement that blinded evaluations are appropriate and acceptable for these studies. As it stands, the difference between the organizations in regard to blinded evaluations appears to be less substance and more enthusiasm. STP believes blinded evaluation is appropriate within specific confines, whereas the FDA prefers blinded evaluations within those same confines. The documents of the STP and FDA as well as the findings of Rouse et al. (2014) collectively suggest that knowledge bias should not be a major impediment to consensus.
Data filtering or binning is a more salient issue representing differences in perception of the role of toxic agents and of pathologists in biomarker studies. STP best practices recommendations assume the following: (1) the qualification is for a safety biomarker; therefore, etiology of injury in biomarker qualification studies is relevant, thus, injury beyond control has meaning; and (2) an interpretive role for the pathologist similar to their role in safety studies, although justification for this role in biomarker qualification studies is not provided. The FDA guidance assumes the following: (1) the qualification is for a biomarker of change regardless of etiology (toxic, spontaneous, disease, etc.), and therefore, change beyond control is not relevant as long as there is detailed accounting of all changes; (2) the pathologist’s sole interpretation is whether there is a deviation from normal histology and quantifying the degree of that deviation using standard terminology. The STP document addresses only a subset of the potential biomarkers covered by the FDA guidance. Nevertheless, the more general FDA guidance should be consistent and applicable to the more specific case of safety biomarker qualification. Perhaps once again, the documents do not differ as much as may be perceived.
One factor impacted by the different approaches is the definition of the normal range. In the STP approach, the pathologist determines the normal range of morphology and then the biomarker values associated with that normal morphology define the biomarker’s normal range. Conversely, in the FDA approach, by individually binning all observed changes, the pathology assessment becomes more quantitative and continuous data. Data clustering and/or other statistical methods will determine the relevance of different changes to biomarker responses including the contribution (or lack of contribution) of spontaneous changes to biomarker quantification. These statistical methods will define the normal range. Both are valid approaches requiring the unique skills of the pathologist to identify the presence and severity of morphological change. However, one approach relies on the pathologist’s interpretation of the meaning of these changes. Our experience with pathologists, blinded evaluations, and more complex pathology data sets in the recently reported study suggests that both approaches are likely to yield very similar results. As novel biomarkers increase in sensitivity and complexity and our knowledge of interplay between normal and nonnormal expands, it is uncertain whether this equity will persist. However, the desire for increasingly quantitative data sets from pathology evaluations will grow as the ability to produce them is demonstrated.
Just as the study objective defines the approach in other aspects (molecular biology methods, bioanalytical methods, experimental design, etc.) of experimental research, so it should be with histopathology methods. Our preliminary data are consistent with this view and it is further supported by the extreme similarity of the two supposedly contradictory views of STP and FDA regarding biomarker qualification studies. Ultimately, histopathology methods should be “fit for purpose” and not uniform methods universally applied across studies with varying objectives. STP’s safety study best practices currently represent the best “fit for purpose” histopathology methodology available for safety studies. However, the FDA draft guidance for the use of histopathology in biomarker qualification studies describes the best “fit for purpose” histopathology methodology relative to biomarker evaluation. It is important to recognize that, within the concept of a well-defined biomarker qualification study, STP and FDA recommendations on blinded evaluations align fairly well, albeit reluctantly. Further, it is important to reemphasize that safety study pathology and biomarker qualification pathology have different objectives and represent two different “purposes” where the methods employed should be “fit for purpose.” Recognition of the different objectives for these pathology evaluations defines justifiable differences in filtering and binning methodology, each fit for its purpose. Further discussion is required around approaches to data filtering and binning for biomarker qualification purposes. For the present, pathologists and their clients need to clearly define objectives for pathology that are “fit for purpose” within biomarker studies and thereby describe the essential role of the pathologist in these studies.
Footnotes
Abbreviations
Acknowledgments
I wanted to publicly express my gratitude to the pathologists who volunteered their efforts to the project reported in
. To their credit, they generously applied their knowledge and opinions in spite of having no input to the experimental design including the histopathology methods applied. Although not agreeing with all of the conclusions and statements in the initial draft article, they supplied very valuable input to the article keeping it much more focused, factually consistent, and relevant to pathologists than it might otherwise have been. The reviewers from Toxicologic Pathology also did a phenomenal job in improving the reporting of this work through their very constructive and insightful comments and I want to express my appreciation to them as well.
Author Contribution
RR contributed to conception or design; data acquisition, analysis, or interpretation; drafting the article; and critically revising the article. RR gave final approval and agreed to be accountable for all aspects of work in ensuring that questions relating to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
The author(s) received no financial support for the research, authorship, and/or publication of this article.
*
This is an opinion article submitted to the Regulatory Forum and does not constitute an official position of the Society of Toxicologic Pathology or the journal Toxicologic Pathology. The Regulatory Forum is designed to stimulate broad discussion of topics relevant to regulatory issues in Toxicologic pathology. Readers of Toxicologic Pathology are encouraged to send their thoughts on these articles or ideas for new topics to
