Abstract

Despite the number of years that toxicologic pathology has existed as a functional scientific discipline, there has been surprisingly insufficient guidance provided for the assessment of histopathology data in situations where the results are quantitatively equivocal. That is why I read with great interest the recent invited review article in Toxicologic Pathology titled “Analysis of Unbiased Histopathology Data from Rodent Toxicity Studies (or, Are These Groups Different Enough to Ascribe to Treatment?)” by Holland and Holland (2011). This well-conceived and well-written treatise does an admirable job of describing and weighing various methodologies for determining whether differences between control and compound-treated groups are related to compound administration. Although I concur with many of the authors’ points, I feel it is necessary to offer a different perspective, by reframing the discussion in a context that I believe is more consistent with the contemporary practice of toxicologic pathology. As a nonstatistician, I proceed on the assumption that the authors’ statistical arguments are accurate and do not require further discussion.
Holland and Holland present advantages and limitations of several methods that can be used to assess the relative severity of histopathologic changes in toxicologic bioassays. These include the “Ordering Method,” in which lesions are ranked by severity; the “Score Method,” in which lesions are graded into ordinal classes (e.g., minimal, mild, moderate, marked); and three other approaches that the authors have termed the “Affected,” “Pair-contrast,” and “Outside-control” methods. In the discussion section of the article, the authors contend that the Ordering Method “has overwhelming advantages compared to all other methods,” including the Score Method that has traditionally been used in toxicologic pathology for lesion severity assessment. The implication that the Ordering (ranking) Method is somehow superior to the Score Method for this type of assessment is not novel, but it is a bit misguided in that it suggests that these are competing methodologies, when in actual practice they are complementary, as I hope to demonstrate.
For most of the toxicologic pathologists I know, the histopathologic evaluation of a toxicologic bioassay is typically conducted as a two-stage process, the components of which could be termed the “identification stage” and the “confirmation stage.” The identification stage concerns the process by which potential treatment-related findings are initially discovered, whereas the confirmation stage is used to ensure the validity of treatment-related associations. In accordance with best practice guidelines (Crissman et al. 2004), the identification stage is traditionally performed as an unblinded examination, to minimize the chance that subtle, treatment-related, findings will be missed. Holland and Holland briefly acknowledge the existence of the identification stage in a list of assumptions that they provide as caveats for their discussions. Specifically, assumption #3 from this list states that “the feature of interest is known before the unbiased examination takes place, so there is no need to find the difference between groups during the examination. This requires a prior examination of the material to identify what is putatively related to treatment and what is not.” It should be emphasized, however, that the identification stage is not a trivial phase of the evaluation, because it represents the juncture at which treatment-related findings are either detected or are missed entirely. The identification stage is the phase that comes to mind when a pathologist talks about “reading” a study. Procedures used for severity grading during the identification stage are often prescribed in a laboratory’s standard operating procedures (SOPs) and are characteristically described in the methods sections of pathology reports. Conversely, the confirmation stage is typically not addressed in SOPs or in pathology reports, and it is usually conducted instead, as an informal “check” by the pathologist to guard against the possibility of reporting false positive (or false negative) results.
Although Holland and Holland do not state this explicitly, in reality, their article addresses only the confirmation stage of toxicologic histopathology assessments. It should be recognized, however, that a confirmation stage may not be required for every evaluation. Examples include studies in which there are no findings that are potentially treatment-related (negative studies), or studies in which the evidence of treatment-relatedness is overwhelmingly obvious (e.g., when a particular finding occurs at a high frequency in treated animals exclusively). Conversely, confirmation steps are necessary, and should be standard procedure, when the treatment-relatedness of a finding is uncertain, when criteria used for severity grading are especially nuanced, or as the authors indicate, when it is desirable to determine a no effect level (NOEL) at the lower, less conclusive, end of the findings spectrum. Holland and Holland are concerned that the confirmation stage of the evaluation be conducted in the most objective and unbiased manner that is practicable, and that particular concern seems entirely reasonable and appropriate.
The authors accurately state that the Score Method is by far the most popular approach for assessing the severity of morphologic changes in histopathologic evaluations. What the authors fail to make clear is that the Score Method is the most popular approach used during the identification stage of slide assessment, whereas other methods (such as the Ordering Method) are commonly used during the confirmation stage. The popularity of the Score Method for lesion severity grading during the identification stage is largely deserved, because it is a highly efficient procedure, and the results that are generated by this method can be readily understood by pathologists and nonpathologists alike. Additional advantages of the Score Method that were not mentioned by Holland and Holland include the ability to easily summarize the data in table form, and the ability to communicate impressions of absolute and/or relative lesion severity through the use of terms such as minimal, mild, moderate, marked, and severe. The Score Method is not only entrenched in the psyche of toxicologic pathologists, it is the method to which many nonpathologists and regulatory agency officials have long become accustomed.
By contrast, the Ordering Method is an excessively inefficient and cumbersome approach for the identification stage of slide examination, because only a single type of morphologic finding can be assessed at a time. Since most tissues can have multiple simultaneous observations (e.g., hepatocellular hypertrophy, hepatocellular vacuolation, hepatocellular necrosis, bile pigment accumulation, periportal fibrosis, etc.), the ranking of numerous, independent findings during the initial slide evaluation is a prohibitively time-consuming process. On the other hand, the Ordering Method appears to be an excellent approach for verifying treatment-relatedness during the confirmation stage, as long as the treatment group sizes are not excessively large. Slide ranking is feasible at this secondary stage of the evaluation because the number of potential treatment-related findings in a study is usually far fewer than the number of initial histopathologic observations. For some types of findings, for example those evident at low magnification, use of the Ordering Method may be further facilitated by comparing photomicrographic digital images of scanned slides as opposed to the actual glass slides.
In reality, pathologists often use the Ordering Method, Affected Method, or the Outside-control Method (although they may never have heard of these terms) as part of an informal slide review process performed during the confirmation stage. A typical tactic is for the pathologist to turn the glass slides over, mix them thoroughly, and then either rank the slides as in the Ordering Method, or bin them into categories (such as positive, negative, or +/–) as in the Affected or Outside-control Methods. In order to minimize evaluator bias, the slide labels are masked (blinded) during this phase of the evaluation so that the treatment group status of individual animals is unknown. The results of this secondary evaluation are not usually subjected to statistical testing, as advocated by Holland and Holland, although they certainly could be if the treatment-relatedness of findings remain uncertain. The choice of which method to use during the confirmation stage (e.g., Ordering Method, Affected Method, or Outside-control Method) is often dictated intuitively by the type of morphologic finding to be tested. For example, findings that tend to be continuously variable, such as hepatocellular hypertrophy, seem to be best handled using the Ordering Method, the elegance and power of which stems from the fact that it combines lesion prevalence (incidence) and severity into a single assessment. Alternatively, findings that are more categorical, such as periportal fibrosis (which is usually not present to a substantial degree in the livers of control animals), may be handled more efficiently by using the Affected or Outside-control Methods, which are primarily based on lesion prevalence.
Although distinctions between the procedures advocated by Holland and Holland and the current practice of toxicologic pathology may seem more definitional than substantive, it is not my intent to quibble over semantics. Instead, my primary concern is that less-than-fully-informed persons or agencies will seize upon the Holland and Holland article as evidence that procedures currently used by most toxicologic pathologists for severity scoring are intrinsically inappropriate or inadequate and, accordingly, that a major paradigm shift is warranted. For reasons previously stated, I do not believe that such conclusions are justified. In effect, pathologists who use the Score Method for the identification stage of slide examination and the Ordering Method for the confirmation stage are already following the course recommended by Holland and Holland. In order to make the slide evaluation process more transparent, however, it might be helpful if pathologists formally acknowledged procedures that they use to confirm treatment-related findings by documenting such steps in the methods sections of their pathology reports.
Holland and Holland have provided invaluable information on the relative merits of various methods that can be used for substantiating, or refuting, treatment-associated effects that are equivocal. The only issue that I have with their article concerns the portrayal of the Score Method and the Ordering Method as competing, rather than complementary, approaches, and the implicit suggestion that current procedures used for lesion severity scoring are necessarily deficient. The most productive application of the information imparted by Holland and Holland may be to remind new and established pathologists that unbiased, statistically verifiable methods are available for confirming treatment-related findings, and to encourage the standard use of such techniques in the interpretation of histopathology data.
