Abstract
This article describes the Society of Toxicologic Pathology’s (STP) five recommended (“best”) practices for appropriate use of informed (non-blinded) versus masked (blinded) microscopic evaluation in animal toxicity studies intended for regulatory review. (1) Informed microscopic evaluation is the default approach for animal toxicity studies. (2) Masked microscopic evaluation has merit for confirming preliminary diagnoses for target organs and/or defining thresholds (“no observed adverse effect level” and similar values) identified during an initial informed evaluation, addressing focused hypotheses, or satisfying guidance or requests from regulatory agencies. (3) If used as the approach for an animal toxicity study to investigate a specific research question, masking of the initial microscopic evaluation should be limited to withholding only information about the group (control or test article-treated) and dose equivalents. (4) The decision regarding whether or not to perform a masked microscopic evaluation is best made by a toxicologic pathologist with relevant experience. (5) Pathology peer review, performed to verify the microscopic diagnoses and interpretations by the study pathologist, should use an informed evaluation approach. The STP maintains that implementing these five best practices has and will continue to consistently deliver robust microscopic data with high sensitivity for animal toxicity studies intended for regulatory review. Consequently, when conducting animal toxicity studies, the advantages of informed microscopic evaluation for maximizing sensitivity outweigh the perceived advantages of minimizing bias through masked microscopic examination.
Keywords
This recommended (“best”) practices paper is a product of a Society of Toxicologic Pathology (STP) Working Group commissioned by the Scientific and Regulatory Policy Committee (SRPC) of the STP. The recommendations have been reviewed and approved by the SRPC and Executive Committee of the STP as well as the entire STP membership. The recommendations also have been reviewed and endorsed by the American College of Veterinary Pathologists (ACVP), British Society of Toxicological Pathology (BSTP), European Society of Toxicologic Pathology (ESTP), Japanese Society of Toxicologic Pathology (JSTP), Société Française de Pathologie Toxicologique (SFPT), and Society of Toxicologic Pathology–India (STP–I). The opinions expressed in this paper solely represent those of the authors and should not be construed as official views or policies of the authors’ institutions, including the U.S. Food and Drug Administration (FDA).
Introduction
Many types of pathology data (e.g., organ weights, hematology, and clinical chemistry values) can be quantified using calibrated instruments and compared against control samples with known values. In contrast, the assignment of microscopic 1 (or histopathologic 2 ) diagnoses and their associated severity grades by pathologists, generated through examination of tissue sections physically mounted on glass slides or scanned to produce digital images (e.g., photomicrographs or whole slide images [WSIs] 3 ), constitute qualitative or semi-quantitative data generated by professional judgments regarding microscopically observed normal versus abnormal tissue features that are not subject to exact quantification. These expert judgments are founded in a comprehensive core knowledge base shared among all pathologists (i.e., a multiyear biomedical and/or comparative biology education) that is informed by an individual’s subsequent professional experience gained by mentored training (e.g., during a formal residency and/or while engaged in the practice of pathology) and regular continuing education.2,4 -7
In microscopic evaluations, the pathologist performs an initial (preliminary) microscopic assessment followed by further review as warranted to establish each final qualitative diagnosis and any associated semi-quantitative severity grade. This process of iterative (i.e., stepwise) diagnostic refinement is an essential element of microscopic evaluation for pathologists in all professional settings, including medical8 -10 or veterinary medical11,12 diagnostic pathology practice, experimental pathology research, 13 and safety assessment (toxicologic pathology).12,14
Informed professional judgment by a pathologist is indispensable in generating microscopic diagnoses. Such expert judgments inherently include the potential for bias—either conscious or unconscious—which is perceived by some as a confounding factor when attempting to generate high-quality histopathology data. Bias in different scientific settings may take various forms,15 -18 some of which are unavoidable in the course of a microscopic evaluation.15,19 -21 Consequently, the scientific community, including pathologists, has weighed the following two questions with respect to microscopic evaluations: (1) “How is bias minimized?” and (2) “How is sensitivity maximized?” The decision regarding which of these two questions is most relevant determines subsequent steps that are needed to control bias while ensuring the greatest possible quality and sensitivity of the final microscopic data set.
The optimal approach to ensure integrity of microscopic data generated during animal toxicity studies intended for regulatory review has been debated for decades.22 -28 Scientists from nonpathology disciplines often assert that completely masked (“blinded”) microscopic evaluation (i.e., where information is withheld from the pathologist until the microscopic assessment has been completed) is the best means for minimizing all bias.19,22,24,29 -34 This perspective is based on the assumption that any bias in generating data, including assigning microscopic diagnoses and severity grades, is undesirable. In contrast, most toxicologic pathologists advocate for an informed (“non-blinded”) initial microscopic evaluation (i.e., where the pathologist has full access to all of an animal’s treatment/exposure and dose information to provide maximal context in generating, refining, and interpreting diagnoses) of all tissues as the optimal practice for generating high-quality microscopic data during animal toxicity studies.23,27,28,35 -43 This view is founded on the assumption that properly employed, contextual knowledge improves diagnostic sensitivity when assessing product safety in animal studies. These two positions have been addressed historically through periodic papers advocating for only one or the other of these viewpoints. Hence, a clear need existed to develop recommended practices that sensitively detect test article–related findings while minimizing potential bias in the assessment of animal toxicity studies.
For this reason, the Society of Toxicologic Pathology (STP) directed its Scientific and Regulatory Policy Committee (SRPC) to assemble an international Working Group (comprised of members from Asia, Europe, and North America with 10-35 years of toxicologic pathology experience gained in multiple practice settings: academia; government; consulting; contract research organizations; industry [for agrochemical, biopharmaceutical, cell and gene therapy, and medical device products]; and regulatory agencies) to develop specific recommendations regarding optimal (“best”) practices for choosing, using, and communicating the appropriate approach for microscopic evaluation of animal toxicity studies intended for regulatory review. The Working Group’s charter had four specific objectives. The first objective was to explore the two sides of the debate. The second objective was to formulate “best practice” recommendations regarding when and how to employ informed (non-blinded) versus masked (blinded) microscopic assessments. The third objective was to define whether performance of a masked microscopic evaluation should be documented for animal toxicity studies, and when and how to do so. The final objective was to document current regulatory perspectives of informed versus masked approaches for generating microscopic data as well as possible regulatory concerns that might impact their acceptance in the future.
The recommendations and discussion points below were formulated based on the collective experiences of the Working Group members and a detailed survey of current industry practices. 43 Subsequently, extensive input from members of the STP, several other societies of pathology, and many scientists (including nonpathologists) from numerous academic, consulting, contract research, industrial, and regulatory institutions around the world was received during a 30-day-long public comment period in the fourth quarter of calendar year 2021. These perspectives were considered in establishing the final “best practice” recommendations reported here.
Definitions Relevant to the Informed Versus Masked Evaluation Discussion
Multiple terms are used in the scientific literature relevant to discussing informed versus masked microscopic evaluation. For these best practice recommendations, the Working Group has employed the following definitions for key terms (highlighted in this section) throughout this article.
A
A
A
As introduced above, two types of masked microscopic evaluation may be performed: formal and informal. A
An
Recommended (“Best”) Practices for Choosing Between Informed Versus Masked Microscopic Evaluation for Animal Toxicity Studies
The decision regarding which approach to employ for the initial microscopic evaluation (informed versus masked) is a critical consideration when designing an animal toxicity study. This section presents two recommended practices that help decide when to select an informed or masked approach by discussing key factors that influence the choice of one option over another.
Recommendation 1: Informed microscopic evaluation is the default approach for animal toxicity studies.
Informed microscopic evaluation has been the standard practice for safety animal toxicity studies intended for regulatory review for decades.23,27,28,35 -43 Routine animal toxicity studies are screening studies with the primary objective of characterizing the toxicity profile of a test article to support product development; they are not conducted to investigate a focused (e.g., mechanistic) hypothesis. Given the primary objective as a screening study, informed microscopic evaluation is the most appropriate means for generating data from routine animal toxicity studies because test article–related effects are detected with greater sensitivity when foreknowledge of the treatment and dose can be used to set diagnostic criteria and thresholds relative to spontaneous (incidental) findings in concurrent control animals.28,39,41,43
Informed microscopic evaluation to identify and characterize toxicity offers several advantages for toxicity screening that are unattainable when using a masked approach. First and foremost, informed microscopic evaluation discriminates test article–related findings (the “signal”) from incidental background changes (“noise”) with high sensitivity and specificity.39 -41,43,46 The pathologist’s ability to discriminate signal from noise in screening studies relies on two threshold-setting steps, both dependent on access to tissue sections from relevant (generally concurrent) control animals. The essential first part of the microscopic evaluation is a preanalytical calibration in which the pathologist constructs a mental map of “normal” tissue architecture for each tissue of the test system (where a “test system” is the combination of an animal species with its biologically pertinent traits like stock/strain, sex, age, etc.). This calibration is essential because biological endpoints including the features of many tissues (e.g., cytoplasmic vacuolation in hepatocytes and renal tubular epithelium, numbers and sizes of germinal centers in lymphoid organs, presence of infiltrating leukocytes) differ across a spectrum of “within normal limits” appearances; failure to perform this preanalytical calibration step impedes or prevents the interpretation of both frequent findings with marginally altered incidences and very rare findings.44,57 As the microscopic evaluation progresses, the pathologist performs side-by-side comparisons between tissue sections from treated and control animals as the second essential part to distinguish subtle test article–related effects. These two threshold-setting steps are most effective when the map of “normal” tissue architecture is built by assessing tissues from the concurrent control animals that are available for real-time comparison before (i.e., for calibration) as well as during the microscopic evaluation. In fact, both preanalytical calibration and side-by-side comparison are essential because the types and severities of incidental findings differ among animal species; among stocks/strains/breeds for a given species; by individual factors such as sex, age, and body weight; and across animal vendors (including among different facilities for multisite companies).57,58 Furthermore, the type and/or severity of incidental background findings can shift over time.44,57,58 Therefore, reference to historical control data in the absence of concurrent control data typically is an insufficient means for performing the calibration and comparison steps during GLP-compliant studies. However, on a case-by-case basis, animal toxicity studies may be planned with no concurrent controls to accomplish additional objectives. For example, non-GLP dose range-finding and exploratory studies conducted with nonrodents may be designed without concurrent controls as one means for reducing animal use, with the knowledge that historical control data will be available for other comparable animal studies (i.e., virtual control groups).59,60
Threshold-setting as described above substantially reduces the chance of incidental background changes (“noise”) being diagnosed as test article–related findings, and thus improves the sensitivity and specificity of the histopathology data set. 40 This simplification in turn decreases the resources needed to compile, audit, interpret, and clearly communicate the data.39,40 These additional benefits enhance the accuracy of the final histopathology data.
In contrast, masked evaluation of animal toxicity studies is inappropriate as a routine approach because it reduces the sensitivity and specificity of microscopic analysis. This disadvantage is inherent to all masked evaluations, including the “masked to treatment” and “masked to all” approaches defined above, for the following reasons. First and foremost, masked microscopic evaluations do not permit the essential threshold-setting tasks mentioned above (preanalytical calibration, side-by-side comparison of treated and concurrent control animals), thereby producing data sets that make the detection of all but the most exaggerated tissue changes difficult or impossible.39,40,46 Second, loss of sensitivity by recording every unique morphological variation when performing a “masked” microscopic evaluation requires overdiagnosing that greatly increases the time required to perform the microscopic analysis and yields extremely complex data tables that often conceal modest changes and require significantly more effort and time to interpret, audit, and communicate.39,40,46 The inability to accurately discern subtle morphologic distinctions in such complex data tables also may decrease sensitivity, prevent the identification of potential target organs, and/or artificially increase a threshold value (e.g., NOEL, NOAEL, or equivalent).
The “masked to all” approach (i.e., withholding all metadata from the study pathologist before and during the microscopic evaluation) is always inappropriate for initial evaluation of animal toxicity studies. Certain metadata are recognized as essential to effective microscopic evaluation and thus should be available to the pathologist before the microscopic assessment begins, including gross (macroscopic) findings, organ weights, and clinical pathology values.61,62 In fact, key ancillary pathology metadata are specifically noted in regulations and regulatory guidance documents as necessary information for the study pathologist during microscopic evaluation. For example, existing GLP regulations by the US Environmental Protection Agency (EPA) and the US Food and Drug Administration (FDA) state that “[r]ecords of gross findings for a specimen from postmortem observations shall (EPA) / should (FDA) be available to a pathologist when examining that specimen microscopically.”63,64 Availability of these additional pathology data before beginning the microscopic evaluation is crucial as they provide the pathologist with insight that enables the identification of target organs and/or cell populations during the microscopic analysis.
Preliminary diagnoses generated by initial informed microscopic evaluation often are confirmed by an informal masked review of a subset of the tissue sections before they are finalized by the pathologist. This verification step typically involves informal masked review of selected organs and/or findings, and it is performed by and at the discretion of the study pathologist.41,43 Review of preliminary microscopic data in this manner is not required under GLP regulations63 -65 but is performed by study pathologists when deemed necessary as an additional quality control step of the iterative process in generating final microscopic diagnoses and severity grades. 43
Recommendation 2: Masked microscopic evaluation has merit for confirming preliminary diagnoses for target organs and/or defining thresholds (“no observed adverse effect level” and similar values) identified during an initial informed evaluation, addressing focused hypotheses, or satisfying guidance or requests from regulatory agencies.
In biomedical research using animals, masked microscopic evaluation may be a useful tool for certain questions. Some of these questions (see below) are relevant to animal toxicity studies. Therefore, the decision regarding whether to use masked versus informed microscopic evaluation depends on the purpose of the study.
A. Informal (Ad Hoc) Masked Microscopic Evaluation
As mentioned above, the most common use for masked microscopic evaluation in animal toxicity studies is for informal post hoc verification of preliminary findings following an initial informed analysis. A recent (2019) survey of 83 institutions representing 589 toxicologic pathologists indicated that this informal post hoc approach is performed worldwide, as warranted, as a component of the iterative process of pathology data generation by 97% of responding toxicologic pathologists. 43 This practice is only one tool by which the pathologist ensures diagnostic accuracy. During the initial informed evaluation, the study pathologist examines tissue sections to generate a preliminary list of diagnoses and severity grades. If the study pathologist considers it to be necessary, a post hoc masked microscopic review is performed (by the study pathologist) for those organs that were identified during the initial informed examination as potential targets for test article activity. This masked follow-up step is performed at the study pathologist’s discretion to accomplish a specific purpose such as confirming or refining incidences and/or severities of preliminary diagnoses or substantiating a relationship of subtle changes to test article treatment (NOEL, NOAEL, etc.). For this informal post hoc review, the study pathologist performs a masked review of one or more tissues and sorts slides for a finding that is possibly related to test article treatment based on criteria for severity grades that were developed during the initial informed evaluation. To accomplish this task, the pathologist chooses not to view treatment and dose information on the uncoded slide labels. After the informal review is completed, the study pathologist views the labels. The use of informal post hoc masked microscopic review when deemed necessary by the pathologist is an important component of the iterative diagnostic process that hones the accuracy and sensitivity of pathology raw data.12,13,20,23,25 -27,35,37 -39,41,43,46,66 -69 Therefore, this process is not documented in the study protocol or pathology report.
B. Formal (Designed) Masked Microscopic Evaluation
As stated above, instances for formal, masked microscopic evaluations during animal toxicity studies depend on the study objective. Masked microscopic evaluation generally is inappropriate for safety animal toxicity studies because their main objective is to identify and characterize test article–related findings with maximal sensitivity. However, some regulatory guidance recommends a masked microscopic evaluation of any target organ identified during the initial informed examination. For example, in screening for neurotoxicity, both the EPA 70 and Organisation for Economic Co-operation and Development (OECD) 71 recommend a stepwise examination in which tissues from control and high-dose animals are evaluated first using an informed approach. If no structural findings are observed in this initial analysis, the tissues from animals of other dose groups need not be assessed. If findings are observed in high-dose animals, the regulatory guidance is that target tissues from all dose groups “should be coded and examined in random order without knowledge of the code” to determine the frequency and severity of diagnoses. The usual industry practice is to perform these coded assessments as formal masked analyses with appropriate documentation in advance (i.e., in the study protocol or a subsequent protocol amendment).
Formal masked microscopic evaluations are appropriate for certain animal toxicity studies. This option may be the initial choice for several scenarios.
1. Formal masked microscopic evaluation is used as the initial analytical approach in investigational toxicity studies designed to test a focused hypothesis or explore mechanisms for a test article–related finding that has been identified previously.23,68,72
2. An initial masked microscopic evaluation may be the preferred means for investigational toxicity studies for which existing regulatory guidance specifically recommends that semi-quantitative microscopic data and quantitative measurements be acquired in blinded fashion. For instance, an initial masked microscopic analysis is advised as a (nonbinding) recommendation in two FDA guidance documents, one for product development under the Animal Rule 73 and another for qualification of novel biomarkers. 74
a. When conducted under the Animal Rule, the fundamental objective of animal studies is to provide a set of clinical trial-like data in situations where a human clinical trial is not feasible or would be unethical. This guidance applies specifically to “animal efficacy studies and . . . PK [pharmacokinetic] and/or PD [pharmacodynamic] studies” rather than GLP-compliant nonclinical safety studies and says that “[a]ll personnel responsible for the collection, assessment, or interpretation of data”—including those responsible for the “necropsy, gross pathology, and histopathology data”—should be blinded. 73 The wording of the Animal Rule is problematic for pathologists because interpretation of microscopic findings is not feasible until the data set has been decoded, while acquisition of microscopic data is impeded in the absence of ancillary pathology data (e.g., gross lesions, organ weights, clinical pathology values). Therefore, the protocol must clearly indicate the study phase within which the data blinding is lifted for the purpose of data interpretation.
b. For confirmatory biomarker qualification studies, the main study objective is to formally test the hypothesis that the shifting expression (amount and/or distribution) of a target in cells or tissue sections is a relevant indicator of either normal or abnormal biologic processes and/or the changes induced in these processes by some therapeutic intervention. Accordingly, the pathologist often may be masked to metadata that are specific to the biomarker that is being qualified (e.g., results from positive-control biomarkers, tissue sampling times). 74
3. With respect to quantitative pathology data (e.g., morphometric measurements made from highly homologous tissue sections), regulatory guidance for safety animal toxicity studies typically recommends that measurements be captured in blinded fashion.75,76
4. Similarly, a global guideline (Topic GL–44) formulated by the International Cooperation on Harmonisation of Technical Requirements for Registration of Veterinary Medicinal Products (VICH) during target animal safety studies for investigational veterinary vaccines suggests that the pathologist performing the microscopic evaluation be blinded to the treatment groups.77,78
Guidance documents generally state that sponsors may propose alternative study designs with appropriate scientific justification. Based on points raised in this “best practice” document, the STP position for animal toxicity studies is that sponsors should routinely propose an informed histopathology evaluation, including the situations listed here where guidance suggests an initial masked analysis.
A fundamental principle for all microscopic data sets acquired using a masked evaluation, whether using an informal or formal approach, is that data interpretation is performed only after the data have been decoded (i.e., the tissue diagnoses have been linked to the individual subject metadata [treatment, dose level, etc.]). Valid interpretation by the study pathologist cannot occur unless data are decoded. Data decoding may be undertaken by either the study pathologist or an independent third party. For data acquired by formal masked microscopic evaluation, the decoding procedure should be stated in the study protocol (or protocol amendment). On occasion, microscopic diagnoses made using a formal masked microscopic evaluation are refined by the study pathologist after decoding has been completed. In such cases, any diagnostic adjustments will be tracked in the audit trail for the study.
Recommended (“Best”) Practices for Using Masked Microscopic Evaluation for Animal Toxicity Studies
Several further design decisions need to be considered on those occasions in which an animal toxicity study intended for regulatory review will include a masked microscopic evaluation. This section provides three recommended (best) practices that the STP believes are optimally suited to ensure that the microscopic examination provides accurate and sensitive data.
Recommendation 3: If used as the approach for an animal toxicity study to investigate a specific research question, masking of the initial (first) microscopic evaluation should be limited to withholding only information about the group (control or test article–treated) and dose equivalents.
If a masked microscopic evaluation is used for an animal toxicity study, the Working Group asserts that the degree of masking should be limited to the identity of the test article and dose (i.e., “masked to treatment” only). In this context, “masked to treatment only” is the necessary masking choice because microscopic diagnoses and interpretations by the study pathologist are informed by and integrated with other pathology metadata, including gross findings, organ weights, and clinical pathology values. In addition, the Working Group contends that masking is appropriate as the initial approach for microscopic evaluation only if the spectrum of incidental background findings is known and if key diagnostic terminology and predetermined, well-characterized grading criteria exist for use with an established animal model.2,79 -81
For special studies where a masked microscopic evaluation is promoted in current guidance documents (e.g., biomarker qualification), several recommendations have been proposed to make concurrent control tissues available for informed microscopic examination before a masked analysis of all tissues from all groups. On a case-by-case basis, options might include (1) preparing an extra set of slides from some or all control blocks22,74 or (2) having a separate satellite concurrent control group used to define the baseline range of findings but for which data will not be included in the final report. 74 These approaches are not implemented routinely during animal toxicity studies, for several reasons. Both options would lengthen the study timeline (and cost) because extra tissue sections would have to be processed and evaluated; the second option would increase animal use, which runs counter to 3Rs (reduce, refine, replace) initiatives82,83; and neither option provides the possibility of referring to the full range of control findings during the course of a study. These design variants acknowledge the fact that accurate microscopic diagnoses can be made only if the pathologist has foreknowledge of all relevant data related to an animal’s particular biological status, as is the standard for diagnostic pathology practice.8,9,11
Recommendation 4: The decision regarding whether or not to perform a masked microscopic evaluation is best made by a toxicologic pathologist with relevant experience.
The pathologist is the member of the study team who is best positioned by education and relevant professional (scientific and methodological) experience to make recommendations/decisions regarding whether or not the microscopic data set for an animal toxicity study will be made more robust by employing a masked analytical approach. For those studies in which a formal masked microscopic evaluation is being considered, the pathologist should be consulted in the design phase of the study regarding whether or not a masked initial assessment is useful to address the study objectives while ensuring data accuracy and sensitivity.
Recommendation 5: Pathology peer review, performed to verify the microscopic diagnoses and interpretations by the study pathologist, should use an informed evaluation approach.
Pathology diagnoses and interpretations for animal toxicity studies intended for regulatory review often are verified by one or more additional pathologists. 84 For this purpose, the reviewing pathologist examines a portion of the material previously evaluated by the original study pathologist. The reviewing pathologist confirms that the study pathologist’s diagnoses conform to accepted diagnostic terminology and are used in a consistent manner as well as that the interpretation is supported by the data. These procedures usually are formal (i.e., documented in the study protocol or protocol amendment) and take one of the two forms: a pathology peer review by one additional pathologist,40,84,85 or a pathology working group (PWG) comprising multiple pathologists.40,50,84 Peer review of microscopic data by these means is not obligatory under GLP regulations.63 -65
Because the primary purpose of a pathology peer review is not to produce new microscopic data but to confirm the accuracy and consistency of diagnoses and interpretation generated by the study pathologist, the peer review pathologist should have access to all metadata provided to the study pathologist and all diagnoses and interpretations being proposed for inclusion in the final microscopic data set as communicated in the draft pathology report. For this reason, an informed microscopic evaluation by the peer review pathologist is the default approach for pathology peer reviews of animal toxicity studies.84,86 Similar to the study pathologist, a peer review pathologist may choose to perform an informal post hoc masked microscopic evaluation (as described above for recommendation 2) to clarify their diagnoses and/or interpretations.
In contrast, PWG are conducted on a case-by-case basis to answer a specific question and/or generate new pathology data. The type of microscopic evaluation depends on that purpose. The PWG process may be performed as either an informed or masked slide (or image) evaluation depending on the objective for which the PWG was convened.40,50
Discussion
Histopathologic diagnoses from animal toxicity studies intended for regulatory review provide significant value for assessing the safety of test articles. Depending on the study objectives, microscopic evaluation of animal toxicity studies may involve (1) an informed examination of all tissues with no subsequent informal masked review for diagnostic refinement, (2) an initial informed examination of all tissues followed by an informal masked review of selected tissues to refine diagnoses, or (3) a formal (protocol-driven) masked examination of all tissues. The STP assembled a Working Group to develop recommendations regarding the optimal use of informed versus masked microscopic evaluation during animal toxicity studies. The Working Group undertook this effort by fulfilling four specific objectives.
The first objective was to assess differences of opinion among scientists regarding the proper approach to performing a microscopic evaluation for animal toxicity studies. The Working Group addressed this objective by conducting a detailed survey of current global practices on this topic 43 and by reviewing the relevant scientific literature. Two principal perspectives exist with respect to the approach, informed versus masked, for microscopic evaluation in animal studies. Scientists not directly involved in microscopic analysis cite masked evaluation as an important technique to minimize diagnostic bias for all studies using histopathology, including safety studies designed to assess toxicity. Toxicologic pathologists affirm that informed evaluation is necessary for animal toxicity studies (whether GLP or non-GLP) to maximize sensitivity when identifying and characterizing potential test article–related effects during such screening bioassays. Diagnostic differences have been reported for microscopic data generated for toxicity studies when a single pathologist has viewed the same study materials at different times using informed versus masked approaches87,88 or when multiple pathologists have viewed the same tissue sections independently (compare reference nos. 89 vs 88 and 90). These minor differences are to be expected in all interpretive medical sciences (including the clinical practice of medicine) and typically do not impact the overall conclusions. Pathologists show better agreement with respect to identifying a pathological process (e.g., discriminating normal tissues from altered [inflamed, necrotic, neoplastic, etc.] tissues) compared with agreement regarding the relative severity of the change. 88 The continued evolution in harmonized diagnostic nomenclature45,89,90 coupled with the use of reference images for well-defined lesions in toxicologic pathology45,90 -92 is improving the interpathologist alignment in lesion terminology and severity grades. Masked microscopic evaluation, in contrast, reduces diagnostic sensitivity without a measurable improvement in diagnostic accuracy in routine toxicity studies.37,39,41,88 Moreover, the toxicologic pathology community has demonstrated for many decades that microscopic evaluation using an informed approach for animal toxicity studies to identify toxicity has led to sustained, high-level performance of safety evaluations, ensuring the consistent generation of accurate and sensitive microscopic data. 43
The second objective was to define a set of “best practice” recommendations that address the optimal design of microscopic evaluations for animal toxicity studies intended for regulatory review. The Working Group defined five best practices based on the scientific consensus from various sources informing the toxicologic pathology field (e.g., published literature, comments by members of STP and other global societies of toxicologic pathology, 43 professional experiences of Working Group members). The primary conclusion was that informed microscopic evaluation is the default approach for safety animal toxicity studies. The five practices in this article describe currently applied approaches (informed vs masked) for microscopic evaluation used globally by the toxicologic pathology community. 43 These best practices have proven to reliably generate accurate and sensitive data over the past several decades.23,27,28,35 -43 In certain cases, masked pathology evaluation might be suitable to address specific objectives (e.g., emphasis on limiting bias in testing a hypothesis in an investigational study to evaluate a potential mechanism of toxicity), but such situations are driven by different primary considerations from safety animal toxicity studies (where the emphasis is maximizing sensitivity for identifying test article–related findings). Similarly, on occasion, safety animal toxicity studies may use a masked approach for the pathology evaluation based on particular guidance/requests from regulatory agencies. Notably in such cases, published regulatory guidance states that exceptions to a masked pathology evaluation may be proposed at the discretion of the sponsor with appropriate scientific justification.73,74,77,78 Accordingly, the Working Group recommends that the routine study design for safety animal toxicity studies incorporates an informed pathology evaluation to maximize detection of potential test article–related findings.
The third objective was to consider the need and best means for documenting when a masked (informal or formal) microscopic evaluation was conducted during an animal toxicity study intended for regulatory review. The Working Group debated this point extensively, and consequently developed separate recommendations for two specific situations: (1) post hoc informal (non-protocol-driven) masked review performed at the discretion of the study pathologist to confirm or refine their microscopic diagnoses made during an initial informed assessment, and (2) formal (protocol-driven) masked evaluation as the initial (and only) analytical approach. In the first scenario, the Working Group consensus was that no documentation is needed in the pathology report because post hoc informal masked review performed while generating histopathology raw data is considered part of the iterative diagnostic process. For the second situation, the Working Group advocates that when a formal masked microscopic evaluation is implemented as the analytical approach, it is documented initially in the study protocol (or protocol amendment) and then acknowledged in the pathology report.
The fourth and final objective was to assess regulatory considerations with respect to selecting informed versus masked microscopic evaluation for animal toxicity studies. In general, regulations and guidance provided by regulatory agencies for animal toxicity studies do not define a specific approach for the microscopic evaluation. Therefore, the choice between informed versus masked evaluation depends on the scientific question(s) to be investigated and should be made by qualified personnel (e.g., the toxicologic pathologist familiar with the appropriate scientific endpoints in consultation with the study director). In a few instances, regulatory guidance addressing specific investigational objectives (e.g., the Animal Rule 73 and hypothesis-driven biomarker qualification 74 ) recommends a blinded approach for the initial microscopic evaluation. The Working Group notes that the five best practices outlined in this article will aid sponsors in designing and conducting appropriate microscopic evaluations for animal studies using sensitive methods based on scientifically sound principles.
Conclusion
In summary, the advantages of informed microscopic evaluation of animal toxicity studies intended for regulatory review far outweigh the hypothetical advantages of masked examination as the initial approach.23,39,40,43 Specifically, toxicologic pathologists generally acknowledge that “[b]linding is not applicable to a scientific investigation for which the potential outcomes are not defined in advance and there is no specific hypothesis to test” 39 —and therefore is not advocated for animal toxicity studies where assessing safety with maximal sensitivity is not driven by a focused hypothesis. 43
At the time of publication, these five best practices for appropriate use of informed versus masked microscopic evaluation in animal toxicity studies intended for regulatory review have been endorsed by multiple societies of pathology around the world, starting with the STP and then followed by the American College of Veterinary Pathologists (ACVP), British Society of Toxicological Pathology (BSTP), European Society of Toxicologic Pathology (ESTP), Japanese Society of Toxicologic Pathology (JSTP), Société Française de Pathologie Toxicologique (SFPT), and Society of Toxicologic Pathology–India (STP–I). First, informed microscopic analysis is the default approach for microscopic evaluation of animal toxicity studies. Second, informal post hoc masked microscopic evaluation may be useful in toxicity studies to address specific questions such as confirming preliminary diagnoses and/or severity grades for target organs and/or defining thresholds (e.g., NOAEL) identified during an initial informed evaluation. Formal masking of the microscopic evaluation should be restricted to investigational toxicity experiments or to studies performed to satisfy guidance or specific requests from regulatory agencies. Third, if used as an approach for an animal toxicity study, masking of the initial microscopic evaluation should be limited to withholding information about the group (control or test article–treated) and dose. Fourth, the decision regarding whether or not to perform a masked microscopic evaluation is best made by the study pathologist. Finally, pathology peer review should use an informed evaluation approach. The consensus of global societies of toxicologic pathology and their members, based on decades of sustained, reliable performance in safety evaluation,25,28,39 is that these five best practices reliably deliver the most accurate and sensitive histopathology data for animal toxicity studies.41,43
Footnotes
Acknowledgements
The authors wish to thank our many pathologist colleagues who provided feedback regarding these recommendations and specifically acknowledge Dr David Herr and several other scientists (biostatisticians and regulatory scientists who had to remain anonymous for professional reasons) for their additional insights.
Author Contribution
The analyses, conclusions, and opinions expressed in this article are solely those of the authors. All authors participated in the discussions involved with formulation and organization of this article.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
