Abstract
A number of issues may arise during the conduct of a study which can complicate interpretation of in vitro and in vivo datasets. Speakers discussed the implications of differing interpretations and how to avoid complicating factors during study planning and execution. Consideration needs to be given to study design factors including defining objectives, consideration of expected pharmacological effects, dose selection and drug kinetics, species used, and vehicle selection. In addition, the effects of vivarium temperature effects on various endpoints, how to control variables affecting clinical pathology, and how early death animals, common background findings, and artifacts can affect histopathology interpretation all play into the final interpretation of study data.
A number of issues may arise during the conduct of a study which can complicate interpretation of in vitro and in vivo data sets. Speakers at this symposium discussed the implications of differing interpretations and how to avoid complicating factors.
The first speaker, Laura Dill Morton (Aclairo Pharmaceutical Development Group, Vienna, Virginia), presented a talk entitled “Challenges in Study Interpretation—Study Design Factors.” She reviewed the differences between goals, which are high-level, specific and measurable objectives, strategies, and tactics, and the role of these things in creating study designs that meet program objectives. Goals (what is to be accomplished, such as registering compound X for the treatment of condition Y) must be distinguished from objectives (simple and measurable steps toward goals, such as “conduct toxicology studies to characterize the toxicity of compound X”).
It is entirely possible to conduct toxicology studies that are technically compliant but yet fail to meet their objectives and are therefore meaningless. Studies may be confounded when not all of the pertinent influences have been measured or accounted for; studies may fail when one or more of the key assumptions was not met (eg, if a positive control fails to produce the expected effect). Although this is the classic definition of a failed study, any study that doesn’t fulfill its objectives can be considered failed.
Designing high-quality toxicology studies starts with clearly stated study objectives that are aligned with program objectives. The key questions that the team needs to answer should be identified, such as “What is the relationship between exposure and toxicity?” or “Are the toxic effects reversible?” The team should take the time to evaluate prior publications about the pharmacology/target of the test article so that expected pharmacologic effects can be factored into the study design. This analysis should also consider the kinetics of pharmacologic responses and whether and why they might not be sustained throughout the study, since this may affect the timing of sample collection. Reasons why effects might not be sustained include receptor changes (shedding, internalization, or downregulation), physiologic compensation, or immunogenicity.
Anticipated pharmacologic effects may also drive the specific end points in the study, depending on the organ system where these effects are expected. Some end points that are not standard in toxicology studies, such as intraocular pressure, or nerve conduction velocity, may be appropriate. If these end points are included in the study, it is important to select a laboratory that is experienced and capable of executing and interpreting these data.
In addition to pharmacologic kinetics, the study design must consider drug kinetics. Effects that are C max related may be missed or underestimated if they are evaluated at a time when exposure is low, such as prior to the administration of the next dose. Conversely, some drug-related effects, such as antidrug antibodies, may be masked by high drug concentrations. Drug kinetics also influence the selection of reversibility duration and the frequency of dosing. Drug exposure should cover the dosing interval, and the frequency of dosing should support the clinical dosing interval, though it need not perfectly replicate it.
Dose selection is probably the most critical element of toxicology study design and should be rigorously scrutinized. Writing a clear justification of the dose, with specifics including magnitude and frequency of toxicologic effects driving the limitation of the dose, can provide a valuable “reality check.” This is necessary to avoid setting the dose too low based on effects observed transiently or in only a small number of animals. Such careful documentation provides a valuable memory aid and can be useful to validate subsequent studies if previously observed toxicity does not recur.
The other critical element for a toxicology study is the selection of the test system. If the test article is not pharmacologically active in the chosen test system, or key metabolites are not generated in the test system, the resulting data may be completely inadequate to support human testing. Specific data should be cited to support the choice of the toxicology species. Finally, the interaction of the test system and the pharmacologic class should be considered. Some mechanisms are highly dependent on a diseased or at least abnormal physiologic state, such as obesity, and may be difficult or impossible to evaluate in normal animals. This may result in toxicology studies in normal animals underpredicting potential toxic effects in patients. In such cases, it may be helpful to conduct primary or supplemental toxicology studies in disease models (eg, transgenic or knockout models). Toxicology studies, particularly those conducted in unusual or fragile models, should also include sufficient numbers of animals to meet study objectives while balancing the need to minimize animal use. As such, pilot information about feasibility and likely mortality may provide information to justify animal numbers in a definitive study and avoid the necessity of restarting a study.
The comparisons that will be made in the toxicology study should be considered when both designing the study and gathering data. If the study is using a vehicle that has toxicologic or immunostimulatory effects of its own, a separate nonvehicle control group (such as a saline or untreated control) may be necessary to separate out what is due to vehicle and what is due to test article. For example, in gene therapy studies, the control used is often only diluent, rather than vector that does not express the transgene. This can make it difficult to separate the immunologic consequences of the vector from those of the transgene. It is also important to consider the degree of interindividual variability when establishing the size of groups and the amount of prestudy data that will be collected.
In summary, there are many ways to miss effects in toxicology studies. It is possible to conduct studies that are compliant with standard operating procedures and good laboratory practices yet are scientifically inadequate. This can be avoided, or at least minimized, by thinking carefully about every aspect of the study design, considering data already available for the test article and target, and not accepting defaults built into protocol templates.
Martin Sanders (Independent Consultant, Waterford, Connecticut) presented a talk entitled “The Effect of Temperature on Functional Endpoints in Preclinical Studies.” He presented examples from safety pharmacology, toxicology, and murine animal model studies on how changes in body temperature (BT), whether drug or environmentally induced (housing conditions), can alter the functional end points under study. 1 -6 There is a considerable literature on how temperature affects murine biology, summarized as follows: Mice are usually housed at vivarium temperatures of 20°C to 22°C, which is below the animal’s thermoneutral zone of 30°C to 32°C. 7 The choice of the lower temperature supports the comfort of the technical staff working in the facility rather than being optimized for the mice housed in the room. To maintain the same BT under cold conditions as they do under thermoneutral conditions, mice utilize metabolic adaptations that consist of increased metabolic rate, heart rate, activation of the sympathetic nervous system, and increased glucocorticoid production. As a result, there is immune system dampening that can lead to altered responses in a variety of experimental models (eg, infection, cancer, atherosclerosis). An argument can be made that the clinical translation and reproducibility of experimental mouse model data will improve if this one adjustment to a warmer housing environment is made.
From a safety pharmacology perspective, assessing whether treatment with a compound affects the QT interval is a key end point measured in cardiovascular studies. In most cases, drug-induced QT interval prolongation results from a compound’s inhibition of human ether-a-go-go-related gene (hERG) current (rapid delayed rectifier [IKr] channel). Several years ago, Martin Sanders’ department tested a central nervous system penetrant compound in the telemetered minipig that produced significant dose-related increases in QTc but was shown not due to interaction with cardiac ion channels. 8 The compound also produced a dose-dependent drop in BT, and following up with a literature search, there were several examples showing that there was about an 18 milliseconds increase in QTc per degree of BT lowering. 9,10 They confirmed this finding examining the diurnal changes in BT and QTc and by treating the animals with dihydrocapsaicin, another agent shown to lower BT. 11
Aside from drug-induced changes in BT, most of the literature examples focus on how environmental temperature affects biological responses. Although the mouse is the most common model studied under these conditions, there is one particularly interesting finding from a Novartis primate toxicology study testing a dipeptidyl peptidase (DPP) 4 inhibitor. 12 In this example, high-dose vildagliptin administered over a 3-week period produced necrosis of the tail and ears. However, in a repeat 3-week study, the room temperature was raised from the typical housing conditions of 68°F-75°F (20°C-21.1°C) to 81°F-88°F (27.2°C-31.1°C), more typical of the cyno’s natural climate. Under this condition, skin lesions did not develop. The DPP-4 inhibition leads to an enhanced role of neuropeptide Y as a vasoconstrictive factor.
Functionally, the typical ∼600 bpm heart rate in mice studied at room temperature was attributed to the sympathetic nervous system dominating control. Swoap et al in 2008 13 showed that under thermoneutral conditions (30°C), heart rate fell to ∼350 bpm, and using both β adrenergic-less and muscarinic M2 KO mice showed that under thermoneutral temperatures, parasympathetic tone is the major factor controlling heart rate, similar to what is seen in humans. Another functional end point affected by temperature is the tail flick latency or paw withdrawal response. 14,15 This nociceptive test, where the reaction time between application of heat and tail/paw withdrawal is typically used as a pain index in a test for analgesia, has been shown sensitive to ambient temperature mostly due to the direct relationship between nerve conduction velocity and temperature. Improved reproducibility with less variability is possible when ambient temperature is well controlled and considered as an experimental variable.
From an animal model perspective, it has been shown that cold mice, housed at temperatures below their thermoneutral zone, have suppressed immune responses that affect study results from oncology, infection, and metabolic disease studies. From oncology models, 16,17 tumors grew more slowly and spontaneous lung metastases were significantly reduced at thermoneutral temperatures compared to standard vivarium room temperatures. In addition, there were greater numbers of activated CD8+ T cells associated with tumors and draining lymph nodes. In contrast, at room temperatures, there were higher numbers of myeloid-derived suppressor cells present. In an infection model with Francisella tularensis, 18 live strain vaccinated mice at thermoneutral temperatures displayed elevated antigen-specific T-cell responses and survived intranasal challenge that were fatal to immunized mice at room temperatures. Finally, in metabolic disease models of nonalcoholic fatty liver disease 19 and atherosclerosis, 20 thermoneutral housing augmented a pro-inflammatory immune response and exacerbated high-fat diet-induced liver disease due to the lower production of corticosterone. The thermoneutral housing also resulted in heightened intestinal permeability and intestinal microbiome dysbiosis, which better mirrors the human condition. In a Western diet model of atherosclerosis, wild-type mice housed at room temperature are typically resistant to disease development. Under thermoneutral conditions, there is the initiation of atherosclerosis, altered lipid profiles, and increased aortic plaque size. The latter was accompanied by increased levels of aortic and white adipose tissue inflammation and increased circulating immune cell expression.
From the examples above, with the goal of improving the translation and reproducibility of preclinical models, researchers are encouraged to investigate the effect of housing temperature on their study outcomes.
Bill Reagan (Pfizer, Groton, Connecticut) followed with a talk entitled “How Do We Control the Variability of Clinical Pathology Data to Optimize Interpretation.” He reviewed the many variables that need to be considered when assessing clinical pathology data in preclinical toxicity studies answering questions such as, Were the changes consistent over time? Were the animals fasted? What was the route of administration of the test compound? Were the changes similar across dose groups?, and so on. 21 After considering these and many more variables, the main question that should be answered when evaluating clinical pathology data remains: Are the changes test article related or not? Better understanding and minimizing preanalytical, analytical, and postanalytical variables will help optimize the interpretation of the clinical pathology data and lead to an accurate identificaton of true test article–related effects.
Preanalytical variables are the most common ones to influence the outcome of the studies, especially in rodent studies. Major preanalytical categories to consider are husbandry practices, restraint techniques, fasting status, blood collection technique and timing, sample handling/storage, and animal age-related changes. The effect of stress that occurs with husbandry practices as well as restraint techniques for bleeding animals can have a profound effect on hematology and clinical chemistry parameters. Just taking mice for an elevator ride can lead to decreases in blood lymphocytes and thymic weights. 22 Restraining mice 23 and rats 24 by holding the body versus the tail can lead to higher liver and skeletal muscle enzymes, respectively. Chair restraint of monkeys can have profound effects, including increasing heart rate and blood pressure, which can induce focal areas of ischemia, leading to cardiac necrosis and associated cardiac troponin I release. 25 Chair restraint and the clinical sequela can confound the interpretation of cardiac troponin results in preclinical toxicity studies. Are these changes due to stress of restraint or directly associated with cardiotoxicant test article effect? It was demonstrated how to minimize this variable by collecting samples to assess cardiac troponin I away from serial sampling time points. In addition, examples were given on how fasting, anesthesia, blood collection techniques, timing, and site of blood collection can have direct effects on many of the clinical pathology parameters. 26 -28 Dr Reagan also illustrated the effect of age of the rodent as another variable to consider when assessing clinical pathology data. The example used was a common age-related cardiomyopathy that occurs in rats, which results in multifocal areas of necrosis/mononuclear infiltrate. The exact pathogenesis is not clear, but these rodents may also have increased blood cardiac troponin concentrations. 29
Analytic variables, including validation of assays and interferring substances, were also discussed. One of the most crucial steps in validating an immunoassay is to determine the cross-reactivity of the analyte of interest in the species of interest using biologically relevant material. This was illustrated by demonstrating the range of immunoreactivity that occurs in rat samples with assays developed to detect human cardiac troponin. 30 It is crucial to show there is good immunoreactivity before proceeding with implementation of an assay. Interferring substances were also discussed. Icteric, lipemic, and/or samples with hemolysis can have an effect on the measurement of analytes. It was demonstrated in rats and mice that marked hemolysis can increase aspartate aminotransferase and potassium and decrease alkaline phosphatase. Although not often considered, test articles that are administered to the test species can also interfere with the proper analysis of analytes. It was illustrated how a compound caused marked decreases (both in vitro and in vivo) in alanine aminotransferase, which could interfere with the detection of hepatoxicity in patients treated with this drug (Figure 1).

In vitro inhibition of alanine aminotransferase (ALT) by compound A. No effects on aspartate aminotransferase (AST) or glutamate dehydrogenase (GLDH) are noted.
Postanalytical variables are those that occur during or after the data verification phase. Examples include inaccurate data review, incorrect transmission of the data to the end user, and erroneous data interpretation. Quality laboratories have electronic interfaces between the instrumentation and the laboratory information reporting system, as well as procedures in place to review electronic and manually entered data to prevent errors. Having qualified personnel with the appropriate training should result in accurate data interpretation.
Overall, with good study design (keeping many of the variables constant including restraint, fasting, and age of the animals), consistent husbandry and blood collection techniques, and minimization of analytic variables should lead to the production of high-quality clinical pathology data, which can easily be interpreted with accuracy.
Finally, Dr Torrie A. Crabbs from EPL, Inc presented a talk on “Tissue Collection Quandaries, Artifacts, Early Deaths, and Other Confounders of Pathologic Interpretation.” This talk focused on a variety of factors that can obscure pathologic interpretation of histologic findings: early death animals, tissue collection and processing artifacts, common spontaneous background findings, and a variety of other conundrums.
Early death animals include animals that are either found dead or are moribundly sacrificed during the course of a study. The cause of these deaths can be the result of direct or indirect toxicity from the test article, due to a procedural accident such as a gavage error, secondary to trauma, as a result of old age, or due to infectious agents; however, in many cases, the cause is unknown. Two main confounders associated with early death animals are a lack of age-matched controls and an increased risk of diagnostic drift. This is especially true in larger carcinogenicity studies where early death animals are often read immediately, potentially creating a long delay between reading tissues from early death and terminal sacrifice animals.
While histologic features of organs from young adults and aged laboratory animals are well known by toxicologic pathologists, less is known about younger animals, which is important because postnatal histologic maturation occurs in some organs. These immature histologic features must not be confused with chemical- or drug-related effects. One way to address this issue is to add additional control animals at the beginning of the study that can be sacrificed during the course of the study as needed, though this is often not feasible. In addition, there are now numerous publications that address the postnatal histologic development of numerous tissues in laboratory animals. 31 -33
Dr Crabbs presented several examples in rodents and canines that demonstrated how crucial knowledge of postnatal development is to differentiate normal physiologic changes from toxic lesions. For example, in the testes, there are numerous histologic findings commonly indicative of toxicity that are normal during certain stages of development. These include multinucleated germ cells, apoptosis, exfoliated germ cells, and hypospermatogenic tubules. 31,34,35 Therefore, if a pathologist encounters small testes with low numbers of spermatogenic cells, the first thing that needs to be determined is the age/developmental stage of the animal at the time of death. Features of apoptosis and missing cell types in a prepubertal animal likely indicate a normal physiological change. On the other hand, if these features are present in a sexually mature animal, a xenobiotic-associated change must be considered. If there are features of sexual immaturity, such as the presence of rosettes, 32 in an otherwise mature animal, a possible endocrine disruption event must be considered.
Diagnostic drift is a variation in the application of diagnostic terminology and/or criteria over time. It is a major problem faced by all toxicologic pathologists, especially in larger toxicity/carcinogenicity studies that take months to read. As mentioned previously, early death animals are often reviewed as they die; therefore, there can be long intervals between the reading of early death animals as compared to terminal sacrifice animals. In addition, only target organs are often examined in lower dose groups, which can create a delay between the original examination of the control and high-dose tissues as compared to the lower dose groups. All of these delays can result in the application of slightly different diagnostic criteria for specific changes. In addition, multiple terms can be used to describe the same or similar lesions. Maintaining consistency over time can be a challenge, particularly with regard to the application of thresholds and the use of severity grades. All of this can result in the creation or masking of treatment-related effects. In order to help minimize diagnostic drift, it is important to establish a “diagnostic dictionary” at the beginning of the study, which describes the morphologic criteria for each finding. This document can then be reviewed upon the completion of the study to look for duplications of terms, inappropriate terminology, and terminology that is inconsistent with previous studies for similar compounds. Another method to reduce diagnostic drift is to review a subset of the controls, higher dose and lower dose groups, and then repeat. Although diagnostic drift can still occur with this method, it will be spread out relatively evenly across all groups. Some pathologists will perform a blind read of target tissues once the study is complete. This blind read will include all animals, including early death animals. This method is especially helpful for subtle changes and helps to assure a test article–related change is real and that diagnostic criteria were consistently applied across the study. Most importantly, peer review should be incorporated whenever possible. Peer review adds validity to the accuracy of the findings because it provides a targeted review of treatment-related lesions over a short period of time. Peer review also helps to assure all treatment-related effects were properly identified, confirms lesions were consistently diagnosed, and helps to ensure that the most current terminology is used.
There are several artifactual changes that can occur during tissue collection and slide processing that can appear to be treatment-related morphologic changes. Proper knowledge of these artifacts is critical for the pathologist to differentiate the two. Proper training of technicians is also necessary as this will reduce the likelihood of creating these artifacts in the first place. Tissue artifacts typically fall into one of 3 major classes: fixation or handling artifacts, processing artifacts, and sectioning artifacts. Several types of artifacts were discussed.
In the lung, a common fixation artifact occurs if the lung is not properly inflated during fixation. It results in a partially collapsed lung that must histologically be differentiated from interstitial inflammation. Aggressive or overinflation of the lung can result in separation of perivascular tissues, which can be mistaken for perivascular edema; however, true edema often contains a small amount of eosinophilic proteinaceous fluid within the expanded perivascular tissue that is lacking in the artifact.
For most tissues, formalin is the fixative of choice; however, this is not the case for all tissues, such as the testes and the eyes. To maintain the structural integrity of the testes, they need to be fixed whole. Therefore, a more rapidly penetrating/stronger fixative than formalin is required, such as Bouin’s or Davidson’s. Regardless of the fixative used, artifacts will still be present, but they will be drastically reduced with the use of Bouin’s or Davidson’s. Fixation in formalin results in shrinkage of the germ cells away from one another, producing a range of artifacts that can be mistaken for cytoplasmic vacuolation, cell exfoliation, and degeneration. 36
In addition to artifacts created due to fixation, rough/improper handling of tissues during necropsy can disrupt normal architecture. For example, stretching nervous tissue during necropsy removal can result in vacuolization of the neuropil. 36 Dark neurons, another common artifact that results from improper handling of the brain, 37 must be distinguished from eosinophilic neurons (“dead reds”), which are indicative of acute neuronal degeneration. Improperly used forceps prior to fixation can result in pinching or crushing of the tissue. A prolonged death to prosection interval has been associated histologically with hepatocellular vacuolation. 38 This vacuolation is due to plasma influx and must be differentiated from true vacuolation, due to lipid accumulation.
Artifacts can also occur during the processing and sectioning of fixed tissues to slides. Cytoplasmic vacuolization of the brain, especially the white matter, has been demonstrated to occur when there is prolonged holding of fixed tissue in alcohol, prior to processing. This can occur when tissues are placed in an automated processor on Friday evening and held in alcohol till processing begins on Sunday evening. 39 Air bubbles entrapped between the tissue and the water bath create small round basophilic regions that must be differentiated from basophilic foci in the liver. 36
During the histopathologic evaluation of tissues, pathologists encounter numerous confounders. In addition to distinguishing immature tissue and fixation/processing artifacts from real toxicant-induced changes, it is critical that pathologists have adequate knowledge on strain and species variations, sexual dimorphism, and common spontaneous background findings.
Background findings can be congenital or hereditary, age related, associated with infectious agents, secondary to trauma, or due to artifacts. Cysts in certain tissues, such as the thyroid, parathyroid, and pituitary, are common congenital findings that are often not diagnosed because they do not increase in incidence with treatment. However, cysts in other tissues, such as the liver and kidney, can increase following treatment and therefore are commonly recorded when present. Age-related changes include normal aging changes, such as thymic involution; degenerative conditions, such as chronic progressive nephropathy, polyarteritis, and cardiomyopathy; and spontaneous/familial tumors. Many of these age-related changes can be exacerbated by stress and/or treatment, further confounding interpretation. The presence of histologic findings secondary to infectious agents, such as the distinct lymphoplasmacytic perivascular interstitial pneumonia present in rats infected with Pneumocystis carinii, 40 must be differentiated from test article–associated findings. Trauma can include not only fractures and bite wounds due to fighting but also be accidentally inflicted during test article administration. During gavage studies, material can be inadvertently instilled directly into the trachea and/or lung or can be accidentally aspirated during removal of the gavage needle. Aspiration and/or reflux can result in lesions at various levels of the respiratory tract that do not necessarily represent a systemic toxicity.
Dr Crabbs concluded her talk by briefly addressing issues that arise when findings appear to be treatment related but they are not (ie, there is an increased incidence of spontaneous findings in dosed groups as compared to controls). This is extremely troubling in smaller studies, where there are less animals/dose group and statistical interpretation is difficult. These increases can result from improper sampling/stratification of samples, infectious agents masquerading as toxic agents, a lack of historical control data, inexperience, or just bad luck. Improper sampling/stratification can result in a clustering of findings in one dose group. For example, if one histology technician processed all the thyroid glands for the controls and trimmed them transversely and another technician processed all the thyroid glands for the high-dose group but trimmed them longitudinally, there could be a false increase in recorded C-cell lesions merely due to the difference in sectioning. Another example involves the interval between fasting and necropsy. Rodents are nocturnal animals, and therefore, their glycogen levels are highest in the evening. Significant variations in the length of time between fasting and necropsy can cause significant histological variation; this must be distinguished from a pathologic process. A final issue of concern that was addressed was that of tumors in young animals. The presence of these can be extremely troubling, especially if they only occur in a dosed group. Tumors in young animals often occur in extremely low incidences and historical control data are sparse; therefore, interpretation can be difficult unless the tumors have been demonstrated to be familial.
In conclusion, pathologists encounter numerous issues that can confound interpretation of a study. Consistency and proper training along all levels of the process is key. Thorough knowledge of some of the more common types and causes of confounders is needed to aid pathologists in differentiating artifacts and incidental findings from test article–related effects. In addition, peer review should be incorporated whenever possible to help validate the accuracy of any study.
Footnotes
Author Contributions
K. Funk and M. McVean contributed to conception and design and critically revised the manuscript. L. Morton, M. Sanders, W. Reagan, and T. Crabbs contributed to acquisition, analysis, and interpretation, drafted the manuscript, and critically revised the manuscript. All authors gave final approval and agree to be accountable for all aspects of work ensuring integrity and accuracy.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article
