Abstract
Standard components of nonclinical toxicity testing for novel pharmaceuticals include clinical and anatomic pathology, as well as separate evaluation of effects on reproduction and development to inform clinical development and labeling. General study designs in regulatory guidances do not specifically mandate use of pathology or reproductive end points across all study types; thus, inclusion and use of these end points are variable. The Scientific and Regulatory Policy Committee of the Society of Toxicologic Pathology (STP) formed a Working Group to assess the current guidelines and practices on the use of reproductive, anatomic pathology, and clinical pathology end points in general, reproductive, and developmental toxicology studies. The Working Group constructed a survey sent to pathologists and reproductive toxicologists, and responses from participating organizations were collected through the STP for evaluation by the Working Group. The regulatory context, relevant survey results, and collective experience of the Working Group are discussed and provide the basis of each assessment by study type. Overall, the current practice of including specific end points on a case-by-case basis is considered appropriate. Points to consider are summarized for inclusion of reproductive end points in general toxicity studies and for the informed use of pathology end points in reproductive and developmental toxicity studies.
Keywords
Background and Introduction
The International Conference on Harmonization of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH)has developed several nonclinical safety guidelines, including those for development of drugs (ICH M3(R2) 2009a), drugs for use in advanced cancer (ICH S9 2009b), and biopharmaceuticals (ICH S6(R1) 2011), all of which acknowledge assessment of the reproductive tract and/or reproductive function as an expected component of the nonclinical safety assessment. The ICH S5(R2), which is currently under revision, provides further specific guidance for assessment of reproductive function, as well as detection of maternal and developmental toxicity with exposures during pregnancy (2005). However, the role of pathology end points in these assessments is less clear; specific guidance for inclusion of histologic end points in reproductive toxicity studies is limited, and there are no definitive requirements for the inclusion of clinical pathology end points in reproductive or juvenile toxicity studies or for the addition of reproductive toxicity end points to general toxicity studies in pharmaceutical development. For the agrochemical and food industry, there are more detailed guidelines with regard to collecting organ weights and tissue evaluation in general toxicity studies (U.S. Environmental Protection Agency 1998; Center for Food Safety and Applied Nutrition [CFSAN] 2000); however, since these guidelines do not address pharmaceutical development, the available best practices will be described and referenced. The combined experience of pathologists and reproductive toxicologists working in nonclinical safety assessment supports the assertion that, on a case-by-case basis, additional histologic and clinical pathology end points may be useful to include in reproductive toxicity studies, and likewise reproductive end points in general toxicity testing. As with inclusion of behavioral and functional assessments in repeat-dose toxicity studies (Redfern et al. 2013), inclusion of reproductive toxicity or pathology end points in studies that otherwise do not require them can be viewed as part of an overall strategy to potentially reduce animal use and candidate drug attrition and increase understanding of preclinical–clinical translation. Additionally, major health authorities require consideration of nonclinical juvenile toxicity evaluation when appropriate in support of pediatric use. Currently, region-specific guidance documents from the U.S. Food and Drug Administration (FDA 2006), European Medicines Agency (EMA 2008), and the Japanese Ministry of Health Labor and Welfare (MHLW 2012) leave the juvenile toxicity strategy flexible, including pathology end points, to allow rationale for case-by-case approaches. The Organization for Economic Cooperation and Development also describes an extended 1-generation test (Test 443; 2011). In addition to these regional guidance documents, development of an international harmonized guidance for nonclinical studies supporting pediatrics, the ICH S11, has been initiated.
In an effort to understand the current regulatory context and practices regarding the use of reproductive and pathology end points for the assessment of reproductive, developmental, and juvenile toxicity in pharmaceutical drug development, the Scientific and Regulatory Policy Committee of the Society of Toxicologic Pathology (STP) formed a Working Group. The purpose of this Working Group was to gain a broadly informed view of these activities, and thus the group included participants who are diplomates or members of the STP, American Board of Toxicology, American College of Veterinary Pathologists, American Society of Veterinary Clinical Pathologists, Society of Toxicology, Teratology Society, and the International Life Sciences Institute Health and Environmental Safety Institute Developmental and Reproductive Toxicity (DART) Technical Committee.
The Working Group reviewed current practices for the inclusion of reproductive end points in general toxicity studies and pathology end points in reproductive and juvenile toxicity studies, in part through development of a survey of current practices used by pathologists and reproductive toxicologists. This survey was conducted through the STP, and responses for each participating organization were collected and tabulated for analysis by the Working Group. One consequence of requesting combined responses from each organization is that there was no direct enumeration of the total number of individuals who contributed to the results overall, or the absolute number of studies reported. Some organizations gathered input from several individuals for their responses, whereas a single individual may have represented other organizations. However, useful information was evident in the numerical responses for defining areas of (a) common practice, (b) variable or inconsistent practices, and (c) lack of cross-industry experience or consensus.
There were approximately 100 responding organizations or individuals overall, representing contributions from industry, contract research, consulting, government, and academia. Regions represented were global, with the majority of responses coming from North America and Europe, but also from Japan, Korea, and Africa. Approximately 80% included input from their pathology function, and 65% included input from their reproductive toxicology function. Not all respondents answered all questions, and in some instances the comments reflected a lack of direct experience in the subject area queried or responses were inadequately informative to draw conclusions. Overall, most questions were answered by about half of the participating organizations or individuals. The results were compiled centrally by the STP; responding organizations or individuals were anonymous to the Working Group. Both the numerical results and comments provided by respondents were comprehensively reviewed, and results were summarized in tabular form according to study type (DART, juvenile, or general toxicity study; see Tables 1–6). In many instances, the interpretation of results was strengthened by the comments from contributors, which are also discussed.
Survey Responses on the Incorporation of Developmental and Reproductive Toxicity End Points in General Toxicity Studies.
Note. Boldface denotes the “bucket” with the greatest percentage of responses for the parameter.
aMaybe represents both “triggered by previous observations” and “potential consequences of pharmacological target modulation” (combining these was based on feedback from the respondents’ comments).
bBoth males and females.
Inclusion of Reproductive/Endocrine Organ Weights in General Toxicity Studies.
Note. Boldface denotes the “bucket” with the greatest percentage of responses for the parameter.
aMaybe represents both “triggered by previous observations” and “potential consequences of pharmacological target modulation” (combining these was based on feedback by the respondent comments).
bBoth males and females.
Recording of Sexual Maturity in Microscopic Observations Data for General Toxicity Studies.
Note. Boldface denotes the ‘bucket’ with the greatest percentage of responses for the parameter.
a“Some” includes a composite of categories including “specific study types,” “based on pharmacology/target,” “triggered by previous findings,” and “only if the nonhuman primate (NHP) is the only toxicology species used.”
Confidence of Pathology Group in Histologic Evaluation of Sexual Maturity.
Note. Boldface denotes the ‘bucket’ with the greatest percentage of responses for the parameter.
a“Majority” indicated most pathologists of the responding organization.
Survey Responses about Incorporation of General Toxicity End Points on Developmental and Reproductive Toxicity Studies.
Note. Boldface denotes the ‘bucket’ with the greatest percentage of responses for the parameter. EFD = embryofetal development; NHP = nonhuman primate; PPND = pre- and postnatal development.
aMaybe represents both “triggered by previous observations” and “potential consequences of pharmacological target modulation” (combining these was based on feedback from the respondents’ comments).
Survey Responses about Incorporation of General Toxicity End Points in Juvenile Toxicity Studies.
Note. Boldface denotes the ‘bucket’ with the largest percentage of responses for the parameter.
a“Maybe” represents both “triggered by previous observations” and “potential consequences of pharmacological target modulation” (combining these was based on feedback by the respondent comment).
Survey results were used as a basis for understanding current industry practices. The Working Group then developed an integrated assessment based on these practices, as well as current regulatory guidelines, review of the scientific literature, and personal experience. A summary of points to consider is provided, acknowledging that inclusion of end points is often contingent on program context, such as mechanism of action, target distribution, intended patient population, and expected or established toxicity. Where best practices are already established, these specific recommendations are referenced. The integrated assessment by the Working Group is summarized by study type in Tables 7 and 8.
Working Group Integrated Assessment of Current Practices and Considerations for the Inclusion of Reproductive End Points in General and Juvenile Toxicity Studies for Pharmaceutical Drug Development.
Note. NHP = nonhuman primate.
aComplete refers to clinical pathology, organ weight, and anatomic pathology end points commonly included in general toxicity studies.
bRecording may be limited to reproductive immaturity, with reproductive maturity being the default as long as this is stated in the report.
c“For cause” indicates that these would only be added when prior data or pharmacology and mechanism suggest a potential effect.
Working Group Integrated Assessment of Current Practices and Considerations for the Inclusion of Pathology End Points in Developmental and Reproductive Toxicity Studies for Pharmaceutical Drug Development.
Note. Controls should be utilized when evaluating any end points. Dogs are generally not utilized for fertility, embryofetal, or pre- and postnatal evaluations.
aClinical pathology: Standard clinical pathology—end points commonly tested in general toxicology studies. Targeted clinical pathology—limited end points to evaluate efficacy markers, pharmacologic end points, or specific toxicity concerns.
bHistopathology: Complete—tissues commonly evaluated in general toxicology studies. Targeted histopathology—limited tissues to evaluate efficacy markers, pharmacologic end points, or specific toxicity concerns.
cWhen fertility end points in nonhuman primate (NHP) are assessed as part of a general toxicity study, a standard panel of organ weights, including reproductive tissues, is typically included.
The current regional and harmonized guidelines are not particularly specific with regard to clinical or anatomic pathology end points in reproductive or juvenile toxicity studies or reproductive end points in general toxicity studies. Rather, these guidance documents provide a fair amount of flexibility in approach with an emphasis on justifying the scientific rationale for each program. This flexibility was echoed in many survey responses, which favored the addition of “nonstandard” assessments on a “for cause” basis for both general and reproductive/developmental studies. Specific considerations are outlined below for general toxicity studies, DART studies, and juvenile toxicity studies. The scope of the survey was limited to species most frequently and broadly used for general toxicity testing (rat, dog, and nonhuman primate [NHP]), although it is acknowledged that the rabbit, not the dog, is more commonly used in DART testing. Many principles for rodents may also apply to rabbits for study designs in which they are used; therefore, they will not be discussed separately. Finally, there is a brief discussion of considerations and challenges when the NHP is the only pharmacologically relevant species, such as with many highly specific biopharmaceuticals.
Inclusion of Reproductive End points in General Toxicity Studies
Evaluation of Antemortem Reproductive End Points in General Toxicity Studies
In the survey of toxicologic pathologists and reproductive toxicologists conducted by this Working Group, the majority of respondents agreed that they do not routinely include antemortem reproductive assessments in their general toxicity studies. Survey results for antemortem end points are summarized in Table 1. For the male reproductive system, end points queried included semen (dog/NHP only) or terminal sperm analysis (rat), testicular volume, hormone analysis, and/or mating trials. For the female reproductive system, hormone analysis, in-life cyclicity, and/or mating trials were queried.
Hormone evaluation was the most frequent antemortem assessment added to general toxicity studies for both males and females across species; this was followed by semen/sperm evaluation in males (dogs/NHP only) and in-life cyclicity evaluation in females.
Survey respondents included hormone assessments most often (60–63%) when pharmacology or existing data of the test item suggested a potential issue. In nonrodent species, proper assessment is not typically possible without an altered study design due to long and variable cycle lengths requiring frequent sampling in females, and due to hormone pulsatility in males. For all species, the evaluation of hormones should take into consideration the necessary blood volume, sampling frequency and circadian rhythm for the timing of blood collections, the inherent variability of hormones, the ovarian cycle in nonsynchronized females, and the number of animals required to detect a difference related to the test substance. The number of animals needed to detect a difference in hormones is often severely underestimated. This topic has recently been addressed in several publications (Stanislaus et al. 2012; Chapin and Creasy 2012; Andersson et al. 2013).
In dogs and NHP, serial assessment of semen may be added to a chronic or subchronic study design, and testicular volume may be added as a longitudinal measure of potential effects on testis size. In rodent studies, serial assessment is not possible since the approach is typically limited to sperm collected from the cauda epididymis or vas deferens at the time of necropsy. Based on Working Group experience, mating trials are infrequently added, but can obviate the need for a stand-alone study of fertility when included for males in chronic rodent studies (Mitchard, Jarvis, and Stewart 2012). As an example, although unusual, a mating trial added to a 6-month rat study detected a functional effect on fertility in the absence of testicular histopathology and demonstrated increased sensitivity relative to the standard 2 to 4 weeks premating period used in fertility studies (Powles-Glover, Mitchard, and Stewart 2015). Finally, longitudinal monitoring of cycling is feasible by vaginal swab examination for evidence of menses in NHP and lavage for vaginal cytology in rodents but is infrequently applied in the general toxicity study setting.
Thus, overall it is recognized that, on a case-by-case basis, directed assessments of the male or female reproductive systems can be included in general toxicity studies. These may be prompted by the specific pharmacology or pharmacologic targets or based on previous study findings. However, careful consideration of potential impact to other study end points, or the limitations of the general toxicity study environment, should be considered for successful implementation.
Evaluation of Reproductive Organ Weights in General Toxicity Studies
Organ weight evaluations are routinely included in general toxicity studies. In this survey, almost all (93–98%) respondents routinely weighed the testes in general toxicity studies conducted in rodents, dogs, and NHP; ovaries (83–91%), prostate (80–87%), and pituitary gland (71–73%) were also commonly weighed in all species. The epididymides were weighed commonly in rats but less consistently in dogs or NHP. The seminal vesicles were infrequently and inconsistently weighed across species, and in the rodent, they were sometimes collected with the prostate. The uterus was weighed by approximately 50% of respondents across species. Survey responses for organ weight data in general toxicity studies are summarized in Table 2.
For tissues related to the reproductive system, the STP has recommended weighing the pituitary gland in both sexes for all species except mice, testes in all species, and epididymides and prostate in rats and in other rodent and nonrodent species on a case-by-case basis (Sellers et al. 2007). Seminal vesicle weights in rodents generally provide information similar to that afforded by the prostate weight assessment (Sellers et al. 2007). The STP has recommended that female reproductive organ weights, including ovary and uterus weights, should be performed on a case-by-case basis in rodents and nonrodent species and interpreted cautiously due to high interindividual variability with normal cycling (Sellers et al. 2007). The Working Group is in general agreement with this position, while also acknowledging that ovarian weights from rodent studies with animals of comparable age can be useful in detecting test item effects, especially when interpreted with other data such as histopathology or hormone levels. This is consistent with the survey responses indicating that ovarian weights are typically recorded, and uterine weights are less consistently determined.
Organ weight changes in accessory sex glands should be interpreted in light of corroborating data. Fluid loss from the seminal vesicles can alter organ weights significantly, so necropsy approaches to avoid this may help to prevent leakage and improve consistency of results across individuals if this end point is considered necessary. When seminal vesicle weights or ventral prostate weights are collected carefully, either individually or as part of the accessory sex gland unit (prostate and seminal vesicles), they can offer valuable information because they are sensitive to detecting a low testosterone signal (O’Connor et al. 2000). Although these accessory reproductive organ weights vary with body weight, others such as testis, are not consistently affected (Marty, Johnson, and Carney 2003; O’Connor et al. 2000). While lower accessory sex gland weights may occur as a result of test item–related changes such as hormonal perturbations (e.g., low testosterone), these changes may also occur due to nonspecific factors such as stress or decreased food consumption/body weight (Everds et al. 2013). Thus, caution should be taken to avoid overinterpretation of accessory sex gland organ weight data in isolation.
It should also be noted that sexual maturity, especially in dogs and NHP, has a substantial impact upon male and female reproductive system organ weights. Because of inherent variability between individuals, whether mature or immature, reproductive organ weights in large animals are most valuable when correlated with histologic findings and evaluation of maturity.
Evaluation of Reproductive Histopathology in General Toxicity Studies
Male and female reproductive system tissues are routinely included in the standard (core) protocol-required tissues for histopathologic examination in general toxicity and carcinogenicity studies, as per regulatory expectations. The STP has recommended that this core tissue list should include the pituitary, testes, epididymides, prostate, seminal vesicles, ovaries, uterus, and vagina (Bregman et al. 2003). This recommendation was made for all repeat-dose toxicity and carcinogenicity studies involving all drug classes, routes of administration, study durations, and laboratory animal species (Bregman et al. 2003).
Routine histopathologic examination of the testes should be performed in a stage-aware manner without quantitative staging of spermatogenesis (Creasy 1997; Lanning et al. 2002). Histopathologic examination of the testes should always be accompanied by evaluation of the epididymides. Organ weight changes in either the testes or epididymides, or histopathologic changes in the epididymides (such as reduced sperm or differences in the amount/morphology of luminal cellular debris), should be correlated to histologic findings in the testis.
Likewise, routine histopathologic examination of both ovaries for potential ovarian toxicity should be performed using a 2-tier approach (Regan et al. 2005). This approach uses qualitative examination of both ovaries in conjunction with microscopic examination of other female reproductive tissues, organ weight evaluation, and awareness of estrous/menstrual cycle stage as a first tier. If warranted based on the qualitative assessment, this may be followed by additional second-tier assessments or studies to include monitoring of estrous/menstrual cyclicity, hormones, and/or ovarian follicle counting, although follicle counts are not a routine assessment for pharmaceutical development (Regan et al. 2005).
Sexual Maturation
The evaluation of sexually immature animals does not provide an accurate assessment of the potential for toxic effects on the adult reproductive system. While most general toxicity studies in rodent species use animals that are sexually mature at termination, it is not uncommon for dogs and NHPs used in general toxicity studies to be sexually immature. Thus, caution must be used in assessing the risk for effects on spermatogenesis, cyclicity, or other aspects of reproductive morphology and function in large animal studies. Documentation of sexual maturity provides additional context for review and interpretation of the study data. In particular, studies intended to evaluate potential effects on fertility require mature animals, while immature animals may support pediatric use; these are discussed further in subsequent sections.
Sexual immaturity is often recorded in the histopathology data for dogs and NHP when present, but histologic evidence of maturity is not routinely recorded. The survey responses highlight these current practices (Table 3) and confirm the more frequent documentation of immaturity for males than females. Survey comments also indicated that recording sexual maturity, while not standard, can be important for interpretation of findings and hazard identification or in the development of pediatric indications (see discussion on pediatrics below). Of the 43 comments provided by respondents, about half clearly supported documenting immaturity, maturity, or both, with the remainder mostly neutral. There were 4 comments that specifically noted that recording immaturity had been helpful in avoiding additional studies in juvenile animals, but there were also 5 comments that indicated that inclusion of sexual maturity status had contributed to a program delay or request for additional nonclinical studies. Although further context was not provided for these responses, the current regulatory environment supports inclusion of this information because it enables appropriate risk assessment for the target population.
Based on the survey responses, most pathologists are able to discern sexual maturity (Table 4), acknowledging that it can be difficult to definitively categorize some adolescents. This is especially true for pubertal females, where ovarian and uterine morphology can be similar to that of young adult females in the earliest preovulatory phases of their cycle. This is further complicated in NHP by the wide variability in age at onset of regular menses.
When maturation status was documented, most respondents preferred the use of the terms “immature” and “mature” with infrequent use of alternate terms such as “prepubertal,” “peripubertal,” “adolescent,” or “adult” in all species. The histologic features used most frequently to confirm sexual maturity in males of all species were the presence of spermatogenesis/spermiogenesis in the testis and sperm content in the cauda epididymis. Accessory sex organ epithelium and/or secretions were not considered good indicators of sexual maturation. The histologic features used most frequently to confirm sexual maturity in females of all species were the presence of corpora lutea and evidence of cyclicity. Mammary gland formation occurs early during puberty and therefore was generally not considered a good indicator of full sexual maturity. Age, body weight, and organ weights in both males and females, as well as cyclicity in females, were commonly used as nonhistological criteria in the assessment of sexual maturity.
While the Working Group is in general agreement with survey respondents regarding the practices of assessing sexual maturity based on histology of reproductive tissues in dogs and NHP, age and body weight are not considered robust indicators of maturity in these species. Both Working Group experience and survey responses indicate that the omission of information documenting maturity status in the study record may undermine interpretation of other data, such as testicular and ovarian weights, and can lead to inconsistent assumptions regarding potential effects on spermatogenesis or cyclicity. For these reasons, documentation of sexual maturity (whether immature or mature) should be considered. Although not routinely included in test system characterization for toxicity studies, dentition, physeal closure, or other objective measures to confirm age and/or maturity could also be considered (Bowen and Koch 1970; Smith, Crummett, and Brandt 1994; Kilborn, Trudel, and Uhthoff 2002).
Recognizing and distinguishing test item–related findings from spontaneous or age-related changes in immature or maturing animals, especially dogs and NHP, is an important skill for pathologists reviewing general toxicity studies. This can be a challenging exercise, even for pathologists experienced in the evaluation of the reproductive tract, due to the small number of animals used and the high variability among individuals. Publications describing testicular development (Picut, Remick, et al. 2015; Haruyama et al. 2012; Campion et al. 2013), common spontaneous histologic features of the testes during the process of sexual maturation (Goedken, Kerlin, and Morton 2008; Rehm 2000; Sato, Doi, Kanno, et al. 2012; Sato, Doi, Wako, et al. 2012; Thuilliez et al. 2014), and ovarian development (Picut, Dixon, et al. 2015; Picut et al. 2014) represent excellent references for this topic.
Specific Considerations for Evaluation of the NHP Reproductive System in General Toxicity Studies
NHPs are often the only pharmacologically relevant species for nonclinical safety testing of highly targeted biotherapeutics, and the cynomolgus monkey is the species most often evaluated. While the reproductive physiology and endocrinology of NHP and humans are similar (Weinbauer et al. 2008), the use of NHP imposes some practical limitations on fertility testing. For example, NHPs usually have single births, relatively low conception rates, and high rates of fetal loss, which would necessitate the use of unacceptably large numbers of animals (Martin and Weinbauer 2010). In recognition of these issues, the addendum to ICH S6(R1) formalized the acceptance of the use of organ weights and histopathologic examination of the reproductive tract in at least 1 study of at least 3 months of duration using mature NHP to support the evaluation of potential effects on fertility (2011). This has led to a need for assessment of maturity as a component of prestudy screening, usually via assessment of sperm or cyclicity, and for toxicologic pathologists to recognize morphologic features enabling them to verify that animals exposed to a pharmaceutical candidate are sexually mature at the end of the study. As indicated in Table 3, a majority (71%) of respondents indicated that maturity status is recorded for male NHP, although fewer (43%) reported routine recording of maturity status in females. In males, criteria for evaluation included histologic assessment of spermatogenesis, spermiogenesis, and/or presence of sperm in the epididymides. Survey respondents most commonly used presence of a corpus luteum, indicating that ovulation had occurred, as evidence of sexual maturity in females. Some caution is needed when a single corpus luteum or corpus albicans is identified in a young female NHP, as isolated instances of ovulation may occur prior to the onset of regular ovulation and menses (Beckman and Feuston 2003; Cline et al. 2008; Buse et al. 2003). When necessary, a weight of evidence approach using observations of serial menses prior to study initiation may be warranted. One approach is to require at least 2 cycles of at least 2 days of duration and at least 20 days apart when selecting mature females (Weinbauer et al. 2015).
Histologic evaluation can be useful to demonstrate an absence of effects on the structural integrity of the reproductive tract in both sexes and the synchronicity of cyclical features across reproductive tissues in individual females. However, these end points do not directly assess reproductive function as compared to a number of end points that more directly evaluate potential impact on fertility, such as sperm parameters, fertilization, or implantation. In recognition of this, the addendum to ICH S6(R1) also directs that, when pharmacology or prior findings with the candidate dictate, the inclusion of additional specialized antemortem assessments such as reproductive hormones, cycling patterns, and sperm evaluations may be desirable in repeat-dose toxicity studies (2011). Although these additional evaluations may be useful to more fully characterize an effect, changes in the male antemortem end points in isolation are not expected to be more sensitive indicators of toxicity than histopathology (Cappon et al. 2013). Likewise, inherent variability in cycle stage and progression, along with small group sizes, limits the utility of monitoring cycling in female NHP (Bussiere et al. 2013).
While microscopic evaluation is an important criterion for verification of maturity, importance should also be given to prestudy selection of NHP for inclusion in general toxicity studies of ≥3 months of duration. In females, the occurrence of regular menses is good evidence of sexual maturity (Weinbauer et al. 2008), and survey respondents reported using age and evidence of cyclicity most frequently to select mature females. However, many mature female NHPs do not cycle regularly or have irregular or anovulatory cycles due to stress and hierarchical status. For males, respondents reported using age most often for selecting sexually mature animals (data not shown). However, age is not entirely dependable as a marker of sexual maturity (Luetjens and Weinbauer 2012; Vogel 2000). When NHPs are the only pharmacologically relevant species for a highly targeted biotherapeutic, and there is a need to ensure that sexually mature animals are selected for the study, evaluation of an ejaculate for the presence of sperm prior to treatment is a highly reliable functional end point (Luetjens and Weinbauer 2012). Consideration should also be given to the many social and management factors that have been shown to impact the reproductive system in both sexes and thus may complicate interpretation (Weinbauer et al. 2008; Niehoff, Bergmann, and Weinbauer 2010; Cline et al. 2008; Bussiere et al. 2013).
Histologic Staging of the Female Reproductive Cycle
There is no regulatory guidance or requirement for histologic staging of the reproductive cycle in mature female animals in general toxicity studies. The survey did not query details of staging, but cyclicity is not routinely recorded by most respondents, although it is considered for cause in the rodent and NHP (Table 1). As with evaluation of any organ or organ system, the pathologist must understand the underlying species-specific physiology to properly interpret apparent differences in morphology. This is particularly important for the female reproductive tract, since the morphology is highly synchronized and extremely sensitive to hormonal perturbation. Normal cyclicity can be disrupted by environmental factors, such as lighting, social hierarchy, or secondary factors such as stress and body weight changes, and these in turn will affect the morphologic appearance of the various portions of the reproductive tract (Keane et al. 2015; Bussiere et al. 2013; Everds et al. 2013; Weinbauer et al. 2008).
For rodent general toxicity studies, the morphology of each of the reproductive tract organs (ovaries, uterus, cervix, and vagina) as well as mammary gland and endocrine tissues should be assessed to identify possible changes due to a potential test item effect or hormonal perturbations. The microscopic evaluation of all reproductive tissues should include an awareness of the normal variation in morphology associated with stages of the estrous cycle (Dixon et al. 2014). Recognition of the morphologies in these organs that are associated with synchronous progression through the estrous cycle is important, and discrepancies within animals may indicate an abnormality. The rapid estrous cycle (4–5 days) in the typical rodent used in general toxicity studies may enable detection of alterations even in studies as short as 2 weeks of duration (Sanbuissho et al. 2009), although reproductive senescence may confound assessment in studies longer than 3 months of duration (Ishii et al. 2012; LeFevre and McClintock 1988; Sone et al. 2007). When alterations are suspected based on morphologic examination of the reproductive tract, documenting stages of the estrous cycle may suggest potential mechanisms (central vs. peripheral) underlying the observed changes. However, differences in the incidence of cycle stages across groups based only on vaginal histology may represent normal variation and should not be used in isolation as evidence of a test item–related effect. Evaluation may include consideration of other data from the current or previous studies, such as the morphology of other reproductive tissues, changes in organ weights, effects on reproduction or fertility, values of hormones, and/or alterations in cyclicity based on vaginal cytology, when these additional data are available.
The Working Group supports the International Harmonization of Nomenclature and Diagnostic Criteria recommendation that routine documentation of stage of the estrous cycle in rodents is not necessary in general toxicity studies (Dixon et al. 2014); however, evaluating the cycle stage is a routine part of the microscopic evaluation and has been described in detail for the rat (Westwood 2008), dog (Rehm, Stanislaus, and Williams 2007), and NHP (Attia 1998; van Esch et al. 2008). If the pathologist chooses to record the cycle stage in a standard toxicity study, the information should not be considered in isolation as evidence of a test effect unless accompanied by other correlative or supporting data. Notably, there are circumstances, such as evidence of effects on fertility, differences in organ weights, or apparent changes in reproductive morphology, in which estrous cyclicity data would provide additional weight of evidence for assessments or clarification of findings observed. In such circumstances, the pathologist should consider documentation of cycle stage.
In dogs and NHPs, detecting effects on estrous/menstrual cyclicity from the microscopic examination is more problematic because of their long cycle duration and low numbers of animals per group. The laboratory beagle under controlled environmental conditions is monoestrous, with long periods of diestrus/anestrus (Rehm et al. 2007), so often the females will be in diestrus/anestrus and will not have ovulated during the dosing period of short-term toxicity studies. Although adult cynomolgus monkeys ideally have regular menstrual cycles lasting approximately 28 days (Weinbauer et al. 2008), cycle irregularities are frequent. Social hierarchy plays an important part in menstrual cyclicity, and both adults and adolescents often have anovulatory cycles, further complicating the assessment in NHP (Cline et al. 2008). Assessment of cyclicity during the dosing phase using vaginal cytology (dog) or vaginal swabs for evidence of menses (NHP), and potentially hormone evaluation, may be important in helping to characterize the onset and progression of variations in morphology detected histologically. However, attempting to use stage alone may present interpretive difficulties in determining test item–related effects (Bussiere et al. 2013). Therefore, the considerations for rodents apply to dogs and NHPs as well: evaluate female reproductive tissues with cycle stage awareness, but routine recording may not be warranted in the absence of apparent changes in the reproductive tract, expected pharmacology, or by other findings in the current or previous studies. Likewise, if recorded, the information should not be interpreted as an end point in isolation.
Evaluation of the Mammary Gland in General Toxicity Studies
The female mammary gland is one of the core tissues recommended for examination by the STP in nonclinical toxicity studies (Bregman et al. 2003). In both males and females, it can serve as a sensitive end point for test item–related changes in hormone levels or receptor signaling in the hypothalamic–pituitary–gonadal axis (Lucas et al. 2007). In female rodents, the mammary gland offers little utility in assessing sexual maturity as the histology of the gland is similar in prepubertal and sexually mature virgin animals (Russo, Twari, and Russo 1989). Cycle-dependent changes in glandular histology have been reported in the rat (Schedin, Mitrenga, and Kaeck 2000); however, the differences in appearance are subtle and of little practical aid in differentiating phases of the estrous cycle in standard nonclinical safety studies. In comparison, the histology of the mammary gland in female dogs and NHPs may provide corroborative information to differentiate mature and immature animals (Cline and Wood 2008; Rehm et al. 2007; Harleman and Foley 2001); in fact, the mammary gland is one of the first tissues to undergo peripubertal developmental changes in NHP, including transient pubertal gynecomastia in males (Cline and Wood 2008). In mature dogs, the appearance of the mammary gland changes predictably throughout the estrous cycle (Chandra, Cline, and Adler 2010), but its morphology in NHP changes little during the menstrual cycle (Stute et al. 2004). In males, the mammary gland has limited development in most species; the rat is an exception, with lobuloalveolar development occurring under the influence of androgens and tubuloalveolar development occurring under the influence of antiandrogens or prolactin (Lucas et al. 2007; Cardy 1991). While not always included in the protocol-specified tissue list for male rats, mammary gland is often present in sections of skin, and findings should be recorded if present.
Although mammary gland morphology does reflect endocrine influence, the majority of respondents in the current survey indicated that they generally do not use this tissue to document or confirm sexual maturity (data not shown). In females, corpus luteum formation and evidence of estrous/menstrual cyclicity were generally given greater importance in determination of sexual maturity. For assessment of cyclicity, only about half of the respondents indicated that mammary gland is useful, mostly in dogs, and it was not cited as a criterion for evaluation as often as were the ovaries, uterus, and vagina.
In summary, review of the mammary gland in general toxicity studies provides some limited information for assessment of maturity and may be helpful in assessing synchrony with other reproductive tissues in individual animals. A major value in assessing this tissue lies in its sensitivity to effects on reproductive hormonal balance. Thus, evaluation of the mammary gland, especially in the context of findings in the reproductive tract, may provide corroborative or mechanistic information in the evaluation of disruption of reproductive hormone balance. In addition, it may augment the assessment of sexual maturity in the dog. Overall, the common industry practices are consistent with the literature and assessment by the Working Group.
Summary of Working Group Considerations for Incorporation of Reproductive End Points in General Toxicity Studies
Consistent with general practice and published best practices, most general toxicity studies include a gross and histologic assessment of the male and female reproductive tract. The survey results largely confirmed, and the Working Group does not suggest changing, these established practices. In general, survey respondents indicated that specialized reproductive assessments are infrequently included routinely in general toxicity studies but may be added for cause. The Working Group agrees with the STP recommendation of “stage aware” assessment of both male and female reproductive tissues in mature animals (Creasy 1997; Dixon et al. 2014; Lanning et al. 2002). Additional reproductive end points in mature animals, including sperm assessments, estrous or menstrual cycle stage, hormone evaluations, and mating trials, may be considered for cause but are not routinely included. Routine evaluation of mammary gland morphology in both sexes may also add to the interpretation of findings in reproductive tissues, particularly in females. Based on the survey results and Working Group experience, documentation of the stage of male and female sexual maturity should be considered for nonrodents (dog and NHP). Finally, interpretation of findings in the reproductive tract should be cautiously evaluated in the context of potentially confounding factors including stress or body weight loss. These integrated considerations are summarized in Table 7 for general toxicity studies.
Inclusion of Pathology End Points in Reproductive Toxicity Studies
Pathology End Points in DART Studies
Routine DART studies are not intended to support intentional pharmaceutical use during pregnancy but rather to identify effects on development and reproduction in animals for informing human risk assessment. There are few regulatory guidelines governing inclusion of clinical and anatomic pathology end points in reproductive toxicity studies. The current ICH S5(R2) guideline on detection of toxicity to reproduction for medicinal products and toxicity to male fertility indicates that observations should include the preservation and possible histologic evaluation of organs from adults with macroscopic findings and appropriately matched tissue from control animals in reproductive toxicity studies, including fertility and early embryonic development studies, embryofetal development (EFD) studies, and pre- and postnatal development (PPND) studies. For EFD studies, macroscopic observations should also include evaluation of fetuses for birth defects and evaluation of the placenta. For fertility studies, testes, epididymides, prostate gland, ovaries, and uteri should be preserved from all animals for possible histologic evaluation on a case-by-case basis but can then be discarded after completion and reporting of the study. In ICH S5(R2), organ weights are not specifically required unless there were reproductive organ weight effects in general toxicity studies or evaluation of reproductive organ weights at the dose levels tested was not done previously. ICH S5(R2) states that some minimal maternal toxicity is expected to be induced in high-dose group of EFD toxicity studies unless at a limit dose of 1 g/kg or saturation of exposure. Although it is not clear whether it is necessary to reproduce expected toxicities in the high-dose dams on an EFD study, clinical and/or anatomic pathology findings from the general toxicity program can contribute to the justification of high-dose selection of these studies. Additional considerations for high-dose justification of EFD toxicity studies that are under discussion as part of the forthcoming revision of the ICH S5 guideline include adequate exposure margin relative to clinical use and/or maximum pharmacodynamic effect. For DART studies with biotherapeutics, the ICH S6(R1) emphasizes the need to limit studies to pharmacologically relevant species and further acknowledges that reproductive studies may not be useful or necessary in cases where there are no relevant nonclinical species for toxicity testing. Overall, the ICH S6(R1) outlines differences from the ICH S5(R2) expectations for reproductive assessments when NHPs are a single relevant species. For example, in most cases, fertility assessments rely on histopathologic assessment of reproductive tissues rather than a breeding end point. The ICH S6(R1) guideline does not include detailed recommendations for specific end points to include when studies are conducted, other than to state that specialized assessments of menstrual cyclicity, sperm analysis, and hormone evaluations be considered if there is a specific cause for concern. Thus, it is common practice to derive NHP fertility information from the general toxicity program with inclusion of mature monkeys in at least 1 study of at least 3 months of duration.
The survey results regarding pathology end points on DART studies included responses to both general questions and study type–specific questions, and results are summarized in Table 5. Overall, when pathology end points (clinical pathology, organ weights, and/or anatomic pathology) were added to DART studies either as routine or case by case, approximately half of the survey respondents indicated that the addition of these specific end points impacted the risk assessment (survey responses were split 53% yes, 47% no). Some additions may have been at the request of regulators, but some respondents indicated that they were helpful in clarifying maternal versus fetal toxicity, evaluating differences in susceptibility (i.e., adult vs. fetus/offspring), justifying dose selections, and/or providing additional weight of evidence for assessments or clarification of findings observed in general toxicity studies. Specific end points added on a case-by-case basis were viewed as generally beneficial, and pharmacodynamic end points in studies with biotherapeutics were considered essential by some respondents to help define a persistence of effect or clarify if findings were a primary or secondary effect of maternal exposure. Concerns with adding pathology end points included difficulty in interpreting the relationship to the test item due to limited appropriate historical control data based on reproductive phase (e.g., gestation, lactation) and identification of findings not seen in general toxicity studies. However, the stated benefits gained (ability to characterize findings by identifying potential mechanisms, clarifying adult vs. infant toxicity or toxicity vs. pharmacology, and determining maternal toxicity vs. direct pharmacology effect on offspring) were considered to be a justification to include additional targeted end points in these studies on a case-by-case basis. For example, histology and immunohistochemistry of lymphoid tissues of offspring are often assessed in studies with immunomodulatory drugs (Auyeung-Kim et al. 2009; Vaidyanathan et al. 2011; Martin, Oneda, and Treacy 2007).
Pathology End Points in Male and Female Fertility Studies
The speculative nature of the survey responses and comments confirmed that there is very limited experience with functional dog fertility studies (Daurio et al. 1987; Youssef et al. 1999), therefore, the survey data are not presented here. There is also limited experience across respondents for functional NHP fertility studies; survey data for NHP fertility studies should be interpreted acknowledging that few organizations have firsthand experience with this study type. Although experience with functional fertility studies in dogs and NHPs is limited, evaluation of the male and female reproductive systems in sexually mature animals may offer valuable insights into potential effects on fertility and in some cases may be the best available nonclinical model for evaluation.
Survey data comments indicated that clinical pathology is not typically added to fertility studies without scientific justification (e.g., previous findings, pharmacology/target). Although not the default approach, the rationale for adding clinical pathology appears similar across species since 48 to 57% of respondents would trigger this in all species, although 27 to 30% of the respondents do not ever perform these evaluations (Table 5).
Of all the pathology end points considered across the study types, addition of organ weights to dedicated fertility studies was the only one done by default by the majority of survey respondents, 58 to 74% in rodents and 50 to 58% in NHP (Table 5). Survey data and comments indicated that reproductive tissue organ weights, especially testis weights in males, are often collected in fertility studies despite the absence of a regulatory requirement to include them. However, if a fertility evaluation is performed as part of a chronic toxicity study in mature animals (e.g., when NHP is the only relevant species), then all standard organs for evaluation of general toxicity, including reproductive tissues, would be weighed as part of the necropsy assessment.
No clear trends in the survey data were observed for adding histopathology to fertility studies. For rodents, some respondents indicated that it was included de facto (35–42%) or conducted for cause (48%) based on findings (such as fertility and/or organ weight effects; Table 5). In the experience of the Working Group, a gross necropsy including reproductive organ weights is often performed, with preservation of reproductive tissues for possible future histologic evaluation. Thus, additional characterization of potential effects on reproduction can be evaluated from specific animals if necessary, but is not included routinely, especially if reproductive tissue histopathology data are already available from previously conducted repeat-dose toxicity studies at the same dose range. For NHPs, the survey responses were mixed, but the most common response was to include histopathology (44–48%); this is consistent with histopathology being standard on repeat-dose studies and often used as a primary end point in NHP for nonclinical assessment of potential effects on reproductive tissues that could impact fertility. In summary, the practice is generally consistent with current guidance documents. Inclusion of organ weights and histology is relatively common, even though not currently required for stand-alone fertility studies.
Pathology End Points in EFD Toxicity Studies
Survey results supported the perception that there is very limited experience with dog EFD studies (Holson et al. 2015), as this is not a standard species for this study type; thus, the survey data for this species are not presented here. There is also limited experience across respondents with dedicated NHP EFD studies, although these have been more common in the biotechnology industry, especially prior to the most recent guideline (ICH S6[R1]).
Of responding organizations, 30% routinely added pathology end points (such as clinical pathology, organ weights, and/or histopathology) to define maternal toxicity in EFD studies whereas 48% stated that they would add them if no other signs of maternal toxicity were anticipated (data not shown). The survey did not distinguish between dose range–finding studies, where additional end points to characterize maternal toxicity may be more common, versus pivotal studies, where end points would typically only be added for cause. One respondent indicated that collecting these or specific end points (both maternal and fetal) could help differentiate whether fetal effects were secondary to maternal toxicity or direct (primary) fetal effects. Most NHP EFD studies are likely performed to support biopharmaceutical development, and studies to support those programs may not use maternal toxicity to justify the high dose, but rather exposure and/or pharmacodynamic effects. In this situation, there is limited utility in adding maternal pathology end points on these studies other than to characterize pregnancy-specific pharmacologic effects on the dam. Beyond dose rationale, 47% of respondents indicated that they use general toxicity end points on a case-by-case basis to understand potential pharmacodynamic effects on development.
Survey data confirm that clinical pathology is not typically performed in EFD studies (maternal or fetal) without scientific justification, but when it is triggered it is primarily for maternal evaluation. In dams, clinical pathology is rarely (14–16% respondents) done by default, and often (27–32% respondents) is not included at all (Table 5). When included if triggered by pharmacology/target or previous findings, the high response in the “maybe” category (52–59%) is consistent with the previously discussed 30% of respondents indicating that they would add clinical pathology end points to help characterize maternal toxicity. Only one organization appears to routinely perform clinical pathology assessments on fetuses (rodent only) from EFD studies, and the majority of respondents (64–71%) do not collect fetal clinical pathology for any species. The high response rate for not collecting these data from fetuses may appear inconsistent with the 47% of respondents that indicated they would add general toxicity end points on a case-by-case basis to understand potential effects on development. However, this may reflect the difficulty of interrogating the mechanism in these complex studies, as well as a lack of direct application of these data to inform clinical use or monitoring. There can also be technical/practical limitations on the amount of information that can be obtained from fetal specimens as part of an EFD toxicity study. Based on Working Group experience, if clinical pathology end points are deemed useful from rodent fetuses, it is advisable to collect blood from satellite animals to avoid damaging the specimens for fetal evaluations and pooling samples across the litter if necessary to obtain sufficient blood volumes. Another approach is to collect these samples from a pilot study rather than the definitive study where the fetal exam is the primary outcome. Sampling earlier gestation age specimens is challenging, as blood volumes are more limited and use of tissue or whole fetal homogenates can be hard to interpret relative to maternal clinical pathology.
Many (76%) respondents indicated that the apparent challenges of interpreting clinical pathology end points in gestating and lactating females (lack of historical data/familiarity) did not impede the interpretation of these end points. Some respondents indicated that if clinical pathology end points were added to a study, it was done in a deliberate fashion and the study design was modified appropriately. In general, the survey comments indicated that opinions on this were split with some respondents stating that concurrent controls have been sufficient, but for others, the lack of gestation age-appropriate historical control data was problematic. At least one respondent stated that maternal clinical pathology is not helpful in EFD studies as it can confuse and complicate regulatory applications, because the no observed effect level may be different when compared to general toxicity studies. Clearly, some maternal clinical pathology end points (such as serum proteins, lipids, and some leukocyte subsets) change over the course of pregnancy (Honda et al. 2008); understanding these changes and their normal variability in specific species is important to appropriately interpret test item–related changes in these end points. In addition, close attention must be paid to the physiologic requirements of the dam so as to avoid undue stress on the animal by added clinical pathology end points and interference with the interpretation of the standard EFD end points.
In the majority of cases, maternal organ weights are either not routinely collected in rodent EFD studies (40%) or triggered on a case-by-case basis (40%; Table 5). Although the survey question did not explicitly exclude gravid uterine weights, in Working Group experience they are typically collected as a reproductive end point for these studies, as are individual fetal weights. Sixty-three percent of respondents do not routinely collect rodent fetal organ weights. In NHP, maternal organ weights are either typically not collected (36%) or they are triggered on a case-by-case basis (48%). In comparison with the infrequent collection of organ weights from rodent fetuses, only 36% responded that they do not collect NHP fetal organ weights (Table 5).
The survey indicates that greater than 50% of respondents would potentially include maternal or fetal histopathology, but it was unclear how often that might actually happen across studies or programs within an organization. Targeted review was considered useful to some respondents, but some (22–47%) indicated that they do not perform maternal or fetal histopathology analyses (Table 5). A low percentage of respondents routinely include histopathology for dams (12–19%) or fetuses (5–12%) in EFD studies (Table 5). Seventy-eight percent of respondents indicated that the challenges of interpreting anatomic pathology in fetuses/pup/infants (such as lack of historical data/familiarity with conduct of these studies) did not impede interpretation. Survey comments consistently indicated that interpretation of microscopic evaluation can be difficult due to low-incidence findings, limited/no historical data on specific (developmental) age of death, differentiating direct or indirect (maternal) fetal changes, differentiating developmental morphology from pathological changes, and confounding of histopathology evaluation by test item effects on growth (developmental delay).
Pathology End Points in PPND Toxicity Studies
The results of the survey on inclusion of pathology end points in rodent and NHP PPND studies are much like that for EFD studies (Table 5); although not considered routine, clinical pathology, organ weight, and histopathology data are incorporated into the maternal and/or fetal aspects of these studies when deemed appropriate based on pharmacology and/or results of previous studies. There are not many recently published examples of rodent PPND studies in the literature, but a recently published compilation of NHP developmental toxicity studies provides examples of PPND studies that included pathology end points (Weinbauer et al. 2015).
The survey data indicated that there is very limited to no experience with dog PPND studies, as this is not a standard species for this study type; thus, survey data for this species are not presented here.
Forty-seven percent of respondents indicated that they add pathology end points (such as clinical pathology, organ weights, and/or histopathology) on a case-by-case basis to understand potential pharmacodynamic effects on development, but 37% of the respondents stated that they do not utilize these end points in this manner.
Survey data confirm that clinical pathology is not typically conducted in rodent PPND studies without scientific justification, although for NHP PPND studies, responses were fairly evenly split between organizations that include clinical pathology as a default, only for cause, or not at all. For rodents, it appears that 33% of organizations do not include clinical pathology of the dams, while 56% of respondents would include it for cause. Most organizations either do not perform clinical pathology on rodent pups (56%) or would trigger inclusion of clinical pathology based on previous findings or on pharmacology/target (38%; Table 5). For NHP, 54% of respondents would include clinical pathology of the dams for cause, and another 27% of respondents routinely conduct clinical pathology evaluations on dams, whereas for NHP infants, the responses were fairly evenly divided across respondents that routinely include, only for cause, or not at all (Table 5).
As described for EFD studies, the majority of respondents stated that the challenges of interpreting clinical pathology end points in gestating and lactating females (such as lack of historical data/familiarity with conduct of these studies) did not impede the interpretation of these end points when controlled appropriately in the context of normal pregnancy and lactation-related changes. Most (82%) of the respondents also indicated that the apparent challenges of interpreting clinical pathology end points in infants/pups (lack of historical data/familiarity) did not impact the interpretation, and 77% responded that there was no negative impact of including clinical pathology end points in their studies (data not shown). It appears that concurrent controls can be sufficient, but respondent comments also indicated that a comprehensive review of rodent offspring hematology historical control values would be useful. As with other situations with limited historical control data and knowledge, it is not uncommon to default to a conservative interpretation that an equivocal effect may be test item related. As discussed with fetal end points in EFD studies, evaluating these end points in offspring, particularly if utilizing a pharmacodynamic biomarker, can facilitate correlation to maternal effects and help determine whether effects are direct or indirect (maternal).
Maternal organ weights are not a standard end point for PPND studies but may be included for cause. Specifically, in rodents and NHPs, organ weights are not typically recorded for dams by 42 to 46% of respondents, but 31 to 32% indicated that they would include these if indicated based on prior findings or mechanism of action (Table 5). In offspring, respondents generally do not record organ weights in rodents (42%), but in NHP, they are either routinely collected (37% of respondents) or included for cause (37% of respondents; Table 5).
The survey indicated that approximately 50% of respondents would trigger maternal or fetal histopathology for individual programs, but it was unclear from the survey how often that might occur. For dams, histopathology is rarely performed (8–17%) or not done (36–39%); however, it may be included for cause by 47 to 54% of respondents (Table 5). For rodent offspring, histopathology is not routinely performed (28% of respondents listed “never”), but 53% of respondents perform histopathology on a case-by-case basis (Table 5). For instance, for a drug with concerns about renal development, evaluation of kidneys from the PPND study from culled neonatal and weanling rat pups could provide information about a potential effect on nephrogenesis as, unlike humans, significant renal development occurs during the first week of postnatal life in rats (Zoetis and Hurtt 2003a). Histopathology for NHP offspring is a relatively common (32%) default practice, and an additional 48% of respondents would include it on a case-by-case basis (Table 5). As mentioned earlier for fetal histopathology in EFD studies, most (78%) of the respondents indicated that, although there were challenges with interpreting histopathology in pups/infants (low-incidence findings, limited/no historical data on specific age of death, differentiation of direct or indirect changes, differentiating developmental dysmorphology from pathological changes, and confounding effects on growth), this rarely resulted in a negative impact on the study interpretation.
Summary of Working Group Considerations for Incorporation of Pathology End Points in Reproductive Toxicity Studies
In the Working Group experience, the routine addition of clinical pathology, organ weights, and/or anatomic pathology end points are not expected to change the human risk assessment for effects on development and reproduction. Experiences from the authors, and confirmed by the survey results, however, indicate that addition of selected end points to confirm anticipated pharmacodynamic effects, maternal and/or target organ toxicity may be useful to the interpretation of the study. In some cases, periodic blood samples and/or tissues collected at necropsy could be stored for potential future analysis. However, if unexpected toxicities occur in a DART study, it may be more appropriate to conduct a separate study specifically designed to investigate the unexpected finding. While the guidelines do not require recording of reproductive organ weights (testes, epididymides, and accessory sex glands appropriate to the species) from males in fertility studies, it is a common practice that the Working Group supports since they are accurate, easily recorded, cannot be collected retrospectively, and can offer unique insight into the androgen status of animals in the study. Male reproductive tissues should be evaluated histologically, especially if not already assessed in mature males in completed general toxicity studies or if there is antemortem evidence of an effect. It is not deemed useful to perform standard clinical pathology as a default analysis for rodent fertility studies, relying instead on data from repeat-dose toxicity studies. For rodent EFD and PPND studies, it is not typical to include maternal or fetal clinical pathology, organ weights, or histopathology as a default practice, but this could be considered on a case-by-case basis for the reasons described above. One general exception might be the investigation of target-specific changes in clinical pathology, organ weights, or histopathology in dams, fetuses, or offspring to confirm dose or pharmacodynamic effects. When NHP studies are terminal, collection of anatomic pathology data should be considered in the study design. Another case-by-case exception might be the addition of maternal clinical pathology in NHP if maternal toxicity is observed or expected, or to confirm relevant pharmacodynamic effects if no toxicity is expected or observed. These integrated considerations are summarized in Table 8 for DART Studies.
Pathology End Points in Nonclinical Studies Supporting Pediatrics
Pathology End Points in Juvenile Toxicity Studies
The FDA Guidance for Industry, Nonclinical Safety Evaluation of Pediatric Drug Products (FDA 2006) indicates that tissues/organs in the test species should be macroscopically and microscopically evaluated, with particular regard to those undergoing development at the intended time of treatment. Organs to be given emphasis due to ongoing postnatal development include the brain, kidneys, and lungs and the immune, reproductive, endocrine, skeletal, and gastrointestinal systems. This guidance also includes several tables with estimated timing for completion of developmental processes, such as completion of nephrogenesis, or events, such as puberty (FDA 2006). The EMA Committee for Human Medicinal Products Guideline on the need for nonclinical testing in juvenile animals on human pharmaceuticals for pediatric indications (2008) further acknowledges that clinical pathology determinations can also be useful, but they may be limited by the technical feasibility of obtaining adequate samples for analysis, particularly in the case of juvenile rodents. The Japanese MHLW guideline for nonclinical safety studies in juvenile animals for pediatric drugs stipulates that study end points should be selected with reference to toxicity target organs in mature animals and with the objective of adequate evaluation of effects on the development of organs/functions, although no specific end points are described (2012). All of these guidance documents suggest that studies in juvenile animals should be conducted infrequently, and a recent review suggests that the percentage of pediatric investigation plans supported by a juvenile toxicity study has remained fairly stable (Hurtt and Engel 2015). However, requests for and conduct of these studies are increasing with the increasing number of pediatric programs overall, and, at present, there is no consistent message regarding the value of the data generated in these studies for improving the design, conduct, and interpretation of safety data from subsequent clinical studies in children (Rose 2011, 2014; Leconte et al. 2011; Anderson et al. 2009; Bailey and Marien 2009, 2011). In response to this, the ICHS11 has endorsed the development of a new harmonized guideline for nonclinical safety assessment to support pediatric drug development.
Studies in juvenile animals of any species can be quite complex and logistically challenging (Barrow, Barbellion, and Stadler 2011; Cappon et al. 2009). Each end point may require a separate cohort of animals, so these studies may involve a large number of animals. Furthermore, each species grows and develops at a different rate, and it may be difficult to discriminate between a cumulative toxicity and one due to exposure during a particular developmental window of susceptibility. Although there are general metrics for age multiples in the development of the central nervous system (CNS) and reproductive systems, these do not necessarily hold up against other organ systems such as the gastrointestinal tract, renal system, and pulmonary system (Morford et al. 2004). Toxicity may reflect organ system interdependencies, especially those contributing to absorption, distribution, metabolism, and excretion of pharmaceuticals (Hines and McCarver 2002; McCarver and Hines 2002; Blake et al. 2005). Finally, although historically pre- and peripubertal macaques and dogs have been used for general toxicity testing, the background pathology and normal tissue histology from very young animals in these species are not well described as compared to those from the older animals typically used in toxicity testing. With regard to clinical pathology end points, appropriateness for the age of the animal, the timing of collection, collection intervals, and need for fasting should also be considered, as well as the potential need for adapting assays or methods for neonatal and juvenile animals. For example, in a juvenile rat study starting with pups prior to weaning, sufficient blood volume may not be available to include a standard clinical pathology evaluation. In such a case, a limited directed terminal clinical pathology evaluation may be sufficient for appropriate risk assessment. If longitudinal clinical pathology assessment is desired, the use of satellite animals may be warranted to avoid compromising the normal growth physiology of the main study animals. Furthermore, there have been few publications and little direct experience to drive specific recommendations regarding the routine inclusion of clinical pathology end points in juvenile animals with the objective of consistently providing interpretable results.
A review of survey responses for inclusion of general toxicity pathology end points in juvenile toxicity studies revealed, to some extent, a lack of experience in these studies and/or willingness to comment specifically. The number of survey responses for adding general toxicity end points to juvenile toxicity studies ranged from 23 to 38 of the approximately 100 total participating organizations, with several comments indicating a lack of specific experience with these studies. However, for the organizations that did respond, the responses were fairly consistent. For all species, when dedicated juvenile toxicity studies are conducted, there is a general willingness to consider inclusion of pathology end points as either a routine process or based on pharmacology/target or findings. The specific distribution of responses is indicated in Table 6.
For both histopathology and clinical pathology end points, there was some concern by respondents regarding a relative lack of historical control data in juvenile animals. For anatomic pathology, of the 40 responses, 31 (78%) indicated no negative impact, while 9 (23%) responded that, yes, the challenges of interpreting anatomic pathology in fetuses/pups/infants had a negative impact. For clinical pathology end points, with 34 responses, most indicated that a lack of historical data did not prevent interpretation of clinical pathology end points (82%) and had no negative impact on studies (77%). However, the remaining responses indicated that interpretation was compromised (18%) or studies negatively impacted (23%); therefore, this remains an area where additional experience or education could be helpful. However, as with results from the DART study queries, the comments reflected a general willingness to accept this uncertainty, at least when there are available data from concurrent control animals. In addition, for rodent studies, the first scheduled necropsy often occurs when the rats and mice are at least 6 to 8 weeks of age. Although they will not have all reached full sexual maturity at this time, it is an age range for which relevant historical data for both anatomic and clinical pathology exists and can provide additional context for interpretation (Picut, Dixon, et al. 2015; Picut et al. 2014; Picut, Remick, et al. 2015; Campion et al. 2013).
In the Working Group’s experience, one important “exception” occurs when pathology end points are collected from individual animals at unscheduled intervals. In these situations, the lack of concurrent and historical control data may well preclude a confident interpretation in any species. Reference data for juvenile animals are available and can help provide context for more common excursions from normal adult profiles, but may not cover the full spectrum of variability seen during development (Jordan et al. 2014). This is particularly challenging for NHP and dogs, where small group sizes and high variability for many end points can make interpretation of specific test item–related effects difficult. In rodents, 1 study design option is to include an additional small group of pups to be kept in the same environment and euthanized only if needed as an age-matched tissue control for unscheduled necropsies during the study.
Survey respondents also indicated that some general toxicity study end points (such as calcium, glucose, and cholesterol) are being used to understand potential pharmacodynamic effects, especially for metabolically active compounds. Although this question was not only specific to studies in juvenile animals, such studies would be included in the question. The majority (63%) of respondents would include these end points routinely or for cause, with the remaining 37% indicating that they do not add such end points. There were 20 comments for this question, likely directly related to the survey request to explain case-by-case use. Of the 20 comments, 18 indicated a positive experience and 2 indicated a lack of experience but willingness to consider if appropriate. No comments described a negative experience due to inclusion of pharmacodynamic end points.
Pediatric Relevant Pathology End Points in General Toxicity Studies
Most general toxicity studies use animals intended to support clinical studies with adult human healthy volunteers or patients. However, these animals are often relatively young and still in an active growth phase. When studies include skeletally or sexually immature animals, pharmacologic effects on general growth and bone elongation may be detected (Stahlmann 2003; Ryan et al. 1999; Hall and Vallera 2006).
Based on the age of the animals at study start for rodent general toxicity studies, most are expected to be developmentally mature prior to initiation of the dosing phase. It is recognized that it can be difficult or impossible to determine test item–related effects on reproductive tissues, especially in males, prior to maturity. By 10 weeks of age, all rats should be sexually mature with complete spermatogenesis in males and regular estrous cycles in females. However, the long-bone physes remain open and general growth continues for several months (Zoetis et al. 2003). Thus, assessments of body weight gain during the in-life phase and histologic assessment of long bones can be very useful in determining potential general effects on growth and development.
If histologic data from general toxicity studies in dogs or NHP will be used to support pediatric use, an assessment of both sexual and skeletal maturity should be considered for all animals on the study. This would be intended as a part of characterization of the test system but not as representing a pathology “finding.” Although assessment of skeletal maturity was not specifically queried with this survey, it is generally feasible, requiring only standard sections of long bone (Kilborn, Trudel, and Uhthoff 2002). The Working Group also refers readers to the discussion of sexual maturity presented previously in this document.
Pediatric Relevant Pathology End Points in DART Studies
For the most part, data from fertility and EFD studies are not intended or used to directly support prepubertal pediatric patients, but these studies may generally contribute to the identification of developmentally sensitive targets. In contrast, the late gestation or early lactation exposures of PPND studies may provide a “worst-case” scenario for potential effects in the youngest patients. These studies may be particularly useful in supporting pediatrics with molecules that have a long half-life. It is also critical to understand placental and lactation transport, and exposure in the offspring following dose administration to pregnant dams.
The survey results for PPND studies are presented in Table 5 and discussed in detail in the DART section. Responses and comments indicated no dominant position for inclusion of pathology end points for offspring and acknowledged a general lack of experience with dog PPND studies. In rodents, substantial development, including functional maturation of the heart (Hew and Keller 2003), lungs (Zoetis and Hurtt 2003b), digestive tract (Walthall et al. 2005), kidneys (Zoetis and Hurtt 2003a), and nervous system (Watson et al. 2006; Wood, Beyer, and Cappon 2003), occurs in the immediate postnatal/preweaning period. Therefore, exposing the rodent fetus during gestation is not truly reflective of perinatal dosing in humans. In addition to gestational exposure, drugs that are present in milk can result in postnatal exposure of offspring. In fact, due to limited proteolytic digestion for the first 2 weeks postnatally in rats, oral delivery through milk leading to systemic exposure is possible even for some parenteral drugs such as protein therapeutics (Cooper et al. 2014; Halliday 1955; Mackenzie, Morris, and Morris 1983). Thus, although inclusion of pathology of rodent pups in a PPND study has not been a standard end point, it has been used under limited circumstances (Spence et al. 1995) and could be considered where appropriate, possibly including testing of drugs in support of potential use in premature or neonatal infants.
Compared to rodents, dogs, and humans, NHPs are developmentally and behaviorally precocious at birth, and some prenatal exposure may be appropriate to mimic the peri/neonatal period in humans. For example, the lungs (Zoetis and Hurtt 2003b), digestive tract (Walthall et al. 2005), kidneys (Zoetis and Hurtt 2003a), and bones (Zoetis et al. 2003; Hew and Keller 2003) are relatively well developed at birth, and the functional abilities of the CNS are more advanced than humans at birth (Wood, Beyer, and Cappon 2003). For products such as Fc-containing biotherapeutics, which are actively transported across the placenta in late gestation and have a long half-life, exposure in the offspring may be substantial and sustained well into the postnatal period (Moffat et al. 2014). Although there is some risk of biological adaptation in utero, a general understanding of pharmacologic and toxicologic effects on offspring after dosing dams during pregnancy can be useful for pediatric risk assessment.
Although both standard DART studies and dedicated juvenile toxicity studies evaluate potential effects on development, the former is based on maternal administration and the latter utilizes direct administration postnatally. Because juvenile toxicity studies are intended to support pharmaceutical use, standard pathology end points, typical of a general toxicity study, are commonly included. In studies with maternal administration, the focus is more on pregnancy outcomes and general growth and development to inform the human reproductive risk assessment. Depending on the scope of the nonclinical testing program for a molecule, the inclusion of anatomic pathology end points for offspring in a PPND study to address potential developmental impact can be critical, if data from the study are also intended to support pediatric use (Morford et al. 2011).
Summary of Working Group Considerations for Incorporation of Pathology End Points in Nonclinical Studies Supporting Pediatrics
In summary, information from the general toxicology program and/or from the DART testing can avoid unnecessarily duplicative studies in juvenile animals to support pediatrics. There are also technical and feasibility challenges in designing studies to support drugs intended for use in neonates and infants. As with many aspects of drug development, understanding the species-specific underlying developmental physiology is critical to accomplish this aim. The current developmental toxicity literature often focuses on functional processes, but these may not result in developmentally critical stages recognized by histologic end points from animals of different ages. The nonclinical assessment of juvenile toxicity may consider information from many sources: the clinical toxicity program in adults (if available), the mechanism of action, the nonclinical toxicity profile, and developmental signals from the combined human and animal experience in juveniles. The features of each individual program will dictate areas of specific concern and may require additional investigations or end points. Examples might be immunologic function assessments or expanded neuropathology, with the caveat that there will still be individual animal variability and potentially limited historical control data. In this setting, caution should be used in interpreting individual end points in isolation. These integrated Working Group considerations are summarized in Table 7 for juvenile toxicity studies and in Table 8 for DART Studies.
Conclusions
The SRPC of the STP established a Working Group to review the regulatory context and practices for pathology end points in reproductive and juvenile toxicity studies and reproductive end points in general toxicity studies. The Working Group performed a survey of pathologists and reproductive toxicologists to understand the range of current practices. The results of the survey, collective expertise, and available literature and published guidance documents were used to provide considerations regarding reproductive toxicology and pathology. These are described in the respective sections and are summarized in Table 7 for reproductive end points added to general and juvenile toxicity studies and in Table 8 for pathology end points added to DART studies. In summary, the standard end points in general, reproductive, and developmental toxicity studies are aligned with regulatory expectations for the evaluation of reproductive or juvenile toxicity, while additional nonstandard end points can be included when scientifically warranted and may serve to enable earlier identification or clarification of reproductive toxicities.
Footnotes
Acknowledgments
The authors thank the contributors to the survey responses received by the STP to understand current practices in utilizing reproductive end points in toxicity studies and pathology end points in developmental and reproductive toxicity studies. The authors also thank the SRPC; the Reproductive Pathology Interest Group of the STP; the Regulatory Affairs Committee of the American College of Veterinary Clinical Pathologists; and Drs. Darlene Dixon, Justin Vidal, Mark Cline, and Robert Chapin for their detailed technical review of this manuscript.
Author Contributions
Authors contributed to conception or design (WH, MA, CB, ME, MM, JO, KR, AR, VS, KT, CT, MY, LT); data acquisition, analysis, or interpretation (WH, MA, CB, ME, MM, JO, KR, AR, VS, KT, CT, MY, LT); drafting the manuscript (WH, CB, ME, MM, KR, AR, VS, LT); and critically revising the manuscript (WH, MA, CB, ME, MM, JO, KR, AR, VS, KT, CT, MY, LT). All authors gave final approval and agreed to be accountable for all aspects of work in ensuring that questions relating to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Authors’ Note
This review is a product of a Society of Toxicologic Pathology (STP) Working Group commissioned by the Scientific and Regulatory Policy Committee (SRPC) of the STP. It has been reviewed and approved by the SRPC and Executive Committee (EC) of the STP. The article does not represent a formal best practice recommendation of the Society but provides expert guidance on key principles to consider in conducting regulated toxicity studies. The views expressed in this article are those of the authors and do not necessarily represent the policies, positions, or opinions of their respective agencies and organizations. Readers of Toxicologic Pathology are encouraged to send their thoughts on these articles or ideas for new topics to the editor.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported [in part] by the Intramural Research Program of the NIH, National Institute of Environmental Health Sciences.
