A Retrospective Analysis of Toxicogenomics in the Safety Assessment of Drug Candidates

Abstract

Toxicogenomics is considered a valuable tool for reducing pharmaceutical candidate attrition by facilitating earlier identification, prediction and understanding of toxicities. A retrospective evaluation of 3 years of routine transcriptional profiling in non-clinical safety studies was undertaken to assess the utility of toxicogenomics in drug safety assessment. Based on the analysis of studies with 33 compounds, marked global transcriptional changes (>4% transcripts at p < 0.01) were shown to be a robust biomarker for dosages considered to be toxic. In general, there was an inconsistent correlation between transcription and histopathology, most likely due to differences in sensitivity to focal microscopic lesions, to secondary effects, and to events that precede structural tissue changes. For 60% of toxicities investigated with multiple time-point data, transcriptional changes were observed prior to changes in traditional study endpoints. Candidate transcriptional markers of pharmacologic effects were detected in 40% of targets profiled. Mechanistic classification of toxicity was obtained for 30% of targets. Furthermore, data comparison to compendia of transcriptional changes provided assessments of the specificity of transcriptional responses. Overall, our experience suggests that toxicogenomics has contributed to a greater understanding of mechanisms of toxicity and to reducing drug attrition by empiric analysis where safety assessment combines toxicogenomic and traditional evaluations.

Keywords

Toxicogenomics transcriptomics drug safety toxicology microarray biomarkers safety biomarkers

Introduction

Pipeline drug candidate attrition is a major cause of economic loss in the pharmaceutical industry. Static or worsening attrition rates have been leveled at the pharmaceutical industry as evidence of technologic or scientific stagnation, to which the FDA recently responded with its Critical Path initiative (US Food and Drug Administration, 2004). An analysis of Bristol-Myers Squibb (BMS) historical non-clinical drug candidate attrition over the past 10 years (Table 1) suggests that several of the most significant causes of attrition due to toxicity would not be meaningfully addressed by a systems biology technology (whether transcriptomic, metabonomic, or proteomic). These include attrition due to adverse cardiovascular effects (typically ion-channel based), reactive metabolites, immune-mediated adverse events, and teratologic effects.

These causes of attrition may vary somewhat from company to company. However, there remain significant categories of attrition that could potentially be impacted with systems biology. Important examples of such cases include the identification and characterization of pharmaceutical target-mediated toxicity, and certain target organ toxicities (liver, genitourinary, endocrine, nuclear receptor-driven). In addition, the investigation of toxicities that are not causes of attrition but that require a mechanistic evaluation and risk-assessment perspective may be accelerated by toxicogenomic vs. traditional methods. A number of recent reviews and books describe promises and challenges associated with toxicogenomics (e.g., Burczynski, 2003; Lu et al., 2005; Salter, 2005; Boverhof and Zacharewski, 2006).

A consensus is emerging that progress in toxicogenomic research is occurring but is incremental due to the enormity of the undertaking with respect to standards development, software and data infrastructure requirements coupled with the large number of studies needed to validate this approach. Reviews most often focus on the promise of genomic advancements with little mention of the practical constraints or the results obtainable with routine use. This omission likely reflects the difficulty of integrating toxicogenomic data into the drug discovery process due to technological, organizational, and economic limitations. Furthermore, concerns relative to regulatory approaches of data evaluation and requirements for reporting of transcriptional data may also inhibit greater penetrance in industry. At the time of this submission, no reports were identified describing the kinds of utility observed in actual practice with the routine and systematic application of toxicogenomics in pharmaceutical R&D.

Complicating even incremental progress in the practice of toxicogenomics in drug discovery is the diversity of approaches, desired contributions, and available study data. There is currently no generic analysis approach or study design that can deliver for all toxicities all desired marker types (e.g., validated diagnostic, mechanistic, predictive, species-specific, or pan-species toxicity markers). There is significant variation across the toxicogenomics-user community relative to the commitment to incorporate and evaluate the technology. For example, within the pharmaceutical industry, usage spans across limited evaluation with tool compounds, to use in the in vitro secondary screens of compounds, up to full incorporation into nonclinical safety assessment during development. The resourcing, acceptance, and understanding of toxicogenomic approaches is also diverse within regulatory and academic institutions.

An additional consideration in the gradual versus accelerated increase in the impact of toxicogenomics is that our experience has shown the transcriptional profiles revealed by toxicants frequently do not allow ready classification within known mechanisms or phenotypes of toxicity. Limiting here is our nascent understanding of the mechanisms of many chemically induced derangements that lead to adverse changes in organisms. Examples of such effects include complex interactions with kinase-regulated pathways, disturbances of diurnal rhythmicity, the downstream effects of and cross-talk between nuclear hormone receptors, and the nonspecific manifestation of “generalized or severe systemic toxicity” or inanition, frequently obscuring mechanistically specific or primary transcriptional signatures.

Concerns over regulatory use of data remain even though the United States Food and Drug Administration’s Pharmacogenomics Data Submissions Guidance (US Food and Drug Administration, 2005) has encouraged the exploratory use of transcriptional profiling in drug discovery toxicology studies by providing nonbinding recommendations as to when data should be submitted to the FDA. For transcriptomic studies related to an investigational new drug (IND), the guidance recommends that data submission be required if any of the following criteria are met:

The test results are used for making decisions pertaining to a specific clinical trial, or in an animal trial used to support safety

A sponsor is using the test results to support scientific arguments pertaining to, for example, the pharmacologic mechanism of action, the selection of drug dosing and dosing schedule, or the safety and effectiveness of a drug.

Test results constitute a known valid biomarker for physiologic, pathophysiologic, pharmacologic, toxicologic, or clinical states or outcomes in humans, or the test is a known valid biomarker for a safety outcome in animal studies.”

Currently in nonclinical drug safety practice, the second criterion noted above is the most likely of the 3 to be satisfied. This criterion, however, can be a troubling one from a regulator’s perspective due to potential variability in how drug sponsors evaluate data. Different sponsors may reach significantly differing conclusions from the same transcriptional data sets based on experience or access to sufficient compendial data to ascribe specificity to conclusions.

The most difficult criterion stipulated by the USFDA to satisfy is the use of known, valid transcriptional biomarkers, as the conditions for validation listed in the guidance are generally unmet for transcriptomic data: “For the purposes of this guidance, a pharmacogenomic test result may be considered a valid biomarker if (1) it is measured in an analytical test system with well-established performance characteristics and (2) there is an established scientific framework or body of evidence that elucidates the physiologic, pharmacologic, toxicologic, or clinical significance of the test results.”

The pharmacogenomics data submission guidance encourages toxicogenomic data generation and use from exploratory, non-GLP studies, but simultaneously creates a variety of issues and many potential opportunities. There is a clear need to develop practical experience in generating and evaluating toxicogenomic data to help guide drug safety research and discussion. This experience-sharing may help to facilitate the broader application of transcriptional profiling as an adjunct to toxicology studies, and provide opportunities to reduce animal and human testing, select safer drugs and ultimately, reduce compound attrition due to toxicity.

The goal of the present work was to review and analyze the overall utility of transcriptional profiling in the drug discovery and development process. This present retrospective analysis is restricted to the past 3 years during which BMS has pursued the use of routine transcriptional profiling in non-clinical safety studies with data evaluation aided by reference to a commercial compendia. Although representing a modest data set (26 safety studies with 33 compounds, including a subset of 3 years of active pipeline compounds), the results help to clarify the general characteristics and potential impact of transcriptional profiling in nonclinical safety assessment.

Approaches to Toxicogenomics

Figure 1 depicts points in the nonclinical pharmaceutical pipeline where toxicogenomic studies could potentially be performed, including cases where the early identification of candidate pharmacologic or toxicity markers must be weighed against the probable utility for later stage compounds and extra effort involved, e.g., transcriptomic profiling of in vitro assays or of single-dose toxicokinetic and tolerability studies. Efforts discussed here are heavily weighted towards multiple-dose in vivo studies where traditional pathology data is concurrently available or available from prior studies at similar dose levels. Data is typically from samples collected at the end of a safety study, but other approaches could improve assessments of predictive or pharmacologic markers (including satellite collection of data at early time points or nearer peak drug levels).

The BMS database of toxicogenomic studies is comprised of results from 2 types of studies including those in which transcriptional profiling was included on non-GLP repeat dose toxicology studies and special studies directed toward identifying toxic mechanisms. Half of the in vivo studies performed have been focused mechanistic studies to investigate toxicities identified from prior toxicity studies, and typically do not include the full range of traditional study endpoints. These studies generally involve multiple compounds, time points, species and tissues, thereby allowing for a focused investigation of the relationship of transcription to toxicity, and providing an opportunity to assess whether transcriptional changes precede detectable safety liabilities.

The motivation for performing mechanistic transcriptomic studies is varied but has included: (1) the limitation of traditional assays in satisfactorily characterizing toxicity, (2) the limitation of traditional approaches for hypothesis generation relative to mechanism of toxicity, (3) existence of transcriptional experience in the literature or elsewhere that appears relevant to the observed toxicity, (4) a specific request by a regulatory authority to generate the data, or (5) the ease and speed with which a transcriptomic assay can be performed compared to the pursuit of other mechanistic approaches.

The remaining studies include the routine addition of transcriptional profiling to late discovery non-GLP safety studies, treating transcriptomics as an additional endpoint in the panel of assays performed at the end of a study. The single time-point sampling precludes the use of these studies to investigate the predictability of markers relative to the time-course of appearance of toxicity. The liver is the standard tissue analyzed, with additional tissues profiled if there are other recognized or suspected target tissues. This profiling approach has avoided the assaying of tissues with more than minimal histopathological findings, under the presumption that transcriptional evaluation of overt lesions would be less mechanistically informative and unlikely to supercede the utility of histopathology. Additional tissues saved but not evaluated from such studies form a useful resource for potential future research.

Results from both routine and mechanistic studies are discussed herein, with a greater focus on the routine studies where transcriptional profiling results are phenotypically anchored to traditional endpoints within the same study. For perspective, Table 2 outlines the work flow used when transcriptomics is added as an additional assay to an already planned study, whereas Table 3 outlines the approach when transcriptomics drives a mechanistic study.

As indicated in the flow of Table 2, a limited number of tissues and dose groups are selected for expression profiling from a safety study based on known target pharmacology or prior pathology findings. In addition, routine transcriptional profiling of liver may predict hepatic and extra-hepatic toxicity that occurs in longer-term studies or at higher dose levels. This approach reduces opportunities for interpretation of transcriptional findings in isolation of traditional endpoint data. Given the general paucity of validation of transcriptional markers, in cases where transcriptional profiles differ from controls while traditional endpoint data do not (including in longer-term studies or at higher doses), the transcriptional findings are considered of no significance to drug safety. This approach is aligned to that used with traditional data; for example pathologists generally consider a 1.5-fold increase in mean serum alanine aminotransferase (ALT) and aspartate aminotransferase (AST) activities compared to controls in the absence of hepatocellular alterations of no toxicologic significance.

When transcriptional profiling is added to a standard toxicity study design, a process must be established to ensure a readily executable work flow with timely conclusions. Table 4 provides such a work flow with a question being asked at each step and an example technique listed to answer the question. The work flow involves several categories of activity: assessment of data quality (steps 1–4, 9), detection of treatment-related signals in the data (steps 3, 5–7), characterization of detected signals relative to predefined biological processes such as known markers or predefined pathway maps (steps 7–9), characterization of detected signals relative to the responses in a compendium including assessment of specificity (steps 10–12), interpretation of findings relative to drug safety and efficacy (step 13).

In practice, the work flow can be suspended if a study fails quality control, if the level of transcriptional change detected is very low (comparable to that expected by chance) or very high (confounding the evaluation of specific responses or assessment of etiology). Also, a step may be self-limiting when markers are not seen consistently regulated across tissues or dose groups or no significant enrichment of regulated genes relative to predefined groups of genes is detected.

Though quality control is the first step of the work flow, validated relationships between variation in RNA and microarray quality metrics vs. the ability to accurately detect biological signals do not exist; therefore, work flows typically rely on judging data quality relative to historical norms or to a known standard. For example, for RNA quality it is possible to judge electrophoretic profiles relative to those produced by intact vs. intentionally degraded samples in order to establish a standard for RNA acceptance that can be followed and documented. However, the standard is usually arbitrary as the consequences of various degrees of RNA degradation on signal detection by microarrays is typically unknown and depends on the strength of the signal to be detected. In general the use of historical norms to judge data is a powerful method to identify extreme cases across a broad range of processing errors that may be encountered in microarray studies.

However, this approach cannot identify smaller deviations such as inadvertent sample switches within a dose group or distinguish between stress-related transcriptional change caused by over lengthy tissue collection times relative to stress-related change due to toxicity. Corrective measures for these kinds of subtle changes are, in theory, possible, such as microarray-based sample identification from endogenous or introduced markers or defining sample collection standards based on studies that relate the effect of protocols on microarray measurements. In practice, where quality standards remain unclear or arbitrary, the best defense against the uncertainties is reliance on traditional experimental approaches to study design including use of control groups, biological replicates, blocking, and randomization, or elimination of suspect data sets from evaluation.

A general feature of the work flow of Table 4 is an avoidance of examination of specific genes until late in the work flow. Individual gene analysis is generally deferred as long as possible to avoid early investigator bias towards small, familiar subsets of genes and also the tremendous work involved in any gene by gene exploration across a multiplicity of tissues, times, dose groups, and individual animals. The use of analyses at the global microarray level and with predefined gene sets avoids potential early investigator bias. Such pre-analytic approaches may summarize large amounts of data (e.g., principal components analysis) and thereby focus or limit subsequent analyses. Overall a work flow is meant to aid the investigator in developing information for interpretation fairly, quickly, and thoroughly. At the end of the work flow, however, there is the requirement that the developed data be interpreted relative to drug safety and efficacy, and this interpretation is likely to be biased by the judgment and experience of the investigator.

Although suboptimal study design or limited time for data analysis may restrict the full utility of transcriptional data interpretation, a number of questions of potential utility to safety assessment can be asked of the data that generally involve a limited degree of analytical effort. In this review, examples seen in practice relating to the following questions are discussed:

Are transcriptional changes with drug treatment approaching the noise level of detection or are they suggestive of pathology?

Is there evidence of pharmacologic activity in the species evaluated, such that the study findings may be understood relative to minimum efficacious exposures?

Is there a dose-related increase in transcriptional change and a no effect level for transcription?

Are there target tissues and species for transcriptional change (whether pharmacology or toxicity driven)?

Are transcriptional changes potentially of non-specific origin, such as the result of generalized (severe, systemic) toxicity?

Can observed transcriptional changes be classified relative to those associated with known toxicities or tool compounds such that mechanisms are ruled in or ruled out?

Do significant transcriptional changes precede the detection of pathology by traditional endpoints?

Are transcriptional changes specific to certain toxicities, pharmacologies, tissues, or study designs as determined by comparison to a standardized database of drug-induced transcriptional changes?

Answers to these questions were derived from the limited initial set of studies undertaken within the previous 3 years at BMS; summary conclusions of the data appear next. The summary data together with specific examples provide an early view of the process of validation and use of transcriptomic data for routine drug safety studies. Table 5 describes the number and kinds of studies and data that are the basis of this article. A synopsis of the methods used for microarray processing and analysis appear in Appendix 1.

Global Transcriptional Drug Effects

Drug candidates can display a range of global transcriptional response with some studies showing near background level response (i.e., <1.5% transcripts change at p < 0.01) while others show profound global change (>10% transcripts change at p < 0.01). Approximately 1% of transcripts are expected to change by chance at a significance level defined at p < 0.01. Figure 2 indicates the percent of transcripts observed changing at a t-test p < 0.01 across the 33 compounds profiled with each comparison representing treated vs. control within a dose level, tissue type, sex, and time point.

The tissues tested displayed a significant capacity for transcriptional response to drug treatment. In particular, using p < 0.01, a range of 1 to 20% of transcripts changing was noted in liver, while those in brain ranged from 1 to 9.5%. Although drug-related transcriptional variation was seen in all tissues examined, the data set is too limited to determine whether in response to equivalent drug exposures some tissues (e.g., the liver) are on average more transcriptionally responsive than others (e.g., skeletal muscle). The majority of studies demonstrated near background levels of transcriptional change, consistent with either high pharmacologic selectivity of contemporary drug molecules and/or the absence of the drug target in the preclinical species tested, e.g., virologics or known human-specific targets.

Figure 3 indicates how transcription in drug-treated vs. control (t-tests across all 33 compounds, 82 comparisons in all) related to the presence or absence of drug-related pathology. Cases involving rodent liver hypertrophy are noted due to the commonality of enzyme induction and related changes in the rodent. It is evident that dose groups with high percentages of transcripts changing at a p< 0.01 are more likely to have pathology (93% of cases with >3% of transcripts changing, excluding CYP induction-associated liver hypertrophy), whereas lower levels of global transcriptional change do not guarantee the absence of pathology (50% of cases with <3% transcripts changing were at dose levels producing or known to produce pathology). The association of high percentages of transcripts changing with pathology in a dose group was observed across multiple tissue types (in liver, brain, retina, vagina, testis, and stomach) and often in the absence of concurrent histopathologic changes in the transcriptional target tissue.

The results indicate that the degree of global transcriptional change is less well-associated with histopathology findings than with the general presence of pathology in a dose group. Approximately 60% of samples with histopathologic findings had fewer than 3% of transcripts changing at p < 0.01. Conversely, approximately 60% of samples with greater than 3% of transcripts changing at p < 0.01 did not have histopathologic findings. These data are consistent with observations that significant transcriptional changes in an organ may be observed prior to histopathologic alterations (e.g., 6 hours post-dose) or from lesions occurring elsewhere in an animal, such as the induction of acute phase protein expression in liver secondary to intestinal or dermal inflammation and/or necrosis.

Although histopathology is generally assumed to be the gold standard for describing the phenotype of a tissue, this retrospective evaluation does not support a more specific relationship of histopathology with gene expression than that of other toxicology study end-points such as clinical signs or serum chemistry. However, even though transcription and histopathology measure distinct processes, there remains value in anchoring transcription to histopathologic findings and other pathology endpoints to better characterize the timing and mechanism of pathology. Over time it is expected that several of the endpoints or combinations of them will drive decisions on drug safety. For example, it is generally recognized that the combined evaluation of ALT, AST, alkaline phosphatase (ALP), and gamma-glutamyl transferase is superior in sensitivity and specificity to the evaluation of liver pathology, than the individual endpoints.

In addition to differential response over time and to distal events, the lack of a clear specific relationship of histopathology to gene transcription arises because of the differential sensitivity of these assays to events within a target tissue. Significant histopathologic findings may involve lesions where the percentage of overtly affected cells are low, e.g., scattered, single-cell hepatocellular degeneration, Kupffer cell rather than hepatocellular target toxicity, diffuse leukocytic inflammatory foci in the liver, scattered degeneration of individual skeletal myofibers, or slight vacuolation in myofibers without other tissue changes (see Figure 4). Three separate cases of multifocal inflammatory infiltrates in the liver were evaluated for which transcriptional profiling did not corroborate either the presence of a lesion with potential to progress to cirrhosis, or potential mechanisms underlying this pathology. It is possible that informative transcriptional changes involving few genes are present in these data sets. The exhaustive searching for such changes may be hard to justify when they are readily detected by histopathology or other more facile endpoints. Multifocal and locally extensive lesions are statistically more likely to detectably impact the global transcriptome than those lesions described as focal. The need to integrate histopathology findings with transcriptional profiling data in nonclinical studies is underscored by these particular examples.

Overall, results with pharmaceutical drug candidates suggest that the degree of global transcriptional change relates to toxicity findings and has corroborative utility for safety assessment. As transcriptional changes may precede other histopathologically detectable events, they are potentially attractive biomarkers. The global marker content also provides a valuable early view as to whether more detailed analysis of transcriptional profiles is likely to have utility. Background levels of transcriptional change can suggest that detailed transcriptional profiling analysis has little value, while high percentages of the genome changing may point to multiple primary and secondary processes. Such complexity generally precludes facile hypothesis generation and may indicate a high potential for off-target activity in a drug candidate.

Pharmacologic Markers

When the drug target was known to be present in the species evaluated, candidate transcriptional pharmacologic and/or efficacy markers were readily identified in drug safety studies, frequently in the absence of any other efficacy biomarker determinations, for approximately half of the drug targets (excluding human or pathogen-specific targets). Efficacy biomarkers were found in 7 of the 33 compounds, which corresponded to 7 of 16 drug targets present in the species evaluated. These pharmacologic markers include combinations of known pharmacologic markers, direct modulators of the drug target, target pathway members, or biologically compelling patterns of gene expression consistent with the anticipated pharmacology of the target.

Candidate pharmacologic biomarkers were often the most profoundly altered transcripts, and were modulated across tissues, sex, and dose groups. These marker genes included components of the target pathway that responded to drug exposure as predicted based on in vitro to in vivo efficacy data. A challenge in pharmacologic marker identification is the recognition of adaptive transcriptional changes within the same pathway as a result of drug effect on the principal molecular target, e.g., adaptive up-regulation of up-stream transcripts for the protein target vs. down-stream transcriptional consequences of target inhibition. Thus differential regulation often precludes a simple scoring of a pharmacologic pathway effect as consistently up- or down-regulated; instead, the totality of changes are used to interpret the presence of a pharmacological effect.

Figure 5 depicts the transcriptional response following treatment with the pro-inflammatory E. coli lipopolysaccharide, and following doses of two anti-inflammatory treatments that inhibit 2 different target proteins participating in inflammatory biology. The results show that common downstream markers of inflammation are affected, as expected from the treatments. Also indicated in the Figure is the detection of pan-tissue pharmacology and confirmation of a predicted exposure-efficacy relationship in the species evaluated (target potency differed from human). This data set exemplifies a finding often confusing to those with limited transcriptional profiling experience, specifically the appearance and strong regulation of biomarkers in unexpected tissues such as the Cyp1A1 regulation seen in Figure 5 in the heart and skeletal muscle following an anti-inflammatory drug treatment. On closer review, a relationship between Cyp1A1 down-regulation and inflammation has been described (Abdulla and Renton, 2005).

Much of our teaching leads us to view changes of Cyp1A1 as largely restricted to tissues with Phase I metabolizing capacity and pathognomic for Ahr receptor interactions. The strong bias of our pre-existing knowledge base can readily lead one to focus on minor changes or dismiss potentially important transcriptional events outside one’s own historic context.

Dose-Response of Transcription

For 7 compounds, tissues were profiled at 2 or more dose levels, and a clear dose-related increase in transcription of profiled tissues was seen for 6 of the 7 compounds (Figure 6). In many cases, profound, dose-dependent changes in transcription were associated with the transition to an adverse effect level. The most extreme case showed a response in which the number of transcripts changing increased from 1 to 9% (p < 0.01) at a nontoxic and toxic dose, respectively, corroborating the relatively frequently observed steep or threshold-driven, dose-toxicity-response relationship.

Dramatic dose-related transcriptional changes may render the interplay between pharmacologic effect and toxicity or of multiple toxicities more transparent. The interplay between pharmacology and toxicity can lead to an inverse dose-response in the transcriptional regulation of a pathway or set of transcripts as with anti-proliferative pharmacologic effects vs. xenobiotic-induced liver hypertrophy or as with inflammatory effects at a low dose being masked by cytotoxicity of immune cells at a higher dose. In several rat toxicity studies, the cell cycle regulator, cyclin-dependent kinase inhibitor 1A, (p21) was shown to be transcriptionally repressed across a broad class of growth inhibitory treatments. However, a growth inhibitory compound that repressed p21 by 7-fold at a low dose associated with little toxicity was seen to have no effect on p21 at a 6-fold higher dose that also caused marked elevations in plasma AST, ALT, and lactate dehydrogenase (LDH), and showed microscopic evidence of periportal hypertrophy and centrilobular apoptotic necrosis. Experience and comparison to historical norms are required to most accurately interpret such potentially confounded findings.

An important interpretation to consider is that transcriptional findings may be of little significance when they are the result of non-specific changes driven by moribundity. We observed such nonspecific effects in 4 of 33 compounds evaluated. Although well primed to not over interpret agonal traditional endpoint changes in both toxicology and toxicokinetics, pathologists and toxicologists must learn anew not to overinterpret such agonal transcriptional events.

One example of a relatively common transcriptional finding associated with general toxicity and marked reduction in weight gain is a significant up-regulation of cholesterol biosynthesis transcripts in rat liver. This is commonly recapitulated in clinical chemistry profiles of rats given high doses of xenobiotics (Car et al., 2006). For illustration, 2 chemically unrelated BMS compounds (tyrosine kinase receptor and G protein-coupled receptor targets) caused general systemic toxicity in rats including a marked loss in body weight. In both cases, highly significant up-regulation in cholesterol-biosynthesis pathway transcripts occurred (p < 1 × 10⁻²⁶ using microarray markers with p < 0.01, at least 2-fold change, and markers assigned by Affymetrix to the cholesterol biosynthesis category).

Figure 7 shows highly correlated change for 20 cholesterol biosynthesis-related markers that are up-regulated by both drugs at p < 0.01. In both studies, there was at least a 10-fold increase in rat hepatic hmgcr (3-hydroxy-3-methylglutaryl-Coenzyme A reductase), hsd17b7 (hydroxysteroid (17-beta) dehydrogenase 7), and idi1 (isopentenyl-diphosphate delta isomerase 1). Pending advances in toxicogenomic practice that better discriminate between multiple effects, profiling of samples from dose groups showing evidence of marked toxicity may better characterize the toxicity rather than find changes that are informative as to mechanism.

Evaluation of Transcriptional Data for Mechanism of Toxicity

Assessment of transcriptional profiles can provide additional support for interpretation of traditional endpoints by adding the potential underlying molecular basis for the finding. For instance, for 2 drug targets examined in our studies, transcriptional changes were strongly confirmatory of the specificity of target organ toxicity (skeletal vs. cardiac myopathy) and species specificity (for skeletal myopathy in dog vs. rat).

One of the most consistent contributions of transcriptional data to safety assessment has been to help classify findings relative to known mechanisms of toxicities (for 30% of pipeline targets; 7 of 33 compounds evaluated). Using this approach, dissimilarity or similarity of expression changes to those produced by reference toxicant compounds is taken as evidence to rule in or out a mechanism of toxicity (e.g., Waring et al., 2001). Figure 8 demonstrates 2 example cases, (1) one where the pattern of transcriptional change is qualitatively similar to a known reference toxicant of rodent liver (top panel) and (2) one where the pattern of transcriptional change is qualitatively dissimilar to a known reference toxicant compound of rat testis (bottom panel). Additionally, transcriptional results generated by a study compound may be compared to published (Ellinger-Ziegelbauer et al., 2004) or compendium data (Iconix Figure 9).

In studies that utilize toxicogenomics to investigate mechanism of toxicity, study design parameters such as dose selection and postdose timepoints of sampling need to be chosen with great care. Even with thoughtful design, our experience suggests that transcriptional data is more likely to contribute to ruling out rather than ruling in candidate mechanistic hypotheses, consistent with the limited number of toxicities that have been characterized transcriptionally relative to the variety and novelty of toxicities that are encountered in the practice of drug discovery and development.

Predictive Markers of Toxicity

The utility of transcriptomics for predictive toxicity requires that transcriptional changes occur prior to the otherwise detectable onset of toxicity. A minority of the studies (8 of the 33 compounds) where target tissues were profiled at dose levels and times known to precede a previously identified toxicity support this predictive utility. For these 8 toxicities, 5 were associated with robust transcriptional changes (>4% of transcripts at p < 0.01) at times preceding changes in traditional end points. For the three toxicities not associated with early, robust transcriptional changes, one toxicity altered some specific markers indicative of the toxicity (e.g., serum amyloid A1 up-regulation with a dog liver inflammatory lesion), whereas there was no transcriptional evidence of toxicity in the remaining 2 cases.

In the cases where robust transcriptional changes were evident, transcriptional data, if predictive, would correspond to identifying toxicities in study durations of hours vs. days for 1 toxicity, hours vs. weeks for 3 toxicities, and a month vs. several months for 1 toxicity. In actual practice, however, the markers were not developed for predictivity but were used to obtain additional mechanistic insights.

Unfortunately, however, in the cases outlined here, the target tissues associated with the observed robust early transcriptional changes are unlikely to be included as part of a standard tissue collection paradigm of a commercial compendia, as they occurred in testis, vagina, retina, and cerebral cortex. When focused on previously identified pathologies, toxicogenomics was employed successfully to generate candidate predictive markers. With the limited available experience, it is too early to determine whether some tissues are in general more likely to yield predictive transcriptomic markers than others.

The results collected also allow the identification of predictive markers of toxicity from changes in nontarget tissues. In 3 studies, compound effects on skin or skin and intestine resulted in acute phase transcriptional signatures in the liver. Although indirect transcriptional effects may aid the detection of distal toxicities, they also confound recognition of subtle toxicities in a target tissue as illustrated by a case in which the acute phase response from skin toxicity and minimal hepatocellular degeneration were both present.

Based on our experience, significant transcriptional signals can appear prior to visible pathology. In practice, the work required to evaluate and develop these markers as general predictive or screening tools is usually not justified due to the low frequency with which any particular toxicity is encountered, the opportunity cost vs. work on ongoing programs, and the broad array of studies required for validation. The utility of these early time point changes has been for additional mechanistic information, which though generally not definitive adds to the understanding of the toxicity. For select toxicities with broad or frequent impact and suboptimal existing biomarkers (e.g., biliary, renal, muscle toxicity), validating predictive biomarkers is currently being pursued by consortia efforts such as those underway at International Life Sciences Institute (ILSI, 2006) and the Critical Path Institute (C-Path, 2006).

Use of Compendia

Given the multiplicity of analytes being statistically scored in a transcriptional profiling analysis (typically up to 30,000 on current microarrays) it is likely that some false positive markers, candidate genes that appear to be significantly associated with a toxicity or pathology, will be observed. The potential for a false association between measurements is exacerbated by what is typically referred to as the multiple testing problem where it is expected that, at the confidence levels used for more traditional laboratory assays (such as p < 0.01), large numbers of potentially false positives gene changes will result (Norris and Kahn, 2006). Even with perfect accuracy of array measurement, however, detected transcriptional changes may not have a clear causal or mechanistic relationship to pathology or efficacy.

A large compendium or database of transcriptional responses to a wide variety of drugs and toxicants can play a valuable role in efficiently gauging the specificity of a transcriptional response with respect to other endpoints or study features. The compendium can provide valuable perspective whether it is built principally of microarray studies and associated study findings (Hayes et al., 2005, Iconix DrugMatrix) or combines lower throughput studies from the literature as well as data from microarrays (Mattingly et al., 2006). The judging of specificity of transcriptional changes relative to pathology is performed through direct comparisons of study data vs. the compendium or via a small set of markers derived from the compendium or through scoring of study data relative to classifiers derived from the compendium. Utility of scoring systems derived from a compendium depends on their sensitivity and specificity. Practical experience to date with the current generation of approaches and data suggests false negatives dominate scores of transcriptionally quiet data and false positives dominate scores when responses involve robust transcriptional changes.

One example of the use of a compendium at the individual gene level involves our experience with the comparison of transcriptional profiles of a compound that produced hepatic carcinomas in long-term studies in the rat relative to 2 negative control compounds. Hundreds of transcripts were observed to change in rat liver across all profiled compounds with few transcripts identified to be specific to the toxic compound. One of the unique transcripts, growth differentiation factor 15 (gdf15), a gene reported to play a role in neoplastic transformation (Li et al., 2000), changed only in the test compound of interest, raising the possibility of an association with carcinogenesic outcome. However, both the Comparative Toxicogenomics Database (CTD, Mattingly et al., 2006) and Iconix DrugMatrix indicate gdf15 is transcriptionally altered by a multiplicity of treatments.

In Iconix DrugMatrix, 16% of transcriptional profiles in rat liver showed drug-related changes in gdf15 at p < 0.01, including several innocuous compounds. The CTD also cites modulation of gdf15 in literature references across a nonspecific set of drugs. The data from both resources suggest the gdf15 finding was not mechanistically specific or useful in providing a risk-assessment for nongenotoxic carcinogenicity. This kind of comparative assessment for marker specificity can be rapidly performed (minutes) and can provide helpful perspective prior to the design of further marker validation studies.

In another study, the most up-regulated transcript following drug treatment, calgranulin B, was suspected to be a pharmacologically specific marker of an anti-inflammatory effect. Iconix DrugMatrix data suggested that this transcript was associated with inflammation modulating treatments vs. nonmodulating treatments with a specificity of 83%. The literature, including that contained in the CTD, provides mechanistic but little transcriptional evidence for the association of calgranulin B transcriptional regulation with inflammation modifying treatments. In this instance, the CTD data are similar to data summarized in pathway tools (e.g., Ingenuity Pathway Analysis, Ingenuity Systems Inc., Redwood City, CA). If augmented with the empirical transcriptional data of toxicology study-containing transcriptional compendia, these tools would likely be of more value in interpretation of such data. The p-value based pathway scoring of transcriptional data should become more sensitive as pathways are enriched in genes known to be transcriptionally modulated by drug treatments or toxicities and depleted of genes that, though functionally related, are transcriptionally unchanged by treatments or toxicities.

These examples illustrate how a compendium can facilitate interpreting transcriptional changes associated with known mechanisms or defined phenomena. A compendium can, however, also bring mechanistic context by using comparisons to detect associations between study features as illustrated by Figure 10. This figure illustrates how the similarity with a query expression profile retrieves study attributes from the database. The database can be used to highlight study attributes common to the study underlying the query transcriptional profile and those underlying the similar profiles (e.g., across binding assay findings, clinical signs, toxicities, drug target, clinical pathology).

One powerful approach to the use of a compendium is to score the probability that the study features associated with similar expression profiles from a compendium would occur by chance. For example, if among the most similar expression profiles the majority were associated with studies where glucose elevation occurred while only 2% of profiles across the compendium were associated with glucose elevation, then the transcriptional similarity is likely to be related to hyperglycemic metabolic alterations. This kind of automated p-value based scoring capability can also facilitate analysis by attributing transcriptional changes to trivial causes such as dosing method. For example, Table 6 shows the 18 most similar expression profiles in Iconix DrugMatrix to a query expression profile as judged by Pearson correlation across all genes.

Analysis of the list by inspection does not yield an obvious reason for this particular listing of profiles to have resulted from the query. However, automated methods can refer to the study designs underlying the transcriptional profiles of the compendium and determine that 13 of 18 profiles were from studies with subcutaneous dosing while overall less than 5% of compendium profiles were associated with subcutaneous dosed studies. This level of enrichment in a list of compendium profiles is unlikely by chance (p < 1 × 10⁻¹³) and suggests that global transcriptional changes were not of toxicologic signficance. These p-value-based scores are produced using the same hypergeometric scoring method (Hosack et al., 2003) as those used to evaluate microarray results with respect to groupings of genes into pathways or ontologies. Whether genes or studies, the statistical overrepresentation of a category (e.g., cholesterol biosynthesis or subcutaneously dosed) becomes impossible as the number of items to be categorized approaches the total (e.g., all genes on the chip or all studies in the database).

Experience has shown that interpretation or classification of unknowns by comparing to historical toxicogenomic studies or compendia may be limited by the design of studies in those reference data sets. For example, when compendia are comprised of studies with few dose levels, include studies enriched for much lower or higher degrees of toxicity than the toxicity under question, employ sampling at time points significantly different to those under question, or inappropriate gender (most data bases are composed of male data), extrapolative interpretation may be difficult. Given contextual biases of individual studies or even of entire compendia, challenging candidate biomarkers and classifications with high-resolution dose-response data is a critical part of the validation of predictive and diagnostic markers of drug safety.

Quality Control of Expression Studies

Quality control for microarray data starts with the quality of the RNA sample isolated for subsequent analysis and includes all steps involved in the generation of the microarray data. Evaluation of inter-chip and inter-day variation is also critical. In several of our studies (5 of the 26) poor RNA quality was detected, with three studies having uniformly poor RNA. For these 3 studies, poor RNA quality was attributed to inappropriate sample collection or storage, caused by either delayed tissue preservation at necropsy or improper shipment from a contract research organization. The additional workload of collecting samples for both transcriptional profiling and traditional study endpoints was considered an important factor in these failures (failures did not occur in studies whose primary endpoint was transcription).

In our laboratories and others, the complexity of evaluating rodent skin has been encountered on numerous occasions, and in the context of toxicology studies with rodents, the transcriptomic evaluation of skin has proven to be difficult. Careful dissection to consistently avoid adnexa, mammary tissue and subcuticular fat is required. Across all tissues, the most important consideration in maintaining high quality is to reduce time to tissue preservation after euthanasia and to provide consistency in dissection across animals of all groups, particularly when multiple prosectors are involved. Results from one study suggested that tissue preservation methods designed to stabilize RNA can produce excellent quality RNA, but microarray analysis suggested that biological stress responses continued in these samples post-collection relative to paired samples flash frozen in liquid nitrogen.

As discussed under the example work flow, there are no generally accepted standardized criteria for deciding when RNA is of inappropriate quality. Poor RNA quality is suggested by RNA electrophoretic profiles with poor 28 to 18 S ribosomal peaks or multiple early peaks and dramatically reduced global microarray to microarray correlation coefficients, where typical intra-dose group correlation coefficients of 98% fall to below 90%. Our approach to determination of RNA quality and transcriptional data integrity combines visual evaluation of RNA electrophoretic profiles and chip-to-chip correlation and is illustrated in Figure 11.

When inadequate RNA quality and resulting chip quality is suspected, a conservative approach is to discard all data from a dose group-tissue combination if any individual sample is considered of poor quality. To avoid potential bias, if failures span multiple dose group-tissue combinations an entire tissue or potentially entire study is discarded rather than excluding a series of individual samples.

Sex-Specific Markers

In 30% of studies, both male and female transcriptional profiles were included. Candidate sex-specific transcriptional differences were identified in all cases and included many gender-specific transcriptional changes beyond those that would normally be expected (such as α2u-globulin in male rats). In 2 studies, markers identified and selected based on microarray profiling of high-dose male rats did not uniformly generalize to females: in 1 study, 30% of the set of markers shown to be dose-responsive in males did not appear dose-responsive in females even though drug exposure and toxicity findings were similar between males and females. A weakness of this early data is that it primarily consists of results from small group size, single dose level, and microarrays (vs. RT-PCR or other more focused methods) and, therefore, may detect associations of transcription with sex simply due to the multiplicity of genes being assayed. Although we have limited experience, these early data suggest that sex-specific markers are a potential important feature of toxicogenomic work, and marker validation will either benefit from inclusion of both sexes or could be undertaken in a sex-specific fashion. In general, internal company and commercially available compendia have been constructed largely with males to limit expensive, potentially redundant profiling. In our experience, the ability to extrapolate transcriptional outcomes between genders is limited and should not be assumed in any predictive or mechanistic work, or in the validation of biomarkers.

Applying Toxicoenomics to Safety Assessment

The results of 3 years of routine use of toxicogenomics in early drug discovery suggests transcriptional findings have utility for drug safety assessment with contributions occurring relatively frequently with respect to mechanistic classification, contributing to the weight of evidence of findings together with traditional end point data, and potentially including the use of the entire measured transcriptome as a marker of adverse drug effect. With the application of toxicogenomics at the transition into pre-IND studies, the cost of performing the relatively few studies and the frequency of informative outcomes appears to be favorably balanced against the costs that are incurred if perspectives are not fully developed prior to IND.

Transcriptomics was not found to be overly sensitive relative to the currently accepted gold standards of toxicology such as histopathology and clinical chemistry. The studies to date have found no clear cases where profound global transcriptional changes were detected in the absence of a drug effect as measured by other endpoints, even though changes in traditional endpoints may be observed only at later times. Furthermore, results have shown multiple cases where transcriptomics is no more or even less sensitive than the traditional endpoint of histopathology, particularly when histopathological changes are focal and multifocal in nature.

Overall the data collected shows transcriptional profiles represent an often overlapping, but at times distinct endpoint from that of traditional endpoints. A major challenge is the integration of transcriptomic results with traditional findings within a study and across prior studies, such as comparisons to compendia. Currently, the impact of toxicogenomics compendia or published work derives largely from gauging specificity of individual transcriptional markers, sets of markers, or even the entire measured transcriptome. Global methods for sensitively, specifically, and quantitatively scoring transcriptional profiles as markers of toxicity are still in their infancy. Furthermore, in tissues where data is abundant such as in liver, there is little high-resolution dose-response information or data from longer term studies, both of which are necessary to find specific predictive or diagnostic markers. This is particularly true when transcriptional data are obtained from a study with a single dose level and with tissues collected at a single time point.

It is generally recognized that pharmacologic effects, histopathologic effects, and effects on liver function are likely to have different relationships to drug plasma concentration and different times of onset. Thus, a study reporting transcriptional changes at a single dose level and a single time point would not be able to discriminate among markers that were predictive or diagnostic of pharmacologic, functional or histopathologic effects.

Perhaps not surprisingly, the ability to extrapolate transcriptional outcomes between genders appears to be severely limited and should not be assumed in any predictive or mechanistic work, in the validation of biomarkers, or use of sex-specific data compendia. Based on this limited retrospective, transcriptional data is more likely to contribute to ruling out rather than ruling in candidate mechanistic hypotheses, consistent with the limited number of toxicities that have been characterized transcriptionally relative to the variety and novelty of toxicities that are encountered in the practice of drug discovery and development. This acknowledges the state-of-the-art in toxicologic pathology as reactive at its best, rather than predictive.

The distinct, case-dependent relationship of transcriptomic and traditional findings and automated transcriptomic scoring methods has motivated us to interpret toxicogenomic study results similar to those of traditional pathology or toxicology where the reporting of study findings relative to pathology, artifact, pre-existing conditions, or drug dose rely on experience and consultation with relevant experts. Like more traditional endpoints, toxicogenomic data may ultimately lead to recognized, accepted interpretations relative to understanding mechanisms of toxicity. However, the current state-of-the-art relies on personnel with pathology and toxicology backgrounds learning toxicogenomic tools and then building their experience by analyzing multiple toxicogenomic studies.

Moving beyond the current state-of-the-art in toxicogenomics relies on the technologists continuing to build essential infrastructure and enabling more toxicologists and pathologists to master these tools while contributing to defining and refining the emerging discipline of toxicogenomics. Software solutions that are placed into the hands of the toxicologists and pathologists will facilitate advancement of the discipline, and there is an urgent need for software that is more consistent with toxicologic and histopathologic data. Although available commercial tools have demonstrated facility with parts of the data domain, as with microarray data or clinical chemistry data, no solution is available that allows organization and analysis across all the domains of the raw study data, the contextual data of protocols, and the interpretive data of reports or toxicity signatures.

Other more focused areas of need include: (1) analytical tools and views that enable rapid analysis across traditional endpoints and transcription; (2) signature creation methodologies that attempt to explain and quantitatively account for transcriptional variation in terms of biological processes rather than identifying biological processes in transcriptional data; (3) pragmatic solutions for cross-species comparative transcriptomic infrastructure that are judged and altered based on functional success and (4) reliable methods for archiving and communicating toxicogenomic data and analysis within an organization, between drug sponsors and regulators, and between members of research consortia engaged in marker discovery or validation work.

An emphasis on focused and finite interpretation of toxicogenomic data is essential for fitting this work into a drug discovery and development process, but raises questions around present uncertainty of interpretive standards and use of toxicogenomic data in regulatory decision making. The current FDA guidance (US Food and Drug Administration, 2005) provides flexibility for sponsor experimentation on markers and interpretive standards by limiting the requirement for a sponsor to submit data from non-validated markers and studies.

The guidance, however, does not provide clarity regarding the potential for re-interpretation of toxicogenomic data from short-term animal studies as validated markers become available, including after more definitive longer-term animal studies or even human studies have been performed. Some FDA communications imply there is an open-ended requirement to re-analyze toxicogenomic data, making study interpretation non-finite and subject to change indefinitely (US Food and Drug Administration, 2006). The lack of clarity on this matter is a serious barrier to advancing the use of toxicogenomics in a formal regulatory setting and motivating the involvement of pathologists.

In summary, our experiences suggest there is value in routinely including toxicogenomics in exploratory studies of drug safety by emphasizing a tractable, finite analysis of a drug’s efficacy and toxicity relative to transcriptional change. Multiple challenges persist in defining and testing markers and interpretive standards, engaging the toxicologist/pathologist in the data and process, building infrastructure and compendia, defining analytical methods that span new and old data, and in better defining the regulatory landscape to encourage toxicogenomic study activity. Even with the current state-of-the-art, however, the promise for drug discovery is already apparent in terms of the data reproducibility, correspondences to traditional endpoints, and the frequency with which useful mechanistic, global insights and candidate predictive markers are being discovered.

Footnotes

Acknowledgments

We gratefully acknowledge the efforts and assistance of the many BMS employees in Drug Safety Evaluation and Pharmaceutical Candidate Optimization who contributed to study execution, sample and data collection, data handling, computational infrastructure creation, and data interpretation that enabled this report. Photomicrographs () were kindly provided by John R. Megill, Evan B. Janovitz, and Lindsay Tomlinson. We especially acknowledge the general support of the work by Mark Cockett and the BMS Applied Genomics Group.

APPENDIX 1 Figures and Tables

References

10.

11.

12.

13.

14.

15.

16.

17.

18.

19.