Abstract
One application of genomics in drug safety assessment is the identification of biomarkers to predict compound toxicity before it is detected using traditional approaches, such as histopathology. However, many genomic approaches have failed to demonstrate superiority to traditional methods, have not been appropriately validated on external samples, or have been derived using small data sets, thus raising concerns of their general applicability. Using kidney gene expression profiles from male SD rats treated with 64 nephrotoxic or non-nephrotoxic compound treatments, a gene signature consisting of only 35 genes was derived to predict the future development of renal tubular degeneration weeks before it appears histologically following short-term test compound administration. By comparison, histopathology or clinical chemistry fails to predict the future development of tubular degeneration, thus demonstrating the enhanced sensitivity of gene expression relative to traditional approaches. In addition, the performance of the signature was validated on 21 independent compound treatments structurally distinct from the training set. The signature correctly predicted the ability of test compounds to induce tubular degeneration 76% of the time, far better than traditional approaches. This study demonstrates that genomic data can be more sensitive than traditional methods for the early prediction of compound-induced pathology in the kidney.
Introduction
Microarray expression profiling during preclinical drug development is expected to aid in uncovering unexpected or secondary pharmacology, predicting adverse effects, and understanding the mechanisms of drug action or toxicity. Chemogenomic data, including transcriptional profiling, molecular pharmacology, and traditional toxicological measurements, are helping to prioritize compounds with the best chance of clinical success, to provide accurate feedback to medicinal chemistry to aid the refinement of the base pharmacophore into safer molecules, and to identify mechanisms of toxicity to rescue failed compounds. Ultimately, expression profiling experiments will extend upstream to drug discovery for target validation and downstream to clinical biomarkers of safety or efficacy (Ulrich and Friend, 2002; Pennie et al., 2004). Demonstrating this point, several pharmaceutical and biotechnology companies have submitted genomic data to the Food and Drug Administration (FDA) in support of drug applications 〈http://www.fda.gov/cder/genomics〉. When a biomarker is intended for broad research or regulatory use, considerations must be made as to the size and diversity of the training set, and validation of the biomarker on external data must be demonstrated (Somorjai et al., 2003; Ransohoff, 2004). As a result, the FDA has since issued a guidance document regarding the submission of genomic data that describes under what circumstances genomic data should be submitted, and how it may be used in regulatory decision making 〈www.fda.gov/cber/gdlns/pharmdtasub.htm〉.
The majority of published microarray experiments applied to preclinical drug development have focused on improving the understanding and classification of drug action and toxicity occurring concurrently with tissue collection, often using small data sets (Thomas et al., 2001; Waring et al., 2001; Hamadeh et al., 2002a; Hamadeh et al., 2002b; Amin et al., 2004). Of greater value, however, would be predicting drug-induced injury weeks before it occurs, or at least before traditional endpoints would currently indicate. This would have a substantial impact on time, cost, and amount of compound required in order to prioritize and select lead candidates for future GLP studies, for example. To date, no publicly available genomic biomarker of toxicity has been adequately demonstrated to be superior to existing methods, nor have they been appropriately validated to predict the future occurrence of tissue injury prior to its histological appearance. Therefore, we tested this concept by evaluating whether a gene expression based biomarker (i.e., signature) in the kidney could be derived to predict the future occurrence of drug-induced renal tubular degeneration before it is detected using traditional endpoints, such as histopathology or clinical chemistry. In this context, the sensitivity of a gene signature was compared to histopathology by comparing the proportion of positive test cases correctly identified as inducing future nephrotoxicity. To ensure a sufficiently large data set, the signature was derived from short-term repeat dose studies in rats representing 64 nephrotoxic or nonnephrotoxic compounds. The signature, consisting of only 35 genes, was subsequently tested on independent samples from rats treated with 21 structurally distinct compounds with known positive or negative nephrotoxicity outcome. The signature is fully described here so that additional prospective validation by the research community might confirm its performance characteristics and utility in aiding preclinical drug development. This proof of concept study also lays the foundation for the development of other predictive, and more sensitive, genomic biomarkers.
Methods
Animals
Male Sprague–Dawley (Crl:CD(SD)(IGS)BR) rats (weight matched, 7 to 8 weeks of age and averaging 200 to 250 g) were purchased from Charles River Laboratories (Portage, MI) and housed individually in hanging, stainless steel, wire-bottom cages in a temperature (66–77°F), light (12-h dark/light cycle) and humidity (30–70%) controlled room. Water and Certified Rodent Diet #5002 (PMI Feeds, Inc, St. Louis, MO) were available ad libitum throughout the study. Housing and treatment of the animals were in accordance with regulations outlined in the USDA Animal Welfare Act (9 CFR Parts 1, 2, and 3). Animals were assigned to groups such that mean body weights were within 10% of the mean vehicle control group. Test articles were administered either orally (10 ml of corn oil/kg body weight) or by intraperitoneal injection (5 ml of saline/kg body weight), as indicated. Animals were dosed once daily starting on day 0, and necropsied 24 h after the last dose, as indicated. An equivalent number of time- and vehicle-matched control rats were treated concurrently.
In-Life Studies for Training Set
The positive class of the training set was experimentally defined by performing a 28-day repeat-dose study on 15 known renal tubular toxicants in male Sprague–Dawley rats. Rats were treated daily as indicated in Table 1, and sacrificed on days 5 (n = 5 rats) and 28 (n = 10 rats) for kidney histopathology evaluation. Gene expression profiles were obtained on day 5 from 3 randomly chosen rats per treatment group, before the expected appearance of the lesions. Doses were chosen so as to not cause histological or clinical evidence of renal tubular degeneration after 5 days of dosing, but to cause late-onset histological evidence of tubular degeneration as expected from the literature. This time course of injury is essential to derive a predictive signature since the presence of injury on day 5 would result in the identification of gene expression patterns that are indicative of the presence of a lesion, rather than identifying gene expression events that will predict the future occurrence of a lesion.
The negative class of the training set was defined based on literature knowledge of treatment effects in humans and rodents. This class included 49 nonnephrotoxic compound treatments that were administered daily for 5 or 7 days (n = 3 rats) (Table 1), as previously described (Ganter et al., 2005). The dose was an empirically determined maximum tolerated dose in order to ensure sufficient exposure, but not to cause overt clinical toxicity. This was defined as the dose that causes approximately a 50% decrease in body weight gain relative to controls during the course of the 5-day range finding study, and without severe clinical signs of toxicity. This is to ensure sufficient exposure but not to induce severe adverse effects that would otherwise confound interpretation of the gene expression data. These treatments were confirmed not to cause renal tubular degeneration after 5 days of dosing, and are not expected to cause subchronic tubular degeneration based on evaluation of human clinical data in the Physicians Desk Reference and rodent toxicity data in the peer-reviewed literature.
Clinical and Postmortem Evaluation
Gross necropsy observations, organ weights, body weights, clinical chemistry and hematology results were collected as described (Ganter et al., 2005). For histopathological analysis, the right kidney was preserved in 10% buffered formalin for tissue fixation and subsequently embedded in paraffin, sectioned and stained with hematoxylin and eosin. Sections (5 μm thick) were examined under light microscope by board certified pathologists for histopathological lesions. The left kidney was snap-frozen in liquid nitrogen for subsequent RNA extraction.
Microarray Expression Profiling
Gene expression profiling, data processing and quality control were performed as previously described (Ganter et al., 2005). Briefly, kidney samples from 3 rats were chosen at random from each treatment group on day 5 for expression profile analysis on the Amersham Codelink Uniset Rat 1 Bioarray (Amersham Biosciences, Piscataway, NJ). A complete list of probes on the array is provided on the Amersham Biosciences web site 〈www.amershambiosciences.com〉. Log transformed signal data for all probes were array-wise normalized used Array Qualifier (Novation Biosciences, Palo Alto, CA), a nonlinear normalization procedure adapted for the CodeLink microarray platform. Log10 ratios for each experimental group are computed as the difference between the average of the logs of the normalized experimental signals and the average of the logs of the normalized control signals for each gene. Gene annotation, and raw and processed microarray data for all experiments are available at 〈www.iconixpharm.com〉, in addition to the NCBI GEO web site (Accession: GSE 3210).
Signature Derivation
To derive a signature, a 3 step process of data reduction, signature generation and cross-validation was used. A total of 7478 probes were preselected based on having no missing values in the positive class of the training set, and less than 7% (3 of 49) missing values (e.g., invalid measurement or below signal threshold) in the negative class of the training set. The signature used to predict the presence or absence of future renal tubular degeneration was derived using a sparse linear programming algorithm (SPLP) as previously described (El Ghaoui, 2003). This classification algorithm is a variant of a linear support vector machine algorithm, and is unique in that produces short, or sparse, signatures, in addition to taking advantage of measurement variability for increased robustness. As a result of its sparsity, the signature can be interpreted to understand the molecular basis of classification. Briefly, the SPLP algorithm finds an optimal linear combination of variables (i.e., gene expression measurements) that best separate the two classes of experiments in m dimensional space, where m is equal to 7478. The general form of this linear-discriminant based classifier is defined by n variables: x1, x2, … xn and n associated constants (i.e., weights): a1, a2, … an, such that:
where S is the scalar product and b is the bias term. Evaluation of S for a test experiment across the n genes in the signature determines what side of the hyperplane in m dimensional space the test experiment lies, and thus the class membership of the test experiment. Test treatments with scalar products greater than 0 are predicted to induce late-onset renal tubular degeneration.
Signature Validation
The signature was trained and validated using a split sample cross validation procedure. Within each partition of the data set, 80% of the positive experiments and 20% of the negatives experiments are randomly selected and used as a training set to derive a unique signature, which is subsequently used to classify the remaining positive and negative test experiments of known label. An experiment here is defined as the average log10 ratio calculated across the 3 biological replicate hybridizations within a single drug-dose-time combination. This process is repeated 40 times, and the overall performance of the signature is measured as the percent true positive and true negative rate averaged over the 40 partitions of the data set. A log odds ratio (LOR) is used to summarize the performance, and is defined as the natural log of the ratio of the odds of predicting a subject to be positive when it is positive, versus the odds of predicting a subject to be positive when it is negative. It is estimated from a set of training/test cross-validation partitions by the following equation,
where c (= 40) equals the number of partitions and TP i , TN i , FP i , and FN i represent the true positive, true negative, false positive, and false negative counts on the test cases of the ith partition, respectively.
In addition to cross-validation, we also validated the signature experimentally on a sufficiently large set of positive and negative test compounds naïve to the training set. Since long-term daily dosing of a large number of test compounds was not feasible, 9 known nephrotoxicants and 12 known nonnephrotoxicants were chosen based on evidence of their renal tubular toxicity from the literature. The test compounds were also chosen on the condition they were structurally unrelated to the 64 compounds in the training set, thus ensuring that estimates of validity are not biased by the use of compounds with structures similar to that in the training set. Pairwise structural comparisons between the 21 test compounds and the 64 training set compounds resulted in a mean ISIS MolSim Score (MDL Information Systems) of 18.0 ± 13.7, with a maximum of 56. By contrast, within drug-class MolSim Scores average 75.3 ± 9.8 and 96.7 ± 2.3 for the aminoglycosides and anthracyclines, respectively. Therefore, we can test whether the signature is truly predicting toxicity rather than identifying features related by structure. Triplicate rats were dosed daily with the 21 test compounds and groups of rats (n = 3/group) were sacrificed on days 0.25, 1, 3, and 5 for histopathology and kidney gene expression profiling as described above. With the exception of gold sodium thiomalate, all treatment groups were negative for tubular degeneration at the time of gene expression measurement, thus testing whether predictions could be made on treated samples before histological evidence of injury. Kidney gene expression profiles were generated in triplicate for each group at multiple early time points (≤5 d) and the average log10 ratios were evaluated with the signature. To account for compound specific toxicodynamics and toxicokinetics, the maximum scalar product observed over the multiple time points examined was used to classify each compound as being positive or negative for future tubular degeneration.
Results
Histopathological Evaluation of Training-Set Samples
After 5 days of dosing, histopathological analysis confirmed the absence of severe acute tubular injury. A few individual treated animals on day 5 showed histopathological evidence of early chronic progressive nephropathy (CPN), a spontaneous disorder in rats which includes minimal to mild regeneration of tubular epithelium, interstitial inflammation, pelvic dilation, focal thickening of basement membrane and focal infarcts. These findings were consistent with that observed in the control animals. Cisplatin induced a high incidence of mild tubular basophilia (4 of 5 rats), while both cisplatin and carboplatin induced a high incidence of karyomegaly (3 and 5 rats, respectively). Mild tubular dilation and proteinaceous casts were also observed in one lead treated rat. Although considered early signs of tubular nephrosis, these mild and infrequent observations are much less severe and frequent than those found after 4 weeks, as the majority of the kidneys exposed to the 15 nephrotoxicants were unaffected on day 5. Clinical indicators of nephrotoxicity, such as blood urea nitrogen (BUN) and creatinine (CRE) elevation, were also not appreciably elevated on day 5 (Table 2). Serum albumin on day 5 was consistently repressed relative to controls, however, albumin was also reduced (<4.1 g/dL) relative to the controls by 28.5% (14/49) of the negative class treatments, making this marker highly nonspecific (data not shown). Therefore, these mild findings on day 5 are unlikely to bias the signature.
After 4 weeks of daily dosing, all 15 nephrotoxicants showed some evidence of degenerative changes of the renal tubules, or signs of tubular nephrosis, which is consistent with the expected findings of these nephrotoxicants. Tubular degeneration included signs of tubular necrosis, dilation, vacuolation, basophilia, mineralization and cysts (Figure 1). These lesions were also accompanied by a higher incidence and increased severity of epithelial regeneration and interstitial inflammation, as well as granular and proteinaceous casts. A high incidence of karyomegaly was also noted for cisplatin, carboplatin, lead, and cobalt. Consistent with the tubular degeneration was the observation of hypercholesterolemia and hypoalbuminemia for the majority of the nephrotoxic treatments. BUN and CRE levels were elevated for some treatments, but these endpoints did not correlate well with the histological evidence of tubular injury (Table 2).
While most nephrotoxicants induced clear evidence of tubular degeneration, 4-nonylphenol, carboplatin, cadmium, cobalt, cyclosporine A, and roxarsone induced much weaker signs of tubular degeneration on day 28. For example, proteinaceous casts, tubular cysts, and mineralization were only observed in one roxarsone or 4-nonylphenol treated rat on day 28, although these treatments did induce a much higher incidence and severity of tubular regeneration (4–6 rats) and interstitial inflammation (6 rats) suggestive of future tubular effects.
These treatments also induced a higher incidence and severity of CPN relative to controls and their corresponding day 5 treatment groups. Since the nephrotoxicity of these compounds are well described, and early signs of injury are apparent in the current study, these treatments were included in the positive class and are thought to increase the sensitivity of the signature to detect weak nephrotoxicants, as confirmed in trial signatures (data not shown). While a signature specific for a distinct morphometric endpoint, such as necrosis, would have value, it was not our intent to derive a signature for a specific tubular lesion, but rather a signature that is able to detect multiple morphometric endpoints that are indicative of a degenerative change, all of which are indicators of compound-induced toxicity and a concern for drug development.
Signature Derivation
Using kidney gene expression from all 64 compound treatments in the training set, a signature was derived that perfectly classified the training set. Split-sample cross-validation results in an average true positive rate (sensitivity) of 83% and a true negative rate (specificity) of 94%. This equates to a positive and negative predictivity of 81 and 95%, respectively. Leave-one-out cross validation or use of other split ratios of the data set for cross-validation produced similar results. To test whether the algorithm is identifying an accurate signature by chance, the labels for the 64 experiments were randomly permuted and a signature was derived and subject to split-sample cross-validation as above. This process was repeated 99 times. As expected, the average test natural log odds ratio (LOR) over the random permutations closely centered about zero (−0.004 ± 0.86), with a maximum of 2.85. By comparison, the true label set resulted in a LOR of 4.35, which was significantly greater than that achieved with random labels (p < 0.0001, 2-sample t-test).
The signature derived using the complete data set consists of 35 probes, representing 35 unique genes (Unigene build 119), their associated weights and a bias term (Figure 2). The product of the weight and the log10 ratio for each gene (i.e., impact) summed across all 35 genes in the signature, minus the bias term, determines the scalar product and the result of the classification. A positive scalar product indicates the treatment is predicted to induce future tubular degeneration, while a negative scalar product indicates the treatment will not induce future tubular degeneration. The genes and bias term in the signature are scaled such that the classification threshold (i.e., zero) is equidistant, by one unit, between the positive class and negative class experiments in the training set. The average impact across the 15 positive experiments in the training set indicates that 31 of the 35 genes are considered “reward” genes, as they represent expression changes that positively contribute to the scalar product. The reward genes contribute to the sensitivity of the signature by rewarding expression changes consistent with nephrotoxicity. The remaining 4 genes in the signature are considered “penalty” genes as they represent expression changes that negatively contribute to a scalar product. Penalty genes contribute to the specificity of the signature by penalizing treatments with expression changes not consistent with nephrotoxicity. Of the 31 reward genes, 15 have an average log10 ratio in the positive class greater than zero and are therefore induced on average by the 15 nephrotoxicants. The remaining 16 are on average repressed by the 15 nephrotoxicants, but have a negative weight so they contribute positively to the scalar product and are thus reward genes. Examination of the expression changes across the 15 nephrotoxicants in the training set reveals that most genes are not consistently altered in the same direction by all treatments (Figure 2). For example, the signature genes cyclin-dependent kinase inhibitor 1A (p21; U24174) or the EST AW143082, are both induced and repressed to varying degrees by the treatments in the positive class, thus indicating that individual genes would be poor classifiers when used individually. Note that the significance of the gene expression change, relative to controls, in experimental samples is not factored into the classification decision; rather the standard error of each gene in the training set is considered when the algorithm derives the signature (El Ghaoui, 2003).
Signature Validation
In addition to cross-validation, 21 independent and structurally distinct compound treatments representing 9 positive and 12 negative compounds were also evaluated. As shown in Table 3, 7 of 9 (78%) tubular toxicants were correctly classified as being positive. Based on a hypergeometric distribution, the probability of predicting 7 nephrotoxicants from the positive class given 10 total positive predictions and the given sample and population size is less than 3%. This indicates that the tested sensitivity of the signature is not likely due to chance. Of the true positive predictions, 6 were predicted to be nephrotoxic in the absence of histological evidence of renal tubular injury at the time of expression measurement. Allopurinol and bacitracin showed early signs of mild to severe tubulointerstitial nephritis as early as day 1 (allopurinol) or day 3 (bacitracin), whereas the other treatments were considered histologically normal.
The two false negative predictions, bromobenzene and carbon tetrachloride, are thought to induce tubular injury via the formation of toxic reactive metabolites (Lau et al., 1984; Ozturk et al., 2003). It is not clear why these 2 treatments were not positively identified. Of the 12 nonnephrotoxicants tested, 9 (75%) were correctly classified. Of the 3 false positives, the two antimetabolites—cytarabine and azauridine—were close to the threshold and therefore are predicted to be weak nephrotoxicants. Although the signature predicted these treatments would induce future nephrotoxicity, renal tubular injury may not have been observed for these drugs in clinical or preclinical studies due to other dose-limiting toxicities of other types and in other organs, as is typical of antineoplastics. For example, both cytarabine and azauridine induced significant leukopenia in rats at the doses tested. The other false positive, diclofenac, had the highest scalar product of the 21 compounds tested, which strongly suggests that this compound is nephrotoxic.
While human clinical evidence suggests that idiosyncratic hepatotoxicity and gastrointestinal bleeding are the predominant adverse effects of this anti-inflammatory, rodent studies have identified acute nephrotoxicity in both rats and mice, albeit at 3 and 30 times higher doses than used in this study, respectively (Kocaoglu et al., 1997; Hickey et al., 2001). To test whether the tested dose of 3.5 mg/kg/d is indeed nephrotoxic following subchronic dosing, a 28 day repeat dose study was performed where rats were treated daily by oral gavage with 3.5 mg/kg/day and necropsied after 5 days (n = 5 rats) for gene expression measurement (n = 3 arrays) and day 28 (n = 10 rats) for histopathology. Surprisingly, the scalar product on day 5 from this repeat study was now negative (−0.88), which was consistent with the negative histological outcome on day 28. There is no clear explanation for the false positive finding from our earlier study.
To challenge the ability of the signature to correctly classify a nonnephrotoxicant (oxaliplatin) structurally related to nephrotoxicants in the training set (i.e., cisplatin, carboplatin), oxaliplatin was administered intravenously to rats at a maximum tolerated dose (8 mg/kg/day) for 5 days (n = 3 rats) and kidney gene expression profiles from days 1, 3, and 5 were evaluated with the signature. The maximum scalar product was negative (−0.11), which illustrates that the signature can correctly classify a toxicologically distinct, yet structurally related, group of compounds.
To challenge the sensitivity of the signature, a lower yet nephrotoxic dose (1 mg/kg/week; n = 5 rats) (van Hoesel et al., 1984, 1986) of the 3 anthracyclines in the training set (doxorubicin, daunorubicin, idarubicin) were administered intravenously to rats once weekly (days 0, 7, 14, and 21), and kidney gene expression profiles were measured on day 5 (n = 3 arrays) and evaluated with the signature. In all 3 cases, the scalar product on day 5 was positive (Daunorubicin = 1.31, Idarubicin = 1.20, Doxorubicin = 0.65), indicating that these doses are predicted to be nephrotoxic under subchronic dosing conditions. Although kidney histopathology on day 5 and 28 were negative (data not shown), literature data supports the conclusion that nephrotoxicity in rats for doxorubicin, and likely for daunorubicin and idarubicin also, would be expected beyond 4 weeks at this dosing regimen (van Hoesel et al., 1984, 1986). Strain and/or stock differences that affect metabolic characteristics may contribute to the discrepancy between these 2 studies. Future validation studies to examine nonnephrotoxic doses of nephrotoxicants will further test the specificity of the signature.
Discussion
Using a 64-compound training set, transcriptional profiling data from rat kidneys was used to derive a 35 gene signature to predict long-term renal tubular degeneration before its histological appearance using short-term gene expression data. The signature performed well when cross-validated and correctly classified 16 of 21 (76%) independent test compounds structurally unrelated to the training set. By comparison, histopathology and clinical chemistry markers (i.e., blood urea nitrogen or creatinine) collected from these short-term studies were not prognostic of future tubular degeneration, thus indicating a difference in sensitivity between gene expression changes and traditional toxicological and clinical endpoints. It must be recognized that in order to overcome the expense of many (~60) long-term animal studies, we relied on reference compounds that have been previously studied and well annotated with respect to renal effects.
Discrepancies in previous studies versus ours may negatively affect our estimate of prediction accuracy, but surprisingly, we obtained very good results in forward validation studies that were found unlikely to be due to chance. More importantly, it demonstrated proof of concept that genomic biomarkers have the potential to increase the sensitivity for predicting tubular degeneration above and beyond existing methods. This is of importance due to the practical application of toxicogenomics in drug development and the need for developing more sensitive screening assays. Continued validation of the signature by the scientific community will produce a more accurate estimate of its predictivity, and its relative sensitivity compared to other emerging methods, such as proteomics or metabolomics. When such genomic biomarkers are evaluated in the context of other structural, in vitro, or in vivo preclinical data, the ability to predict nephrotoxicity may be significantly enhanced. By analogy, the Salmonella mutagenicity (Ames) test for the identification of rodent carcinogens has been estimated to have an accuracy of 65%; however, when used in conjunction with other in vitro and in vivo genotoxicity assays, the overall predictability is extremely high (Zeiger, 1998). Therefore, incorporation of gene-based biomarkers into preclinical drug screening paradigms offers a unique opportunity for prognosis of compound-induced pathology. This experimental approach is currently being applied to model and predict other latent forms of drug-induced pathological injury.
A benefit of the SPLP algorithm is the small number of genes used in the signature, which may allow for the use of nonarray based technologies as potentially less expensive and higher throughput screening tools in the future. Using a variety of approaches, it may be possible to reduce the signature to even fewer genes with minimal loss of predictive performance (Natsoulis et al., 2004). While this signature was derived on the Codelink RU1 platform, cross-platform mapping experiments have facilitated the identification of an alternative version of the signature capable of analyzing Affymetrix GeneChip expression data (B.P.E. et al., unpublished data). Higher throughput transcriptional measurement methods, which promise to substantially simplify transcriptional profiling experiments, are poised to make these approaches highly cost effective and applicable to preclinical screening. Although other algorithms, such as neural nets or kernal support vector machines, have been shown to achieve higher predictivity for certain classification problems, they lack interpretability due to the nonlinear and nonsparse nature of the signatures (Natsoulis et al., 2004). This property, however, does not preclude their use when mechanistic understanding of the classifier is not desired, or is achieved by other means.
While not the objective of this study, it is possible to examine the molecular and biological function of the genes in the signature, in addition to other genes found to be necessary for classification, to reveal the possible interconnected processes linked to early nephropathy. Considering the cellular heterogeneity of the kidney, it is likely that the observed changes are reflective of concurrent events in multiple cell types, although this remains to be tested. Many of the genes in the signature are of unknown function. As a result, the mechanism of action is difficult to infer from this version of the signature.
Identifying additional genes, beyond these 35, that are useful for classification may further our understanding of mechanism. Additionally, evaluating other genes profoundly regulated by the nephrotoxicants, but not necessarily optimal for classification, is of great value in understanding the mechanism of action. For example, the gene with the largest differential expression between the nephrotoxicants and the nonnephrotoxicants in this study was albumin. While the SPLP algorithm did not identify this gene as being useful for classification, it was expressed an average of approximately 9 times lower in the nephrotoxicant-treated animals relative to the nonnephrotoxicants. Like serum levels of albumin, the down-regulation of albumin mRNA in the kidney is likely too nonspecific to be used in classification. The expression of albumin in the kidney has been shown to be decreased in rat models of nephrectomy-, and adenine-, and ischemia-reperfusion induced renal failure (Yamauchi et al., 1989; Yokozawa et al., 1994; Yoshida et al., 2002).
This may, in part, reflect hypoalbuminemia which is frequently observed concurrently with tubular injury. Although basal albumin expression is minimal in kidney, it has been shown that albumin can increase renal proximal tubule expression of TNF-α and NF-κB signaling and hence potentiate tubular interstitial inflammation during renal deterioration (Drumm et al., 2002). Therefore, down-regulation of albumin may be a result of feedback inhibition to diminish protein-overload-induced expression of inflammatory cytokines. Relative to previous work in this field, the 35 genes in the signature have not been previously described as putative biomarkers of nephrotoxicity. This is likely due to the fact that this study was designed to identify expression changes that precede tissue injury, while the majority of proposed biomarkers of nephrotoxicity, such as Kim-1 or Osteopontin, were identified as being coincident with tubular injury (Amin et al., 2004). Evaluation of the genes in the signature and their relevance to nephrotoxicity is the subject of ongoing investigations and beyond the scope of this paper.
To summarize, we have derived a gene signature that is designed to predict the future development of renal tubular degeneration before it appears histologically or by other traditional clinical pathology endpoints. Using an independent set of compound treatments structurally distinct from those in the training set, we have validated the signature and found that it accurately classifies 76% of the compound treatments, which compares favorably to existing methods, such as histopathology, blood urea nitrogen, and serum creatinine. The signature was composed of only 35 genes. Additional investigation on the role of these genes, and others, in the pathological process of tubular degeneration may highlight therapeutic approaches to ameliorate clinical toxicity and/or to identify accessible biomarkers for clinical use. As the field of chemogenomics moves forward, it is expected that similar experimental approaches to biomarker derivation, validation, and scientific acceptance will be implemented to model other drug-induced toxicities so that new technologies can be incorporated into preclinical drug and chemical toxicity testing strategies.
Footnotes
Acknowledgments
We would like to thank Drs E. Blomme, A. Tolley, and S. Baumhueter for valuable comments to the manuscript, in addition to many other scientists at Iconix Pharmaceuticals for their technical assistance, particularly C. McSorley for coordinating the in vivo studies. We would also like to thank Dr. J. Seely for photomicrographs.
