Abstract
Gene expression profiling, metabolomic screens, and other high-dimensional methods have become an integral part of many biological investigations. To facilitate interpretation of these data, it is important to have detailed phenotypic data—including histopathology—to which these data can be associated, or anchored. However, as the amount of phenotypic data increases, associations within and across these data can be difficult to visualize and interpret. We have developed an approach for categorizing and clustering biologically related histopathological diagnoses to facilitate their visualization, thereby increasing the possibility of identifying associations and facilitating the comparison with other data streams. In this study, we utilize histopathological data generated as part of a standardized toxicogenomics compendium study to generate composite histopathological scores and to develop visualizations that facilitate biological insight. The validity of this approach is illustrated by the identification of transcripts that correlate with the pathology diagnoses that comprise the categories of “response to hepatocellular injury” and “repair.” This approach is broadly applicable to studies in which histopathology is used to phenotypically anchor other data, and results in visualizations that facilitate biological interpretation and the identification of associations and relationships within the data.
Keywords
Introduction
The rapid advancement of high-dimensional assay methods such as gene expression microarrays has profoundly impacted the type and volume of data that is collected in contemporary biological investigations. The challenge then becomes mining these data to develop biological meaning and to uncover previously unidentified trends, patterns, and associations. A useful approach is to associate these data with phenotype data, a process designated “phenotypic anchoring” (Hamadeh et al., 2002; Luo et al., 2005). Simple phenotypic assessments, such as the presence or absence of a discrete histopathological diagnosis, can be dealt with in a straightforward manner. However, when the amount of phenotype data increases, interpreting these data and associating them with other data streams (e.g., gene expression data) becomes challenging.
The application of information visualization approaches to high-content data to facilitate data mining has been extensively applied (Weinstein et al., 1997; Eisen et al., 1998). For example, previous studies associated with gene expression profiling have used hierarchical clustering followed by visualization of these results into heat maps and dendrograms to facilitate biological interpretation of these data. Visualization approaches have been used primarily on high-dimensional data and not on the phenotypic data, i.e., the observable physical characteristics of the subject(s) under analysis. Recently, Hamadeh et al. (2004) performed a principal component analysis on the severity scores of different histopathological lesions to facilitate the biological interpretation of gene expression data from furan-treated animals. Previously, this group had published a method for generating a combined pathology score for pooled samples, which was derived by summing the severity score for all observed histopathological diagnoses for all of the individuals comprising the pool (Hamadeh et al., 2002). Other examples exist in the literature from other scientific disciplines in which gene expression data have been associated with histopathological data (Moggs et al., 2004; Nielsen et al., 2004; Livasy et al., 2006). Primarily, the approaches used in these examples were to find gene expression profiles that correlated with a specific parameter or phenotype. Cumulatively, these efforts have demonstrated the utility of applying visualization approaches to rich data sets in order to generate observations that are phenotypically anchored.
To further develop capabilities to meaningfully integrate the analysis of high dimensional data within the context of available phenotypic data, here we detail an approach to associate, integrate, and visualize histopathological data. Instead of calculating an average value of all histopathological diagnoses (Hamadeh et al., 2002), this approach focuses on individual animals and integrating biologically related diagnoses. This is achieved by first partitioning and integrating a series of individual histopathological diagnoses into related biological themes or categories (e.g., Response to Hepatocellular Injury), each containing interrelated diagnoses. As part of this process, the histopathological scores from each experimental subject are recalculated according to a modified scoring system based on their relative contribution to each biological category. The results can then be clustered using a variety of methods and easily visualized using heat maps and dendrograms to facilitate the visualization of these data and identification of potential relationships within a biologically relevant context.
To demonstrate the utility of this approach, we reexamined histopathological data from an acute hepatoxicity toxicogenomics study. This study is comprised of 8 chemical compounds, with 3 time points and at least 4 doses per compound. In total, over 400 animals were examined using a variety of methods—gene expression microarrays, metabolic profiling, clinical chemistry, and histopathology. Histopathological examination of 2 sections of the left liver lobe involved a study pathologist, a reviewing pathologist and a pathology working group (PWG) panel to assure consistency across animals and studies (Boorman et al., 2002). A total of 32 different diagnoses were scored, resulting in a matrix of over 13,000 observations. Many of these diagnoses are interrelated and may describe different aspects of the same biological process. In the present study, the traditional histopathological data were recompiled according to a biologically based scoring system. Because individual diagnoses were still scored, albeit with a weighting appropriate to its relative contribution to a given biological categorization, no data was lost. Thus, different biological categories can be visualized at different points in the analysis, depending upon the specificity that is desired (e.g., the involvement or location of a specific cell type in a given process). This approach has provided insight into the pathology data associated with this study and serves to guide and illuminate the analysis of gene expression profiling data that was generated from the same samples used for the histopathological analysis. While the observations detailed for this approach are primarily focused on acute hepatocyte toxicity, the principles and concepts can be readily applied to other sets of histopathological data to gain similar insights.
Materials and Methods
In Life Studies
This study is part of a larger study to evaluate hepatic transcript response to a variety of hepatotoxicants that injure different liver cells and cause injury in different regions of the liver using 12- to 14-week-old F344/N male rats. Hepatotoxicants were identified based on published literature and pilot experiments and chosen based on their ability to injure different cell types and regions of the liver. For each chemical, doses that elicited a subtoxic, a moderately toxic, or an overtly toxic response 24 hours after treatment were selected. The doses used were as follows: bromobenzene 25, 75, and 250 mg/kg body weight (BW) per os (p.o.) (vehicle: corn oil); 1,2-dichlorobenzene 15, 150, 1500 mg/kg BW p.o. (vehicle: corn oil); 1,4-dichlorobenzene 15, 150, 1500 mg/kg BW p.o. (vehicle: corn oil); N-nitrosomorpholine 10, 50, 300 mg/kg BW p.o. (vehicle: phosphate-buffered saline (PBS)); diquat 5, 10, 20 and 25 mg/kg BW intraperitoneal (i.p.) (vehicle: PBS); monocrotaline 10, 50, 300 mg/kg BW (vehicle: PBS); thioacetamide 15, 50, and 150 mg/kg BW (vehicle: PBS) and galactosamine 25, 100, and 400 mg/kg BW (vehicle: PBS). For each compound, animals also were treated with vehicle alone. All animals were fasted overnight prior to exposure. Groups of 4 to 6 animals were dosed between 8:30 and 9:30 a.m., thereafter feed was provided ad libitum until sacrifice after 6, 24, or 48 hours of treatment by isoflurane anesthesia followed by cervical dislocation. Experiments were performed according to the established guidelines and an approved Animal Study Protocol was on file prior to initiation of the study (NRC, 1996).
Histopathology
Two cross-sections that were adjacent to the central core collected for transcript analysis were collected from the left liver lobe, fixed by immersion in 10% neutral-buffered formalin, and transferred to histology grade alcohol 18–24 hours after necropsy. Tissues were then embedded in paraffin, sectioned at 5 micron, and stained with hematoxylin and eosin. A study pathologist completed the initial microscopic evaluation of each slide for a given compound. Exact grading criteria for each diagnosis are detailed in Table 1. A quality assessment review was performed on the initial histopathological findings. A third pathologist reviewed the quality assessment report and the slides. Any inconsistencies in diagnoses were resolved during a Pathology Working Group (PWG) review by 4 to 5 pathologists, including a core of 4 pathologists that participated in every PWG review. Data for all compounds were compiled into a single table detailing the scores for all 32 diagnoses across the 426 animal subjects (Supplementary Table 1).
Hierarchical Clustering and Visualization of Histopathology Data
Clustering algorithms are designed to create homogenous groups of related objects by mathematically determining the degree of similarity between all of the subjects. Many different algorithms for grouping samples and methods for assessing similarity exist (Aldenderfer and Blashfield, 1984; Causton et al., 2003). For the purpose of this work, single linkage hierarchical clustering with Euclidean distance as the similarity metric (Aldenderfer and Blashfield, 1984) was performed on the compiled histopathology severity data and the subsequent heatmap and dendrogram were generated using Spotfire DecisionSite version 8.0 (Spotfire, Inc., Somerville, MA). Several combinations of different clustering algorithms and distance metrics were used and similar visualizations resulted (data not shown).
Gene Expression Profiling and Analysis
The gene expression profiling data associated with this study will be detailed in a separate publication. Briefly, 1 μg of total RNA from the left liver lobe from either an individual rat or a pooled sample representing time-matched, vehicle control animals was amplified and labeled with a fluorescent dye (either Cy3 or Cy5) using the Low RNA Input Linear Amplification Labeling kit (Agilent Technologies, Palo Alto, CA) following the manufacturer’s protocol. The amount and quality of the fluorescently labeled cRNA was assessed using a Nanodrop ND-100 spectrophotometer and an Agilent Bioanalyzer. Equal amounts of Cy3- or Cy5-labeled cRNA were hybridized to the Agilent Rat Oligo Microarray (Agilent Technologies, Inc., Palo Alto, CA) for 17 hours, prior to washing and scanning. Data was extracted from scanned images using Agilent’s Feature Extraction Software (Agilent Technologies, Inc., Palo Alto, CA). Fluorophore reversal hybridizations were performed for 318 treated rats against time-matched control RNA pools for a total of 636 hybridizations. Fluorophore reversal hybridization data were combined using an error-weighted average for each individual using Rosetta Resolver version 5.1.0.1.0 (Rosetta Biosoftware, Seattle, WA) (Weng et al., 2006). Rosetta Resolver was used to perform one-way error-weighted analysis of variance (ANOVA) (with Bonferroni multiple test correction) and principal component analysis (PCA), a mathematical approach for combining components (transcripts in this instance) in order to reduce the dimensionality of the data such that the greatest amount of variability is represented by the first principal component (Causton et al., 2003). PCA results were visualized using Spotfire DecisionSite. Gene ontology enrichment analysis was performed using High-Throughput GoMiner (Zeeberg et al., 2003, 2005).
Results and Discussion
Due in part to the growth and expansion of high dimensional assays, the volume of data generated in many experiments continues to grow. Once these data have been generated the challenge becomes integrating these results with other data and mining the combined data to find previously undiscovered associations, relationships, or trends. Shortly after the first gene expression microarray experiment was performed (Schena et al., 1995, Lockhart et al.,1996), the utility of hierarchical clustering as a means to visualize the sheer volume of data inherent in this technology and to identify underlying biological relationships became clear (Weinstein et al., 1997; Eisen et al., 1998). Since that time, this analytical/visualization approach has been extensively used with gene expression profiling data as well as with many other data types.
However, one critical aspect to mining data from high-dimensional studies is to analyze the association between these data and the phenotypic data. As the number of animals experiments associated with a given project increases, it becomes a challenge to meaningfully integrate and visualize phenotypic data as well. The compilation of the histopathology data generated from this study produced a matrix of over 13,000 data points. We sought to develop a generalized approach to analyze and visualize phenotypic data using methodology frequently employed in the analysis of gene expression data. Histopathology data presents an interesting set of challenges.
Despite attempts to use consistent terminology for specific diagnoses throughout the 8 chemicals, doses and time points for the entire study, synonyms for a specific diagnosis were occasionally used, for example, “sinusoidal congestion” and “congestion” indicated identical diagnoses in this study set. In addition, there is often a tendency to focus on the most severe lesions present in a tissue, which can frequently overshadow lesser lesions. While this tacitly acknowledges the hierarchical nature of certain diagnoses, it also means that certain, often early, lesions may not be consistently diagnosed throughout all time points. For example, an early change such as glycogen depletion, which frequently preceded the occurrence of degenerative hepatocellular lesions in these studies, may be overlooked or obscured by the related but more severe lesion of hepatocyte necrosis. Furthermore, the grading schemes utilized for scoring some of the early lesions may no longer apply when the tissue has been altered by a destructive lesion such as necrosis. These issues pose a challenge to generating an appropriate visualization to summarize the data and identify relationships. Figure 1 illustrates the results of hierarchical clustering when all diagnoses were considered individually and with equal weight. This visualization is unsuitable in that it does not facilitate the identification of known, let alone novel, relationships. An additional complication arises when attempting to find correlations with alternative data types, such as gene expression microarray data. The issues described here can lead to inappropriate subject groupings, thereby resulting in erroneous associations.
Integration of Histopathological Data into Biological Categories
A method for compiling histopathological diagnoses into a set of discrete biological categories was devised to overcome these challenges. This method provides a foundation to integrate related or linked diagnoses, and prevents confounding synonymous diagnoses. The focus of this compendium study was acute hepatocyte necrosis. Thus, the observed diagnoses can be broadly assigned to 3 major categories: (1) Response to Hepatocellular Injury, (2) Inflammation, and (3) Repair. Table 2 details the individual diagnoses that constitute each of these categories. A modified scoring system was also developed. This system weights each of the individual diagnoses differently to account for the differences in severity of each diagnosis within the continuum of its assigned biological category. Thus, glycogen depletion receives a low weighting within the Response to Hepatocellular Injury category, while necrosis is weighted more heavily. A standard convention in gene expression analysis is to assign negative values to down-regulated transcripts and to indicate their degree of down-regulation using the color green at varying saturation levels.
In a similar way, up-regulated transcripts have positive values and are visualized in red. In keeping with this convention, categories that have negative effects on the liver were scored with increasingly negative values (according to the severity of their negative effect), while categories having positive effects were scored on a positive number continuum (Table 2). For the purpose of these representative visualizations we have limited the use of color to the red to green continuum, though the use of additional colors (e.g., a different color for every category) could be applied to visual the data in different ways. Figure 2 illustrates the results of hierarchical clustering when each subject animal was re-scored using this system. The resulting visualization enhances the ability to gain biological insight in several ways. For example, Figure 2 clearly illustrates that in most animals that display a noted Response to Hepatocellular Injury there is also an increase in Inflammation. This is not a surprising finding and is well-documented (Schwabe and Brenner, 2006). In contrast however, Figure 1 fails to illustrate this association. Figure 2 also depicts two groups (indicated with the red arrows) of subject animals that display a Response to Hepatocellular Injury and Inflamma-tion, as well as having indications of Repair in the histopathology data. Once again, this association is not readily apparent in the histopathology data prior to rescoring, as illustrated in Figure 1. A potentially useful follow-up to this overview visualization is to identify the subject animals within these groups, and to further analyze their gene expression profiles to determine if any mechanistic insight into their differential response can be established.
Since the original histopathology data is being compiled but not altered or lost, it is a simple exercise to recompile using different categories or by subdividing categories to facilitate visualizing the data to enable different points of emphasis. For example, by compiling the histopathology data into the 3 categories detailed here, all of the location information (e.g., centrilobular, midzonal, focal, diffuse, etc.) that was included in the original diagnoses is ignored. However, the chemicals that were selected for this study are known to have toxic effects within different regions of the liver. Therefore, it may provide additional biological insight if the necrosis diagnoses from different regions are separated from the other categories. Table 3 details the scoring method that was used for this revised approach and the result of performing hierarchical clustering on these data is depicted in Figure 3. Due to the number of samples being clustered, it is not always feasible to associate sample annotation information with a clustering heatmap because of size limitations of a printed page though this information is readily accessible when viewed with software programs capable of generating such visualizations.
For this reason, Supplementary Table 2 details the compound, dose, time, and animal number based on the hierarchical clustering order presented in Figure 3 (numbering begins with the leftmost column in this figure and proceeds to the right).
Re-scoring the data in this manner revealed that subject animals with the diagnosis of focal necrosis primarily (denoted with the blue bar) were associated with mid- and high-dose galactosamine treatment. However, a few individual diquat-treated animals at lower doses were also in this group. Interestingly, the group containing subject animals with the diagnosis of midzonal necrosis (denoted with the yellow bar) was comprised of 19 animals treated with higher doses of diquat and 1 monocrotaline-treated animal. These findings suggest a potential continuum within the phenotypic response to diquat that is related to the exposure dose. This is a testable hypothesis that can be developed directly as a result of the visualization presented in Figure 3.
However, even though the same root data is presented in Figure 1, this observation is not apparent and the hypothesis would remain undeveloped. This strongly suggests that compiling histopathological data into biological categories, re-scoring individual diagnoses to reflect the severity of their effect within their assigned categories, and visualizing these results provides a powerful mechanism for researchers to uncover important biological associations that can be developed into testable hypotheses.
Association of Transcript Profiling Data with Biological Categories
We next sought to illustrate how this approach can facilitate and guide analysis of related data, such as gene expression microarray data. In order to identify transcripts that correlated with the categorized histopathological diagnoses, a one-way ANOVA was performed using gene expression data derived from the same liver samples used for histopathology. In these analyses the Response to Hepatocellular Injury score for each of the 318 treated rats in this study (or a randomization of this score) was used as the factor level for the ANOVA. A total of 1,736 transcripts were found to vary significantly (p-value ≤ 1 × 10−7) across the factor levels, thereby suggesting that the expression levels of these transcripts are associated with the severity of hepatocellular injury. The gene expression data from all 318 rats for this group of transcripts was then used in a principal component analysis (PCA), which functions to reduce the dimensionality of data while highlighting similarities and differences within the data.
The resulting visualization from this analysis (Figure 4) reveals that the variability contained in the first principal component (the combination of variables that explains the greatest amount of variation) provides good separation of the individual animals based on their Response to Hepatocellular Injury score as indicated by the decreasing trend of scores from left to right in the figure. The separation is not perfect, which is expected when one considers that this is only a portion of the biological response to treatment that these animals undergo. Alternatively, if individual histopathological diagnoses comprising the Response to Hepatocellular Injury category were inappropriately re-scored, few transcripts would be identified by the ANOVA, and these transcripts would fail to generate a meaningful grouping of the data in a PCA.
When these same gene expression data for all 318 treated rats were used in 3 separate ANOVAs wherein the Response to Hepatocellular Injury scores were randomized, on average only 3 transcripts were found to be significantly different across the factor levels and the subsequent PCA did not generate a meaningful separation of the data (data not shown). Furthermore, when the 1,736 significant transcripts were subjected to gene ontology enrichment analysis, the enriched biological processes were in accordance with expectations for a Response to Hepatocellular Injury category: “response to stress,” “wound healing,” and “fatty acid metabolism” biological process categories were significantly enriched.
Assessment of the individual transcripts having the most significant impact on the data separation in Figure 4 (based on principle component loading factor scores) revealed associations with phase I and phase II metabolism and glycogenolysis, the catabolism of glycogen, which are consistent with increasing severity of hepatocellular injury. Levels of transcripts encoding enzymes involved in this process, including glycogen phosphorylase (Pygl), glycogen debranching enzyme, and glucose 6-phosphate (G6pc) were found to be strongly associated with the Response to Hepatocellular Injury categorical score.
Additionally, vasopressin has been shown to have a regulatory influence on liver glycogenolysis (Whitton and Hems, 1976), and arginine vasopressin receptor 1A (Avpr1a) transcript levels had the highest principal component #1 loading score. A total of 108 transcripts whose variance correlated with the Repair category were identified using the same ANOVA approach as was used for finding associations with the Response to Hepatocellular Injury category. Based on gene ontology enrichment analysis, these transcripts are associated with biological processes involved in “cell division” and “response to DNA damage stimulus,” which is consistent with expectations for a Repair category. Taken together, this suggests that grouping histopathology data into biological categories and generating an appropriate weighted score for this category is not only a valid approach to handling phenotypic data, but also provides a meaningful way to guide analyses of associated data.
Here we have presented an approach to integrate histopathology data into biological categories as a means to facilitate phenotypic data visualization and enhance biological insight. The study that was used as an example to demonstrate the utility of this approach is focused on acute hepatic toxicity. However, we believe that this approach can be broadly applied to other studies that also have associated histopathology data. The success of this approach is based on the meaningful integration of related histopathological diagnoses; therefore, it is important that consistent terminology and scoring be used in the primary assessment of histopathology samples and the generation of pathology reports for each animal.
The need for integration of diagnoses into categories also highlights the need for cross-disciplinary interactions not only during the experimental planning stages, but also during the downstream analyses after results have been generated. However, once proper integration of histopathology data has occurred, the ease at which these data can be visualized to gain insight into associations and relationships increases and these visualizations can then serve as a useful tool in subsequent analyses of data generated from high-dimensional assays.
Footnotes
Acknowledgments
The authors would like to express their appreciation to Ray Tennant, Rick Paules, and the ToxPath team of the National Center for Toxicogenomics for performing the compendium study that served as a basis for this work and for insightful discussion regarding this visualization approach. Additional gratitude goes to Imran Shah, Joel S. Parker, Susan A. Elmore, and Ronald A. Herbert for insightful comments and stimulating discussion on this concept. This research was supported in part by the Intramural Research Program of the National Institute of Health and the National Institute of Environmental Health Sciences and has been funded in part with Federal funds from the National Institute of Environmental Health Sciences, National Institute of Health, under Contract Numbers N01-ES-25497 and NIH-ES-33513.
