Abstract
Hepatocellular carcinoma (HCC) is the third leading cause of cancer-related death worldwide; however, the mutational properties of HCC-associated carcinogens remain largely uncharacterized. We hypothesized that mechanisms underlying chemical-induced HCC can be characterized by evaluating the mutational spectra of these tumors. To test this hypothesis, we performed exome sequencing of B6C3F1/N HCCs that arose either spontaneously in vehicle controls (n = 3) or due to chronic exposure to gingko biloba extract (GBE; n = 4) or methyleugenol (MEG; n = 3). Most archived tumor samples are available as formalin-fixed paraffin-embedded (FFPE) blocks, rather than fresh-frozen (FF) samples; hence, exome sequencing from paired FF and FFPE samples was compared. FF and FFPE samples showed 63% to 70% mutation concordance. Multiple known (e.g., Ctnnb1T41A, BrafV637E) and novel (e.g., Erbb4C559S, Card10A700V, and Klf11P358L) mutations in cancer-related genes were identified. The overall mutational burden was greater for MEG than for GBE or spontaneous HCC samples. To characterize the mutagenic mechanisms, we analyzed the mutational spectra in the HCCs according to their trinucleotide motifs. The MEG tumors clustered closest to Catalogue of Somatic Mutations in Cancer signatures 4 and 24, which are, respectively, associated with benzo(a)pyrene- and aflatoxin-induced HCCs in humans. These results establish a novel approach for classifying liver carcinogens and understanding the mechanisms of hepatocellular carcinogenesis.
Keywords
Hepatocellular carcinoma (HCC) is a leading cause of cancer-related deaths worldwide (Global Burden of Disease Cancer Collaboration et al. 2017), and its incidence is increasing (Balogh et al. 2016). Efforts to understand the molecular basis of HCC and other cancers have motivated researchers to establish databases to classify the spectra of genomic changes involved in subtypes of human cancers (https://www.genome.gov/17516564/the-cancer-genome-atlas/; http://icgc.org/; and http://cancer.sanger.ac.uk/cosmic/signatures). Multiple large-scale analyses suggest that tumors carry “mutational signatures,” which are unique combinations of mutations that are a consequence of genetic alterations influenced by various environmental exposures (Helleday, Eshtad, and Nik-Zainal 2014; Alexandrov et al. 2013). The mutational signature is determined by the frequency of the unique combination of the base substitutions within the trinucleotide context (5′ and 3′ to each mutated base; Alexandrov et al. 2013). Because the extent and types of chemical exposure in humans are not always quantifiable, efforts have been made to develop model systems for assessing the effects of chemical exposure in immortalized rodent or human cells (Olivier et al. 2014; Nik-Zainal et al. 2015; Severson et al. 2014) and in rodent tumors (Westcott et al. 2015). On the basis of these analyses, more than 30 mutational signatures have been identified, some of which have been attributed to specific environmental agents. The signatures are conserved in human and rodent cancers and, as a result, are expected to have direct translational relevance (Schulze et al. 2015).
B6C3F1/N mouse, which is the F1 hybrid of C57BL6/N and C3H/HeN mice, has been used to evaluate toxicity and carcinogenicity by the National Toxicology Program (NTP) due to its ability to correctly identify several known carcinogens (A. King-Herbert and Thayer 2006; A. P. King-Herbert, Sills, and Bucher 2010). Although the B6C3F1/N mouse genome has not been fully assembled, HCCs from these mice have been characterized for the guanosine triphosphatase-encoding oncogene Hras, which is preferentially mutated in spontaneous HCCs, and the Wnt signaling pathway gene Ctnnb1, which is preferentially mutated in some chemically induced HCCs (Devereux et al. 1999; Hoenerhoff et al. 2013). Because of their well-characterized mutation patterns, these genes can be used to validate the viability, power, and accuracy of the exome sequencing approach for identifying additional mutations in B6C3F1/N mouse HCCs. Direct comparison of the mutational spectra for aflatoxin B1 from human and mouse HCCs shows that they are remarkably similar (Chawanthayatham et al. 2017). Therefore, the B6C3F1/N mouse provides a relevant model system for comprehensive investigation of the mutational responses to environmental carcinogens.
In previous NTP studies, B6C3F1/N mice were used to identify the carcinogenic potential of ginkgo biloba extract (GBE; NTP 2013) and methyleugenol (MEG; NTP 2000). The selection of GBE, a widely used herbal supplement in traditional Chinese medicine for a variety of central neurodegenerative disorders (Nash and Shah 2015), was based on concerns about its potential human health risks and its wide-spread use as a nutraceutical. Additionally, MEG, a plant-derived phenylpropanoid chemical that is commonly used as a flavoring agent in food products, was selected because it bears structural resemblance to safrole, which is a known carcinogen (Bode and Dong 2015). Our results and those of others verified that both of these compounds have carcinogenic activity but indicated that their mechanisms of carcinogenesis may differ fundamentally in terms of their levels of genotoxicity (nongenotoxic for GBE; genotoxic for MEG; Li et al. 2009; Umegaki et al. 2007; Hoenerhoff et al. 2013; Maeda et al. 2014, 2015; Burkey et al. 2000; Miller et al. 1983; Zhou et al. 2007). Genotoxic or nongenotoxic classification of chemical compounds is based on various in vitro and in vivo genotoxicity assays (Smith et al. 2016), where genotoxic carcinogens induce DNA damage or mutations and nongenotoxic carcinogens do not directly alter DNA, but instead influence cellular transformation secondarily, in response to proliferative or metabolic effects. Despite the circumstantial and mechanistic evidence linking GBE and MEG to cancer, the mutational signatures for these compounds have not been determined.
To explore the potential utility of mutational spectra in characterizing the mechanisms of GBE and MEG, we performed exome sequencing on HCCs from B6C3F1/N mice. Three HCCs that arose in untreated animals (spontaneous), four HCCs from animals treated with high-dose levels of GBE, and three HCCs from animals treated with MEG were sequenced. As a control, exome sequencing was also performed on DNA extracted from nontumor normal liver tissues from untreated age-matched vehicle control animals (n = 3) from the GBE and MEG studies. Most tumor samples from NTP studies are not archived as fresh-frozen (FF) samples but are instead stored as formalin-fixed paraffin-embedded (FFPE) blocks following pathology evaluation. However, FFPE sample preparation is notorious for introducing DNA degradation and base modification (including cytosine deamination; Do and Dobrovic 2015). Therefore, to determine the feasibility of using NTP FFPE samples, we prepared paired FF and FFPE samples from each HCC and subjected them to exome sequencing in parallel.
Material and Method
Mouse Strains, Tumor Induction, and Preparation of Samples
HCCs from male and female B6C3F1/N mice (F1 hybrid mice from a C57BL6/N dam and a C3H/HeN sire) were procured from archival tissue from the NTP chronic GBE and MEG bioassays (NTP 2000, 2013). The mice were exposed daily by oral gavage for 2 years to 2,000 mg/kg GBE (n = 4); 37, 75, or 150 mg/kg MEG (n = 3); or a vehicle control (n = 3). HCCs that developed after 2-year exposures were collected and one half was FF in liquid nitrogen and the other half was collected in 10% neutral buffered formalin and routinely processed and embedded in paraffin blocks (FFPE) within 24 to 48 hr. Representative slides were stained with hematoxylin and eosin for microscopic examination. As a technical reference and also as a control for germ-line mutations, nontumor normal liver tissues from untreated age-matched vehicle control animals were also collected from the GBE and MEG bioassays (n = 3). Experiments were performed in compliance with the requirements set forth by the Guide for the Care and Use of Laboratory Animals.
DNA Isolation and Exome Sequencing
DNA was extracted from FF and FFPE normal liver and GBE-treated, MEG-treated, and spontaneous HCCs with DNeasy Tissue kits (Qiagen, Valencia, CA). For genomic library preparation, paired-end libraries were prepared using the Paired-End DNA Sample Preparation Kit (Illumina, San Diego, CA). Exome sequences were enriched from the genomic libraries according to the manufacturer’s protocol with the SureSelect Mouse Exome kit (G7550A-001: Agilent, Santa Clara, CA). Sequencing was performed on the Illumina HiSeq2000 according to the manufacturer’s protocol. The 26 libraries were indexed and run on a total of 13 lanes (2 samples per lane) to generate an average of ≥ 50× depth of coverage required for the analyses.
Identification of Germ-line Variants in the B6C3F1 Genome
Due to the lack of an assembled genome for the B6C3F1/N mouse, we used the assemblies of the C3H/HeN and C57BL6/N genomes from the NIEHS Mouse Methylome Project (National Institutes of Health/National Institute of Environmental Health Sciences [NIH/NIEHS] grant #ZI-ES103146-01; Sara Grimm, NIEHS [personal communication, 2018]; GEO Accession: GSE106379) to identify the differences between the C3H/HeN and C57BL6/N genomes using MUMmer (version 3.23). Differences (single-nucleotide variation, small insertions/deletions events, and multinucleotide variations) between the C3H/HeN and C57BL6/N genomes were identified, and those that were on exons (97,962 genomic locations; 0.15% of the exonic bases) were compiled as a “truth set” of known germ-line variants in the B6C3F1/N mouse.
Identification of Variants in HCC and Filtering against B6C3F1/N Germ-line Variants
Before analyzing the exome sequences, FASTQ files were subjected to quality control with the FastQC tool (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). The low-quality bases at the ends of the reads were removed. The exome sequencing reads were then aligned to the C3H/HeN genome using the Burrows–Wheeler aligner (BWA) MEM algorithm (version 0.7.12). Aligned reads were preprocessed using a Genome Analysis Toolkit (GATK) best practices workflow consisting of marking duplicates, realigning around insertions and deletions of nucleotides (indels), and recalibrating the base quality scores (Van der Auwera et al. 2013; McKenna et al. 2010). The preprocessed exome sequencing reads from each individual tumor sample were used to make tumor versus normal variant calls using the MuTect tool (version 1.1.4; Cibulskis et al. 2013). In the initial preprocessing step, the MuTect tool ignores reads with too many mismatches or very low-quality scores since these represent noisy reads. The MuTect tool used default cutoffs of at least 14 reads in the tumor and at least 8 in the normal. For the normal variant calls, pools of either the three FF or the three FFPE samples were used. The “truth set” of known variants between the C3H/HeN and C57BL6/N genomes were used to classify the variants as somatic or germ line. MuTect was used for variant calling in the C3H exonic regions.
Classification of Variants with Predicted Oncogenic Function
The variants were functionally annotated using the SnpEff tool (Cingolani et al. 2012) to predict their effects. The following functional information on each variant was obtained: type of the variant (synonymous/nonsynonymous), functional class (silent, missense, and nonsense), and amino acid and codon change. Subsequently, the nonsynonymous mutations were subjected to Sorting Intolerant from Tolerant (SIFT) analysis (http://sift.bii.a-star.edu.sg/downloads/SIFTBLink.tar.gz), which considers the degree of conservation of amino acid residues of closely related sequences identified through PSI-BLAST, to predict whether the amino acid change would be tolerated (score ≥ 0.05) or would affect protein function (score < 0.05). The variants were then compared with the known cancer driver mutations from NIEHS Gene alteration in cancer database (GACDB; Jackson et al. 2006; Lea et al. 2007), Integrative Oncogenomics database (IntOGen; Gonzalez-Perez et al. 2013), and Catalogue of Somatic Mutations in Cancer (COSMIC database; Forbes et al. 2017). Sanger sequencing was used to confirm some of these variants.
Analysis of Mutational Spectra
Mutational spectra were obtained by examining the trinucleotide context (one nucleotide before and one nucleotide after the mutation). The frequency of occurrence of each such “motif” was computed using the R package “Somatic Signatures.” Six substitution subtypes were included based on the pyrimidine of the mutated Watson–Crick base pair: C > A, C > G, C > T, T > A, T > C, and T > G. In addition, the substitutions were examined by incorporating information on the nucleotides immediately 5′ and 3′ to each mutated nucleotide generating 96 possible mutation types (6 types of substitutions × 4 types of nucleotides at the 5′ position × 4 types of nucleotides at the 3′ position).
To classify the mutational signatures obtained in our study, the known signatures for mutational processes in human cancers were downloaded from COSMIC (http://cancer.sanger.ac.uk/cosmic/signatures.txt). Hierarchical clustering based on the Kullback–Leibler distance method was used to compare the mutational signatures in human cancers to the mutational spectra from the mouse HCC FF/FFPE exome sequencing data after accounting for the differences in the trinucleotide composition of the human and mouse exomes.
Results
Identification of Germ-line Variants in the B6C3F1/N Mouse
The genome of the standard NTP mouse model B6C3F1/N is not assembled. Therefore, we performed a series of initial analyses to identify differences between the C3H/HeN and C57BL6/N genomes using MUMmer (version 3.23) and the genome assemblies from the NIH/NIEHS B6C3F1/N Mouse Methylome Project (grant #ZI-ES103146-01). Differences (single-nucleotide variation, small insertions/deletions events, and multinucleotide variations) between the C3H/HeN and C57BL6/N genomes were identified at 7,087,821 genomic locations. Among these, 97,962 locations were on exons (0.15% of the exonic bases; data not shown). This set of differences between the parental genomes was used as a “truth set” of known germ-line variants in the B6C3F1/N mice for comparison to the tumors.
To obtain a reliable set of tumor-specific variants from the exome sequencing data, we first filtered out the low-quality reads and aligned them to the C3H/HeN genome using the BWA mem algorithm (version 0.7.12). Analysis of the number of total and aligned reads from each of the tumor and normal tissue samples verified that a high percentage (> 90%) of both the FF and FFPE samples aligned to the C3H/HeN genome (Online Supplementary Table S1). Furthermore, 60% to 90% of the bases in coding exons had a depth coverage of 50 reads or more for the majority of samples (Online Supplementary Figure S1). The exception was for the normal samples 2 and 3 that were processed by the FFPE method and had a much lower number of target bases (27% to 45% vs. 90% to 95% for the other samples). These findings verify the uniformity and high quality of the FF sample set and suggest that two of the FFPE reference samples may have been compromised in their quality.
Next, we preprocessed the aligned reads using a GATK best practices workflow consisting of marking duplicates, realigning the sequences around indels and recalibrating the base quality scores. These preprocessed exome sequencing reads were used to make tumor versus normal variant calls using the MuTect tool (version 1.1.4). The “truth set” of known variants between the C3H/HeN and C57BL6/N genomes were used to classify the tumor variants as somatic or germ line, and the final sets of tumor-specific variants for each sample were compiled for further functional analyses (Figure 1).

Steps used to obtain and analyze sets of tumor-specific variants. Raw reads from fresh-frozen (FF) and formalin-fixed paraffin-embedded (FFPE) normal tissues and individual tumor samples were filtered to remove low-quality reads, mapped to the C3H genome, and subjected to a GATK best practices workflow protocol. Then, the resultant data sets for each tumor sample were compared to the pooled normal variant data set for the corresponding FF or FFPE samples. Germ-line variants were eliminated by removing the C3H versus BL6 variants in the “truth set.” The tumor-specific variants were then subject to mutational spectra analysis or filtered for nonsynonymous variants and low SIFT scores prior to comparison to known mutations.
Identification of Known and Novel Mutations in HCCs from B6C3F1/N Mice
To assess the accuracy and sensitivity of exome sequencing, we first evaluated the oncogenic mutations in Ctnnb1 and Hras that were identified by exome sequencing of the FF and FFPE samples and compared them to data obtained by Sanger sequencing of the FF samples from each tumor (Table 1). As expected (Devereux et al. 1999; Hoenerhoff et al. 2013), mutations in Ctnnb1 were identified only in chemically induced HCCs, whereas mutations in Hras were identified in both spontaneous and chemically induced HCCs. A variety of different Ctnnb1 variants was identified by the Sanger method in GBE- and MEG-induced HCCs, whereas only a subset of these mutations (T41A in a single GBE tumor and D32N and D32Y in two different MEG tumors) were identified by exome sequencing. Furthermore, these three Ctnnb1 mutations (T41A, D32N, and D32Y) were detected only in the FF samples, whereas most of the variants of the Hras gene that were identified by Sanger sequencing were also detected by exome sequencing of both the FF and FFPE samples. Detailed analysis of the individual HCCs demonstrated that each had a unique pattern of Ctnnb1 and Hras variants (Online Supplementary Table S2), suggesting that heterogeneity exists among the variant-positive HCCs. This result was verified by Sanger sequencing of an independent set of spontaneous, GBE- and MEG-induced HCCs (n = 10 each), which also showed a nonuniform distribution of the variants and a propensity for Ctnnb1 variants in chemically induced HCCs and Hras in spontaneous HCCs (Table 1, last three columns).
Variants of Ctnnb1 and Hras Identified by Sanger Sequencing and Exome Sequencing.
Note: GBE = ginkgo biloba extract; MEG = methyleugenol; FF = exome sequencing for fresh-frozen tissue; PE = exome sequencing for formalin-fixed paraffin-embedded tissue.
To assess the composition of additional variant genes in each tumor tissue, we evaluated the matches to known variants in cancer-related genes from COSMIC (Forbes et al. 2017), IntOGen (Gonzalez-Perez et al. 2013), and NIEHS GACDB (Jackson et al. 2006; Lea et al. 2007). Several mutations that align with known human mutations in cancer-related genes were identified in the MEG tumors, many of which were calculated to have SIFT scores of < 0.05, indicating a greater potential for altered biological effects (Table 2). Among these known mutations, variants were identified in Braf (Shiraha, Yamamoto, and Namba 2013), Elmo1 (Jiang et al. 2011), and Gnas (Nault et al. 2012), each of which has been established to play a role in the pathogenesis of HCC.
Variants from Exome Sequencing of B6C3F1/N Mouse Hepatocellular Carcinomas Which Correspond to Known Mutations in Human Cancer-related Genes from Publicly Available Databases.
Note: GBE = ginkgo biloba extract; MEG = methyleugenol; FF = fresh-frozen tissue; PE = formalin-fixed paraffin-embedded tissue; COSMI = Catalogue of Somatic Mutations in Cancer; IntOGen = Integrative Oncogenomics database.
To further extend the array of cancer-related gene variants identified in HCCs from spontaneous, GBE, and MEG-treated mice, we performed a global search of uncharacterized mutations within known cancer-related genes. Several previously uncharacterized mutations had SIFT scores < 0.05, the majority of which were in tumors from mice treated with the higher MEG doses (Online Supplementary Table S3). Some notable examples include a C559S mutation in Erbb4 (Soung et al. 2006), an A700V mutation in Card10 (Wang et al. 2001), and a P358L mutation in Klf11 (Fluhr et al. 2017), all of which have not been previously characterized, although the genes have established roles in HCC or other cancers. Several variants were also identified exclusively in spontaneous or GBE-induced HCCs, but not in MEG-induced HCCs, such as a G97C mutation in Ulk1 (Lee and Jang 2015; Xu et al. 2013) in a spontaneous HCC sample and a D226E mutation in Actn1 (Feng et al. 2013) in a GBE-induced HCC sample. These results further support the validity of our model and suggest the potential utility of the exome sequencing approach for new gene mutation discovery.
MEG HCCs Display a Dose-dependent and Higher Mutational Burden Than GBE or Spontaneous HCCs
To achieve a global perspective on the effects of the two chemicals on mutagenicity, we evaluated the number of nonsynonymous variant calls for each treatment group. The mutation spectra and total mutation counts from FF and FFPE tissues for each of the exposures are presented in Online Supplementary Figures S2 and S6, respectively. The overall mutation burden was higher in HCCs from MEG-treated mice than from GBE-treated mice or HCCs arising spontaneously. Furthermore, a dose-related increase in mutation burden was observed for the MEG-treated mice (Figure 2A). This pattern was replicated for both FF and FFPE samples, though the overall number of variant calls was uniformly greater for the FFPE samples. The dose-dependent increase in mutation burden was also observed for combined nonsynonymous and synonymous variants (Online Supplementary Table S4).

Nonsynonymous mutational burdens in fresh-frozen (FF) and formalin-fixed paraffin-embedded (FFPE) samples. (A) The total numbers of nonsynonymous mutations are shown for sets of spontaneous (n = 3), gingko biloba extract (GBE; n = 4), and methyleugenol (MEG; n = 3) B6C3F1/N mouse Hepatocellular Carcinomas (HCC) that were processed as FF and FFPE samples. The inset shows the numbers of nonsynonymous mutations identified in individual MEG samples from mice treated at the specified dosages. (B) Venn diagram of the total numbers of nonsynonymous mutations identified uniquely in the FF or FFPE samples and in both FF and FFPE samples for spontaneous, GBE, and MEG sets of HCCs at higher (≥0.1) or lower (<0.1) alternate allele frequency. The concordance was calculated as a percentage of FF mutations that were also identified in the FFPE samples.
MEG HCCs Display Distinct Mutational Spectra
To determine whether MEG and GBE might preferentially induce mutations in specific nucleotides, we generated mutational spectra of the FF and FFPE exome sequencing samples based on the trinucleotide context as described in the Method section (Online Supplementary Table S7). The spectrum plots (Figure 3A) show some unique differences in the spectral patterns for the MEG samples compared to the GBE and spontaneous HCCs, which were similar to each other.

Analysis of mutational spectra of B6C3F1/N mouse Hepatocellular Carcinomas that arise either spontaneously or due to chronic exposure to gingko biloba extract or methyleugenol. (A) The spectra of mutations based on the trinucleotide content (one nucleotide before to one nucleotide after the mutation). (B) Hierarchical clustering based on the Kullback–Leibler distance method after accounting for the differences in the trinucleotide composition of the human and mouse genomes. The six sets of samples were classified together with 30 “signatures” from Catalogue of Somatic Mutations in Cancer database.
To further examine these differences, we performed clustering using the Kullback–Leiber distance measure for the mutational spectra of FF and FFPE samples and 30 publicly available COSMIC signatures that have been previously characterized (Forbes et al. 2017). The FF and FFPE MEG tumors clustered closest to a cluster containing signatures 4 and 24, which has been observed in a subset of human liver cancers, including cancers with known exposure to benzo(a)pyrene and aflatoxin B1 (Figure 3B; Chawanthayatham et al. 2017). Furthermore, the mutational spectra for both the FF and FFPE GBE samples clustered with those of the spontaneous HCCs near signatures 2, 7, 11, 19, and 23.
Single-nucleotide Variants (SNV) Concordance between FF and FFPE Samples
To assess the concordance between the FF and FFPE samples, we examined the overlap in the exonic nonsynonymous variants identified in paired samples. Approximately 63% to 70% of the variants in the FF exome sequencing data set were also represented in the FFPE exome sequencing data set (Figure 2A). The primary reason for the discordance between the FF and the FFPE mutations is the poor signal in 2 of the 3 FFPE normal samples. This could be due to the degradation, library prep, sample handling, and so on, of the FFPE normal samples. Also, we noted that at a very low alternate allele frequency (< 0.1), the concordance between FF and FFPE was reduced (8% to 40%; Figure 2B). These results suggest that variant calls in FFPE samples preserve the majority of the mutations from FF samples, but that FFPE sample variants that are of higher alternate allele frequency are likely to be most reliable. However, when we used a more stringent cutoff of alternate allele frequency ≥ 0.2 and it produced a slightly higher rate of concordance (∼73% to 77%) between FF and FFPE (data not presented).
To gain further insight into potential sources of discordance in the exome sequencing data, we performed Sanger sequencing of a concordant gene (Braf) and a discordant gene (Bves). The BrafV637E mutation, which was identified by exome sequencing in both FF and FFPE samples, could be verified by Sanger sequencing. However, the three variants of Bves (M166K, R180S, and P195L), which showed discordance in the FF and FFPE samples for mice from different groups, were not verifiable by Sanger sequencing (Online Supplementary Table S5). We also noted that the alternate allele frequency for Bves (average 0.12; range 0.059 to 0.32) was lower than the alternate allele frequency for Braf (0.368).These results suggest that the discordance in our exome sequencing data could potentially be explained, in part, by a level of false positivity, which has been shown to be inherent to the exome sequencing method (Mu et al. 2016) and which may be more prominent for mutations that have low alternate allele frequency.
We also reexamined the discordant Ctnnb1 variants from Table 1 to determine whether low alternate allele frequency may explain their observed discordance in the FF and FFPE samples. However, the alternate allele frequencies of the discordant Ctnnb1 variants (D32N: 0.25; D32Y: 0.375; and T41A: 0.375) were relatively high, suggesting that other factors may contribute to discordance. Further investigation demonstrated that Ctnnb1 had an unusually low number of supporting reads (8–10 reads), which could provide an alternate potential explanation for discordance between FF and FFPE samples. We took a very careful look at all the reads in the FF and FFPE samples for the Ctnnb1 variants. From this investigation, we found that the low number of supporting reads and noisy signal in the FFPE normal samples may be the cause of the discordance between the FF and FFPE samples for the Ctnnb1 gene.
To determine whether the inflated number of variant calls in the FFPE samples might be related to an increase in the incidence to C to T mutations, which is a common artifact of FFPE processing (Do and Dobrovic 2015), we examined the percentage of C:T transitions among the variants. However, the number of C:T variants in FFPE samples (spontaneous, 15%; GBE, 21%; and MEG, 9.4%) did not follow a consistent trend relative to the number of C:T variants in FF samples (spontaneous, 21.4%; GBE, 12.8%; and MEG, 11.0%; data not shown).
Discussion
Similar to other cancers, HCCs are characterized by diverse networks of molecular events, which include the introduction of genomic mutations, the modulation of epigenetic regulation (histone modification and miRNAs/lncRNAs), the dysregulation of gene expression profiles, and the modulation of posttranslational modification. Emerging research studies have demonstrated that external factors, such as chemical carcinogen exposure, also leave a “signature” on the HCCs at the genomic, epigenetic, and transcriptional levels (Westcott et al. 2015; Lin et al. 2017; Shi et al. 2016). In this study, we sought to characterize the genomic alterations caused by exposure of B6C3F1/N mice to GBE and MEG for two years. We also sought to determine whether a mutational signature for these carcinogens could be identified for both FF and FFPE tumor sample preparation methods. Our results defined a mutational signature for MEG that was similar in both FF and FFPE samples. Furthermore, we uncovered an array of mutations in cancer-related genes in spontaneous, GBE, and MEG HCCs, including several that have not been previously identified. These findings suggest an approach for mechanistic characterization of suspected carcinogens, as well as a strategy for identifying novel molecular biomarkers of HCC.
Although GBE and MEG have each been shown to contribute to HCC development in NTP carcinogenicity bioassays (NTP 2000, 2013), evidence suggests that they act through disparate mechanisms. GBE has traditionally been categorized to be nongenotoxic due to early evidence that it activates several nuclear receptors and induces xenobiotic metabolizing enzymes (Li et al. 2009; Umegaki et al. 2007). Furthermore, GBE induces hepatocellular hypertrophy in B6C3F1/N mice and modulates the expression of genes that control oxidative stress and proliferation (Hoenerhoff et al. 2013). These findings are supported by mechanistic evaluation of GBE in constitutive androstane receptor knockout mice, which confirmed a predominant nongenotoxic mode of carcinogenicity (Maeda et al. 2014, 2015). However, results from in vitro bacterial gene mutation assays (Ames assays) that were performed using the same lot of GBE suggest that GBE is mutagenic to different bacterial strains (Hoenerhoff et al. 2013). Thus, although the mechanism of GBE is likely to be primarily nongenotoxic in mice, it is conceivable that high doses of GBE in the Ames assay may contribute to some level of genotoxicity as evidenced by bacterial assays.
The mechanism of carcinogenesis induced by MEG, on the other hand, is likely to be genotoxic. Although MEG is not mutagenic in bacterial assays, evidence suggests that it can be metabolically converted in vivo to a DNA reactive electrophile, 1-hydroxymethyleugenol, which induces DNA adduct formation and HCCs in male B6C3F1 mice (Burkey et al. 2000; Miller et al. 1983). MEG and other structurally similar compounds (e.g., safrole and estragole) can produce reactive intermediates that are capable of forming DNA adducts that result in genotoxicity (Zhou et al. 2007).
The B6C3F1/N is the standard mouse model for NTP bioassays and the HCCs examined in this study were derived from bioassays using this mouse model (A. King-Herbert and Thayer 2006; A. P. King-Herbert, Sills, and Bucher 2010). However, because their genome is not assembled, we had to incorporate a series of additional analyses after exome sequencing to rule out single-nucleotide polymorphisms and germ-line mutations and to ensure the reliability of the single-nucleotide variant calls. The resultant data sets displayed >90% alignment with the C3H/HeN genome. To further eliminate nonsomatic mutations, we used nontumor liver tissues from vehicle control age-matched mice as reference samples. Because the mutagenic properties of chemicals typically involve specific structural elements that reside in DNA from any species, these findings are expected to have translational relevance in terms of the mechanistic action, though it is likely that the repertoire of genetic mutations that leads to HCC may have some features of species specificity, especially for chemicals with a nongenotoxic mode of action.
Although FFPE samples are notoriously compromised in their genomic quality (Do and Dobrovic 2015), they constitute a vast resource of patient and experimental animal tumor samples for biomedical research throughout the world. Therefore, an important component of our work was the comparison of data resulting from FF and FFPE samples obtained from the same tumor mass. Although the FFPE variants aligned well to the C3H genome, the variants for two of the FFPE nontumor control samples displayed a reduced depth of coverage, which could be due to reduced DNA quality from FFPE samples. Furthermore, the FF and FFPE samples displayed mutations in sets of genes that overlapped by approximately 60% to 70%. By comparison, approximately 70% to 80% concordance was obtained in a recent study using human tumors that were prepared by FF and FFPE methods (Hedegaard et al. 2014). Given that heterogeneity of HCCs is common, even within adjacent parts of the same tumor mass, incomplete concordance between variant sets might be expected for any study that compares adjacent samples (Lin et al. 2017). In this study, every HCC mass was dissected into 2 halves with each of them either FF or FFPE. DNA was isolated from the entire HCC section from the FFPE blocks. For FF HCCs, a part of the frozen tumor was used for DNA extraction. Hence, there is a chance that sample heterogeneity may have contributed to the discordance to some extent. Notably, the concordance between the FF and FFPE samples in our study was reduced for variants in genes with lower alternate allele frequency, for which several Sanger sequencing–verified mutations were incompletely detected by exome sequencing. These results suggest that there were qualitative differences in the content of the FF and FFPE variant sets that could be explained, in large part, by variable representation of low alternate allele frequency genes. Additional Sanger sequencing of a concordant gene (Braf) and a discordant gene (Bves) demonstrated that the discordance might also be explained, in part, by an inherent level of false positivity in the experimental assay. A low level of false positivity in next-generation sequencing (approximately 1.3%) has been shown to occur primarily in complex genomic regions; however, raising the quality-score threshold to a predicted zero false positive rate has led to a significant (2.2%) loss of detection for Sanger-confirmed variants (Mu et al. 2016). Therefore, though raising the threshold could improve the quality of the data, it could also lead to loss of information.
In addition to the qualitative differences in the FF and FFPE samples indicated by the incomplete concordance, the samples in our study differed quantitatively in that more variant calls were uniformly observed in FFPE samples as compared to FF samples. This finding could be contributed in part by a slightly greater number of sequence reads that were performed for the FFPE samples than for the FF samples (Online Supplementary Table S1) or by the reduced quality of two of the FFPE normal reference tissue samples (Online Supplementary Figure S1), which may have resulted in inflation of the tumor:normal variant ratio. Alternatively, it is possible that the FFPE preparation method introduced more artifactual mutations. We did not, however, observe a consistent pattern of increase in C:G > T:A mutations that are a common artifact due to spontaneous deamination reactions secondary to prolonged formalin fixation (Do and Dobrovic 2015). The absence of increased abundance of C:G > T:A mutations in the FFPE samples compared to the FF samples may be related to the relatively short time (24 to 48 hr) of formalin fixation in these samples that have a frozen tissue counterpart (Prentice et al. 2018). The artifactual increase in C:G > T:A mutations may be remedied by pretreating with uracil N-glycosylase (UNG). In contrast to the FFPE samples in this study, the majority of the FFPE archival samples in NTP rodent carcinogenicity bioassays are fixed in formalin for weeks due to the large number of animals (∼800) at study termination and NGS studies on these samples may benefit UNG pretreatment. We are currently conducting a prospective study in order to identify the time that leads to increased C:G > T:A mutations during formalin fixation.
Despite the differences in the FF and FFPE data sets, the most obvious trend in the overall mutational burden was that it was much higher for MEG HCCs than for GBE or spontaneous HCCs and displayed a dose-dependent pattern of increase for MEG HCCs regardless of the sample preparation method. The latter findings suggest that the potent genotoxic effects of MEG treatment superseded the subtler effects of the FFPE versus FF preparation method on the data sets. The most important consideration in favor of this possibility is that the hierarchical clustering revealed a similar mutational spectra signature for the FF and FFPE samples for each of the paired samples. Based on these findings, the approach used in this study should be applicable to both FF and FFPE samples, given that the effects of the chemical carcinogen are sufficiently potent, though additional filters, such as the application of alternate allele number restriction and the use of additional replicates, may help to ensure the quality of the data for FFPE samples.
Our results also suggest the possibility of using this approach to verify the roles of mutations that have been described in the literature and to identify new roles for previously uncharacterized mutations, genes, and pathways, which could help to establish potential novel biomarkers and therapeutic targets. By comparison with databases that catalogue known mutations in cancer-related genes, we revealed variants that align with mutations in the Wnt-signaling activator Ctnnb1 for chemically induced tumors and in the mitogen-activated protein kinase/extracellular-signal regulated kinase pathway activator Hras for spontaneous tumors, which is consistent with the induction of HCC by distinct carcinogenic processes (Hoenerhoff et al. 2013). Other mutations that are predicted to affect protein function were identified in several genes that are known to participate in the development of HCC, including Braf (Shiraha, Yamamoto, and Namba 2013), Elmo1 (Jiang et al. 2011), and Gnas (Nault et al. 2012). We also identified numerous variants in genes that are known to be associated with HCC and other cancers. These include a C559S mutation in Erbb4, a receptor tyrosine kinase that harbors at least 8 other exonic mutations in HCC and various other types of cancers (Soung et al. 2006); an A700V mutation in Card10, a member of the caspase recruitment domain/guanylate kinase class of signaling proteins that interacts with BCL10 and NF-κ B (Wang et al. 2001); and a P358L mutation in Klf11, a Krüppel-like transcription factor that is dysregulated at the epigenetic level in cancers of the blood. A G97C mutation in the autophagy regulator Ulk1, which has an established role in HCC and in a wide variety of other cancers (Lee and Jang 2015; Xu et al. 2013), was identified in a spontaneous tumor sample in our study, while a D226E mutation in the focal adhesion gene Actn1 (Feng et al. 2013) was identified in a GBE sample. Thus, our analysis may also contribute to the understanding of functional roles for previously uncharacterized mutations.
The clustering of MEG induced HCCs very close to other liver cancer signatures such as signatures 4 and 24 based on its mutational spectra is a particularly compelling finding of this study, given the extreme potency of the prototypical signature 4 carcinogen benzo(a)pyrene and tobacco carcinogens, and 24 carcinogen aflatoxin B1, which is a common contaminant of a variety of food products (Gross-Steinmeyer and Eaton 2012), suggesting these chemicals may have potentially similar mutagenic mechanisms. As a genotoxin, aflatoxin has been proposed to induce lipid peroxidation and generate aldehydes that inhibit DNA repair, leading to p53 codon 249 mutations in more than 60% of aflatoxin B1-associated HCCs (Weng et al. 2017). However, we did not identify p53 mutations in our analysis of MEG. The extent to which MEG overlaps in its mechanism of toxicity and carcinogenicity with benzo(a)pyrene and aflatoxin B1 will be important to assess in future studies. Although the mutational spectra for the MEG-induced HCCs had distinctly unique clustering patterns, the mutational spectra for the GBE-induced and spontaneous HCCs clustered together near signatures 2, 7, 11, 19, and 23. The underlying etiologies for signature 2 resembles cytosine mutations caused by apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like (APOBEC) enzymes, signature 7 resembles mutations in pyrimidine dimers due to ultraviolet light, signature 11 resembles that of alkylating agents, and that signatures 19 and 23 have not been reported; however, further studies may help determine its significance to the establishment of spontaneous and GBE-induced HCCs. Based on the genes that are mutated, it appears that GBE may accelerate intrinsic biochemical mutagenic processes leading to dose-related increases in HCC with similar mutation spectra to the spontaneous tumors. In addition, accumulation of mutations in additional genes such as Ctnnb1 might confer greater proliferative advantage to the GBE-exposed HCCs. Thus, these observations highlight the need for both mutational spectra and analysis of specific genes and pathways as complementary approaches to a more complete understanding of the underlying mechanisms of carcinogenesis.
In summary, the findings of this study are significant in terms of providing mechanistic insight into the mutagenic outcome of MEG exposure in mice, which is likely to be applicable to its outcome in humans. These results suggest that our approach for characterizing suspected carcinogens could be applied to tumor samples that are prepared by the FFPE method, the primary method of tissue fixation at the NTP archives, and those of other worldwide research organizations. This study would be benefited by expansion to a larger investigation with more samples and a greater variety of chemical compounds. Nevertheless, we have verified the utility of this approach for classifying chemical carcinogens and identifying molecular biomarkers of HCC, which could help in understanding the mechanisms of carcinogenesis and eventually prove instructive in the development of novel biomarkers of exposures and disease as well as personalized therapy.
Supplemental Material
Supplemental Material, DS1_TPX_10.1177_0192623318789398 - Exome Sequencing of Fresh-frozen or Formalin-fixed Paraffin-embedded B6C3F1/N Mouse Hepatocellular Carcinomas Arising Either Spontaneously or due to Chronic Chemical Exposure
Supplemental Material, DS1_TPX_10.1177_0192623318789398 for Exome Sequencing of Fresh-frozen or Formalin-fixed Paraffin-embedded B6C3F1/N Mouse Hepatocellular Carcinomas Arising Either Spontaneously or due to Chronic Chemical Exposure by Scott S. Auerbach, Miaofei Xu, B. Alex Merrick, Mark J. Hoenerhoff, Dhiral Phadke, Debra J. Taxman, Ruchir Shah, Hue-Hua L. Hong, Thai-Vu Ton, Ramesh C. Kovi, Robert C. Sills, and Arun R. Pandiri in Toxicologic Pathology
Supplemental Material
Supplemental Material, DS2a_TPX_10.1177_0192623318789398 - Exome Sequencing of Fresh-frozen or Formalin-fixed Paraffin-embedded B6C3F1/N Mouse Hepatocellular Carcinomas Arising Either Spontaneously or due to Chronic Chemical Exposure
Supplemental Material, DS2a_TPX_10.1177_0192623318789398 for Exome Sequencing of Fresh-frozen or Formalin-fixed Paraffin-embedded B6C3F1/N Mouse Hepatocellular Carcinomas Arising Either Spontaneously or due to Chronic Chemical Exposure by Scott S. Auerbach, Miaofei Xu, B. Alex Merrick, Mark J. Hoenerhoff, Dhiral Phadke, Debra J. Taxman, Ruchir Shah, Hue-Hua L. Hong, Thai-Vu Ton, Ramesh C. Kovi, Robert C. Sills, and Arun R. Pandiri in Toxicologic Pathology
Supplemental Material
Supplemental Material, DS2b_TPX_10.1177_0192623318789398 - Exome Sequencing of Fresh-frozen or Formalin-fixed Paraffin-embedded B6C3F1/N Mouse Hepatocellular Carcinomas Arising Either Spontaneously or due to Chronic Chemical Exposure
Supplemental Material, DS2b_TPX_10.1177_0192623318789398 for Exome Sequencing of Fresh-frozen or Formalin-fixed Paraffin-embedded B6C3F1/N Mouse Hepatocellular Carcinomas Arising Either Spontaneously or due to Chronic Chemical Exposure by Scott S. Auerbach, Miaofei Xu, B. Alex Merrick, Mark J. Hoenerhoff, Dhiral Phadke, Debra J. Taxman, Ruchir Shah, Hue-Hua L. Hong, Thai-Vu Ton, Ramesh C. Kovi, Robert C. Sills, and Arun R. Pandiri in Toxicologic Pathology
Supplemental Material
Supplemental Material, TS1_TPX_10.1177_0192623318789398 - Exome Sequencing of Fresh-frozen or Formalin-fixed Paraffin-embedded B6C3F1/N Mouse Hepatocellular Carcinomas Arising Either Spontaneously or due to Chronic Chemical Exposure
Supplemental Material, TS1_TPX_10.1177_0192623318789398 for Exome Sequencing of Fresh-frozen or Formalin-fixed Paraffin-embedded B6C3F1/N Mouse Hepatocellular Carcinomas Arising Either Spontaneously or due to Chronic Chemical Exposure by Scott S. Auerbach, Miaofei Xu, B. Alex Merrick, Mark J. Hoenerhoff, Dhiral Phadke, Debra J. Taxman, Ruchir Shah, Hue-Hua L. Hong, Thai-Vu Ton, Ramesh C. Kovi, Robert C. Sills, and Arun R. Pandiri in Toxicologic Pathology
Supplemental Material
Supplemental Material, TS2_TPX_10.1177_0192623318789398 - Exome Sequencing of Fresh-frozen or Formalin-fixed Paraffin-embedded B6C3F1/N Mouse Hepatocellular Carcinomas Arising Either Spontaneously or due to Chronic Chemical Exposure
Supplemental Material, TS2_TPX_10.1177_0192623318789398 for Exome Sequencing of Fresh-frozen or Formalin-fixed Paraffin-embedded B6C3F1/N Mouse Hepatocellular Carcinomas Arising Either Spontaneously or due to Chronic Chemical Exposure by Scott S. Auerbach, Miaofei Xu, B. Alex Merrick, Mark J. Hoenerhoff, Dhiral Phadke, Debra J. Taxman, Ruchir Shah, Hue-Hua L. Hong, Thai-Vu Ton, Ramesh C. Kovi, Robert C. Sills, and Arun R. Pandiri in Toxicologic Pathology
Supplemental Material
Supplemental Material, TS3_TPX_10.1177_0192623318789398 - Exome Sequencing of Fresh-frozen or Formalin-fixed Paraffin-embedded B6C3F1/N Mouse Hepatocellular Carcinomas Arising Either Spontaneously or due to Chronic Chemical Exposure
Supplemental Material, TS3_TPX_10.1177_0192623318789398 for Exome Sequencing of Fresh-frozen or Formalin-fixed Paraffin-embedded B6C3F1/N Mouse Hepatocellular Carcinomas Arising Either Spontaneously or due to Chronic Chemical Exposure by Scott S. Auerbach, Miaofei Xu, B. Alex Merrick, Mark J. Hoenerhoff, Dhiral Phadke, Debra J. Taxman, Ruchir Shah, Hue-Hua L. Hong, Thai-Vu Ton, Ramesh C. Kovi, Robert C. Sills, and Arun R. Pandiri in Toxicologic Pathology
Supplemental Material
Supplemental Material, TS4_TPX_10.1177_0192623318789398 - Exome Sequencing of Fresh-frozen or Formalin-fixed Paraffin-embedded B6C3F1/N Mouse Hepatocellular Carcinomas Arising Either Spontaneously or due to Chronic Chemical Exposure
Supplemental Material, TS4_TPX_10.1177_0192623318789398 for Exome Sequencing of Fresh-frozen or Formalin-fixed Paraffin-embedded B6C3F1/N Mouse Hepatocellular Carcinomas Arising Either Spontaneously or due to Chronic Chemical Exposure by Scott S. Auerbach, Miaofei Xu, B. Alex Merrick, Mark J. Hoenerhoff, Dhiral Phadke, Debra J. Taxman, Ruchir Shah, Hue-Hua L. Hong, Thai-Vu Ton, Ramesh C. Kovi, Robert C. Sills, and Arun R. Pandiri in Toxicologic Pathology
Supplemental Material
Supplemental Material, TS5_TPX_10.1177_0192623318789398 - Exome Sequencing of Fresh-frozen or Formalin-fixed Paraffin-embedded B6C3F1/N Mouse Hepatocellular Carcinomas Arising Either Spontaneously or due to Chronic Chemical Exposure
Supplemental Material, TS5_TPX_10.1177_0192623318789398 for Exome Sequencing of Fresh-frozen or Formalin-fixed Paraffin-embedded B6C3F1/N Mouse Hepatocellular Carcinomas Arising Either Spontaneously or due to Chronic Chemical Exposure by Scott S. Auerbach, Miaofei Xu, B. Alex Merrick, Mark J. Hoenerhoff, Dhiral Phadke, Debra J. Taxman, Ruchir Shah, Hue-Hua L. Hong, Thai-Vu Ton, Ramesh C. Kovi, Robert C. Sills, and Arun R. Pandiri in Toxicologic Pathology
Supplemental Material
Supplemental Material, TS6_TPX_10.1177_0192623318789398 - Exome Sequencing of Fresh-frozen or Formalin-fixed Paraffin-embedded B6C3F1/N Mouse Hepatocellular Carcinomas Arising Either Spontaneously or due to Chronic Chemical Exposure
Supplemental Material, TS6_TPX_10.1177_0192623318789398 for Exome Sequencing of Fresh-frozen or Formalin-fixed Paraffin-embedded B6C3F1/N Mouse Hepatocellular Carcinomas Arising Either Spontaneously or due to Chronic Chemical Exposure by Scott S. Auerbach, Miaofei Xu, B. Alex Merrick, Mark J. Hoenerhoff, Dhiral Phadke, Debra J. Taxman, Ruchir Shah, Hue-Hua L. Hong, Thai-Vu Ton, Ramesh C. Kovi, Robert C. Sills, and Arun R. Pandiri in Toxicologic Pathology
Footnotes
Acknowledgments
We would like to acknowledge the assistance provided by Ms. Leslie Couch at the NTP archives in retrieving the FF tissues and the FFPE tissues for this project as well as the NTP Pathology support core laboratory for providing tissue sections for DNA isolation. The efforts by the NTP staff that generated the cancer data for the GBE and MEG chronic bioassays are also acknowledged. We appreciate the help provided by Ms. Beth Mahler with the images. We would like to thank the NIEHS Mouse Methylome Project team, in particular Sara Grimm, Paul Wade, Jim Thomas and the NISC Comparative Sequencing Program for the generation and pre-publication access to genome sequences of C57BL6/N mice and C3H/HeN mice. We would like to thank the reviewers for their constructive criticism of this article.
Author Contributions
Authors contributed to conception or design (SA, BM, MH, RCS, AP); data acquisition, analysis, or interpretation (SA, MX, BM, MH, DP, DT, RS, HH, TT, RK, RS, AP); drafting the manuscript (SA, MX, DP, DT, AP); and critically revising the manuscript (SA, MX, BM, MH, DP, DT, RS, HH, TT, RK, RS, AP). All authors gave final approval and agreed to be accountable for all aspects of work in ensuring that questions relating to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
