Abstract
To characterize the somatic alterations of papillary thyroid carcinomas (PTC) in Chinese patients, we performed the next-generation-sequencing (NGS) study of the tumor-normal pairs of DNA and RNA samples extracted from 16 Chinese PTC patients. The whole genome sequencing (WGS) and transcriptome sequencing (RNA-seq) were conducted for 6 patients who were either current or former smokers and the whole exome sequencing (WES) and RNA-seq were conducted for another 10 patients who were never smokers. The NGS data were analyzed to identify somatic alteration events that may underlie PTC in Chinese patients. We identified a number of PTC driver genes harboring somatic driver mutations with significant functional impact such as COL11A1, TP53, PLXNA4, UBA1, AHNAK, CSMD2 and TTLL5 etc. Significant driver pathways underlying PTC were found, namely, the metabolic pathway, the pathway in cancer, the olfactory transduction pathway and the calcium signaling pathway. In addition, this study revealed genes with significant somatic copy number aberrations and corresponding somatic gene expression changes in PTC tumors, the most promising ones being BRD9, TRIP13, FZD3, and TFDP1 etc. We also identified several structural variants of PTCs, especially the novel in-frame fusion proteins such as TRNAU1AP-RCC1, RAB3GAP1-R3HDM1, and ENAH-ZSWIM5. Our study provided a list of novel PTC candidate genes with somatic alterations that may function as biomarkers for PTC in Chinese patients. The follow-up mechanism studies may be conducted based on the findings from this study.
Keywords
Introduction
Papillary thyroid carcinomas (PTC) is the major type of thyroid cancer and accounts for about 80% of all thyroid cancers. PTC is named for its papillary histological architecture and has a high 5-year survival rate of nearly 95% [1]. Therefore, PTC is usually considered curable clinically. However, a considerable proportion of PTC cases may still develop into more aggressive and lethal thyroid cancers. Over the past years, the incidence of thyroid cancer has significantly increased [2], pushing for the investment of more research efforts on this type of cancer. Although TCGA (The Cancer Genome Atlas) had identified a large number of somatic genetic alterations underlying PTCs [3], few studies have been performed in Chinese PTC patients and thus result in a knowledge gap in Chinese PTC genetic etiology. To gain new insights into the genetic basis of Chinese PTC, we carried out whole genome sequencing (WGS) or whole exome sequencing (WES), plus transcriptome sequencing (RNA-seq) of the tumor-normal pairs of thyroid tissues from 16 PTC patients. Our study identified novel driver genes underlying the carcinogenesis of PTC in the Chinese population.
Materials and methods
Chinese PTC patient samples
We conducted whole genome sequencing (WGS) and transcriptome sequencing (RNA-seq) of the tumor-normal pairs of thyroid tissues from 6 Chinese PTC patients who are either current or former smokers (Table S1). We also did whole exome sequencing (WES) and RNA-seq of paired tumor-normal thyroid tissues from another set of 10 Chinese PTC patients who are never smokers (Table S2). All of the PTC patients are Han Chinese and they live in the Northern region of China. The age at diagnosis of every patient is same as age at initial sample collection. In other word, there are no any differences in time of collection and diagnosis. In this study, all the Chinese PTC patients were enrolled after they gave signed consent form of agreeing to participate in this study. After enrollment, interviews were conducted, and patients’ medical records were abstracted. Research protocols were approved by the Institutional Review Board of The Affiliated Shengjing Hospital, China Medical University. The frozen PTC tumors and matched normal thyroid tissues were reviewed by at least two pathologists. Tumor samples with at least 80% tumor-cell content were used in the study. DNA and RNA samples were extracted simultaneously using the AllPrep DNA/RNA/miRNA Universal Kit (QIAGEN China, Shanghai) before the next-generation-sequencing (NGS) library preparation.
DNA library preparation for NGS
We prepared DNA sequencing libraries following the manufacturer’s protocol (Illumina and Agilent). Briefly, 3 ug of genomic DNA was fragmented to 150–200 bp using the Covaris E210 sonicator. The ends were repaired, and an “A” base was added to the 3’ ends. Paired end DNA adaptors (Illumina) with a single “T” base overhang at the 3’ end were ligated and the resulting constructs were purified using AMPure SPRI beads from Agencourt. The adapter-modified DNA fragments were enriched by 4 cycles of PCR using PE 1.0 forward and PE 2.0 reverse (Illumina) primers. The concentration and size distribution of the libraries was determined on an Agilent Bioanalyzer DNA 1000 chip.
Next-generation sequencing and bioinformatics analyses
DNA sequencing of the paired tumor/normal samples of 16 Chinese PTC patients was carried out for the captured libraries with Illumina HiSeq 2000 using 100 bp paired-end reads. Libraries were loaded onto paired end flow cells at concentrations of 4–5 pM (HiSeq 2000) to generate cluster densities of 300,000–500,000/mm
Read mapping and alignment and variant analysis
Our mutation analysis pipeline essentially includes 1) Mapping and alignment, 2) SNVs and indels discovery, and 3) SNVs and indels filtering and annotation. Briefly, the raw sequence data of short reads in FASTQ format were aligned to a reference human genome (NCBI human genome assembly build 37) using BWA aligner (0.6.1) [4]. The PCR duplicates were detected and removed by Picard 1.65 (
The lists of SNVs/indels were finally annotated by using ANNOVAR [7, 8]. We first filtered SNVs/indels by various public databases including dbSNP, 1000 genomes, HapMap exome Project, the 200 Danish exome [9], the NHLBI 6500 Exome data sets (
Experimental validation of SNVs/indels
To valid the somatic mutations that were identified by high-through put sequencing technology, we further performed Sanger sequencing for 1152 candidate somatic mutations identified from the 16 PTC tumor-normal pairs. Specific primers around the mutations for PCR and follow-up Sanger sequencing reactions were designed by using Primer 3 [11]. The standard PCR was conducted in a 25
Identification of driver somatic mutations
We applied the computational approach, Oncodrive-fm [12], to detect candidate cancer drivers of PTCs on the basis of all the validated somatic mutations. This method is designed according to the principle that the bias toward the accumulation of somatic variants with high functional impact observed in a gene or group of genes indicates positive selection. Oncodrive-fm can measure such functional impact bias (called FM bias) and thus is able to distinguish driver genes or gene modules that are positively selected during tumor development from passenger mutations [12]. Oncodrive-fm also has the advantage of identifying lowly recurrent cancer drivers that are usually missed by the recurrence-based approaches [12]. The most important output of Oncodrive-fm is the ranking of the FM bias of genes. Top-ranking genes exhibit the largest deviations in their average FI (functional impact) from the background, thus making the best driver candidates.
Detecting somatic structural variants
Structural variations including inter-chromosomal translocations (CTX), intrachromosomal translocations (ITX), inversions (INV), deletions (DEL), and insertions (INS) were analyzed by CREST (Clipping REveals STructure), an algorithm that uses the soft-clipped reads to directly map the breakpoints of structural variations [13]. All samples were analyzed using the paired analysis module, which filters SVs present in the matching normal sample. Two additional methods, BreakDancer [14] and Geometric Analysis of Structural Variants (GSAV) [15], that use discordant paired-end reads to map structural variations were also run for comparison purpose. BreakDancer was run using the default parameters. For each predicted SV, we first checked whether discordant mapping of paired-end reads was caused by repetitive regions in human genome. All supporting reads were extracted in FASTQ format and each read re-mapped to the reference genome using BLAT. If a read-pair was mapped within the library insert range (mean insert size
Experimental validation of structural variations
All structural variations were validated by Sanger sequencing. Oligonucleotide primers for genomic PCR were designed for the 1000 bp flanking sequences of each SV using Primer 3 [11]. In some cases, a second iteration of primer design was carried out because there were multiple SVs detected within 1 kb to account for the presence of a second SV in the flanking region.
Annotation of structural variations
SVs with at least one breakpoint in gene coding region were further analyzed for their validity to encode a fusion protein. Each predicted fusion transcript was defined as a list of exons. There were “normal” exons, which correspond exactly to existing annotated exons, and there were fused exons, which are produced by structural variation events with both breakpoints in exons. The sequence of the fused exons is determined using the assembly of reads that cross the breakpoints and the annotation of the exons. For each exon in the list, we calculated exon length, using the annotation for normal exons and sequence length for fused exons. Furthermore, we calculated the number of bases that each exon contributes to the CDS based on the annotated CDS start and end positions. The number of “CDS bases” was 0 for exons lying outside of the CDS start and stop, the full exon length for exons wholly contained between start and stop, and a portion of the exon length for those containing CDS start or stop. If the sum of the number of CDS bases is a multiple of 3, then the CDS was in-frame. If not, then it was considered out-of-frame.
Identification of copy number variations (CNVs)
For whole genome sequencing data of the 6 PTC pairs, CNVs were identified by evaluating the number of sequence reads aligned at each base using the software CONSERTING [17], which employs a three-step analysis. First, the genome was divided into fixed-base windows and the average coverage depth was calculated for each window. The window size was set to be 100 bp in this study. The relative coverage depth was defined as the ratio between the average window coverage and the median of the average window coverage on a set of reference chromosomes that have no gross CNVs based on chromosome-by-chromosome paired tumor/normal coverage analysis. The difference of the relative coverage depth between the tumor sample and its matching was corrected for the GC content of the window and used as the signal for calling CNVs. Second, each chromosome was segmented using a recursive partitioning method on the difference of the tumor versus normal signal. Third, the segments were merged to ensure a genomewide error rate not greater than 0.05. For the 6 WGS pairs, each tumor and its matched normal DNA were also genotyped using the Illumina HumanOmni1-Quad BeadChips containing 1,140,419 SNPs with a median SNP spacing of 1.2 kb. CNVs were manually reviewed by comparison with Illumina SNP array inferred CNV results and structural variation breakpoints identified by CREST. Missing breakpoints that define the CNV boundaries were manually mapped at base-pair resolution by visual inspection of the soft-clipped reads in the immediate neighborhood of the predicted CNV boundaries. All analyses were performed in R (
A stacked bar graph representing the total number of somatic mutations in the exon regions in each patient, color proportioned by the number of somatic mutations in each category of mutants including splicing, indels, nonsense, synonymous and nonsynonymous mutations.
The quality of the total RNA was evaluated using the Agilent 2100 Bioanalyser RNA 6000 Nano Chip. All samples had an RNA integrity number (RIN) of nine or better. For the RNA-Seq sample preparation, the Illumina TruSeqTM RNA Sample Preparation Kit was used according to the manufacturer’s protocol. RNA-Seq reads were obtained using Bustard version 1.9.0 (Illumina Pipeline version 1.3). Reads were quality-filtered using the standard Illumina process and a 0 (no) or 1 (yes) was used to define whether a read passed filtering or not. Sequence files for the 32 samples of the 16 PTC pairs were generated in FASTQ format (sequence read plus quality information in Phred format). TopHat 2.0.6 was used to process the RNA-seq reads that were aligned to the UCSC human reference genome (build hg19) by bowtie v0.12.8 [20, 21]. After TopHat processing, Cufflinks v2.0.2 (including Cuffdiff) was used to perform transcript assembly, abundance and differential gene expression tests between the matched tumor and normal samples [21].
Results
Somatic mutation landscape of papillary thyroid carcinomas
Illumina conducted whole-genome sequencing (WGS) for our first set of 12 DNA samples extracted from 6 papillary thyroid carcinomas (PTC) and the matched normal thyroid tissues. We obtained approximately 138-fold mean exon sequence coverage across the 6 PTC pairs (Table S3). The output short reads were aligned to a reference genome (NCBI human genome assembly build 37) using the BWA program [22]. For the 10 pairs of PTC tumor and normal DNAs subjected to WES, we generated a mean depth of 105 fold exon coverage (Table S4). More than 94% of the 212,911 exons on targeted regions were covered with more than 10 sequencing reads (Table S4).
By Sanger sequencing or genotyping (Nimblegen platform) the list of putative somatic mutations identified by next-generation sequencing (NGS), we validated a total of 1,072 high-confidence somatic mutations in the exonic regions from these 16 PTCs. These included 766 substitutions caused amino acid changes (missense), 67 nonsense mutations leading to truncated proteins, 19 frameshift insertions and deletions (indels) and 3 non-frameshift indels that ranged from 1 to 8 base pairs in length, 40 mutations occurring at exon splicing sites, and 177 silent (synonymous) mutations in exons (Table S5). The summary of the distribution of these exonic somatic mutations across all the 16 tumors was shown in Table S6. Figure 1 showed the number of nonsynonymous, synonymous, nonsense, indels (combining frameshift/non-frameshift insertions and deletions), and splicing mutations within each of the 16 Chinese PTC tumor samples. The ratio of nonsynonymous/synonymous mutations was greater than 2:1 in every tumor sample. PTC tumors from never-smokers (cPTC7 to cPTC16) have much fewer non-silent somatic mutations (including nonsynonymous, nonsense, indels) (mean 10, range 3–22) than current/former smokers (cPTC1 to cPTC6, mean 126, range 12–282) (Fig. 1, Table S6).
Recurrently mutated genes with non-silent somatic mutations in at least 2 of the 16 Chinese PTC cases
Recurrently mutated genes with non-silent somatic mutations in at least 2 of the 16 Chinese PTC cases
Schematic representation of the driver genes of PTC identified by Oncodrive-fm. Twenty genes reached the statistical significance level of qvalue (Bonferroni-adjusted empirical 
Based on the verified list of somatic mutations in Table S5, we found that 37 genes had non-silent somatic mutations in at least two of the sixteen tumors (Table 1). The most frequently mutated gene is TP53 that had non-silent somatic mutations in 5 PTCs, followed by COL11A1, FIG4, UNC80, MUC4 that each mutated in 3 PTCs, and the remaining genes mutated in 2 PTCs such as DOPEY2, RYR2, PCNXL2, PLXNA4, SLC25A5 etc. (Table 1). However, recurrence alone cannot tell us which genes are driver genes underlying the genetic etiology of PTCs due to the existence of a large number of passenger mutations. Therefore, to assess the statistical significance of each gene’s average FI (functional impact) with respect to the null distribution, we follow the standard Oncodrive-fm procedure [12]. Briefly, 100,000 permutations were conducted to obtain empirical
Circos plots of genetic alterations in six papillary thyroid carcinoma (PTC) cases subjected to WGS. These plots depict i) structural genetic variants, including DNA copy number alterations: intra- (green) and inter-chromosomal (purple) translocations; loss of heterozygosity – brown; amplification – red; deletion – blue; and ii) sequence mutations in RefSeq genes involving missense, insertion, deletion – brown; truncating mutation, nonsense, splice, frameshift – red.
Somatic structural variants identified by WGS that are also present at the transcriptome level as revealed by RNA-seq and validated by RT-PCR
Genes with somatic copy number changes and corresponding significant gene expression changes in at least two PTC tumors
We inferred somatic copy number variants (CNVs) for the six PTC pairs that were sequenced by WGS and the other ten PTC pairs that were sequenced by WES. We used
Chromosomal rearrangements identified in 6 whole genome sequenced PTC tumors
Using CREST [13], we detected a total of 382 candidates of structural variations (SVs) across all of the 6 WGS cases of PTC patients and validated that 219 of them were somatic SVs by PCR and follow-up Sanger sequencing in both the tumors and their matched normal samples. These included 34 CTX (inter-chromosomal translocations, mean 6 per case, range 1–17), 39 ITX (intra-chromosomal translocations, mean 7, range 0–23), 82 DEL (deletions, mean 14, range 2–47) and 64 INS (insertions, mean 11, range 0–53) (Fig. 3 and Table S9).
Remarkably, 46% (100 out of 219) of the validated somatic structural variations had breakpoints in coding genes, including genes with known roles in tumorigenesis such as ARID1A, KDM5A, MKL1, MLLT10, MYCL1, or genes also targeted by somatic mutations (for example, ASXL3, BICD1, LRP1B, SDK1). It was predicted that most of these structural variations (65 out of 100, 65%) can lead to loss-of-function of the involved genes. Twelve validated somatic structural variations present in the genome were also present in the transcriptome as revealed by RNA-seq and validated by RT-PCR (Table 2). A few lead to the formation of chimeric fusion proteins. Specifically, six chimeric genes encoding three fusion proteins were detected in three PTC cases which resulted in the expression of chimeric in-frame novel fusion genes, including TRNAU1AP-RCC1 (case cPTC2), RAB3GAP1-R3HDM1 (case cPTC4), and ENAH_ZSWIM5 (case cPTC5) (Table 2, Figs S2–S4).
In addition to the above in-frame gene fusion events, WGS and RNA-seq identified seven truncating events, one aberrant splicing, and one complex genomic rearrangement involving multiple SV events (Table 2, Figs S5–S13). Some of the corresponding proteins were known to be related to cancer biology. For example, ITGB3BP gene encodes a transcriptional regulator that binds to and enhances the activity of members of the nuclear receptor families, thyroid hormone receptors and retinoid X receptors. ITGB3BP induces apoptosis in breast cancer cells through a caspase 2-mediated signaling pathway. ARID1A is a chromatin remodeling gene, and the encoded protein is part of the large ATP-dependent chromatin remodeling complex SNF/SWI, which is required for transcriptional activation of cancer-related genes normally repressed by chromatin. The complex genomic rearrangement (Fig. S13) also created a novel fusion gene involving UNC50 in chromosome 2 and FAM63B in chromosome 15, though the exact role of these two genes in cancer is not known yet.
Using RNA-seq, we also identified 11 SV events that were validated by RT-PCR (Table S10). These are the structural changes occurring only at the transcriptome levels such as read-throughs. The biological significance of these events to tumorigenesis remains elusive at the current stage.
Discussion
In this project, we systematically studied the genetic alterations of Chinese PTC. We used Oncodrive-fm [12] to find cancer drivers of PTC. The data showed that COL11A1 and TP53 were the two most significant PTC driver genes each possessing multiple somatic mutations of very high functional impact. The status of TP53 as the most important PTC driver gene had been well established [24]. We queried the cBioPortal database for the TP53 mutation status (
As for other driver genes, PLXNA4 belongs to the family of plexins that are transmembrane high-affinity receptors for semaphorins, regulating cell guidance, motility, invasion and thus are involved in cancer progression and metastasis [32]. Furthermore, PLXNA4 formed stable complexes with the FGFR1 and VEGFR-2 tyrosine-kinase receptors and enhanced VEGF- induced VEGFR-2 phosphorylation in endothelial cells as well as bFGF-induced cell proliferation, therefore promoting tumor progression and tumor angiogenesis [33]. PLXNA4 may represent a target for the development of novel anti-angiogenic and anti-tumorigenic drugs [33]. Molecular profiling of the “plexinome” in melanoma and pancreatic cancer has detected amplification of and somatic mutations in PLXNA4 and other plexins, indicating the involvement of mutated plexins
Oncoprint plotting showed that fourteen of the 20 PTC driver genes identified in our Chinese cohort also had non-silent somatic alterations in the non-Chinese thyroid cancer patients based on the results of two large scale genome sequencing studies of thyroid cancer (TCGA and MKSCC studies) archived in the cBioPortal database.
especially PLXNA4 in cancer progression [32]. All of these supported our finding of PLXNA4 as a PTC driver gene. Another driver gene – UBA1 encodes ubiquitin-activating enzyme that is required for cellular response to DNA damage [34]. Functional evidences established UBA1 as the enzyme critical for ubiquitylation-dependent signaling of both DSBs (DNA double-strand breaks) and replication stress in human cells, with implications for maintenance of genomic integrity, disease pathogenesis and cancer treatment [34]. Mutation of UBA1 causes tissue overgrowth in Drosophila [35]. Novel drugs and therapies targeting UBA1 as anti-cancer strategies in other cancers are being developed [36, 37, 38], which may be effective in treating PTC patients harboring UBA1 somatic mutations as well.
Previous functional studies also lent support for AHNAK as a PTC driver gene. AHNAK is a protein which has been recently linked to reorganization of the actin cytoskeleton, cellular migration and invasion [39]. AHNAK had been shown to be essential for pseudopod protrusion and tumor cell migration and invasion [40]. Knockdown of AHNAK in metastatic cells resulted in reduced actin cytoskeleton dynamics and induction of mesenchymal-epithelial transition (MET) that is a key cellular process associated with the invasive or metastatic program in many cancers [40]. Tumoral AHNAK overexpression significantly associated with poor survival of larynx carcinoma patients [39] and was involved in colon carcinogenesis [41]. AHNAK is also the biomarker that may have clinical utility for assessing response to cancer treatment [42]. Other PTC driver genes supported by previous studies are CSMD2 and TTLL5. CSMD2 maps to a chromosomal region that may contain a suppressor of oligodendrogliomas [43]. CSMD2 was found to be frequently hypermethylated in pancreatic cancer cell lines and primary pancreatic cancers in a tumor-specific manner [44]. Hypermethylation epigenetically silenced the expression of CSMD2 in tumors, supporting its role as tumor suppressor and our observation of CSMD2 as one driver of PTC. TTLL5 was also known as STAMP, it was shown that STAMP alters the growth of transformed and ovarian cancer cells [45]. In addition, a recent whole-exome sequencing combined with functional genomics reveals TTLL5 as a driver tumor-suppressor in endometrial cancer [46], consistent with our observation of the nonsense and deleterious somatic mutations in PTC patients.
We compared our somatic mutation results of the PTC driver genes identified by Oncodrive-fm (Fig. 2) to the results of the 1629 thyroid tumor samples from two large scale genome sequencing studies of thyroid cancer conducted in non-Chinese populations, i.e. TCGA and MKSCC studies [24, 25] using the cBioPortal database. Fourteen of the 20 PTC driver genes identified in our Chinese cohort also had non-silent somatic mutations in the non-Chinese thyroid cancer patients (Fig. 4), suggesting the functional importance of these driver genes to PTC etiology across different ethnic groups. These 14 genes are COL11A1, TP53, RYR2, PLXNA4, SLC25A5, SDK1, UBA1, GATAD2A, AHNAK, SCN1A, ANKRD44, CSMD2, TTLL5, FAM135B. The six genes having somatic mutations specifically in the Chinese PTC cohort are DOPEY2, FIG4, PCNXL2, OR4C16, OR56A3, and DCAF4L2, indicating that ethnicity related genetic background and living habits of Chinese may result in the specific mutated driver genes contributing to PTC that are hardly seen in the non-Chinese thyroid cancer patients.
Altered genes of certain molecular pathways were significantly enriched in PTC tumors. These include metabolic pathway, pathway in cancer, olfactory transduction pathway and calcium signaling pathway (Table S7 and Fig. S1). Four genes in metabolic pathways were previously implicated in carcinogenesis, namely NOS1, TKTL1, LDHC, UGT8. NOS1 encodes the enzyme known as the nitric oxide synthase that converts arginine and oxygen into citrulline and NO (nitric oxide) [47]. High expression of NOS1 is a favorable prognostic sign in non-small cell lung carcinoma [48]. TKTL1 (transketolase-like protein 1) encodes a transketolase-like enzyme whose expression has been shown to contribute to carcinogenesis through increased aerobic glycolysis and hypoxia-inducible factor alpha stabilization [49, 50]. LDHC encodes the enzyme – lactate dehydrogenase C that has been shown to be overexpressed in cancer compared to the normal tissue samples [51]. The elevated expression of the UGT8 gene coding UDP-galactose:ceramide galactosyltransferase correlated with a significantly increased the risk of lung metastases [52].
Most of the significant genes involved in cancer pathways (Table S7 and Fig. S1) were also indicated in the etiology of PTC and other cancers before, especially TP53, KRAS, CASP8, EP300, PIK3CG, EGLN3, PDGFRA, and ERBB2 etc. TP53 and KRAS were the well-established PTC cancer genes [24]. EP300 encodes a histone modifier and was recently identified as the most significantly mutated gene in small cell lung cancer [53]. CASP8 encodes a member of the cysteine-aspartic acid protease (caspase) family. It is involved in the programmed cell death induced by Fas and various apoptotic stimuli. PIK3CG encodes a protein that belongs to the pi3/pi4-kinase family of proteins and is an important modulator of extracellular signals, including those elicited by E-cadherin-mediated cell-cell adhesion. Recently, PIK3CG emerged as being a potential oncogene because overexpression of its subunits leads to oncogenic cellular transformation and malignancy [54]. EGLN3 was found as overexpressed in lung tumors and is one of the unfavorable prognosticators for overall survival [55]. PDGFRA encodes a cell surface tyrosine kinase receptor for members of the platelet-derived growth factor family. Studies suggest that PDGFRA plays a role in tumor progression. Anti-PDGFRA therapy results in markedly decreased tumor growth in vivo [56]. ERBB2 is interesting because it is a member of the EGFR family of receptor tyrosine kinases. It sits in the surface of cells and is activated by external signals that tell the cell to grow. Mutations to ERBB2 may contribute to the accelerated tumor growth [57].
We also identified a list of genes with significant somatic CNV and corresponding gene expression changes (Table 3). The copy number gains at chromosomal region 5p15.33 involving BRD9 and TRIP13 genes in two PTC patients – cPTC2 and cPTC3 were in line with a previous study showing that the copy number gains of these two genes may play a key role in lung cancer development [58]. FZD3 was a critical gene in WNT pathway important to carcinogenesis [59]. TFDP1, which encodes the E2F-associated transcription factor DP1 is a candidate oncogene at 13q34. TFDP1 showed very strong gene amplification in lung tumors with a frequency of about 3% [60]. Depletion of TFDP1 expression by small interference RNA in a lung cancer cell line (HCC33) with TFDP1 amplification and protein over-expression reduced cell viability by 50% [60]. These lent supports to our finding of significant copy number gain and oever-expression of TFDP1 in Chinese PTC patients.
Analysis of structural variants revealed the presence of three in-frame novel fusion genes in Chinese PTC tumors (Table 2, Figs S2–S4). The involved members of these fusion events played significant roles in molecular and cellular processes. For example, the binding of RCC1 (regulator of chromosome condensation 1) to chromatin is critical for cellular processes such as mitosis, nucleocytoplasmic transport, and nuclear envelope formation; RAB3GAP1 gene encodes the catalytic subunit of a Rab GTPase activating protein. Mutations in this gene are associated with Warburg micro syndrome. ENAH is a motility-promoting gene involved in the control of cell-cell adhesion and cell motility. ENAH was associated with human breast carcinogenesis and its overexpression consistently characterized the transformed phenotype of tumor cells of different lineages [61].
In summary, this project comprehensively utilized NGS approaches to analyze papillary thyroid tumors with the aim of contributing to the better understanding of the genetic architecture of PTC. The data identified two most important PTC driver genes – COL11A1 and TP53, followed by other driver genes with previous biological evidence such as PLXNA4, UBA1, AHNAK, CSMD2 and TTLL5. We also found that four major molecular pathways, namely, metabolic pathway, pathway in cancer, olfactory transduction pathway and calcium signaling pathway were significantly altered in PTC tumors. Copy number aberrations and structural variants were common in PTC, with the most likely driver genes being BRD9, TRIP13, TFDP1, RCC1, RAB3GAP1, and ENAH (involved in in-frame gene fusions) etc. The identified somatic alterations may function as biomarkers for PTC in Chinese patients. However, due to the limited sample size in this study, more driver genes of Chinese PTC await discovery by using larger PTC sample sets. Functional assays are necessary to dissect the molecular mechanisms associated with the PTC driver genes.
Footnotes
Acknowledgments
We thank all the inside reviewers for the comments on the revision of this manuscript.
Conflict of interest
All of the authors declare that they have no conflict of interest.
Supplementary data
The supplementary files are available to download from http://dx.doi.org/10.3233/CBM-191200.
