Novel somatic alterations underlie Chinese papillary thyroid carcinoma

Abstract

To characterize the somatic alterations of papillary thyroid carcinomas (PTC) in Chinese patients, we performed the next-generation-sequencing (NGS) study of the tumor-normal pairs of DNA and RNA samples extracted from 16 Chinese PTC patients. The whole genome sequencing (WGS) and transcriptome sequencing (RNA-seq) were conducted for 6 patients who were either current or former smokers and the whole exome sequencing (WES) and RNA-seq were conducted for another 10 patients who were never smokers. The NGS data were analyzed to identify somatic alteration events that may underlie PTC in Chinese patients. We identified a number of PTC driver genes harboring somatic driver mutations with significant functional impact such as COL11A1, TP53, PLXNA4, UBA1, AHNAK, CSMD2 and TTLL5 etc. Significant driver pathways underlying PTC were found, namely, the metabolic pathway, the pathway in cancer, the olfactory transduction pathway and the calcium signaling pathway. In addition, this study revealed genes with significant somatic copy number aberrations and corresponding somatic gene expression changes in PTC tumors, the most promising ones being BRD9, TRIP13, FZD3, and TFDP1 etc. We also identified several structural variants of PTCs, especially the novel in-frame fusion proteins such as TRNAU1AP-RCC1, RAB3GAP1-R3HDM1, and ENAH-ZSWIM5. Our study provided a list of novel PTC candidate genes with somatic alterations that may function as biomarkers for PTC in Chinese patients. The follow-up mechanism studies may be conducted based on the findings from this study.

Keywords

Papillary thyroid carcinoma (PTC)whole genome sequencing transcriptome sequencing somatic alteration

1. Introduction

Papillary thyroid carcinomas (PTC) is the major type of thyroid cancer and accounts for about 80% of all thyroid cancers. PTC is named for its papillary histological architecture and has a high 5-year survival rate of nearly 95% [1]. Therefore, PTC is usually considered curable clinically. However, a considerable proportion of PTC cases may still develop into more aggressive and lethal thyroid cancers. Over the past years, the incidence of thyroid cancer has significantly increased [2], pushing for the investment of more research efforts on this type of cancer. Although TCGA (The Cancer Genome Atlas) had identified a large number of somatic genetic alterations underlying PTCs [3], few studies have been performed in Chinese PTC patients and thus result in a knowledge gap in Chinese PTC genetic etiology. To gain new insights into the genetic basis of Chinese PTC, we carried out whole genome sequencing (WGS) or whole exome sequencing (WES), plus transcriptome sequencing (RNA-seq) of the tumor-normal pairs of thyroid tissues from 16 PTC patients. Our study identified novel driver genes underlying the carcinogenesis of PTC in the Chinese population.

2. Materials and methods

2.1 Chinese PTC patient samples

We conducted whole genome sequencing (WGS) and transcriptome sequencing (RNA-seq) of the tumor-normal pairs of thyroid tissues from 6 Chinese PTC patients who are either current or former smokers (Table S1). We also did whole exome sequencing (WES) and RNA-seq of paired tumor-normal thyroid tissues from another set of 10 Chinese PTC patients who are never smokers (Table S2). All of the PTC patients are Han Chinese and they live in the Northern region of China. The age at diagnosis of every patient is same as age at initial sample collection. In other word, there are no any differences in time of collection and diagnosis. In this study, all the Chinese PTC patients were enrolled after they gave signed consent form of agreeing to participate in this study. After enrollment, interviews were conducted, and patients’ medical records were abstracted. Research protocols were approved by the Institutional Review Board of The Affiliated Shengjing Hospital, China Medical University. The frozen PTC tumors and matched normal thyroid tissues were reviewed by at least two pathologists. Tumor samples with at least 80% tumor-cell content were used in the study. DNA and RNA samples were extracted simultaneously using the AllPrep DNA/RNA/miRNA Universal Kit (QIAGEN China, Shanghai) before the next-generation-sequencing (NGS) library preparation.

2.2 DNA library preparation for NGS

We prepared DNA sequencing libraries following the manufacturer’s protocol (Illumina and Agilent). Briefly, 3 ug of genomic DNA was fragmented to 150–200 bp using the Covaris E210 sonicator. The ends were repaired, and an “A” base was added to the 3’ ends. Paired end DNA adaptors (Illumina) with a single “T” base overhang at the 3’ end were ligated and the resulting constructs were purified using AMPure SPRI beads from Agencourt. The adapter-modified DNA fragments were enriched by 4 cycles of PCR using PE 1.0 forward and PE 2.0 reverse (Illumina) primers. The concentration and size distribution of the libraries was determined on an Agilent Bioanalyzer DNA 1000 chip.

2.3 Next-generation sequencing and bioinformatics analyses

DNA sequencing of the paired tumor/normal samples of 16 Chinese PTC patients was carried out for the captured libraries with Illumina HiSeq 2000 using 100 bp paired-end reads. Libraries were loaded onto paired end flow cells at concentrations of 4–5 pM (HiSeq 2000) to generate cluster densities of 300,000–500,000/mm ${}^{2}$ (HiSeq 2000) following Illumina’s standard protocol using the Illumina cluster station and the Illumina cBot and HiSeq Paired end cluster kit version 1 (HiSeq 2000). Image analysis and base calling were carried out by using Illumina Software CASAVA with default parameters. The resulting Eland alignment from Illumina pipeline was used to estimate an error rate for each read/cluster. The quality of the sequencing runs was assessed by evaluating the percentage of clusters passing filtering criteria and the percentage of reads that can be aligned to the reference genome. For a typical run, over 70% of clusters pass filter and over 80% align. Any major deviations from these values would trigger further evaluation (average intensity, error graphs, etc.) and likely lead to the abandon of these runs/lanes for study. Short reads data in the FASTQ format that passed quality control were then processed by the analysis pipeline for reads alignment and variant calling.

2.4 Read mapping and alignment and variant analysis

Our mutation analysis pipeline essentially includes 1) Mapping and alignment, 2) SNVs and indels discovery, and 3) SNVs and indels filtering and annotation. Briefly, the raw sequence data of short reads in FASTQ format were aligned to a reference human genome (NCBI human genome assembly build 37) using BWA aligner (0.6.1) [4]. The PCR duplicates were detected and removed by Picard 1.65 (http://picard.sourceforge.net). After alignment, we used the SomaticSniper [5] and VarScan [6] to call SNV for each chromosomal position. We defined high-quality SNVs as those detected both by SomaticSniper with Somatic Score 60 and SNP mapping quality 60 and by VarScan with somatic $p$ value 0.05. To minimize false positives, we also set minimum coverage as 8X in normal and 15X in tumor, minimum reads of variant allele as 4 and minimum proportion of variant allele as 15% in tumor. We also filter by minimum number of 2 reads supporting the variant allele per strand. Presence of reads in normal that cover the mutant allele is considered a germline variant.

The lists of SNVs/indels were finally annotated by using ANNOVAR [7, 8]. We first filtered SNVs/indels by various public databases including dbSNP, 1000 genomes, HapMap exome Project, the 200 Danish exome [9], the NHLBI 6500 Exome data sets (http://evs. gs.washington.edu/EVS/), and the 69 whole genome sequencing data set from Complete Genomics (http://w ww.completegenomics.com/public-data/69-Genomes/) to remove all the known SNPs and germline mutations. After filtering, these novel SNVs/indels were subject to annotation using NCBI, UCSC and Ensembl databases. For SNVs, we also used the SIFT program to predict whether an amino acid substitution affects protein function so that users can prioritize SNVs for further study [10].

2.5 Experimental validation of SNVs/indels

To valid the somatic mutations that were identified by high-through put sequencing technology, we further performed Sanger sequencing for 1152 candidate somatic mutations identified from the 16 PTC tumor-normal pairs. Specific primers around the mutations for PCR and follow-up Sanger sequencing reactions were designed by using Primer 3 [11]. The standard PCR was conducted in a 25 $\mu$ l reaction volume that contains 1–2 U Taq DNA polymerase, 10 mM Tris-HCl (pH 8.3), 0.25 mM dNTPs, 0.2–2 mM BSA, 1.5–2.5 mM MgCl ${}_{2}$ , 20 pM of each primer, and about 10 ng genomic DNA. The PCR reaction conditions were set as: 94 ${}^{\circ}$ C for the first 5 min, followed by 35 cycles of 94 ${}^{\circ}$ C denaturation for 30 s, 50 ${}^{\circ}$ C annealing for 30 s, and 72 ${}^{\circ}$ C extension for 45 s. The PCR products were checked by gel-electrophoresis and then sent out to Eton Bioscience Inc (http://etonbio.com/) for Sanger sequencing. The primers for PCR reaction were used for sequencing reactions. DNA sequence mutations were checked by using the software package Sequencher (http://genecodes.com/). From the to-be-confirmed list of 1152 variants, 1072 were verified to be real somatic mutations, making the validation rate as high as 93%.

2.6 Identification of driver somatic mutations

We applied the computational approach, Oncodrive-fm [12], to detect candidate cancer drivers of PTCs on the basis of all the validated somatic mutations. This method is designed according to the principle that the bias toward the accumulation of somatic variants with high functional impact observed in a gene or group of genes indicates positive selection. Oncodrive-fm can measure such functional impact bias (called FM bias) and thus is able to distinguish driver genes or gene modules that are positively selected during tumor development from passenger mutations [12]. Oncodrive-fm also has the advantage of identifying lowly recurrent cancer drivers that are usually missed by the recurrence-based approaches [12]. The most important output of Oncodrive-fm is the ranking of the FM bias of genes. Top-ranking genes exhibit the largest deviations in their average FI (functional impact) from the background, thus making the best driver candidates.

2.7 Detecting somatic structural variants

Structural variations including inter-chromosomal translocations (CTX), intrachromosomal translocations (ITX), inversions (INV), deletions (DEL), and insertions (INS) were analyzed by CREST (Clipping REveals STructure), an algorithm that uses the soft-clipped reads to directly map the breakpoints of structural variations [13]. All samples were analyzed using the paired analysis module, which filters SVs present in the matching normal sample. Two additional methods, BreakDancer [14] and Geometric Analysis of Structural Variants (GSAV) [15], that use discordant paired-end reads to map structural variations were also run for comparison purpose. BreakDancer was run using the default parameters. For each predicted SV, we first checked whether discordant mapping of paired-end reads was caused by repetitive regions in human genome. All supporting reads were extracted in FASTQ format and each read re-mapped to the reference genome using BLAT. If a read-pair was mapped within the library insert range (mean insert size $\pm$ 3 standard deviation), it was NOT considered to be a supporting read pair for the SV. All SVs with $\geqslant$ 3 supporting read pairs and a BreakDancer score $\geqslant$ 30 after the re-mapping were retained and the tumor-only SVs were considered to be putative somatic SVs. The putative somatic SVs were then subjected to an assembly process to evaluate their validity. All reads mapped within 1 kb (kilobase) of the two breakpoints along with their unmapped mate pairs are extracted using the mapping information was based on the bam files. We then ran phrap [16] to assemble the extracted sequences into contigs using base call, quality value and paired-end sequence information. Assembly was carried out in two iterations because the first iteration usually generated contigs that represent the wild-type allele unless the alternative allele was a homozygous genomic change. The second iteration began with reads not assembled in the first iteration, which generated contigs for the heterozygous alternative allele. All contigs were mapped to the reference human genome using BLAT. If a contig had two distinct parts (i.e. two regions with minimum overlapping) mapped to two different genomic regions with high similarity ( $\geqslant$ 97%) and good read-length ( $\geqslant$ 30 bp), it was considered a cross-junction contig. Once such a contig is identified and there is no germline reads mapped to the breakpoints identified in the blat alignment, the SV is considered an assembly-validated somatic SV. GASV (version 1.4) was run using the default parameters. Paired tumor/normal bam files were used to identify putative somatic SVs.

2.8 Experimental validation of structural variations

All structural variations were validated by Sanger sequencing. Oligonucleotide primers for genomic PCR were designed for the 1000 bp flanking sequences of each SV using Primer 3 [11]. In some cases, a second iteration of primer design was carried out because there were multiple SVs detected within 1 kb to account for the presence of a second SV in the flanking region.

2.9 Annotation of structural variations

SVs with at least one breakpoint in gene coding region were further analyzed for their validity to encode a fusion protein. Each predicted fusion transcript was defined as a list of exons. There were “normal” exons, which correspond exactly to existing annotated exons, and there were fused exons, which are produced by structural variation events with both breakpoints in exons. The sequence of the fused exons is determined using the assembly of reads that cross the breakpoints and the annotation of the exons. For each exon in the list, we calculated exon length, using the annotation for normal exons and sequence length for fused exons. Furthermore, we calculated the number of bases that each exon contributes to the CDS based on the annotated CDS start and end positions. The number of “CDS bases” was 0 for exons lying outside of the CDS start and stop, the full exon length for exons wholly contained between start and stop, and a portion of the exon length for those containing CDS start or stop. If the sum of the number of CDS bases is a multiple of 3, then the CDS was in-frame. If not, then it was considered out-of-frame.

2.10 Identification of copy number variations (CNVs)

For whole genome sequencing data of the 6 PTC pairs, CNVs were identified by evaluating the number of sequence reads aligned at each base using the software CONSERTING [17], which employs a three-step analysis. First, the genome was divided into fixed-base windows and the average coverage depth was calculated for each window. The window size was set to be 100 bp in this study. The relative coverage depth was defined as the ratio between the average window coverage and the median of the average window coverage on a set of reference chromosomes that have no gross CNVs based on chromosome-by-chromosome paired tumor/normal coverage analysis. The difference of the relative coverage depth between the tumor sample and its matching was corrected for the GC content of the window and used as the signal for calling CNVs. Second, each chromosome was segmented using a recursive partitioning method on the difference of the tumor versus normal signal. Third, the segments were merged to ensure a genomewide error rate not greater than 0.05. For the 6 WGS pairs, each tumor and its matched normal DNA were also genotyped using the Illumina HumanOmni1-Quad BeadChips containing 1,140,419 SNPs with a median SNP spacing of 1.2 kb. CNVs were manually reviewed by comparison with Illumina SNP array inferred CNV results and structural variation breakpoints identified by CREST. Missing breakpoints that define the CNV boundaries were manually mapped at base-pair resolution by visual inspection of the soft-clipped reads in the immediate neighborhood of the predicted CNV boundaries. All analyses were performed in R (http://www.R-project.org, 64-bit version 3.6.1, with basic and tree package). We generated circular Circos plots [18] that give a visual genomic depiction of all the detected somatic alterations including structural variants, CNVs and SNVs/indels. For the 10 WES subjects, the algorithm implemented in the CONTRA program [19] was used in the somatic CNV analysis. The significance of CNVs was assessed using a multiple-testing adjusted $p$ -value threshold of 0.05 at the whole exome level.

Figure 1.

A stacked bar graph representing the total number of somatic mutations in the exon regions in each patient, color proportioned by the number of somatic mutations in each category of mutants including splicing, indels, nonsense, synonymous and nonsynonymous mutations.

2.11 RNA-seq experimentation and data analysis

The quality of the total RNA was evaluated using the Agilent 2100 Bioanalyser RNA 6000 Nano Chip. All samples had an RNA integrity number (RIN) of nine or better. For the RNA-Seq sample preparation, the Illumina TruSeqTM RNA Sample Preparation Kit was used according to the manufacturer’s protocol. RNA-Seq reads were obtained using Bustard version 1.9.0 (Illumina Pipeline version 1.3). Reads were quality-filtered using the standard Illumina process and a 0 (no) or 1 (yes) was used to define whether a read passed filtering or not. Sequence files for the 32 samples of the 16 PTC pairs were generated in FASTQ format (sequence read plus quality information in Phred format). TopHat 2.0.6 was used to process the RNA-seq reads that were aligned to the UCSC human reference genome (build hg19) by bowtie v0.12.8 [20, 21]. After TopHat processing, Cufflinks v2.0.2 (including Cuffdiff) was used to perform transcript assembly, abundance and differential gene expression tests between the matched tumor and normal samples [21].

3. Results

3.1 Somatic mutation landscape of papillary thyroid carcinomas

Illumina conducted whole-genome sequencing (WGS) for our first set of 12 DNA samples extracted from 6 papillary thyroid carcinomas (PTC) and the matched normal thyroid tissues. We obtained approximately 138-fold mean exon sequence coverage across the 6 PTC pairs (Table S3). The output short reads were aligned to a reference genome (NCBI human genome assembly build 37) using the BWA program [22]. For the 10 pairs of PTC tumor and normal DNAs subjected to WES, we generated a mean depth of 105 fold exon coverage (Table S4). More than 94% of the 212,911 exons on targeted regions were covered with more than 10 sequencing reads (Table S4).

By Sanger sequencing or genotyping (Nimblegen platform) the list of putative somatic mutations identified by next-generation sequencing (NGS), we validated a total of 1,072 high-confidence somatic mutations in the exonic regions from these 16 PTCs. These included 766 substitutions caused amino acid changes (missense), 67 nonsense mutations leading to truncated proteins, 19 frameshift insertions and deletions (indels) and 3 non-frameshift indels that ranged from 1 to 8 base pairs in length, 40 mutations occurring at exon splicing sites, and 177 silent (synonymous) mutations in exons (Table S5). The summary of the distribution of these exonic somatic mutations across all the 16 tumors was shown in Table S6. Figure 1 showed the number of nonsynonymous, synonymous, nonsense, indels (combining frameshift/non-frameshift insertions and deletions), and splicing mutations within each of the 16 Chinese PTC tumor samples. The ratio of nonsynonymous/synonymous mutations was greater than 2:1 in every tumor sample. PTC tumors from never-smokers (cPTC7 to cPTC16) have much fewer non-silent somatic mutations (including nonsynonymous, nonsense, indels) (mean 10, range 3–22) than current/former smokers (cPTC1 to cPTC6, mean 126, range 12–282) (Fig. 1, Table S6).

Table 1
Recurrently mutated genes with non-silent somatic mutations in at least 2 of the 16 Chinese PTC cases

Gene	N	Case (mutation)
TP53	5	CPTC12 (p.R116W), CPTC16 (p.R49S), CPTC10 (p.R24S), cPTC2 (p.E92X), cPTC1 (p.C3F)
COL11A1	3	cPTC2 (p.P455L), cPTC3 (p.G697X), cPTC3 (p.D242Y), cPTC1 (p.G640V)
FIG4	3	CPTC10 (p.I107S), cPTC3 (p.Y901C), cPTC1 (p.D307H)
UNC80	3	cPTC2 (p.F1964L), cPTC3 (p.R1176Q), cPTC1 (p.G876R)
MUC4	3	cPTC2 (p.A3481S), cPTC3 (p.S3367C), cPTC5 (p.S4079T), cPTC5 (p.A2086T)
DOPEY2	2	CPTC16 (p.Q540X), cPTC2 (p.S40X)
RYR2	2	cPTC3 (p.L1622fs), cPTC1 (p.P3240H)
PCNXL2	2	cPTC2 (p.Q1699H), cPTC2 (p.D834G), cPTC3 (p.A1738S)
PLXNA4	2	cPTC2 (p.E871X), cPTC1 (p.V531M)
SLC25A5	2	cPTC3 (p.A118T), cPTC1 (p.Y81X)
SDK1	2	cPTC2 (p.P2003A), cPTC1 (p.W1460C)
UBA1	2	cPTC6 (p.Q698X), cPTC1 (p.N758S)
GATAD2A	2	CPTC12 (p.P190fs), cPTC6 (p.Q525H)
AHNAK	2	cPTC2 (p.K4836N), cPTC6 (p.D1237N)
SCN1A	2	cPTC3 (p.L1304F), cPTC1 (p.L77M)
ANKRD44	2	cPTC2 (p.G140W), cPTC1 (p.K367M)
OR4C16	2	cPTC3 (p.W147X), cPTC1 (p.L84V)
CSMD2	2	cPTC2 (p.G1277R), cPTC1 (p.E3040V), cPTC1 (p.T2398R)
OR56A3	2	cPTC2 (p.I53M), cPTC1 (p.H4N), cPTC1 (p.L259I)
DCAF4L2	2	cPTC2 (p.H141N), cPTC1 (p.I251F)
TTLL5	2	cPTC3 (p.E777X), cPTC1 (p.S885F)
FAM135B	2	cPTC2 (p.A1356G), cPTC1 (p.D525A)
TTN	2	cPTC3 (p.T34882N), cPTC1 (p.G8343V), cPTC1 (p.S2900Y)
PCYT1A	2	cPTC6 (p.E66K), cPTC1 (p.V181L)
HS3ST4	2	cPTC2 (p.M277T), cPTC3 (p.D402N)
ADCY9	2	cPTC3 (p.V797L), cPTC1 (p.S140Y)
TSPAN19	2	cPTC2 (p.N165I), cPTC1 (p.H79N)
EXPH5	2	cPTC6 (p.F507L), cPTC1 (p.G578V), cPTC1 (p.G578S)
CSMD3	2	cPTC2 (p.T3458S), cPTC3 (p.P214H)
ITIH5L	2	cPTC2 (p.Q249L), cPTC1 (p.R1215S)
AHSG	2	CPTC11 (p.A267T), cPTC2 (p.G287R)
LRP1B	2	cPTC2 (p.R851S), cPTC3 (p.E2076V)
ZDBF2	2	CPTC11 (p.D1077N), cPTC6 (p.Y1610F)
RBMXL3	2	cPTC1 (p.H534N), cPTC5 (p.C388S), cPTC5 (p.E391Q), cPTC5 (p.N405Y)
DCTN4	2	cPTC2 (p.A208S), cPTC4 (p.D8Y)
KANK1	2	cPTC6 (p.P1187T), cPTC1 (p.V384L)
EXOC3	2	CPTC16 (p.C150Y), cPTC2 (p.C150W)

Figure 2.

Schematic representation of the driver genes of PTC identified by Oncodrive-fm. Twenty genes reached the statistical significance level of qvalue (Bonferroni-adjusted empirical $P$ -value) $\leqslant$ 0.01, including COL11A1, TP53, DOPEY2, RYR2, FIG4, PCNXL2, PLXNA4, SLC25A5, SDK1, UBA1, GATAD2A, AHNAK, SCN1A, ANKRD44, OR4C16, CSMD2, OR56A3, DCAF4L2, TTLL5, FAM135B.

3.2 Driver genes and pathways of the 16 PTC pairs

Based on the verified list of somatic mutations in Table S5, we found that 37 genes had non-silent somatic mutations in at least two of the sixteen tumors (Table 1). The most frequently mutated gene is TP53 that had non-silent somatic mutations in 5 PTCs, followed by COL11A1, FIG4, UNC80, MUC4 that each mutated in 3 PTCs, and the remaining genes mutated in 2 PTCs such as DOPEY2, RYR2, PCNXL2, PLXNA4, SLC25A5 etc. (Table 1). However, recurrence alone cannot tell us which genes are driver genes underlying the genetic etiology of PTCs due to the existence of a large number of passenger mutations. Therefore, to assess the statistical significance of each gene’s average FI (functional impact) with respect to the null distribution, we follow the standard Oncodrive-fm procedure [12]. Briefly, 100,000 permutations were conducted to obtain empirical $P$ -value for each gene. Then the empirical $P$ -values were corrected for multiple testing using Bonferroni’s approach. The cutoff to claim ‘significance’ was $q$ -value $<$ 0.01 ( $q$ -value is the Bonferroni-adjusted empirical $P$ -value). In this way, we detected 20 statistically significant driver genes underlying PTC (Fig. 2), most of which are novel drivers not implicated in thyroid cancer before. COL11A1 and TP53 are the two most significant PTC driver genes ( $q$ -values $=$ 4.34 $\times$ 10 ${}^{-9}$ and 3.23 $\times$ 10 ${}^{-8}$ ) having multiple somatic mutations of very high functional impact reflected by high MA (MutationAssessor) scores (Fig. 2). Other top-ranking significant driver genes were DOPEY2, RYR2, FIG4, PCNXL2, PLXNA4, SLC25A5, SDK1, UBA1, GATAD2A, AHNAK, SCN1A, ANKRD44, OR4C16, CSMD2, OR56A3, DCAF4L2, TTLL5, FAM135B. All of them had high functional impact somatic mutations in at least 2 PTCs. The genes with potential functional somatic mutations in at least 2 of the 16 cases indicated that they may be commonly mutated PTC driver genes whose somatic alterations could be used as biomarkers for the prognosis of PTC tumors. We also applied Oncodrive-fm to pathway analysis and identified several significantly altered molecular pathways underlying PTC (Table S7 and Fig. S1) such as metabolic pathway ( $q$ -value $=$ 0.001), pathway in cancer ( $q$ -value $=$ 0.0017), olfactory transduction pathway ( $q$ -value $=$ 0.0045), and calcium signaling pathway ( $q$ -value $=$ 0.018).

Figure 3.

Circos plots of genetic alterations in six papillary thyroid carcinoma (PTC) cases subjected to WGS. These plots depict i) structural genetic variants, including DNA copy number alterations: intra- (green) and inter-chromosomal (purple) translocations; loss of heterozygosity – brown; amplification – red; deletion – blue; and ii) sequence mutations in RefSeq genes involving missense, insertion, deletion – brown; truncating mutation, nonsense, splice, frameshift – red.

Table 2

Somatic structural variants identified by WGS that are also present at the transcriptome level as revealed by RNA-seq and validated by RT-PCR

Gene	Event	Sample ID	Chr ID	Validated
TRNAU1AP_RCC1	Modified in-frame fusion	cPTC2	1, 1	RT-PCR
RAB3GAP1-R3HDM1	Modified in-frame fusion	cPTC4	2, 2	RT-PCR
ENAH_ZSWIM5	Modified in-frame fusion	cPTC5	1, 1	RT-PCR
TRIT1	Truncating	cPTC2	1	RT-PCR
EFCAB7, ITGB3BP	Truncating	cPTC2	1	RT-PCR
SUSD1	Truncating	cPTC5	7, 9	RT-PCR
POLM	Truncating	cPTC2	20, 7	RT-PCR
GRIK5	Truncating	cPTC2	19	RT-PCR
C18orf34	Aberrant splicing	cPTC2	18	RT-PCR
ARID1A	Truncating	cPTC5	9,1	RT-PCR
TEAD1	Truncating	cPTC2	11, 19	RT-PCR
Complex Rearrangement involving	7 DNA SVs, 5 RNA SVs	cPTC5	2, 15	RT-PCR
multiple events such as
UNC50-FAM63B fusion

Table 3

Genes with somatic copy number changes and corresponding significant gene expression changes in at least two PTC tumors

Subject	Gene	Chr	LocStart	LocEnd	log ${}_{2}$ CN	$p$ value	log ${}_{2}$ Expression	$q$ value for differential
					ratio (T/N)	for CNV	ratio (T/N)	expression
cPTC5	ACTL6B	7	99648407	100675400	0.458	0	8.78976	0
cPTC3	ACTL6B	7	99194001	101768800	0.444	5.68E-06	5.08697	7.49E-05
cPTC5	B4GALNT4	11	133701	1805000	0.422	6.59E-10	6.06637	2.22E-08
cPTC2	B4GALNT4	11	133701	2545732	0.975	2.47E-08	5.47373	9.30E-07
cPTC3	BRD9	5	11701	1178200	0.947	0.000227	1.29012	0.00178
cPTC2	BRD9	5	11701	2738600	1.017	0.00201	1.35328	0.012286
cPTC5	BRSK2	11	133701	1805000	0.422	3.15E-12	6.52924	1.52E-10
cPTC2	BRSK2	11	133701	2545732	0.975	0	5.07974	0
cPTC1	CLDN10	13	90864719	104920518	$-$ 0.4787	0.001159	$-$ 6.54226	0.007981
cPTC3	CLDN10	13	96155301	96297500	0.827	7.57E-09	5.42012	2.27E-07
cPTC5	EML6	2	54965367	57933500	0.384	1.62E-07	3.33679	3.44E-06
CPTC9	EML6	2	55143863	55143983	0.585839	2.29E-05	3.5285	0.000219
cPTC5	FBXL16	16	60001	1785300	0.429	1.59E-08	5.19863	4.24E-07
cPTC1	FBXL16	16	60001	1785100	0.554	0.006691	2.43286	0.034556
cPTC5	FZD3	8	28390970	28430174	0.5045	8.17E-06	3.85171	0.000112
cPTC2	FZD3	8	26595734	28451100	0.76	0.000443	3.31304	0.003538
cPTC2	MUC5B	11	133701	2545732	0.975	0.000334	3.5519	0.002813
CPTC9	MUC5B	11	1277360	1277480	$-$ 0.94817	0.000243	$-$ 3.4195	0.001703
cPTC5	SEMA6D	15	47440601	49818900	$-$ 0.417	0.000285	$-$ 2.18387	0.0024
cPTC2	SEMA6D	15	24407717	49540844	$-$ 0.4154	0.000267	$-$ 2.31499	0.002348
cPTC1	TFDP1	13	113432801	114948400	0.51	0.000206	1.85661	0.001847
cPTC3	TFDP1	13	112721060	114502361	1.022	0.000101	2.07171	0.000888
cPTC5	TFR2	7	99648407	100675400	0.458	0.006186	2.08179	0.030804
cPTC3	TFR2	7	99194001	101768800	0.444	0.000952	2.58903	0.006035
cPTC2	TMEM87A	15	24407717	49540844	$-$ 0.4154	0.001595	$-$ 1.5283	0.010161
CPTC9	TMEM87A	15	42553360	42553480	$-$ 0.88529	0.008063	$-$ 1.06237	0.034362
cPTC3	TRIP13	5	11701	1178200	0.947	0	4.4497	0
cPTC2	TRIP13	5	11701	2738600	1.017	5.50E-12	4.33599	6.26E-10
cPTC5	VGF	7	100687401	102409200	0.453	2.80E-09	9.44001	8.45E-08
cPTC3	VGF	7	99194001	101768800	0.444	0.000105	3.88401	0.000922
cPTC5	ZNF3	7	99648407	100675400	0.458	2.54E-05	1.34631	0.000304
cPTC3	ZNF3	7	99194001	101768800	0.444	0.000161	1.15719	0.00132

3.3 Somatic copy number variants (CNV) identified in the 16 PTCs

We inferred somatic copy number variants (CNVs) for the six PTC pairs that were sequenced by WGS and the other ten PTC pairs that were sequenced by WES. We used $-$ 0.415 and 0.322 (which correspond to 0.5 copy loss/gain in diploid regions) and genomewide error rate not greater than 0.05 as thresholds to call somatic CNV following previous publication [23]. We also integrated the somatic CNV data with gene expression data from RNA-seq and found that 93 genes had both significantly altered copy numbers and corresponding significant gene expression changes in PTC tumors (Table S8). Sixteen genes of this list – ACTL6B, B4GALNT4, BRD9, BRSK2, CLDN10, EML6, FBXL16, FZD3, MUC5B, SEMA6D, TFDP1, TFR2, TMEM87A, TRIP13, VGF, ZNF3 had somatic copy number aberrations and corresponding somatic gene expression changes in at least two PTC tumors in our 16 PTC tumor-normal pairs (Table 3), highlighting their importance to the genetic etiology of PTC.

3.4 Chromosomal rearrangements identified in 6 whole genome sequenced PTC tumors

Using CREST [13], we detected a total of 382 candidates of structural variations (SVs) across all of the 6 WGS cases of PTC patients and validated that 219 of them were somatic SVs by PCR and follow-up Sanger sequencing in both the tumors and their matched normal samples. These included 34 CTX (inter-chromosomal translocations, mean 6 per case, range 1–17), 39 ITX (intra-chromosomal translocations, mean 7, range 0–23), 82 DEL (deletions, mean 14, range 2–47) and 64 INS (insertions, mean 11, range 0–53) (Fig. 3 and Table S9).

Remarkably, 46% (100 out of 219) of the validated somatic structural variations had breakpoints in coding genes, including genes with known roles in tumorigenesis such as ARID1A, KDM5A, MKL1, MLLT10, MYCL1, or genes also targeted by somatic mutations (for example, ASXL3, BICD1, LRP1B, SDK1). It was predicted that most of these structural variations (65 out of 100, 65%) can lead to loss-of-function of the involved genes. Twelve validated somatic structural variations present in the genome were also present in the transcriptome as revealed by RNA-seq and validated by RT-PCR (Table 2). A few lead to the formation of chimeric fusion proteins. Specifically, six chimeric genes encoding three fusion proteins were detected in three PTC cases which resulted in the expression of chimeric in-frame novel fusion genes, including TRNAU1AP-RCC1 (case cPTC2), RAB3GAP1-R3HDM1 (case cPTC4), and ENAH_ZSWIM5 (case cPTC5) (Table 2, Figs S2–S4).

In addition to the above in-frame gene fusion events, WGS and RNA-seq identified seven truncating events, one aberrant splicing, and one complex genomic rearrangement involving multiple SV events (Table 2, Figs S5–S13). Some of the corresponding proteins were known to be related to cancer biology. For example, ITGB3BP gene encodes a transcriptional regulator that binds to and enhances the activity of members of the nuclear receptor families, thyroid hormone receptors and retinoid X receptors. ITGB3BP induces apoptosis in breast cancer cells through a caspase 2-mediated signaling pathway. ARID1A is a chromatin remodeling gene, and the encoded protein is part of the large ATP-dependent chromatin remodeling complex SNF/SWI, which is required for transcriptional activation of cancer-related genes normally repressed by chromatin. The complex genomic rearrangement (Fig. S13) also created a novel fusion gene involving UNC50 in chromosome 2 and FAM63B in chromosome 15, though the exact role of these two genes in cancer is not known yet.

Using RNA-seq, we also identified 11 SV events that were validated by RT-PCR (Table S10). These are the structural changes occurring only at the transcriptome levels such as read-throughs. The biological significance of these events to tumorigenesis remains elusive at the current stage.

4. Discussion

In this project, we systematically studied the genetic alterations of Chinese PTC. We used Oncodrive-fm [12] to find cancer drivers of PTC. The data showed that COL11A1 and TP53 were the two most significant PTC driver genes each possessing multiple somatic mutations of very high functional impact. The status of TP53 as the most important PTC driver gene had been well established [24]. We queried the cBioPortal database for the TP53 mutation status (http://www.cbioportal.org/) and found that the non-silent somatic alteration rate in TP53 is 2.8% of the 1629 thyroid tumor samples from two large scale genome sequencing studies of thyroid cancer conducted in non-Chinese populations, i.e. TCGA and MKSCC studies [24, 25] (Fig. 4). This rate is relatively lower than the TP53 mutation rate in our Chinese PTC cohort, suggesting the ethnic difference or the limitation of the smaller sample size of our study. COL11A1 gene polymorphisms were associated with papillary thyroid cancer in a Korean population [26]. The aberrant gene expression of COL11A1 was associated with the metastases of a wide variety of cancers such as breast cancer [27], FAP polyps [28], colorectal tumors [28, 29], oral cavity/pharynx squamous cell carcinoma [30] etc. COL11A1 encodes one of the two alpha chains of type XI collagen. Collagen is the major component of the interstitial extracellular matrix (ECM). ECM is known to play an active role in numerous biological processes such as cell shape, proliferation, migration, differentiation, apoptosis as well as carcinogenesis [29]. Somatic alterations in COL11A1 may contribute to invasion-facilitating altered proteolysis in the ECM of multiple types of cancers [31].

As for other driver genes, PLXNA4 belongs to the family of plexins that are transmembrane high-affinity receptors for semaphorins, regulating cell guidance, motility, invasion and thus are involved in cancer progression and metastasis [32]. Furthermore, PLXNA4 formed stable complexes with the FGFR1 and VEGFR-2 tyrosine-kinase receptors and enhanced VEGF- induced VEGFR-2 phosphorylation in endothelial cells as well as bFGF-induced cell proliferation, therefore promoting tumor progression and tumor angiogenesis [33]. PLXNA4 may represent a target for the development of novel anti-angiogenic and anti-tumorigenic drugs [33]. Molecular profiling of the “plexinome” in melanoma and pancreatic cancer has detected amplification of and somatic mutations in PLXNA4 and other plexins, indicating the involvement of mutated plexins

Figure 4.

Oncoprint plotting showed that fourteen of the 20 PTC driver genes identified in our Chinese cohort also had non-silent somatic alterations in the non-Chinese thyroid cancer patients based on the results of two large scale genome sequencing studies of thyroid cancer (TCGA and MKSCC studies) archived in the cBioPortal database.

especially PLXNA4 in cancer progression [32]. All of these supported our finding of PLXNA4 as a PTC driver gene. Another driver gene – UBA1 encodes ubiquitin-activating enzyme that is required for cellular response to DNA damage [34]. Functional evidences established UBA1 as the enzyme critical for ubiquitylation-dependent signaling of both DSBs (DNA double-strand breaks) and replication stress in human cells, with implications for maintenance of genomic integrity, disease pathogenesis and cancer treatment [34]. Mutation of UBA1 causes tissue overgrowth in Drosophila [35]. Novel drugs and therapies targeting UBA1 as anti-cancer strategies in other cancers are being developed [36, 37, 38], which may be effective in treating PTC patients harboring UBA1 somatic mutations as well.

Previous functional studies also lent support for AHNAK as a PTC driver gene. AHNAK is a protein which has been recently linked to reorganization of the actin cytoskeleton, cellular migration and invasion [39]. AHNAK had been shown to be essential for pseudopod protrusion and tumor cell migration and invasion [40]. Knockdown of AHNAK in metastatic cells resulted in reduced actin cytoskeleton dynamics and induction of mesenchymal-epithelial transition (MET) that is a key cellular process associated with the invasive or metastatic program in many cancers [40]. Tumoral AHNAK overexpression significantly associated with poor survival of larynx carcinoma patients [39] and was involved in colon carcinogenesis [41]. AHNAK is also the biomarker that may have clinical utility for assessing response to cancer treatment [42]. Other PTC driver genes supported by previous studies are CSMD2 and TTLL5. CSMD2 maps to a chromosomal region that may contain a suppressor of oligodendrogliomas [43]. CSMD2 was found to be frequently hypermethylated in pancreatic cancer cell lines and primary pancreatic cancers in a tumor-specific manner [44]. Hypermethylation epigenetically silenced the expression of CSMD2 in tumors, supporting its role as tumor suppressor and our observation of CSMD2 as one driver of PTC. TTLL5 was also known as STAMP, it was shown that STAMP alters the growth of transformed and ovarian cancer cells [45]. In addition, a recent whole-exome sequencing combined with functional genomics reveals TTLL5 as a driver tumor-suppressor in endometrial cancer [46], consistent with our observation of the nonsense and deleterious somatic mutations in PTC patients.

We compared our somatic mutation results of the PTC driver genes identified by Oncodrive-fm (Fig. 2) to the results of the 1629 thyroid tumor samples from two large scale genome sequencing studies of thyroid cancer conducted in non-Chinese populations, i.e. TCGA and MKSCC studies [24, 25] using the cBioPortal database. Fourteen of the 20 PTC driver genes identified in our Chinese cohort also had non-silent somatic mutations in the non-Chinese thyroid cancer patients (Fig. 4), suggesting the functional importance of these driver genes to PTC etiology across different ethnic groups. These 14 genes are COL11A1, TP53, RYR2, PLXNA4, SLC25A5, SDK1, UBA1, GATAD2A, AHNAK, SCN1A, ANKRD44, CSMD2, TTLL5, FAM135B. The six genes having somatic mutations specifically in the Chinese PTC cohort are DOPEY2, FIG4, PCNXL2, OR4C16, OR56A3, and DCAF4L2, indicating that ethnicity related genetic background and living habits of Chinese may result in the specific mutated driver genes contributing to PTC that are hardly seen in the non-Chinese thyroid cancer patients.

Altered genes of certain molecular pathways were significantly enriched in PTC tumors. These include metabolic pathway, pathway in cancer, olfactory transduction pathway and calcium signaling pathway (Table S7 and Fig. S1). Four genes in metabolic pathways were previously implicated in carcinogenesis, namely NOS1, TKTL1, LDHC, UGT8. NOS1 encodes the enzyme known as the nitric oxide synthase that converts arginine and oxygen into citrulline and NO (nitric oxide) [47]. High expression of NOS1 is a favorable prognostic sign in non-small cell lung carcinoma [48]. TKTL1 (transketolase-like protein 1) encodes a transketolase-like enzyme whose expression has been shown to contribute to carcinogenesis through increased aerobic glycolysis and hypoxia-inducible factor alpha stabilization [49, 50]. LDHC encodes the enzyme – lactate dehydrogenase C that has been shown to be overexpressed in cancer compared to the normal tissue samples [51]. The elevated expression of the UGT8 gene coding UDP-galactose:ceramide galactosyltransferase correlated with a significantly increased the risk of lung metastases [52].

Most of the significant genes involved in cancer pathways (Table S7 and Fig. S1) were also indicated in the etiology of PTC and other cancers before, especially TP53, KRAS, CASP8, EP300, PIK3CG, EGLN3, PDGFRA, and ERBB2 etc. TP53 and KRAS were the well-established PTC cancer genes [24]. EP300 encodes a histone modifier and was recently identified as the most significantly mutated gene in small cell lung cancer [53]. CASP8 encodes a member of the cysteine-aspartic acid protease (caspase) family. It is involved in the programmed cell death induced by Fas and various apoptotic stimuli. PIK3CG encodes a protein that belongs to the pi3/pi4-kinase family of proteins and is an important modulator of extracellular signals, including those elicited by E-cadherin-mediated cell-cell adhesion. Recently, PIK3CG emerged as being a potential oncogene because overexpression of its subunits leads to oncogenic cellular transformation and malignancy [54]. EGLN3 was found as overexpressed in lung tumors and is one of the unfavorable prognosticators for overall survival [55]. PDGFRA encodes a cell surface tyrosine kinase receptor for members of the platelet-derived growth factor family. Studies suggest that PDGFRA plays a role in tumor progression. Anti-PDGFRA therapy results in markedly decreased tumor growth in vivo [56]. ERBB2 is interesting because it is a member of the EGFR family of receptor tyrosine kinases. It sits in the surface of cells and is activated by external signals that tell the cell to grow. Mutations to ERBB2 may contribute to the accelerated tumor growth [57].

We also identified a list of genes with significant somatic CNV and corresponding gene expression changes (Table 3). The copy number gains at chromosomal region 5p15.33 involving BRD9 and TRIP13 genes in two PTC patients – cPTC2 and cPTC3 were in line with a previous study showing that the copy number gains of these two genes may play a key role in lung cancer development [58]. FZD3 was a critical gene in WNT pathway important to carcinogenesis [59]. TFDP1, which encodes the E2F-associated transcription factor DP1 is a candidate oncogene at 13q34. TFDP1 showed very strong gene amplification in lung tumors with a frequency of about 3% [60]. Depletion of TFDP1 expression by small interference RNA in a lung cancer cell line (HCC33) with TFDP1 amplification and protein over-expression reduced cell viability by 50% [60]. These lent supports to our finding of significant copy number gain and oever-expression of TFDP1 in Chinese PTC patients.

Analysis of structural variants revealed the presence of three in-frame novel fusion genes in Chinese PTC tumors (Table 2, Figs S2–S4). The involved members of these fusion events played significant roles in molecular and cellular processes. For example, the binding of RCC1 (regulator of chromosome condensation 1) to chromatin is critical for cellular processes such as mitosis, nucleocytoplasmic transport, and nuclear envelope formation; RAB3GAP1 gene encodes the catalytic subunit of a Rab GTPase activating protein. Mutations in this gene are associated with Warburg micro syndrome. ENAH is a motility-promoting gene involved in the control of cell-cell adhesion and cell motility. ENAH was associated with human breast carcinogenesis and its overexpression consistently characterized the transformed phenotype of tumor cells of different lineages [61].

In summary, this project comprehensively utilized NGS approaches to analyze papillary thyroid tumors with the aim of contributing to the better understanding of the genetic architecture of PTC. The data identified two most important PTC driver genes – COL11A1 and TP53, followed by other driver genes with previous biological evidence such as PLXNA4, UBA1, AHNAK, CSMD2 and TTLL5. We also found that four major molecular pathways, namely, metabolic pathway, pathway in cancer, olfactory transduction pathway and calcium signaling pathway were significantly altered in PTC tumors. Copy number aberrations and structural variants were common in PTC, with the most likely driver genes being BRD9, TRIP13, TFDP1, RCC1, RAB3GAP1, and ENAH (involved in in-frame gene fusions) etc. The identified somatic alterations may function as biomarkers for PTC in Chinese patients. However, due to the limited sample size in this study, more driver genes of Chinese PTC await discovery by using larger PTC sample sets. Functional assays are necessary to dissect the molecular mechanisms associated with the PTC driver genes.

Footnotes

Acknowledgments

We thank all the inside reviewers for the comments on the revision of this manuscript.

Conflict of interest

All of the authors declare that they have no conflict of interest.

Supplementary data

The supplementary files are available to download from http://dx.doi.org/10.3233/CBM-191200.

References

Hay

I.D.

Thompson

G.B.

Grant

C.S.

Bergstralh

E.J.

Dvorak

C.E.

Gorman

C.A.

Maurer

M.S.

McIver

Mullan

B.P.

Oberg

A.L.

Powell

C.C.

van Heerden

J.A.

and Goellner

J.R.

, Papillary thyroid carcinoma managed at the Mayo Clinic during six decades (1940–1999): temporal trends in initial therapy and long-term outcome in 2444 consecutively treated patients, World J Surg 26 (2002), 879–885.

Chen

A.Y.

Jemal

and Ward

E.M.

, Increasing incidence of differentiated thyroid cancer in the United States, 1988–2005, Cancer 115 (2009), 3801–3807.

Ren

X.S.

Yin

M.H.

Zhang

Wang

Feng

S.P.

Wang

G.X.

Luo

Y.J.

Liang

P.Z.

Yang

X.Q.

J.X.

and Zhang

B.L.

, Tumor-suppressive microRNA-449a induces growth arrest and senescence by targeting E2F3 in human lung cancer cells, Cancer Lett 344 (2014), 195–203.

Handsaker

Wysoker

Fennell

Ruan

Homer

Marth

Abecasis

and Durbin

, The Sequence Alignment/Map format and SAMtools, Bioinformatics 25 (2009), 2078–2079.

Larson

D.E.

Harris

C.C.

Chen

Koboldt

D.C.

Abbott

T.E.

Dooling

D.J.

Ley

T.J.

Mardis

E.R.

Wilson

R.K.

and Ding

, SomaticSniper: identification of somatic point mutations in whole genome sequencing data, Bioinformatics 28 (2012), 311–317.

Koboldt

D.C.

Chen

Wylie

Larson

D.E.

McLellan

M.D.

Mardis

E.R.

Weinstock

G.M.

Wilson

R.K.

and Ding

, VarScan: variant detection in massively parallel sequencing of individual and pooled samples, Bioinformatics 25 (2009), 2283–3385.

Wang

and Hakonarson

, ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data, Nucleic Acids Res 38 (2010), e164.

Yang

and Wang

, Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR, Nat Protoc 10 (2015), 1556–1566.

Vinckenbosch

Tian

Huerta-Sanchez

Jiang

Albrechtsen

Andersen

Cao

Korneliussen

Grarup

Guo

Hellman

Jin

Liu

Sparso

Tang

Zheng

Astrup

Bolund

Holmkvist

Jorgensen

Kristiansen

Schmitz

Schwartz

T.W.

Zhang

Yang

Wang

Hansen

Pedersen

and Nielsen

, Resequencing of 200 human exomes identifies an excess of low-frequency non-synonymous coding variants, Nat Genet 42 (2010), 969–972.

10.

Sim

N.L.

Kumar

Henikoff

Schneider

and Ng

P.C.

, SIFT web server: predicting effects of amino acid substitutions on proteins, Nucleic Acids Res 40 (2012), W452–457.

11.

Rozen

and Skaletsky

, Primer3 on the WWW for general users and for biologist programmers, Methods Mol Biol 132 (2000), 365–386.

12.

Gonzalez-Perez

and Lopez-Bigas

, Functional impact bias reveals cancer drivers, Nucleic Acids Res 40 (2012), e169.

13.

Wang

Mullighan

C.G.

Easton

Roberts

Heatley

S.L.

Rusch

M.C.

Chen

Harris

C.C.

Ding

Holmfeldt

Payne-Turner

Fan

Wei

Zhao

Obenauer

J.C.

Naeve

Mardis

E.R.

Wilson

R.K.

Downing

J.R.

and Zhang

, CREST maps somatic structural variation in cancer genomes with base-pair resolution, Nat Methods 8 (2011), 652–654.

14.

Chen

Wallis

J.W.

McLellan

M.D.

Larson

D.E.

Kalicki

J.M.

Pohl

C.S.

McGrath

S.D.

Wendl

M.C.

Zhang

Locke

D.P.

Shi

Fulton

R.S.

Ley

T.J.

Wilson

R.K.

Ding

and Mardis

E.R.

, BreakDancer: an algorithm for high-resolution mapping of genomic structural variation, Nat Methods 6 (2009), 677–681.

15.

Sindi

Helman

Bashir

and Raphael

B.J.

, A geometric approach for classification and comparison of structural variants, Bioinformatics 25 (2009), i222–230.

16.

Ewing

and Green

, Base-calling of automated sequencer traces using phred. II. Error probabilities, Genome Res 8 (1998), 186–194.

17.

Chen

Gupta

Wang

Nakitandwe

Roberts

Dalton

J.D.

Parker

Patel

Holmfeldt

Payne

Easton

Rusch

Patel

Baker

S.J.

Dyer

M.A.

Shurtleff

Espy

Pounds

Downing

J.R.

Ellison

D.W.

Mullighan

C.G.

and Zhang

, CONSERTING: integrating copy-number analysis with structural-variation detection, Nat Methods 12 (2015), 527–530.

18.

Krzywinski

Schein

Birol

Connors

Gascoyne

Horsman

Jones

S.J.

and Marra

M.A.

, Circos: an information aesthetic for comparative genomics, Genome Res 19 (2009), 1639–1645.

19.

Lupat

Amarasinghe

K.C.

Thompson

E.R.

Doyle

M.A.

Ryland

G.L.

Tothill

R.W.

Halgamuge

S.K.

Campbell

I.G.

and Gorringe

K.L.

, CONTRA: copy number analysis for targeted resequencing, Bioinformatics 28 (2012), 1307–1313.

20.

Trapnell

Pachter

and Salzberg

S.L.

, TopHat: discovering splice junctions with RNA-Seq, Bioinformatics 25 (2009), 1105–1111.

21.

Trapnell

Roberts

Goff

Pertea

Kim

Kelley

D.R.

Pimentel

Salzberg

S.L.

Rinn

J.L.

and Pachter

, Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks, Nat Protoc 7 (2012), 562–578.

22.

and Durbin

, Fast and accurate short read alignment with Burrows-Wheeler transform, Bioinformatics 25 (2009), 1754–1760.

23.

Lonigro

R.J.

Grasso

C.S.

Robinson

D.R.

Jing

Y.M.

Cao

Quist

M.J.

Tomlins

S.A.

Pienta

K.J.

and Chinnaiyan

A.M.

, Detection of somatic copy number alterations in cancer using targeted exome capture sequencing, Neoplasia 13 (2011), 1019–1025.

24.

Integrated genomic characterization of papillary thyroid carcinoma, Cell 159 (2014), 676–690.

25.

Landa

Ibrahimpasic

Boucai

Sinha

Knauf

J.A.

Shah

R.H.

Dogan

Ricarte-Filho

J.C.

Krishnamoorthy

G.P.

Schultz

Berger

M.F.

Sander

Taylor

B.S.

Ghossein

Ganly

and Fagin

J.A.

, Genomic and transcriptomic hallmarks of poorly differentiated and anaplastic thyroid cancers, J Clin Invest 126 (2016), 1052–1066.

26.

Park

H.J.

Choe

B.K.

Kim

S.K.

Park

H.K.

Kim

J.W.

Chung

J.H.

Hong

I.K.

Chung

D.H.

and Kwon

K.H.

, Association between collagen type XI alpha1 gene polymorphisms and papillary thyroid cancer in a Korean population, Exp Ther Med 2 (2011), 1111–1116.

27.

Ellsworth

R.E.

Seebach

Field

L.A.

Heckman

Kane

Hooke

J.A.

Love

and Shriver

C.D.

, A gene expression signature that defines breast cancer metastases, Clin Exp Metastasis 26 (2009), 205–213.

28.

Fischer

Salahshor

Stenling

Bjork

Lindmark

Iselius

Rubio

and Lindblom

, COL11A1 in FAP polyps and in sporadic colorectal tumors, BMC Cancer 1 (2001), 17.

29.

Fischer

Stenling

Rubio

and Lindblom

, Colorectal carcinogenesis is associated with stromal expression of COL11A1 and COL5A2, Carcinogenesis 22 (2001), 875–878.

30.

Schmalbach

C.E.

Chepeha

D.B.

Giordano

T.J.

Rubin

M.A.

Teknos

T.N.

Bradford

C.R.

Wolf

G.T.

Kuick

Misek

D.E.

Trask

D.K.

and Hanash

, Molecular profiling and the identification of genes associated with metastatic oral cavity/pharynx squamous cell carcinoma, Arch Otolaryngol Head Neck Surg 130 (2004), 295–302.

31.

Kim

Watkinson

Varadan

and Anastassiou

, Multi-cancer computational analysis reveals invasion-associated variant of desmoplastic reaction involving INHBA, THBS2 and COL11A1, BMC Med Genomics 3 (2010), 51.

32.

Balakrishnan

Penachioni

J.Y.

Lamba

Bleeker

F.E.

Zanon

Rodolfo

Vallacchi

Scarpa

Felicioni

Buck

Marchetti

Comoglio

P.M.

Bardelli

and Tamagnone

, Molecular profiling of the “plexinome” in melanoma and pancreatic cancer, Hum Mutat 30 (2009), 1167–1174.

33.

Kigel

Rabinowicz

Varshavsky

Kessler

and Neufeld

, Plexin-A4 promotes tumor progression and tumor angiogenesis by enhancement of VEGF and bFGF signaling, Blood 118 (2011), 4285–4296.

34.

Moudry

Lukas

Macurek

Hanzlikova

Hodny

Lukas

and Bartek

, Ubiquitin-activating enzyme UBA1 is required for cellular response to DNA damage, Cell Cycle 11 (2012), 1573–1582.

35.

Pfleger

C.M.

Harvey

K.F.

Yan

and Hariharan

I.K.

, Mutation of the gene encoding the ubiquitin activating enzyme ubal causes tissue overgrowth in Drosophila, Fly (Austin) 1 (2007), 95–105.

36.

Olsen

S.K.

Capili

A.D.

Cisar

J.S.

Lima

C.D.

and Tan

D.S.

, Designed semisynthetic protein inhibitors of Ub/Ubl E1 activating enzymes, J Am Chem Soc 132 (2010), 1748–1749.

37.

G.W.

Ali

Wood

T.E.

Wong

Maclean

Wang

Gronda

Skrtic

Hurren

Mao

Venkatesan

Beheshti Zavareh

Ketela

Reed

J.C.

Rose

Moffat

Batey

R.A.

Dhe-Paganon

and Schimmer

A.D.

, The ubiquitin-activating enzyme E1 as a therapeutic target for the treatment of leukemia and multiple myeloma, Blood 115 (2010), 2251–2259.

38.

Lukkarila

J.L.

da Silva

S.R.

Lynn-Pavia

Gunning

P.T.

and Schimmer

A.D.

, Targeting the ubiquitin E1 as a novel anti-cancer strategy, Curr Pharm Des (2012).

39.

Dumitru

C.A.

Bankfalvi

Zeidler

Brandau

and Lang

, AHNAK and inflammatory markers predict poor survival in laryngeal carcinoma, PLoS One 8 (2013), e56420.

40.

Shankar

Messenberg

Chan

Underhill

T.M.

Foster

L.J.

and Nabi

I.R.

, Pseudopodial actin dynamics control epithelial-mesenchymal transition in metastatic cancer cells, Cancer Res 70 (2010), 3780–3790.

41.

Tanaka

Jin

Yamazaki

Takahara

Takuwa

and Nakamura

, Identification of candidate cooperative genes of the Apc mutation in transformation of the colon epithelial cell by retroviral insertional mutagenesis, Cancer Sci 99 (2008), 979–985.

42.

Leong

Nunez

A.C.

Lin

M.Z.

Crossett

Christopherson

R.I.

and Baxter

R.C.

, iTRAQ-based proteomic profiling of breast cancer cell response to doxorubicin and TRAIL, J Proteome Res 11 (2012), 3561–3572.

43.

Lau

W.L.

and Scholnick

S.B.

, Identification of two new members of the CSMD gene family small star, filled, Genomics 82 (2003), 412–415.

44.

Shimizu

Horii

Sunamura

Motoi

Egawa

Unno

and Fukushige

, Identification of epigenetically silenced genes in human pancreatic cancer by a novel method “microarray coupled with methyl-CpG targeted transcriptional activation” (MeTA-array), Biochem Biophys Res Commun 411 (2011), 162–167.

45.

Blackford

J.A.

, Jr. Kohn

E.C.

and Simons

S.S.

, Jr., STAMP alters the growth of transformed and ovarian cancer cells, BMC Cancer 10 (2010), 128.

46.

Liang

Cheung

L.W.

Stemke-Hale

Dogruluk

Liu

Guo

Scherer

S.E.

Carter

Westin

S.N.

Dyer

M.D.

Verhaak

R.G.

Zhang

Karchin

Liu

C.G.

K.H.

Broaddus

R.R.

Scott

K.L.

Hennessy

B.T.

and Mills

G.B.

, Whole-exome sequencing combined with functional genomics reveals novel candidate driver cancer genes in endometrial cancer, Genome Res 22 (2012), 2120–2129.

47.

Kanwar

J.R.

Kanwar

R.K.

Burrow

and Baratchi

, Recent advances on the roles of NO in cancer and chronic inflammatory disorders, Curr Med Chem 16 (2009), 2373–2394.

48.

Puhakka

Kinnula

Napankangas

Saily

Koistinen

Paakko

and Soini

, High expression of nitric oxide synthases is a favorable prognostic sign in non-small cell lung carcinoma, APMIS 111 (2003), 1137–1146.

49.

Fritz

Coy

J.F.

Murdter

T.E.

Ott

Alscher

M.D.

and Friedel

, TKTL-1 expression in lung cancer, Pathol Res Pract 208 (2012), 203–209.

50.

Kayser

Kassem

Sienel

Schulte-Uentrop

Mattern

Aumann

Stickeler

Werner

Passlick

and zur Hausen

, Lactate-dehydrogenase 5 is overexpressed in non-small cell lung cancer and correlates with the expression of the transketolase-like protein 1, Diagn Pathol 5 (2010), 22.

51.

Tang

and Goldberg

, Homo sapiens lactate dehydrogenase c (Ldhc) gene expression in cancer cells is regulated by transcription factor Sp1, CREB, and CpG island methylation, J Androl 30 (2009), 157–167.

52.

Dziegiel

Owczarek

Plazuk

Gomulkiewicz

Majchrzak

Podhorska-Okolow

Driouch

Lidereau

and Ugorski

, Ceramide galactosyltransferase (UGT8) is a molecular marker of breast cancer malignancy and lung metastases, Br J Cancer 103 (2010), 524–531.

53.

Peifer

Fernandez-Cuesta

Sos

M.L.

George

Seidel

Kasper

L.H.

Plenker

Leenders

Sun

Zander

Menon

Koker

Dahmen

Muller

Di Cerbo

Schildhaus

H.U.

Altmuller

Baessmann

Becker

de Wilde

Vandesompele

Bohm

Ansen

Gabler

Wilkening

Heynck

Heuckmann

J.M.

Carter

S.L.

Cibulskis

Banerji

Getz

Park

K.S.

Rauh

Grutter

Fischer

Pasqualucci

Wright

Wainer

Russell

Petersen

Chen

Stoelben

Ludwig

Schnabel

Hoffmann

Muley

Brockmann

Engel-Riedel

Muscarella

L.A.

Fazio

V.M.

Groen

Timens

Sietsma

Thunnissen

Smit

Heideman

D.A.

Snijders

P.J.

Cappuzzo

Ligorio

Damiani

Field

Solberg

Brustugun

O.T.

Lund-Iversen

Sanger

Clement

J.H.

Soltermann

Moch

Weder

Solomon

Soria

J.C.

Validire

Besse

Brambilla

Lantuejoul

Lorimier

Schneider

P.M.

Hallek

Pao

Meyerson

Sage

Shendure

Schneider

Buttner

Wolf

Nurnberg

Perner

Heukamp

L.C.

Brindle

P.K.

Haas

and Thomas

R.K.

, Integrative genome analyses identify key somatic driver mutations of small-cell lung cancer, Nat Genet 44 (2012), 1104–1110.

54.

Brazzatti

J.A.

Klingler-Hoffmann

Haylock-Jacobs

Harata-Lee

Niu

Higgins

M.D.

Kochetkova

Hoffmann

and McColl

S.R.

, Differential roles for the p101 and p84 regulatory subunits of PI3Kgamma in tumor growth and metastasis, Oncogene 31 (2012), 2350–2361.

55.

Andersen

Donnem

Stenvold

Al-Saad

Al-Shibli

Busund

L.T.

and Bremnes

R.M.

, Overexpression of the HIF hydroxylases PHD1, PHD2, PHD3 and FIH are individually and collectively unfavorable prognosticators for NSCLC survival, PLoS One 6 (2011), e23847.

56.

Reinmuth

Liersch

Raedel

Fehrmann

Bayer

Schwoeppe

Kessler

Berdel

Thomas

and Mesters

R.M.

, Combined anti-PDGFRalpha and PDGFRbeta targeting in non-small cell lung cancer, Int J Cancer 124 (2009), 1535–1544.

57.

Minami

Shimamura

Shah

LaFramboise

Glatt

K.A.

Liniker

Borgman

C.L.

Haringsma

H.J.

Feng

Weir

B.A.

Lowell

A.M.

Lee

J.C.

Wolf

Shapiro

G.I.

Wong

K.K.

Meyerson

and Thomas

R.K.

, The major lung cancer-derived mutants of ERBB2 are oncogenic and are associated with sensitivity to the irreversible EGFR/ERBB2 inhibitor HKI-272, Oncogene 26 (2007), 5023–5027.

58.

Kang

J.U.

Koo

S.H.

Kwon

K.C.

Park

J.W.

and Kim

J.M.

, Gain at chromosomal region 5p15.33, containing TERT, is the most frequent genetic event in early stages of non-small cell lung cancer, Cancer Genet Cytogenet 182 (2008), 1–11.

59.

Lee

E.H.

Chari

Lam

R.T.

Yee

English

Evans

K.G.

Macaulay

Lam

and Lam

W.L.

, Disruption of the non-canonical WNT pathway in lung squamous cell carcinoma, Clin Med Oncol 2008 (2008), 169–179.

60.

Castillo

S.D.

Angulo

Suarez-Gauthier

Melchor

Medina

P.P.

Sanchez-Verde

Torres-Lanzas

Pita

Benitez

and Sanchez-Cespedes

, Gene amplification of the transcription factor DP1 and CTNND1 in human lung cancer, J Pathol 222 (2010), 89–98.

61.

Di Modugno

DeMonte

Balsamo

Bronzi

Nicotra

M.R.

Alessio

Jager

Condeelis

J.S.

Santoni

Natali

P.G.

and Nistico

, Molecular cloning of hMena (ENAH) and its splice variant hMena+11a: epidermal growth factor increases their expression and stimulates hMena+11a phosphorylation in breast cancer cell lines, Cancer Res 67 (2007), 2657–2665.

Novel somatic alterations underlie Chinese papillary thyroid carcinoma

Abstract

Keywords

1. Introduction

2. Materials and methods

2.1 Chinese PTC patient samples

2.2 DNA library preparation for NGS

2.3 Next-generation sequencing and bioinformatics analyses

2.4 Read mapping and alignment and variant analysis

2.5 Experimental validation of SNVs/indels

2.6 Identification of driver somatic mutations

2.7 Detecting somatic structural variants

2.8 Experimental validation of structural variations

2.9 Annotation of structural variations

2.10 Identification of copy number variations (CNVs)

3. Results

3.1 Somatic mutation landscape of papillary thyroid carcinomas

Table 1 Recurrently mutated genes with non-silent somatic mutations in at least 2 of the 16 Chinese PTC cases

3.4 Chromosomal rearrangements identified in 6 whole genome sequenced PTC tumors

4. Discussion

Footnotes

Acknowledgments

Conflict of interest

Supplementary data

References

Table 1
Recurrently mutated genes with non-silent somatic mutations in at least 2 of the 16 Chinese PTC cases