A competing endogenous RNA network identifies novel mRNA,miRNA and lncRNA markers for the prognosis of diabetic pancreatic cancer

Abstract

Pancreatic cancer (PaC) is highly associated with diabetes mellitus (DM). However, the mechanisms are insufficient. The study aimed to uncover the underlying regulatory mechanism on diabetic PaC and find novel biomarkers for the disease prognosis. Two RNA-sequencing (RNA-seq) datasets, GSE74629 and GSE15932, as well as relevant data in TCGA were utilized. After pretreatment, differentially expressed genes (DEGs) or miRNAs (DEMs) or lncRNAs (DELs) between diabetic PaC and non-diabetic PaC patients were identified, and further examined for their correlations with clinical information. Prognostic RNAs were selected using KM curve. Optimal gene set for classification of different samples were recognized by support vector machine. Protein-protein interaction (PPI) network was constructed for DEGs based on protein databases. Interactions among three kinds of RNAs were revealed in the ‘lncRNA-miRNA-mRNA’ competing endogenous RNA (ceRNA) network. A group of 32 feature genes were identified that could classify diabetic PaC from non-diabetic PaC, such as CCDC33, CTLA4 and MAP4K1. This classifier had a high accuracy on the prediction. Seven lncRNAs were tied up with prognosis of diabetic PaC, especially UCA1. In addition, crucial DEMs were selected, such as hsa-miR-214 (predicted targets: MAP4K1 and CCDC33) and hsa-miR-429 (predicted targets: CTLA4). Notably, interactions of ‘HOTAIR-hsa-miR-214-CCDC33’ and ‘CECR7-hsa-miR-429-CTLA4’ were highlighted in the ceRNA network. Several biomarkers were identified for diagnosis of diabetic PaC, such as HOTAIR, CECR7, UCA1, hsa-miR-214, hsa-miR-429, CCDC33 and CTLA4. ‘HOTAIR-hsa-miR-214-CCDC33’ and ‘CECR7-hsa-miR-429-CTLA4’ regulations might be two important mechanisms for the disease progression.

Keywords

Pancreatic cancer diabetes mellitus lncRNA feature gene competing endogenous RNA

Introduction

Pancreatic cancer (PaC) is one of lethal diseases with high mortality.¹ Due to late onset of symptoms, the 5-year overall survival rate of PaC is very poor (less than 5%).² Diabetes mellitus (DM) is another cause of healthy challenge worldwide because there is insufficient data to monitor and inspect the disease progression.³ Notably, prevalence of DM, especially new-onset DM, is found very high among PaC patients.⁴ As recorded in a study, more PaC cases meet the criteria for DM diagnosis (40.2% vs 19.2%), compared with control.⁵ Previously, a meta-analysis indicated DM was tightly related to increased risk of PaC.⁶ These suggest DM is highly associated with PaC and the identification of specific biomarkers for diabetic PaC could be useful for the detection and prevention of PaC.

Currently, several relevant biomarkers have been identified. Glucagon/insulin ratio is supposed as a candidate biomarker for PaC in new-onset DM patients.⁷ Serum thrombospondin-1 level is found decreased in PaC patients and the level is associated with DM.⁸ In addition, VNN1 accounts for paraneoplastic islet dysfunction in PaC with new-onset DM, and thus is identified as a potential biomarker.⁹ MicroRNAs (miRNAs/miRs) are a class of small RNAs important for the control of cellular process and activity via down-regulating gene expressions. Circulating miR-25 is recognized as a candidate biomarker for PaC.¹⁰ MiR-200 and miR-196 are significantly increased in PaC patients compared with control; however, their expressions are not different between diabetic PaC and non-diabetic PaC patients.¹¹ These collectively suggest that alteration of miRNAs might not tie up with diabetic PaC, or there might be other complicated regulatory mechanisms. Long non-coding RNAs (lncRNAs) belong to RNA species larger than 200 nt length and are defined as non-protein-coding RNA transcripts. They are novel regulators for gene expression, protein function and chromatin modification.^12,13 Reportedly, the lncRNA might serve as a competing endogenous RNA (ceRNA) in the control of miRNA level and function, suggesting it has a negative correlation with miRNA’s expression.¹⁴ Overexpressed lncRNA MALAT1 is implicated with poor prognosis of PaC,¹⁵ and increased HOTTIP is found to promote PaC progression via enhancing cell proliferation and survival.¹⁶ However, rare studies have been reported regarding to the lncRNA alterations between diabetic PaC and non-diabetic PaC patients, nor miRNA changes modulated by lncRNA. Thus, there is a need to reveal underlying mechanisms on molecular process in diabetic PaC patients.

Therefore, we utilized two RNA-sequencing (RNA-seq) datasets deposited in Gene Expression Omnibus (GEO) database and the relevant data in The Cancer Genome Atlas (TCGA) database, to select relevant gene, miRNA and lncRNA markers for diabetic PaC detection and try to explore their regulatory relationships. Via these efforts, we expected to lay a foundation of regulatory mechanism on diabetic PaC and find novel biomarkers for the prognosis.

Data and methods

Data resources and pretreatment

The relevant data about PaC were downloaded from TCGA (https://gdc-portal.nci.nih.gov/) database. Selection criteria of the samples were: (1) they were PaC tissues; (2) the diabetes information of the patients was provided; (3) samples with paired mRNA and miRNA interactions were obtained. In total, 135 samples were acquired, including 99 diabetic PaC patients and 36 non-diabetic PaC patients.

The microarray dataset GSE74629 was downloaded from the GEO (http://www.ncbi.nlm.nih.gov/geo) database on the Illumina platform (Illumina, San Diego, California, USA), which contained 36 PaC patients: 14 were complicated with diabetes and 22 were not.¹⁷ The annotation files were also downloaded. Probe values were converted into gene expressions. If multiple probes were corresponded to the same gene symbol, their probe values were averaged to calculate the gene expression. Gene expressions were subjected to logarithmic normalization to reach an approximately normal distribution, and then were further normalized using the Linear Models for Microarray Analysis (http://www.bioconductor.org/packages/release/bioc/html/limma.html) limma package.¹⁸

Another dataset GSE15932 was also downloaded from the GEO database, based on the Affy platform (Affymetrix Inc., Santa Clara, California, USA). There were16 PaC patients: 8 were diabetic and 8 were non-diabetic.² After raw data in CEL format were downloaded, they were underwent background correction and normalization using Microarray Suite (MAS) and quantiles methods implemented in the oligo (http://www.bioconductor.org/packages/release/bioc/html/oligo.html) package.

RNA-seq data and differential analysis

Based on the HUGO Gene Nomenclature Committee (HGNC, http://www.genenames.org/) database, which includes 2,775 lncRNAs and 19,004 protein coding genes annotation information,¹⁹ relevant lncRNAs and mRNAs seq data in the above TCGA database were identified. Differential analyses of the mRNA and miRNAs among these RNA-seq data were performed using edgeR package (version 3.0.1).²⁰ Two criteria, false discovery rate (FDR) < 0.05 and |fold change (FC)| > 1.5, were set as thresholds for the selection of differentially expressed genes (DEGs) or miRNAs (DEMs) between diabetic PaC patients and non-diabetic PaC patients.

Relationship between DEGs, DEMs or DELs and clinical features

Clinical information of all the samples was extracted from above databases or datasets, and the samples were classified into different groups based on following dichotomous variables: age (≥ 60 vs < 60), gender (female vs male), alcohol history (Yes vs No), clinical M (M0 vs MX), N (N3+N2 vs N0+N1), T (T3 + T4 vs T1 + T2), clinical stage (III+IV vs I+II), neoplasm histologic grade (G1+G2 vs G3+G4) and new tumor (Yes vs No). RNA (mRNA, miRNA and lncRNA) expressions that associated with the above clinical information were selected using edgeR package. Likewise, the cut-off values were FDR < 0.05 and |FC| > 1.5.

Selection of prognostic mRNAs, miRNAs and lncRNAs

Following selection of DEGs, DEMs and differentially expressed lncRNAs (DELs) between diabetic PaC patients and non-diabetic PaC patients, the survival information in each sample was extracted to perform cox analysis by the survival package.²¹ Kaplan-Meier (KM) curve was used to visually display the survival result.

Consistency analysis of the DEGs

Protein-protein interaction network of the DEGs

By integrating protein information in three databases, BioGRID (http://thebiogrid.org/), HPRD (http://www.hprd.org/) and DIP (http://dip.doe-mbi.ucla.edu/), potential interactions of these DEGs at protein level were selected to build a protein-protein interaction (PPI) network, which was visualized by the Cytoscape (http://cytoscape.org/) software.

Optimizing feature genes based on network betweenness centrality

After PPI network construction of the DEGs, topological structure of the network was analyzed based on the node’s degree and betweenness centrality (BC) algorithm, using the following formula:

C_{B} (v) = \sum_{t \neq v \neq u \in V} \frac{σ_{s t} (v)}{σ_{s t}},

where $σ_{st}$ σst stands for the shortest path from s to t, $σ_{st} (v)$ σst (v) represents the node counts (v) from s to t. BC values are varied from 0 to 1, and the more close to 1 of the value, the more important of the node is. Based on this definition, the nodes having their BC value ranked in the top 100 were taken as candidate feature genes.

Selection of the optimal feature gene sets

In order to optimize and identify the representative and remarkable genes that could be used for clinical prognosis and model construction, the top 100 genes were further performed with the optimal combination selection, using recursive feature elimination (RFE) algorithm.²² Efficacy of the classification for different samples was evaluated to obtain the optimal feature gene sets, by iterative and random combination of the feature genes.

Construction of support vector machine classifier utilizing the feature gene sets

Once the optimal feature gene sets were identified, was the support vector machine (SVM) classifier model built based on these gene expressions in each sample.²³ Probability of each sample belonging to a certain classification was determined by expression of these feature genes in each sample, to predict and distinguish different kinds of PaC patients.

Independent validation and assessment of SVM classifier performance

To verify the stability and reproducibility of the SVM classifier, two datasets of GSE15932 and GSE74629 were used as the validation sets. Five indicators: sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and area under the curve (AUC) of the receiver operating characteristic (ROC) curve were used to assess the classifier performance.

Prediction of lncRNAs targeted by miRNAs

Relevant miRNAs and lncRNAs information was integrated according to miRcode (http://www.mircode.org/) and starBase databases,^24,25 to predict the interplayed lncRNA-miRNA interactions, combining with the selected DEMs information.

Prediction of targets genes of miRNAs

The miRTarBase (http://mirtarbase.mbc.nctu.edu.tw) is a database that provides the latest and extensive validated miRNA-target interaction information.²⁶ This database (release 6.0) was used to search potential target genes of the identified miRNAs in the present study. Then a miRNA-target gene network was established by integrating DEGs in the PPI network. The Cytoscape software was utilized to draw this network.

Construction of ceRNA regulatory network and function and pathway enrichment analysis

Combining lncRNA-miRNA interactions with miRNA-target gene interactions, a ceRNA network, lncRNA-miRNA-mRNA regulatory network was established.

Enrichment analysis of the candidate feature genes in this network was performed by the fisher’s extract test, based on Gene Ontology (GO, http://www.geneontology.org/) and Kyoto Encyclopedia of Genes and Genomes (KEGG, http://www.genome.jp/kegg/pathway.html) databases, to identify these genes’ functions and pathways relating to diabetic PaC. Detailed fisher’s extract test algorithm was as the following formula:

p = 1 - \sum_{i = 0}^{x - 1} \frac{(\begin{matrix} M \\ i \end{matrix}) (\begin{matrix} N - M \\ K - i \end{matrix})}{(\begin{matrix} N \\ K \end{matrix})},

where N represents total gene counts in the whole genome, M stands for gene counts in the pathway, K denotes counts of the differentially expressed genes, and p represents the possibility that at least “x” of “k” genes enriched in a specific function or pathway term.

Results

Selection of disease-specific lncRNAs, miRNAs and mRNAs

Based on aforementioned identification of different RNA types, we identified a total of 841 lncRNAs, 18,713 protein coding mRNAs and 1,046 human miRNAs among the RNA-seq data. Firstly, low abundant mRNAs, lncRNAs and miRNAs were filtered out. The thresholds for low abundance of the three kinds of RNAs were 1, 5 and 5, respectively. Under the selection criteria, 232 lncRNAs, 186 miRNAs and 12,633 mRNAs were remained. Expression density distribution was presented in Figure 1. From this figure, it is indicated that after the filtration of low expressed genes, the peaks of expression density of the three kinds of RNAs were apparently increased. Notably, lncRNA expressions were obviously lower than mRNA and miRNA.

Figure 1.

Expression density distribution of mRNAs, miRNAs and lncRNAs.

Via differential analysis, there identified a total of 600 significant DEGs and 33 significant DELs among the mRNA-seq data; and 28 DEMs among the miRNA-seq data. Heat map of clustering analysis of these RNAs are presented in Figures 2 –4, which show us the three kinds of RNAs could well distinguish the diabetic PaC patients from the non-diabetes PaC.

Figure 2.

Expressions of differentially expressed lncRNAs in different samples. X-axis represents samples, blue denotes non-diabetic pancreatic cancer patients, and yellow denotes diabetic pancreatic cancer patients. Y-axis represents lncRNAs, red stands for up-regulations and green stands for down-regulations.

Figure 3.

Expressions of differentially expressed miRNAs in different samples. X-axis represents samples, blue denotes non-diabetic pancreatic cancer patients, and yellow denotes diabetic pancreatic cancer patients. Y-axis represents miRNAs, red stands for up-regulations and green stands for down-regulations.

Figure 4.

Expressions of differentially expressed mRNAs in different samples. X-axis represents samples, blue denotes non-diabetic pancreatic cancer patients, and yellow denotes diabetic pancreatic cancer patients. Y-axis represents mRNAs, red stands for up-regulations and green stands for down-regulations.

Association between DEGs and clinical features

Based on aforementioned method, the samples were divided into different groups by nine dichotomous variables that indicated clinical features. Then, dysregulated miRNAs, mRNAs or lncRNAs in each comparison were identified (Supplementary Table 1-3).

Important RNAs relevant to prognosis

By calculating the three kinds of RNAs’ expression value and performing prognosis analyses (e.g. overall survival time) based on cox analysis, mRNAs, miRNAs and lncRNAs correlating with prognosis were identified (Table 1). Notably, seven lncRNAs were identified tied up with prognosis, such as HOTAIR, DGCR9, FLJ33360, SCARNA2, KIAA0125, H19 and UCA1. Medium expression value of these lncRNAs was used as the boundary for samples with up-regulated genes and those with down-regulated genes, and KM survival analysis was performed. As shown in Figure 5, we found most of the lncRNAs, especially UCA1, could well distinguish two kinds of the samples that relating to high or low survival ratio. This suggested the seven lncRNAs could be used as prognostic predictors in diabetic PaC patients.

Table 1.

Three kinds of RNAs related with overall survival time.

RNA	Up-regulated	Down-regulated
lncRNA	HOTAIR, DGCR9, FLJ33360	SCARNA2, KIAA0125, H19, UCA1
miRNA	hsa-mir-194, hsa-mir-215, hsa-mir-135b, hsa-mir-196b, hsa-mir-3065	hsa-let-7i, hsa-mir-150, hsa-mir-486
mRNA	NDC80, ZNF536, STMN4, MANEAL, SLC22A14, CENPK, PRB1, ASB4, KIF14, ANLN	TCL1A, FCRL1, CD19, GZMM, NCR3, TRAT1, MATK, KLRK1, KCNA3, ZNF727, TSPAN32, MAP4K1, TNFRSF13B, FAM49A, CHRNE, MYO16, ZC3H12D, PGM5, CD226, PDE1C, AOAH, SLC25A21, FUT7, ITM2A

Figure 5.

Kaplan-Meier survival curve of differentially expressed lncRNAs in diabetic pancreatic cancer patients.

Consistency analysis of the DEGs

PPI network of the DEGs

According to aforementioned selection criteria and information in three protein databases, a PPI network was constructed. In this network, in addition to the DEGs, other expanded genes were also contained, which were defined as genes having more than ten nodes interacted with the DEGs. In summary, the PPI network consisted of 215 nodes (including 201 DEGs and 14 expanded genes) and 306 edges, namely the interaction numbers among these genes (Figure 6). Five predominant nodes in the PPI network were APP (degree = 37), ELAVL1 (degree = 23), HSP90AA1 (degree = 22), FYN (degree = 21) and EGFR (degree = 17).

Figure 6.

Protein-protein interaction network of the differentially genes. Red stands for up-regulated genes, green represents down-regulated genes and no color represents expanded genes.

SVM classifiers and the verification

As mentioned in the method section, we selected the top 100 gene nodes ranked by their BC values, including 86 DEGs and 14 expanded genes. Then, combining with the RFE algorithm, 32 feature genes that could classify the samples were identified, such as CCDC33, CTLA4, GFAP, MAP4K1 and P2RY8 (Table 2); and this gene set had the most accurate prediction, 94.1%, compared with others (Figure 7).

Table 2.

Gene list of 32 genes in the optimal gene set that was identified by support vector machine classifier.

Gene	BC	Degree	logFC	P value	FDR	Gene	BC	Degree	logFC	P value	FDR
AICDA	0.47403	2	−1.12307	1.16E-03	0.023212	CTLA4	0.308448	3	−1.01247	1.35E-03	0.027063
BLK	0.634417	8	−1.66082	2.92E-04	0.005832	GFAP	0.572317	4	1.44425	7.30E-04	0.014596
BTK	0.618975	8	−0.9845	1.40E-03	0.027987	GP6	0.205128	2	0.913638	1.73E-03	0.034623
CCDC33	0.642254	5	1.446459	6.53E-04	0.013058	IKZF1	0.047175	3	−1.21823	9.99E-04	0.019984
CCL5	0.409199	3	−1.16511	1.08E-03	0.02169	IKZF3	0.403687	4	−1.19519	1.05E-03	0.021031
CCR4	0.810997	3	−1.17537	1.01E-03	0.020114	IQGAP3	0.13408	2	1.308788	9.57E-04	0.01913
CD19	0.625091	4	−1.78311	2.16E-04	0.004317	MAP4K1	0.227287	4	−1.10036	1.40E-03	0.027994
CD2	0.898448	7	−0.99376	1.32E-03	0.026381	NEK2	0.026022	5	1.434341	9.17E-04	0.018335
CD22	0.303768	3	−1.47368	5.80E-04	0.011607	P2RY8	0.023534	2	−0.84492	1.90E-03	0.03807
CD247	0.344443	7	−1.15124	1.06E-03	0.021257	PLCG2	0.282431	5	−0.8706	1.81E-03	0.036199
CD36	0.617583	3	−1.53386	4.62E-04	0.00923	PTGDS	0.02181	2	−1.15813	8.39E-04	0.016784
CD5	0.498179	6	−1.48388	4.92E-04	0.009845	SERTAD4	0.205128	2	1.058828	1.59E-03	0.031834
CD79B	0.946875	5	−1.47985	5.03E-04	0.010066	SLAMF1	0.293765	4	−1.23396	8.91E-04	0.017819
CD8A	0.424203	3	−1.08692	1.26E-03	0.025163	UBASH3A	0.00634	2	−1.27884	7.57E-04	0.015145
CDC20	0.657925	9	0.869362	2.13E-03	0.042634	VSIG8	0.205128	2	0.971431	1.72E-03	0.034479
CR2	0.409199	3	−1.62748	2.45E-04	0.0049	ZAP70	0.721343	12	−1.39105	6.58E-04	0.013168

BC: betweenness centrality; FC: fold change; FDR: false discovery rate.

Figure 7.

Identification of the optimal gene set using recursive feature elimination algorithm. a: Node degree distribution in the protein-protein interaction network; b: Accuracy for sample classification by different feature gene combinations.

GSE15932 and GSE74629 were used as two validation datasets, and their normalized data were utilized to test the accuracy of the SVM classifier with 32 feature genes. As a result, this SVM classifier could exactly classify 14 samples of GSE15932 into two groups (7 diabetic and 7 non-diabetic PaC samples), and the accuracy was as high as 87.5%; for GSE74629 dataset, the SVM classifier could distinguish 36 samples (14 diabetic and 22 non-diabetic PaC samples) with an accuracy of 86.1%. Scatter plot of samples in the two datasets based on this SVM classifier was presented in Figure 8(b) and (c)). Evaluation of performance of this SVM classifier indicated it had a high accuracy with high sensitivity and specificity (0.75-0.9, Table 3). Notably, this SVM classifier had a high AUC value in both of the training dataset and the validation dataset (> 0.9, Table 3, Figure 9).

Figure 8.

Scatter plot of samples in validation datasets using the identified support vector machine classifier. a: TCGA training dataset; b: GSE15932 validation dataset; c: GSE74629 validation dataset.

Table 3.

Performance evaluation of the SVM classifier in training dataset and validation dataset.

Datasets	Num.Samples	Correct Rate	Sensitivity	Specificity	PPV	NPV	AUC
TCGA	135	0.941	0.925	1	1	0.829	0.993
GSE15932	16	0.875	0.875	0.875	0.875	0.875	0.984
GSE74629	36	0.861	0.785	0.909	0.846	0.87	0.961

TCGA: The Cancer Genome Atlas; PPV: positive predictive value; NPV: negative predictive value; AUC: area under the curve of the receiver operating characteristic curve.

Figure 9.

AUC of the receiver operating characteristic curve in different datasets. a: TCGA training dataset; b: GSE15932 validation dataset; c: GSE74629 validation dataset. AUC: area under the curve; TCGA: The Cancer Genome Atlas.

Prediction of interplayed lncRNA-miRNA regulations

By integrating information in miRcode and starBase databases, a total of 492 lncRNA-miRNA interactions were selected. Then combined with DEMs and DELs, 53 lncRNA-miRNA interactions were further extracted, including 17 DELs and 13 DEMs, such as HOTAIR (interplayed miRNAs: miR-214, miR-150, miR-146a and miR-194, Figure 10) and UCA1 (interplayed miRNAs: miR-135b and miR-214, Figure 10).

Figure 10.

DEL-DEM regulatory network. Diamond denotes miRNA and square stands for lncRNA. Red represents up-regulation and green represents down-regulation. DEL: Differentially expressed lncRNA; DEM: Differentially expressed miRNA.

Prediction of miRNA-regulated targets

The 13 DEMs identified in the above step were mapped into the miRTarBase database to explore their targets. Then, these potential target genes were compared with the DEGs. Finally, targets of 10 DEMs were identified, such as hsa-miR-135b, hsa-miR-146a, hsa-miR-150, hsa-miR-155, hsa-miR-15b, hsa-miR-194, hsa-miR-196b, hsa-miR-210, hsa-miR-214 (targets: MAP4K1, P2RY8 and CCDC33) and hsa-miR-429 (predicted targets: GFAP, CTLA4). The DEM-DEG regulatory network was exhibited in Figure 11.

Figure 11.

DEM-DEG regulatory network. Diamond denotes miRNAs, round represents target genes and larger round shape represents the gene belongs to the optimal gene set. Red stands for up-regulation and green stands for down-regulation.

Construction of ceRNA network

Integrated ceRNA network and enrichment analysis of the target genes

Combining the lncRNA-miRNA interactions with the miRNA-mRNA interactions, an integrated lncRNA-miRNA-mRNA network was established, consisting of 394 molecules and 824 interactions. Among them, these interactions were highlighted such as ‘HOTAIR-hsa-miR-214-CCDC33’ and ‘CECR7-hsa-miR-429-CTLA4’ (Figure 12).

Figure 12.

CeRNA regulatory network. Diamond denotes miRNAs, square represents lncRNAs, round represents target genes and larger round shape represents the gene belongs to the optimal gene set. All shape in red stands for up-regulation and in green stands for down-regulation. Red lines denote lncRNA-miRNA interactions, and blue lines denote miRNA-target gene interactions.

Based on enrichment analysis for the network, we found the target genes were highly associated with 11 KEGG pathways, like ‘cell adhesion molecules (CAMs’), ‘cytokine-cytokine receptor interaction’ and ‘hematopoietic cell lineage’; and 26 function categories, such as ‘positive regulation of immune system process’, ‘cellular homeostasis’, ‘leukocyte activation’, ‘defense response’ and ‘immune response’ (Figure 13).

Figure 13.

Enriched pathway and functions of target genes in the ceRNA network. a: Enriched functions of the target genes; b: Enriched pathways of the target genes.

ceRNA network of transcription factors

Relevant human transcription factors (TFs) were extracted from two databases: Transcription Regulatory Regions Database (TRRD, http://wwwmgs.bionet.nsc.ru/mgs/gnw/trrd/) and JASPAR (http://jaspar.genereg.net/). Then, combining them with the established ceRNA network, the sub ceRNA network specific to TFs was extracted, in which two TFs, RUNX3 and NFE2 were highlighted. Notably, they also belonged to the optimal feature gene set that could well classify different samples. In addition, RUNX3 was regulated by hsa-miR-214 and hsa-miR-194; while NFE2 was targeted by hsa-miR-146a (Figure 14).

Figure 14.

CeRNA network of transcription factors (TFs). Diamond denotes miRNAs, square represents lncRNAs and triangle represents TFs. All shape in red stands for up-regulation and in green stands for down-regulation. Red lines denote lncRNA-miRNA interactions, and blue lines denote miRNA-TF interactions.

Discussion

In the present study, based on RNA-seq data in two datasets and the TCGA database, we identified 32 feature genes that could classify diabetic PaC patients from non-diabetic PaC, such as CCDC33, CTLA4 and MAP4K1. Importantly, this SVM classifier had a quite high accuracy on the prediction. Seven lncRNAs relating to the prognosis of diabetic PaC were also identified, especially UCA1. In addition, crucial DEMs were selected, such as hsa-miR-214 (predicted targets: MAP4K1 and CCDC33) and hsa-miR-429 (predicted targets: CTLA4). Notably, interactions of ‘HOTAIR-hsa-miR-214-CCDC33’ and ‘CECR7-hsa-miR-429-CTLA4’ were highlighted in the ceRNA network.

HOX Transcript Antisense RNA (HOTAIR) is a 2158 bp lncRNA co-expressed with HOXC genes. In PaC, HOTAIR is reported to uniquely repress interferon- or tumor cell cycle-related genes, thus overexpression of this lncRNA promotes PaC progression and is considered as a negative prognostic factor.²⁷ Aberrant miR-214 is discovered in PaC, and up-regulation of miR-214 reduces the BxCP-3cell sensitive to gemcitabine, a drug which has a pro-apoptotic effect,²⁸ suggesting overexpressed miR-214 might protect against apoptosis. In addition, miR-214 inhibits cell growth and migration in other cancers by targeting oncogenic genes like CD44 and CDK1.²⁹ Unfortunately, there is rare information about the relationship between miR-214 and diabetes or diabetic PaC. In the present study, however, miR-214 was downloaded and targeted by the lncRNA HOTAIR, on this basis, it could be speculated that HOTAIR might have a pro-oncologic effect via suppressing miR-214’s expression in diabetic PaC patients. The protein encoded by coiled-coil domain containing 33 (CCDC33) is a cancer/testis antigen.³⁰ Dysregulation of this gene is detected in prostate cancer compared with benign prostatic hyperplasia.³¹ However, no correlation between this gene and PaC or diabetes have been reported yet, nor the regulation interactions with miRNAs. Based on our study, CCDC33 was predicted as a target of miR-214 and up-regulated in diabetic PaC patients, suggesting this gene might serve as an oncogene during the disease progression. Combining the lncRNA targeting information, we suppose that overexpressed HOTAIR represses miR-214 expression, which results in up-regulation of CCDC33. This might be one of the mechanisms accounting for diabetic PaC progression.

Mitogen-Activated Protein Kinase Kinase Kinase Kinase 1 (MAP4K1), also known as HPK1, regulates proliferation in hematopoietic cells. In pancreatic ductal adenocarcinomas (PDA), deficiency of HPK1 protein is significantly associated with PDA invasion.³² In our study, MAP4K1 was predicted as another target of miR-214, implying miR-214-targeted MAP4K1 expression has an important role in diabetic PaC development.

Cat Eye Syndrome Chromosome Region, Candidate 7 (CECR7) is a potential lncRNA correlating with Cat Eye Syndrome. Based on limited literature information, increased CECR7 is significantly associated with low overall survival in hepatocellular carcinoma.³³ However, in our study, down-regulation of CECR7 was detected in diabetic PaC, suggesting this lncRNA might have different expression patterns in different cancer types, and its expression should be validated via substantial experiments. miR-429 is differentially expressed in PaC, compared with control.³⁴ Cytotoxic T-Lymphocyte Associated Protein 4 (CTLA4) is a gene belonging to immunoglobulin superfamily. Its encoded protein is responsible for transporting an inhibitory signal to T cells. CTLA4 is supposed to participate in PaC-induced immune suppression,^35,36 implying this gene is tied up with immune response stimulated by PaC. Nonetheless, none of the above molecules is mentioned relating with diabetic PaC. In our study, miR-429 was targeted by lncRNA CECR7, while CTLA4 was targeted by miR-429 in the ceRNA network. These predictions suggested this ‘CECR7-miR-429-CTLA4’ integrated regulation might be an important mechanism for development of diabetic PaC. In addition, a previous bioinformatics analysis predicted miR-429 was a central molecule in both of the TF-miRNA-mRNA and lncRNA-miRNA-mRNA ceRNA networks in PaC.³⁷ Our findings indicated that runt related transcription factor 3 (RUNX3) was regulated by hsa-miR-214 in the TF-miRNA-mRNA network. RUNX3 is a TF containing runt domain. Abnormal expression of RUNX3 is detected in metastatic PDA, and this TF has been linked to pathogenesis of metastatic PDA.³⁸ Based on our findings, hsa-miR-214 targeted RUNX3 might be another crucial mechanism for diabetic PaC etiology.

UCA1 is a lncRNA overexpressed in many cancer types, such as colorectal cancer, gastric cancer and bladder cancer, via influencing cell proliferation, apoptosis and cell cycle distribution; and regulating Wnt signaling pathway.^39,40 In this study, UCA1 was identified as the most predominant lncRNA correlating with prognosis of diabetic PaC, suggesting in addition to other cancers, UCA1 might also be an important predictor for diabetic PaC.

The above molecules and their predicted interactions might unveil the mechanisms of diabetic PaC and provide novel biomarkers for identification and prognosis of diabetic PaC. More importantly, the SVM classifier using these feature genes was highly accurate. Despite these obvious advantages, there is a limitation that all these genes and RNAs’ expressions as well as their predicted interactions are needed to be validated by substantial experiments. In the follow-up studies, we will conduct substantial experiments in vitro and in vivo to confirm the above predictions.

In conclusion, ‘HOTAIR-hsa-miR-214-CCDC33’ and ‘CECR7-hsa-miR-429-CTLA4’ regulations might be two critical mechanisms for the development of diabetic PaC. In addition to these molecules, the lncRNA UCA1 might be a novel biomarker for the prognosis of diabetic PaC.

Footnotes

Acknowledgements

KYY and QW participated in the design of this study, and they both performed the statistical analysis. JHJ and HPZ carried out the study and collected important background information. HPZ drafted the manuscript. All authors read and approved the final manuscript. K.Y. and Q.W. contributed equally to this work.

Availability of data and material

The raw data were collected and analyzed by the Authors, and are not ready to share their data because the data have not been published.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

References

Yadav

Lowenfels

AB.

The epidemiology of pancreatitis and pancreatic cancer. Gastroenterology 2013; 144: 1252–1261.

Huang

Dong

Kang

. Novel blood biomarkers of pancreatic cancer-associated diabetes mellitus identified by peripheral blood-based gene expression profiles. Am J Gastroenterol 2010; 105: 1661–1669.

Zimmet

Alberti

Magliano

. Diabetes mellitus statistics on prevalence and mortality: facts and fallacies. Nat Rev Endocrinol 2016; 12: 616–622.

Aggarwal

Kamada

Chari

ST.

Prevalence of diabetes mellitus in pancreatic cancer compared to common cancers. Pancreas 2013; 42: 198–201.

Chari

Leibson

Rabe

. Pancreatic cancer-associated diabetes mellitus: prevalence and temporal association with diagnosis of cancer. Gastroenterology 2008; 134: 95–101.

Ben

Ning

. Diabetes mellitus and risk of pancreatic cancer: a meta-analysis of cohort studies. Euro J Cancer 2011; 47: 1928–1937.

Kolb

Rieder

Born

. Glucagon/insulin ratio as a potential biomarker for pancreatic cancer in patients with new-onset diabetes mellitus. Cancer Biol Ther 2009; 8: 1527–1533.

Jenkinson

Elliott

Evans

. Decreased serum thrombospondin-1 levels in pancreatic cancer patients up to 24 months prior to clinical diagnosis: association with diabetes mellitus. Clin Cancer Res 2015; 22: 1734–1743.

Kang

Qin

Buya

. VNN1, a potential biomarker for pancreatic cancer-associated new-onset diabetes, aggravates paraneoplastic islet dysfunction by increasing oxidative stress. Cancer Lett 2016; 373: 241–250.

10.

Deng

Yuan

Zhang

. Identification of circulating miR-25 as a potential biomarker for pancreatic cancer diagnosis. Cell Physiol Biochem 2016; 39: 1716–1722.

11.

Škrha

Hořínek

Pazourková

. Serum microRNA-196 and microRNA-200 in pancreatic ductal adenocarcinoma of patients with diabetes mellitus. Pancreatology 2016; 16: 839–843.

12.

Ulitsky

Evolution to the rescue: using comparative genomics to understand long non-coding rnas. Nat Rev Genet 2016; 17: 601–614.

13.

Gezer

Özgür

Cetinkaya

. Long non-coding RNAs with low expression levels in cells are enriched in secreted exosomes. Cell Biol Int 2014; 38: 1076–1079.

14.

Wang

Liu

Yao

. Long non-coding RNA CASC2 suppresses malignancy in human gliomas by miR-21. Cell Signal 2015; 27: 275–282.

15.

Pang

Yang

. Overexpression of long non-coding RNA MALAT1 is correlated with clinical progression and unfavorable prognosis in pancreatic cancer. Tumor Biol 2015; 36: 2403–2407.

16.

Cheng

Jutooru

Chadalapaka

. The long non-coding RNA HOTTIP enhances pancreatic cancer cell proliferation, survival and migration. Oncotarget 2015; 6: 10840–10852.

17.

Caba

Irigoyen

Jimenez-Luna

. Identification of gene expression profiling associated with erlotinib-related skin toxicity in pancreatic adenocarcinoma patients. Toxicol Appl Pharmacol 2016; 311: 113–116.

18.

Wettenhall

Smyth

GK.

LimmaGUI: a graphical user interface for linear modeling of microarray data. Bioinformatics 2004; 20: 3705–3706.

19.

Eyre

Ducluzeau

Sneddon

. The HUGO gene nomenclature database, 2006 updates. Nucleic Acids Res 2006; 34: D319–D321.

20.

Robinson

Mccarthy

Smyth

GK.

EdgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 2010; 26: 139–140.

21.

Chen

Peace

KE.

Clinical trial data analysis using R. Boca Raton, FL: CRC Press, 2011.

22.

Guyon

Weston

Barnhill

. Gene selection for cancer classification using support vector machines. Mach Learn 2002; 46: 389–422.

23.

Mavroforakis

Theodoridis

A geometric approach to support vector machine (SVM) classification. IEEE Trans Neural Netw 2006; 17: 671–682.

24.

Das

Ghosal

Sen

. LnCeDB: database of human long noncoding RNA acting as competing endogenous RNA. PLoS ONE 2014; 9: e98965.

25.

Jeggari

Marks

Larsson

MiRcode: a map of putative microRNA target sites in the long non-coding transcriptome. Bioinformatics 2012; 28: 2062–2063.

26.

Hsu

Lin

. MiRTarBase: a database curates experimentally validated microRNA–target interactions. Nucleic Acids Res 2011; 39: 163–169.

27.

Kim

Jutooru

Chadalapaka

. HOTAIR is a negative prognostic factor and exhibits pro-oncogenic activity in pancreatic cancer. Oncogene 2013; 32: 1616–1625.

28.

Zhang

Zeng

. Dysregulation of miR-15a and miR-214 in human pancreatic cancer. J Hematol Oncol 2010; 3: 46.

29.

Sharma

Hamilton

Mandal

CC.

MiR-214: a potential biomarker and therapeutic for different cancers. Future Oncol 2015; 11: 349–363.

30.

Kaczmarek

Niedzialkowska

Studencka

. Ccdc33, a predominantly testis-expressed gene, encodes a putative peroxisomal protein. Cytogenet Genome Res 2009; 126: 243–252.

31.

Adeola

Smith

Kaestner

. Novel potential serological prostate cancer biomarkers using CT100+ cancer antigen microarray platform in a multi-cultural South African cohort. Oncotarget 2016; 7: 13945–13964.

32.

Wang

Song

Logsdon

. Proteasome-mediated degradation and functions of hematopoietic progenitor kinase 1 in pancreatic cancer. Cancer Res 2009; 69: 1063–1070.

33.

Zhang

Fan

Jian

. Cancer specific long noncoding RNAs show differential expression patterns and competing endogenous RNA potential in hepatocellular carcinoma. PLoS ONE 2015; 10: e0141042.

34.

Zhu

Wang

. MicroRNA and gene networks in human pancreatic cancer. Oncology Lett 2013; 6: 1133–1139.

35.

Gnatta

Fogar

Aita

. Pancreatic cancer (PC)-derived soluble mediators induce dendritic cells (DCs) to acquire an immunesuppressive phenotype by downregulating CTLA4. Pancreatology 2013; 13: S29.

36.

Basso

Fogar

Plebani

The S100A8/A9 complex reduces CTLA4 expression by immature myeloid cells: implications for pancreatic cancer-driven immunosuppression. Oncoimmunology 2013; 2: e24441.

37.

Yang

Zhao

. Bioinformatics method to predict two regulation mechanism: TF-miRNA-mRNA and lncRNA-miRNA-mRNA in pancreatic cancer. Cell Biochem Biophys 2014; 70: 1849–1858.

38.

Kleeff

Guweidhi

. RUNX3 expression in primary and metastatic pancreatic cancer. J Clin Pathol 2004; 57: 294–299.

39.

Fan

Shen

Tan

. Long non-coding RNA UCA1 increases chemoresistance of bladder cancer cells by regulating Wnt signaling. Febs J 2014; 281: 1750–1758.

40.

Zheng

Dai

. Aberrant expression of UCA1 in gastric cancer and its clinical significance. Clin Transl Oncol 2015; 17: 640–646.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.02 MB