Abstract
OBJECTIVE:
To identify the mRNAs associated with bladder cancer (BC) recurrence.
METHODS:
The transcription profile of GSE31684 including 39 recurrent BC tumor samples and 54 non-recurrent BC tumor samples as well as transcription profile of GSE13507 including 36 recurrent BC tumor samples and 67 non-recurrent BC tumor samples were downlaoded from the Gene Expression Omnibus. Then, the differentially expressed genes (DEGs) were identified using linear models for microarray data (limma) and the intersections of DEGs from the two datasets were further screened. The weighed gene co-expression network analysis (WGCNA) was used to screen the modules related to BC recurrence. Protein-protein interaction (PPI) network analysis was used to analyze the genes interaction. Their functions were predicted by Gene Ontology and KEGG pathway enrichment. Moreover, The Comparative Toxicogenomics Database 2017 update (CTD) was used to search the BC related pathway. The univariate cox regression analysis was used to identify DEGs associated to the recurrence. Kaplan-Meier plots were used to illustrate recurrence free survival time (RFS).
RESULTS:
A total of 692 intersections DEGs were screened. WGCNA showed that 7 modules (2279 genes) were stable in both the datasets. A total of 169 intersection DEGs were mapped to the 7 modules. There existed 149 interaction relationships among 81 proteins (18 down-regulated and 63 up-regulated DEGs) in the PPI network. Two KEGG pathways including Focal adhesion and ECM-receptor interaction were enriched which were also found in the CTD. The univariate cox regression analysis showed that 3 DEGs (COL4A1, COL1A2 and COL5A1) were significant related to the prognosis. Multivariate cox regression analysis revealed that pathologic_N (N0-N1 vs N2-N3,
CONCLUSION:
COL4A1, COL1A2 and COL5A1 could be associated with BC recurrence.
Introduction
Bladder cancer (BC) is the most common malignancy of the urinary system in China and the ninth most freguent cancer worldwide [1]. Transurethral resection (TUR) is generally used to manage the BC. However, a large number of patients have a 50–70% risk of recurrence after first TUR [2, 3]. Moreover, a total of 15–25% of the patients are also at risk of progression to invasive cancer [4]. No good biomarkers could be used for the prediction of BC recurrence.
With the development of molecular biology, molecular biomarkers have been proven to be a promising clinical tool for the BC prognosis. For example, by using the bioinformatics analysis of weighted gene coexpression network analysis (WGCNA), Yan et al. found four genes could be used to predict the prognosis of BC [5]; The increased expression of matriptase, which is a member of the Type II transmembrane serine protease family, could present poor prognosis of BC [6]; However, the mRNA biomarker identified for the recurrence of BC is little. Nakahara et al. found that the expression of protease activating receptor-2 (PAR-2) is positively correlated with the recurrence of non-muscle invasive bladder cancer. Therefore, it is of great need to identify the biomarkers of BC recurrence.
In this study, using the same microarray data by Riester et al. [7] and Kim et al. [8], we aimed to screen the DEGs between recurrence and non-recurrence using the linear models for microarray data (limma) [9] with the threshold of false discovery rate (FDR)-value
Materials and methods
Microarray data
The transcription profile of GSE31684 [7] (GP570, Affymetrix Human Genome U133 Plus 2.0 Array) including 39 recurrent BC tumor samples and 54 non-recurrent BC tumor samples from the Gene Expression Omnibus (GEO;
The dataset preprocessing and DEGs analysis
The R3.4.1 oligo (version: 3.6,
WGCNA
WGCNA (version 1.61,
The top ten up-regulated and the down-regulated DEGs between recurrence and non-recurrence bladder cancer in the two datasets
The top ten up-regulated and the down-regulated DEGs between recurrence and non-recurrence bladder cancer in the two datasets
DEGs, differentially expressed genes; FC, fold change.
The STRING (Search Tool for the Retrieval of Interacting Genes/Proteins, version 10.0,
BC related DEGs screening
The CTD (
Recurrence related DEGs screening
The univariate cox regression analysis of survival package (Version 2.41-1,
We also did univariate and multivariate cox regression analysis using backward selection to test the independent signifcance of different clinical factors; the
The genes expression profile (Illumina HiSeq 2000 RNA Sequencing) of 336 BC tumor samples with the complete clinical information (recurrence or not) were downloaded from the the cancer genome atlas (TCGA,
A, the volcano plot of DEGs between recurrence and non-recurrence; left for GSE31684; right for GSE13507; The two red vertical dotted lines represent 
A, the venn showed the intersections of DEGs between the two datasets; B, the two-way clustering of differentially expressed genes (DEGs); horizontal axis, the samples; vertical axis, DEGs between recurrence and non-recurrence; left for GSE31684; right for GSE13507; color key, the log2FC of DEGs.
Protein-protein interaction network analysis; the colors of the node indicated the log2FC; the size of the nodes represent the degree.
DEGs analysis
Based on the microarray data analysis between recurrence and non-recurrence by Limma, we obtained 1496 DEGs (362 down-regulated and 1134 up-regulated DEGs) in GSE31684 and 1570 DEGs (850 down-regulated and 720 up-regulated DEGs) in GSE13507. Table 1 lists the top 10 markedly up-regulated and downregulated DEGs in the two datasets. We used volcano plots for the visualization and assessment of the variation (or reproducibility) of genes expression between recurrence and non-recurrence for GSE31684 (Fig. 1A) and GSE13507 (Fig. 1B). A total of 692 intersections DEGs (221 up-regulated and 420 down-regulated DEGs) were identified in these two datasets (Fig. 2A). Furthermore, the bidirectional hierarchical clustering revealed that DEGs could distinguish these two groups (Fig. 2B).
Modules and genes screened by WGCNA
According to the method, the power value when the correlation coefficient squared value of connection degree k and p(k) for the first time to reach 0.9 were selected, that was, power
PPI network analysis
In the PPI network (Fig. 3), there existed 149 interaction relationships among 81 proteins (18 down-regulated and 63 up-regulated DEGs) from 169 proteins. The 2 KEGG pathways including Focal adhesion and ECM-receptor interaction were enriched with the DEGs in the PPI network (Table 2).
The GO and KEGG pathway enrichment for the DEGs in the PPI network
The GO and KEGG pathway enrichment for the DEGs in the PPI network
In the CTD database, 210 KEGG pathways were searched, among which vascular smooth muscle contraction and ECM-receptor interaction were intersected with the enriched KEGG pathway for the DEGs (including 7 genes: COL4A1, VEGFA, COL6A3, COL1A2, COL6A1, COL5A1 and MYL9) in the PPI network.
Clinical factor statistics and cox regression analysis
Clinical factor statistics and cox regression analysis
Note: TCGA, The Cancer Genome Atlas; HR, hazard ratio;
Kaplan-Meier curves of recurrence free survival according to low-expression or high-expression categories; (A) For the dataset of GSE31684; (B) For the dataset of TCGA.
The univariate cox regression analysis obtained 3 DEGs (including COL4A1, COL1A2 and COL5A1) which were significant related to the prognosis.
Using the median of the expression of the 3 DEGs (COL5A1, COL1A2 and COL4A1) as the cut-off point, each individual was then classified into low expression (
Histogram of three important genes expression in GSE31684 (A) and TCGA (B) between recurrence and non-recurrence of bladder cancer.
Additionally, the expression of the 3 DEGs (COL4A1, COL1A2 and COL5A1) was analyzed in the datasets of TCGA and GSE31684 and all the expressions of the three DEGs were significantly higher in recurrence than non- recurrence (Fig. 5).
The univariate and multivariate analysis of overall survival by clinical factors in the training dataset was shown in Table 3. Using univariate analysis, overall survival times were revealed to be significantly associated with pathologic_M (M0 vs M1,
The multivariate analysis on the three DEGs showed that in the dataset of GSE31684, COL4A1 was the independent prognostic factor for overall survival in BC (HR
Multivariate analysis for COL5A1, COL4A1, COL1A2
Note: TCGA, The Cancer Genome Atlas; coef, the coefficients, HR, hazard ratio;
In this study, by using bioinformatics methods, we used genes microarray data from GEO and RNA sequencing data TCGA to develop and validate a novel prognostic tool based on 3 genes (COL4A1, COL1A2 and COL5A1). Our results indicated that COL4A1, COL1A2 and COL5A1 identified in this study could be used to categorize patients into low expression and high expression who had significantly different RFS.
COL4A1, COL1A2 and COL5A1 belong to the collagen family which are the most abundant components of extracellular matrix (ECM). They have been discovered to play an important role in the structural integrity and tensile strength of human tissues and organs [19]. As for cancer, collagens could regulate the physical and the biochemical properties of the tumor microenvironment and modulate cancer cell polarity, migration and signaling [20].
COL4A1 was significantly up-regulated in the recurrence than non-recurrence (log
Additionally, COL1A2 was significantly up-regulated in recurrence than non-recurrence (log
At last, the COL5A1 was also significantly up-regulated in the recurrence than non-recurrence (log
The KEGG pathway enrichment showed that COL4A1, COL1A2 and COL5A1 were significantly enriched in Focal adhesion and ECM-receptor interaction. The ECM-receptor interaction pathway and focal adhesion pathway are associated with the regulation of heterogeneity for muscle-invasive BC [28]. And ECM-receptor interaction pathway and focal adhesion pathway are found to be related to the progression of muscle-invasive BC [29].
Our study had several limitations. First, the samples from the TCGA were muscle-invasive BC while the datasets from the GEO were non muscle-invasive BC. It is not ideal to use the TCGA dataset as the validation dataset and the further study was still needed. Secondly, since the study was to screen the biomarkers for BC, so the blood samples which is the noninvasive were needed to validate the results.
In conclusion, COL4A1, COL1A2 and COL5A1 involved in the pathway of focal adhesion and ECM-receptor interaction could be associated with BC recurrence.
Footnotes
Conflict of interest
None to report.
Supplementary data
Module validation statistics Note: Z-score represented the stability of the modules in the two datasets, Z-score
ID
Color
Z-score
Module 1
Black
35.070
Module 2
Blue
10.571
Module 3
Brown
20.446
Module 4
Green
7.855
Module 5
Grey
2.599
Module 6
Magenta
0.415
Module 7
Pink
5.159
Module 8
Red
0.409
Module 9
Turquoise
21.160
Module 10
Yellow
46.270
The schematic representation of the workflow. WGCNA of genes from the two datasets; (A) Hierarchical cluster tree showing co-expression modules identified by WGCNA (GSE31684 and GSE13507). Each leaf in the tree represents one gene. The major tree branches constitute 10 modules, labeled with different colors; (B) Module-trait association. Each row corresponds to a module, labeled with a color as in (A); The column corresponds to a clinical trait-recurrence; The color of each cell at the row – column intersection indicates the correlation coefficient between the module and the trait; A
