Abstract
Cervical cancer (CC) is one kind of female cancer. With the development of bioinformatics, targeted specific biomarkers therapy has become much more valuable. GSE26511 was obtained from gene expression omnibus (GEO). We utilized a package called “WGCNA” to build co-expression network and choose the hub module. Search Tool for the Retrieval of Interacting Genes Database (STRING) was used to analyze protein-protein interaction (PPI) information of those genes in the hub module. A Plug-in called MCODE was utilized to choose hub clusters of PPI network, which was visualized in Cytoscape. Clusterprofiler was used to do functional analysis. Univariate and multivariate cox proportional hazards regression analysis were both conducted to predict the risk score of CC patients. Kaplan-Meier curve analysis was done to show the overall survival. Receiver operating characteristic (ROC) curve analysis was utilized to evaluate the predictive value of the patient outcome. Validation of the hub gene in databases, Gene set enrichment analysis (GSEA) and GEPIA were completed. We built co-expression network based on GSE26511 and one CC-related module was identified. Functional analysis of this module showed that extracellular space and Signaling pathways regulating pluripotency of stem cells were most related pathways. PPI network screened GNG11 as the most valuable protein. Cox analysis showed that ACKR1 was negatively correlated with CC progression, which was validated in Gene Expression Profiling Interactive Analysis (GEPIA) and datasets. Survival analysis was performed and showed the consistent result. GSEA set enrichment analysis was also completed. This study showed hub functional terms and gene participated in CC and then speculated that ACKR1 might be tumor suppressor for CC.
Keywords
Abbreviations
Study design and data preprocessing. Note: (A) Flow diagram of study. (B) Sample clustering after detecting two outliers.
Cervical cancer is the fourth-most prevalent female cancer worldwide, and contributes to estimated 530,000 new cases annually with 270,000 deaths [1]. The persistent infection of human papilloma virus especially the type 16/18, increasing number of lifetime sexual partners, acquisition of new male partners and non-monogamous male partners have been identified as high-risk factors for cervical cancer [2]. In recent years, bioinformatics has developed rapidly. We can use the database to analyze tumor-related molecules and have guiding significance for targeted therapy [3].
Gene expression profiling [4] and weighted gene expression network analysis (WGCNA) [5] were currently the most widely used analytical method in bioinformatics. In this research, we established co-expression network based on dataset GSE26511 including 19 lymph node-positive CC samples and 20 lymph node-negative CC samples in order to determine the hub module. Then we built PPI network for the genes in hub module. We further performed survival regression analysis to screen out hub genes and finally validate hub genes using databases and website. Combined with the above results, it is concluded that they may be potential targets for CC.
Materials and methods
Research plan and data pre-calculation
The research was designed based on the flow chart (Fig. 1A). Gene expression profile matrix files were obtaied from the Gene Expression Omnibus (GEO) (
Construction and analysis of WGCNA
First, we check the expression data profile of the genes for their rationality. The two obvious outliers (GSM651842, GSM651846) and removed them from the cohort . Obvious outliers means sample clustering according to the distance between different samples in Pearson’s correlation matrices and average linkage. Secondly, we used the R package called “WGCNA” to build the co-expression network based on the genes [8, 9]. The Pearson’s correlation matrices was prepared for all pair-wise genes. Next, a power function amn
PPI network and hub cluster functional analysis
We utilized STRING (Search Tool for the Retrieval of Interacting Genes Database) (
Construction of a prognostic signature
Univariate Cox proportional hazards regression analyses were performed based on the genes in hub module. Genes which had prognostic signature were identified by the cutoff of
Hub module selection. Note: (A) Dendrogram of all genes clustered based on a dissimilarity measure (1-TOM). (B) Dendrogram of consensus module eigengenes obtained by WGCNA on the consensus correlation. The red line was the merging threshold, and groups of eigengenes below the threshold represent modules whose expressions profiles should be merged due to their similarity. (C) Correlation between modules and traits. The upper number in each cell referred to the correlation coefficient of each module in the trait, and the lower number was the corresponding 
The hub gene needed to be verified in GEPIA (Gene Expression Profiling Interactive Analysis) [14]. GSE39001 and GSE6791 were also used to validate hub gene expression. GSE39001 [15] included 43 CC tissue samples and 12 normal tissue samples. GSE6791 [16] included 20 CC tissue samples and 8 normal tissue samples.
Gene set enrichment analysis (GSEA)
Based on hub gene expression level, samples were divided into two groups. In order to study the potential function of this gene, GSEA (
Select hub genes in hub modules. Note: (A) A scatter plot of GS for CC versus the MM in the green module. Intramodular analysis of the genes found in the green module, which contained genes that had a high correlation with CC, with 
All data was processed by R version 3.5.1 (
Cluster analysis of the PPI network. Note: (A) The genes in green module were filtered into the PPI network that contained 476 nodes and 298 edges. (B) Histogram of key proteins. The y-axis represented the name of genes, the x-axis represented the number of adjacent genes, and height was the number of gene connections.
Weighted co-expression network construction and analysis
Expression data profile of GSE26511 including 19 lymph node-positive CC tissue samples and 18 lymph node-negative CC tissue samples was used to construct co-expression networks and identify hub genes (Fig. 1B). We conducted the R package “WGCNA” and set the power
Survival prognosis model of the hub genes. Note: (A) Multivariate Cox proportional hazards regression analysis further screened 2 hub genes. (B) Survival analysis showed that the patients in the high risk group had statistically significantly worse overall survival than those in low risk group in TCGA cohort. (C) ROC analysis was performed to find out the most optimal cutoff value to divide the CC patients into high risk and low risk group. (D and E) The risk scores for all patients in TCGA cohort are plotted in ascending order and marked as low risk (blue) or high risk (red), as divided by the threshold (vertical black line). (F) Five expression and risk score distribution in TCGA cohort by z-score, with red indicating higher expression and light blue indicating lower expression.
It was interesting that some of these modules had similar expression profiles. In order to research the relationships and interactions among the mentioned 8 co-expressed modules, we analyzed the connectivity of eigengenes and completed a cluster analysis. Finally 7 clusters were divided into 3 clusters. one contained three branches and two contained two branches (Fig. 3C).
Functional analysis were performed on the genes in green module. Go analysis showed that genes were mostly enriched in extracellular spaceï¼proteinaceous extracellular matrix and extracellular region (Fig. S2A). KEGG analysis showed that genes were mostly enriched in Signaling pathways regulating pluripotency of stem cells, Melanoma and TGF-beta signaling pathway (Fig. S2B).
Via the STRING website, 508 genes in green module mentioned were screened into PPI network. The complex owned 476 nodes and 298 edges (Fig. 4A). STRING software identified 10 prominent proteins. Among them GNG11 was seen to be the most important one connecting 20 nodes (Fig. 4B).
Validation of hub gene. Note: (A) The result of GEPIA website validation. (B) The validation of ACKR1 expression in GSE39001. (C) The validation of ACKR1 expression in GSE6791. (D) Survival analysis of ACKR1.
Then, we applied the plug-in named MCODE for scoring. So that we could find parameters. The parameters were able to show the qualified results for PPI. After that, clusters could be found in the PPI network. Eighteen clusters were produced due to k-core
MCODE was applied to analyze the top 3 significant clusters. Functional analysis of the 2 clusters were also conducted using clusterprofiler (Fig. S4A–D).
GSEA using GSE26511. Note: Only listed the 4 most common functional gene sets enriched in CC samples with ACKR1 highly expressed. 
We set GM (cor.geneModuleMembership)
Hub gene validation
To further demonstrate ACKR1 was related to CC, we used the GEPIA website to verify it (Fig. 6A). ACKR1 was proved to be positively correlated with CC prognosis and interestingly had higher expression level in normal cervical tissues than in cervical cancer tissues. The expression of ACKR1 in GSE39001 and GSE6791 showed the consistent result (Fig. 6B–C). Survival analysis of ACKR1 showed that patients with higher ACKR1 expression had better overall survival (Fig. 6D).
Gene set enrichment analysis (GSEA)
To identify the potential function of the ACKR1 in CC, GSEA was conducted to search KEGG pathways enriched in the highly-expressed samples. Four gene sets (
Discussion
Cervical cancer is the fourth-most prevalent cancer in women worldwide, and contributes to estimated 530,000 new cases annually with 270,000 deaths. Its development has a complex mechanism, and many genes are involved in its regulation. In recent years, bioinformatics has developed rapidly. We can use the database to analyze tumor-related molecules and have guiding significance for targeted therapy. In this research, we established co-expression network based on dataset GSE26511 including 19 lymph node-positive CC samples and 20 lymph node-negative CC samples in order to determine the hub module. Then we built PPI network for the genes in hub module. We further performed survival regression analysis to screen out hub genes and finally validate hub genes using databases and website. Combined with the above results, it is concluded that they may be potential targets for CC.
In these findings, WGCNA analysis suggested that module green possessed significantly relevant expression pattern. Functional analysis were performed on the genes in green module. Go analysis showed that genes were mostly enriched in extracellular space. KEGG analysis showed that genes were mostly enriched in Signaling pathways regulating pluripotency of stem cells.
Wu et al. found that extracellular vesicles might be emerging targets in cancer [18]. Pickup et al. found that the extracellular matrix modulated the hallmarks of cancer [19]. Song et al. The diversity of characteristics of pancreatic cancer stem cells via regulating sonic hedgehog signaling pathway [20]. The role of these pathways in CC has not been studied and deserved further exploration.
A PPI network was built based on the green module. According to a k-core of 2, we found 7 clusters which might have impact on CC. Ten prominent proteins were identified, among the 10 proteins GNG11 was seen to be the most important one, which suggested that GNG11 might have inhibition effect in CC.
GNG11 (G-protein subunit
We combined survival regression analysis and then identified that ACKR1 was negatively associated with CC. The validation of GEPIA showed that ACKR1 was positively correlated with CC prognosis. ACKR1 was demonstrated to have higher expression level in normal cervical tissues than in cervical cancer tissues. The expression of ACKR1 in GSE39001 and GSE6791 showed the consistent result.
Atypical chemokine receptor 1 (ACKR1) was known to encode a glycoprotein expressing the Duffy blood group antigens [22], which was proved to regulate hematopoiesis on nucleated erythroid cells [23]. Wan W et al. proved that ACKR1 deficiency reduced atherogenesis in ApoE-knockout mice [24]. The functional mechanism between ACKR1 and CC deserved further exploration.
GSEA results also provided some valuable details. For example, ABC transporters were proved to be mediators of drug resistance and contributors to cancer cell biology [25]. Cell adhesion molecules cams have been shown to be involved in tumor metastasis and had a certain value in tumor therapy [26, 27]. Xiao et al. found that CXCL16/CXCR6 chemokine signaling mediated breast cancer progression [28]. Chemokine pathways were also proven to be involved in the process of hepatobiliary cancer [29]. These pathways were closely related to tumors, and their roles in CC were worth further research.
Conclusion
In summary, our study illustrated the hub genes and pathways involved in the progression of CC. We finally speculated that ACKR1 should be a tumor suppressor for predicting and treating cervical cancer.
Supplementary data
The supplementary files are available to download from http://dx.doi.org/10.3233/CBM-190533.
sj-docx-1-cbm-10.3233_CBM-190533.docx - Supplemental material
Supplemental material, sj-docx-1-cbm-10.3233_CBM-190533.docx
Footnotes
Acknowledgments
This work was supported by the National Nature Science Foundation of China (81872119 and 81472442) and the Jiangsu province medical innovation team (CXTDA2017008).
Conflict of interest
The authors report no conflicts of interest in this work.
