Abstract
Colorectal cancer (CRC) is one of the most common digestive tract malignant tumors, which has a high mortality rate especially for patients with CRC recurrence. However, the pathological mechanism of recurrence of CRC is unclear. In this study, we integrated multiple cohort datasets and databases to clarify and verify potential key candidate biomarkers and signal transduction pathways in recurrence of CRC. As results, 628 DEGs were identified from GSE33113 and GSE2630 datasets and their function and pathway were analyzed. 14 hub genes related to CRC recurrence were screened from and their influence on survival were analyzed. Two key genes (IL1B and DDAH1) regarded as prognostic factors were further screened. Relapse-free survival results indicated the interaction between IL1B and DDAH1 genes and B cells was the most obvious and correlated with survival, with statistical significance (
Introduction
Information about the GEO data and series
Information about the GEO data and series
Colorectal cancer (CRC) a common malignant tumor in the gastrointestinal tract has become one of the most common disease with the highest incidence and mortality rate in the world [1, 2, 3]. With the development of various therapeutic methods such as surgery, chemoradiotherapy, targeted therapy and local therapy, the survival of CRC patients has been significantly extended [4, 5]. However, due to the limitations of traditional therapy and tumor heterogeneity, the clinical efficacy of CRC is still unsatisfactory. Although the tumor cannot be detected by conventional imaging (including PET/CT) or laboratory methods after CRC radical resection surgery, those tumor cells may not be completely eliminated, noting as minimal residual disease (MRD) [6]. Those MRD may reappear in the primary organ or the cancer will invade the lymph, blood vessels or body cavity, and migrate to other sites and continue growing, forming a tumor with the same pathological type and biological behavior as the primary tumor. Studies shows that colorectal cancer patients who test positive for MRD 2–8 weeks after surgery have a 13-fold higher risk of recurrence than patients who test negative [7]. For the patients who undergo radical surgery alone, the overall recurrence rate can reach 30–40% [2, 8]. Therefore, postoperative recurrence of CRC has become a big challenge for the therapy of CRC patients. To prevent the CRC recurrence, it is important to understand the mechanism of CRC recurrence, search for related molecular markers, and conduct translational research on related targets.
The identification of molecular markers is important for the treatment of CRC. It is recommended that routine molecular marker detection should include RAS (including KRAS and NRAS), BRAF, and MMR/MSI, which are of great significance in guiding treatment and prognosis [9]. However, the key candidate biomarkers and signal transduction pathways in CRC recurrence are still unclear. With the development of medical science and technology, Next-Generation Sequencing (NGS) is gradually being applied in all aspects of scientific research and clinical practice [10, 11, 12]. It can significantly improve sequencing throughput, sequencing hundreds of thousands to millions of DNA molecules simultaneously. In oncology, databases such as Gene Expression Profiling Interactive Analysis (GEPIA) [13] and The Cancer Genome Atlas (TCGA) [14] provided new approach for the study of markers related to colorectal cancer. By using microarray technology to analyze the differentially expressed genes (DEGs), the pathological process of CRC recurrence can be revealed, which is helpful for the prevent of CRC recurrence [15, 16, 17].
In this study, we took GEO, TCGA and other public databases to identify the key genes related to colorectal cancer recurrence, study the relationship between genes expression and clinical prognosis, and analyze the mechanism of key genes on CRC recurrence.
Microarray data
The microarray data of CRC were obtained from GEO database. All included datasets should meet the following criteria: (1) the data sets using surgically resected samples for RNA detection, (2) total sample size should
Bioinformatics related analysis
Gene function and pathway analysis: ClueGO of Cytoscape3.9.1 software was used to analyse the DEGs gene ontology (Gene Ontology, GO) and Kyoto encyclopedia (Kyoto Encyclopedia of Genes and Genomes, KEGG) GO analysis was consisted by three parts: biologicalprocess, cellularcomponent, and molecularfunction. STRING tools were used to construct protein-proteininteraction (PPI) networks to search for the interactions among proteins. Screening of hub genes was generated via PPI networks using cytoHubba calculations in Cytoscape. Survival analysis and expression analysis were performed using GEPIA online Survival analysis tool. The correlation between gene expression and the abundance of 6 kinds of infiltrating immune cells and the effect on patient survival were evaluated by the TIMER database. The web addresses of all databases and analysis tools are shown in Table 2.
Databases related to bioinformatics analysis
Databases related to bioinformatics analysis
Volcano plot of (a) GSE33113 and (b) GSE2630. Red spots represent the up-regulated genes, blue spots represent the down-regulated genes.
The statistical analyses of experimental data were performed using SPSS version 22.0. All experiments were performed with independent samples. GEO2R was used to identify DEGs associated with the recurrence of CRC. GEO2R is an interactive online analysis tool based on limma algorithm, which can be used to compare two or more datasets in the GEO series to determine DEGs. Previously, the cases in datasets were divided into primary CRC group and recurrent CRC group. Via GEO2R tool, all differentially expressed genes of two groups were identified. With
Results
Screening and identification of DEGs
The DEGs between primary CRC and recurrent CRC were identified from GSE33113 and GSE2630 datasets by GEO2R. The GSE33113 dataset included 72 cases of primary CRC and 18 cases of recurrent CRC. The GSE2630 dataset included 10 cases of primary CRC and 6 cases of recurrent CRC. The screening criteria for DEGs were
GO annotation and KEGG pathway enrichment of DEGs. ac: GSE33113 df: GSE2630.
(a) 10 hub genes selected from PPI network. (b) The intersected DEGs of GSE33113 and GSE2690.
The functional and pathway enrichment analysis of identified DEGs were performed using ClueGO and KEGG of Cytoscape software. The GO analysis indicated that DEGs of GSE33113 were mainly enriched in peptide cross-linking, organic acid binding, granulocyte migration, chemotactic reaction, positive regulation of various biological processes and other physiological processes (Fig. 2a). The DEGs of GSE2630 was mainly involved in the translation of targeted membrane proteins, the synthesis of RNA polymerase III complex, the DNA sensing pathway in cytoplasm, and the positive regulation of translation (Fig. 2d).
Survival analysis of TCGA database. (a) CXCL8, (b) IL1B, (c) CXCR1, (d) S100A12, (e) FPR1, (f) S100A8, (g) FN1, (h) CXCL12, (i) MMP1, (j) S100A9, (k) KCNE5, (l) DDAH1, (m) TM4SF4 and (n) DKK2.
(a) Distribution of risk scores, (b) Recurrence-free survival curves for high and low risk patients, (c) ROC curve versus AUC at different times in this risk model.
The KEGG channel enrichment analysis results showed that the DEGs of GSE33113 was mainly involved in the AGE-RAGE signal path IL-17 signaling pathway, and the metabolic pathway of glycerophospholipids (Fig. 2b, 2c). The DEGs of GSE2630 was mainly involved in the cytoplasmic DNA sensing pathway and RNA polymerization pathway (Fig. 2e, 2f).
To select the key genes of recurrent CRC, a PPI network was built using the STRING database and the intersected DEGs of GSE33113 and GSE2690 was analyzed [19, 20]. The key subnetworks were extracted by cytoHubba calculations in Cytoscape software using MCC algorithm and Molecular Complex Detection plugin. It can be found that in the PPI network, some regions were closely linked to each other, which may represent the molecules complex. Based on the PPI network results and connection data function, the top10 genes were filtered. Including CXCL8, IL1B, CXCR1, S100A12, FPR1, S100A8, FN1, CXCL12, MMP1 and S100A9 genes (Fig. 3a). Furthermore, the intersected DEGs of GSE33113 and GSE2690 chips was taken out by Venn map tool, and 4 intersected genes such as KCNE5, DDAH1, TM4SF4 and DKK2 were observed (Fig. 3b). The 14 genes selected by PPI and Venn map were identified as candidate biomarkers.
Prognostic analysis of DEGs
The 10 hub genes and 4 shared genes were divided into two groups according to the expression level of each gene. To identify the prognostic factors of the 14 candidate genes to the survival time of the patients survival analysis was performed by GEPIA online Survival analysis tool. The overall survival rate was compared by
Establishment of prognostic model
Multivariate COX regression analysis was used to construct a molecular prognostic model for CRC recurrence based on the IL1B and DDAH1. The risk score of each colon cancer patient was calculated by summing the multiplication of normalized expression level of the gene by its corresponding coefficient obtained by GEO2R. The risk scores
mRNA expression of IL1Band DDAH1 genes in control tissues and CRC (*P< 0.05). Adenocarcinoma is the most common type of colorectal cancer, accounting for more than 95% of colorectal cancer. COAD refers to colon adenocarcinoma, READ refers to rectum adenocarcinoma.
The expression of IL1B and DDAH1 in CRC were further analyzed by GEPIA. The expression levels of IL1B and DDAH1 genes in colorectal adenocarcinoma tissues were significantly higher than those in control tissues (Fig. 6a–b). The expression of IL1B and DDAH1 genes among tumor tissues at different stages (stage I, II, III, and IV) were further analyzed. Figure 6c–d show that the expression of DDAH1 was significantly different among the four stages (
(a) The correlation analysis of IL1B and B lymphocyte expression level in CRC. (b) The prognostic value of IL1B and B lymphocyte patients in the survival curve. (c) The correlation analysis of DDAH1 and B lymphocyte expression level in CRC. (d) The prognostic value of DDAH1 and B lymphocyte patients in the survival curve.
(a) The genetic alteration of IL1B and DDAH1. (b) The potential co-expression of IL1B and DDAH1. (c–f) Promoter methylation level of IL1B and DDAH1.
(a) Univariate cox regression analysis. (b) Multivariate cox regression analysis. (c) The predictive nomogram for RFS of CRC patients. (d) The 1,3,5-year calibration curves for CRC patients.
To reveal the function of IL1B and DDAH1 for the prediction performance of CRC patients, the expression levels of IL1B and DDAH1 in association with 6 types of infiltrating immune cells (B lymphocyte, CD4
Genetic alteration co-expression and promotor methylation level analysis
Further, the molecular characteristics of IL1B and DDAH1 were analyzed using the Pan Cancer Atlas of TCGA to evaluate their impact on the genetic expression by CBIOPORTAL database. As shown in Fig. 8a, 0.6% and 0.4% of the CRC samples were altered by IL1B and DDAH1 respectively. Next the potential co-expression of the two DEGs was analyzed. Both the expression of IL1B and DDAH1 exhibited significant correlations as shown in Fig. 8b. Then we explored the promotor methylation level of IL1B and DDAH1 and found that the promotor for IL1B was hyper-methylation, while for DDAH1 was hypo- methylation (Fig. 8c–f).
Cox regression analysis and clinical information
Univariate and multivariate cox regression analysis were performed and a nomogram was developed to predict the 1,3,5-year overall recurrence [21]. The forest was used to show the
Discussion
In recent years, many basic and clinical studies have revealed the causes and potential mechanisms mediating the formation and development of CRC [2]. However, the high recurrence rate of the disease is the main factor for the death of CRC patients [22]. With the wide application of gene chip generation technology, a large amount of core slice data was generated, and most of the data were stored in the public database. Therefore, integrating and reanalyzing these datasets can provide valuable clues for new research. Recently, researchers have conducted a large number of microarray data analysis studies on CRC and obtained hundreds of DEGs [23], but reliable biomarkers related to CRC recurrence have not been identified.
In this study, we integrated datasets from two coincidences of CRC recurrence from different sources and analyzed these data using a variety of bioinformatics methods. 514 DEGs between primary CRC and recurrent CRC patients were screened from the GSE33113 and GSE2630 datasets. The function and pathway of the 514 DEGs were analyzed by GO functional and KEGG genome pathway enrichment analysis. Results showed that the GSE33113 DEGs were mainly involved in the AGE-RAGE signaling pathways, IL-17 signaling pathways, and glycerol phospholipid metabolic pathways [24]. And the DEGS of GSE2630 were mainly involved in plasma DNA sensing pathway and RNA polymerization.
Next, 10 hub genes and 4 intersected genes were screened form the 514 DEGs through PPI and Venn map [17]. GEPIA survival analysis indicated only the IL1B and DDAH1 genes can be regarded as prognostic factors [11, 12]. The expression levels analyze of IL1B and DDAH1 genes in colorectal adenocarcinoma tissues were high, suggested that the two genes significantly affected the survival of CRC patients. Finally, we studied the mechanism of the expression of IL1B and DDAH1 genes on the CRC patients. Results suggested that the expression IL1B and DDAH1 genes affected the immune system of CRC patients, which caused the alter of prognosis of CRC patients Univariate analysis suggests DDAH1 and TNM clinical stage were independent risk factors for relapse-free survival. In multivariate analysis, TNM clinical stage was an independent risk factor for relapse-free survival in patients with CRC recurrence. Surprisingly, the patients at T1and N0 stages had a higher risk of recurrence than patients at T2 and N1, Therefore, for these patients, it is of great significance to assess the risk of recurrence as early as possible after surgery and take measures as early as possible.
Using biomarkers to predict the possibility of CRC recurrence could help to identify MRD, which benefits the early screening and diagnosis. Based on biomarkers results, a better early treatment can be realized, the rate of early detection can be increased, which possess significant influence on the prevention of CRC recurrence.
Although the two datasets were accessed from a same platform, batch effects may have an impact on sample expression differences. In microarray experiments, the absolute optical density values of each chip were not the same. After background processing and data cleaning, the corrected values reflect the gene expression level. Before comparing the results of each test, they must be normalized to adjust for errors caused by gene chip technology. Both datasets were normalized by GEO2R analysis.
The limitations of this study were that the data used in this study were obtained from the open database, so the time of data collection were not consistent. Furthermore, the results of this study were not validated by large clinical samples.
Conclusion
In conclusion, 14 pivotal genes related to CRC recurrence were screened by bioinformatics analysis tools Among them, two genes (IL1B and DDAH1) were regarded as prognostic factors which impacted the survival of CRC patients. Mechanism study suggested IL1B and DDAH1 significantly related to the prognosis by affecting the invasion of immune cells through IL-17 signaling pathway. Our study also found that patients at T1 and N0 stages had a higher risk of recurrence than patients at T2 and N1, Therefore, for these patients, it is of great significance to assess the risk of recurrence as early as possible.
Ethics approval and consent to participate
The study has been approved by the Ethics Committee of Shanxi People’s Hospital (Shanxi provincial people’s Hospital, Yingze District, Taiyuan, Shanxi, China; Dr. Huazhong Sun, Chair; Approval No. 2023-210) on 18 April, 2023.
Consent for publication
All authors have read and approved the submission of the manuscript.
Availability of data and materials
The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.
Funding
This work was supported by the Fundamental Research Program of Shanxi Province (202303021222360 202103021223086).
Author contributions
Preparation of the manuscript: R. Xu and H. Liang. Interpretation or analysis of data: R. Xu, H. Liang and H. Feng. Supervision: H. Liang and Y. Li.
Footnotes
Acknowledgments
Not applicable.
Conflict of interest
The authors declare that they have no conflict of interest.
