Abstract
This study was designed to identify the potential key protein interaction networks, genes, and correlated pathways in early-onset colorectal cancer (CRC) via bioinformatics methods. We selected microarray data GSE4107 consisting 12 patient’s colonic mucosa and 10 healthy control mucosa; initially, the GSE4107 were downloaded and analyzed using
Introduction
Colorectal cancer (CRC) is one of the most common malignant diseases in the world, and its incidences increased with age. According to estimates, more than 777 000 of new cases with CRC were registered in 2015 in the developed countries, 1,2 there were about 376 000 of new CRC cases and 191 000 of death were reported in 2015 in China. 3 Most CRC were related to old age and lifestyle factors, with only a fraction of cases caused by underlying genetic disorders. 4,5 Although numerous efforts has been taken to understand the genetic mechanism for initiation and progression of CRC, it remains a major challenge for researchers to prevent and treat early-onset CRC. Therefore, it is important and urgent to uncover the mechanisms of early-onset CRC and develop novel therapeutic routes.
Gene chip or expression profile is a gene-level detection technique that has been applied to scientific research during 2000. Using gene chips, integrated bioinformatic knowledge makes it possible to detect the expression of the entire genome within the same sample in a single experiment, which is particularly suitable to screen differentially expressed genes. 6,7 With the application of the gene chips, a large amount series of correlated CRC slice data have been produced, archived, and deposited in public databases. Reanalyzing and reintegrating those data sets may find some meaningful clues for new research. A series of microarray data sets have been carried out on CRC in recent years, 8,9 and a large number of differentially expressed genes (DEGs) have been obtained, which are involved in different pathways, biological process, cellular components, or molecular functions.
In this study, we downloaded the original raw data set (GSE4107) from the website of Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/), which is a public database for archiving and querying microarray data. Gene expression profiles of patients with CRC were compared to those in normal healthy control to identify the DEGs. Subsequently, the DEGs were screened using Rstudio software installed Limma packages 10,11 ; then gene ontology (GO) and pathway enrichment analysis were performed on the online website DAVID (https://david.%20ncifcrf.gov). 12 Through analyzing their biological functions and pathways, we may sketch out the outline of CRC development at molecular level and identify the potential candidate genes for diagnosis, prognosis, and therapeutic targets.
Materials and Methods
Microarray Data
Microarray data GSE4107 13 were downloaded from the National Center Biotechnology Information Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/) database, 14,15 which was executed with help of GPL570 Human Genome U133 Plus 2.0 Array. GSE4107 contains 12 patients and 10 healthy control (average age: 50 or less, ethnicity: Chinese).
Data Preprocessing
After GSE4107 was downloaded, probe identification numbers were transformed into gene symbols. For multiple probes corresponding to one gene, the significant expression value was taken as the gene expression value. After that, gene expression values were normalized using the Affy package. 16
Identification of DEGs
The raw data GSE4107 files used for analysis included the
Gene Ontology and Pathway Enrichment Analysis
The GO analysis is a useful method for annotating gene and gene product
20
and identifying characteristic biological meaning of genome and transcriptome.
21,22
The KEGG is a systematic analysis database of gene function, linking genomic information with high-level functional information.
23
Candidated DEGs functional-level enrichment were analyzed through multiple online tools. DAVID, among them, is an online website with gene annotation, visualization, and providing gene attributes.
Integration of Protein–Protein Interaction Network and Module Analysis
First, Search Tool for the Retrieval of Interacting Genes (STRING)
26
database was used to demonstrate DEG-encoded proteins and protein–protein interaction (PPI) information. Second, to evaluate the interactive relationships among DEGs, we mapped the DEGs to STRING, and minimum required interaction score >0.400 (medium confidence) was selected as significant threshold. Then, PPI networks were constructed using the Cytoscape software.
27
The plug-in Molecular Complex Detection (MCODE), a well-known automated method to find highly interconnected subgraphs as molecular complexes or clusters in large PPI networks, was used to screen the modules or clusters of PPI network in Cytoscape. The MCODE parameters criteria were set by default, except K-core = 6. Moreover, the functional enrichment analysis was performed for DEGs in the modules with
Results
Identification of DEGs
In this study, we included 12 patients with CRC and 10 healthy controls for the analysis. GSE4107 was analyzed using Rstudio software and following identifies the DEG sets. Using adjusted
131 Differentially Expressed Genes (DEGs) Were Identified From GSE4107, Including 108 Upregulated Genes and 23 Downregulated Genes in the Patients With Early-Onset Colorectal Cancer, Compared to Healthy Control.a
aThe upregulated genes were listed from the largest to the smallest of fold changes, and downregulated genes were listed from the smallest to largest.
Gene Term Enrichment Analysis
We uploaded DEGs to the online website DAVID to identify GO Terms and KEGG pathways and classified them into 3 functional categories: biological process (BP), cellular component (CC), and molecular function (MF; Figure 1A). As shown in Figure 1B and Table 2, GO analysis showed that the DEGs were most significantly enriched in muscle contraction and regulation of muscle contraction. Moreover, the upregulated DEGs were significantly enriched in biological process, including muscle system process, muscle contraction, and regulation of muscle contraction (Figure 1B and Table 2); the downregulated DEGs were enriched in organic acid transport, lipid metabolic process, and cellular lipid metabolic process (Figure 1B and Table 2).

Gene ontology analysis and significant enrichment of differentially expressed genes (DEGs) in early-onset colorectal cancer (CRC). (A) Gene ontology (GO) analysis classified DEGs into BP, CC, and MF group. (B) Ranking significant enriched GO terms of DEGs.
The Gene Ontology Analysis of DEGs Associated With Early-Onset Colorectal Cancer.
Abbreviation: DEG, differentially expressed gene.
Kyoto Encyclopedia of Genes and Genomes Pathway Analysis
We used online website DAVID to perform DEG functional and signaling pathway enrichment analysis. Figure 2 shows the most significantly enriched pathways of DEGs, and Table 3 lists the significantly enriched pathways of the upregulated DEGs, while there are no available significantly enriched pathways of the downregulated DEGs (Table 3). The significant signal pathway of the (upregulated) DEGs mainly enriched in vascular smooth muscle contraction and cGMP-PKG signaling pathway.

Significantly enriched signal pathway of differentially expressed genes (DEGs) in early-onset colorectal cancer (CRC).
Signaling Pathway Enrichment Analysis of DEGs Associated With Early-Onset Colorectal Cancer.
Abbreviation: DEG, differentially expressed gene.
Module Analysis, Key Candidate Genes, and Pathway Identification From PPI Network
Based on the STRING online database (http://string-db.org) and Cytoscape software, a total of 131 DEGs (108 upregulated and 23 downregulated genes) were filtered into the DEG PPI network complex, containing 82 nodes and 199 edges (Figure 3A), and 49 genes did not fall into the PPI network. According to the filtering of node degree ≥10 criteria, the top 10 hub genes were

Protein–protein interaction (PPI) network of differentially expressed genes (DEGs; Node color: Skyblue indicates up-regulated gene, Red indicates down-regulated gene). (A) Based on the STRING online database, 82 DEGs were filtered into DEGs PPI network. (B) The most significant module from the PPI network.
Pathway Enrichment Analysis of Common Genes Function.
Discussion
The CRC is a disease of accumulated genetic, epigenetic, and environmental aberrations. 28 Understanding the molecular mechanism of CRC is of very importance for diagnosis and treatment. It has been known that Wnt signaling pathway was associated with the major causes of CRC.
In this study, we expected to find out the key candidate genes and signal pathway in early-onset CRC. By comparing the 12 patients’ mucosa with 10 healthy control mucosa, 108 upregulated and 23 downregulated DEGs were screened. By using GO and PPI network analysis, 7 hub genes, namely,
Module analysis of the PPI networks suggested that the early-onset CRC is associated with vascular smooth muscle contraction signaling pathway, and the vascular smooth muscle cell (VSMC) principal function is contraction. 47 The principal mechanisms that regulate the contractile state of VSMCs are changes in cytosolic Ca2+ concentration. Moreover, Rho/Rho kinase, PKC, and arachidonic acid have been proposed to play a pivotal role in this event. 48
In conclusion, in this study, we investigated the potential candidate gene and signal pathway of DEGs in early-onset CRC. Genes were selected by DEG, GO, KEGG, and PPI analysis. This study has improved our understanding of the pathogenesis and underlying molecular mechanism in early-onset CRC; these selected candidate genes and pathways could give us a clue to new therapeutic targets for treatment of CRC. However, further molecular biological experiments are required to confirm the function of these identified genes in CRC.
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Project of National Natural Science Foundation of China (No 81770294).
