Abstract
Gastric adenocarcinoma is the most common histologic type of gastric cancer; however, the pathogenic mechanisms remain unclear. To improve mechanistic understanding and identify new treatment targets or diagnostic biomarkers, we used bioinformatic tools to predict the hub genes related to the process of gastric adenocarcinoma development from public datasets, and explored their prognostic significance. We screened differentially expressed genes between gastric adenocarcinoma and normal gastric tissues in Gene Expression Omnibus datasets (GSE79973, GSE118916, and GSE29998) using the GEO2R tool, and their functions were annotated with Gene Ontology and Kyoto Encyclopedia of Genes and Genomes signaling pathway enrichment analyses in the DAVID database. Hub genes were identified based on the protein-protein network constructed in the STRING database with Cytoscape software. A total of 10 hub genes were selected for further analysis, and their expression patterns in gastric adenocarcinoma patients were investigated using the Oncomine GEPIA database. The expression levels of ATP4A, CA9, FGA, ALDH1A1, and GHRL were reduced, whereas those of TIMP1, SPP1, CXCL8, THY1, and COL1A1 were increased in gastric adenocarcinoma. The Kaplan–Meier online plotter tool showed associations of all hub genes except for CA9 with prognosis in gastric adenocarcinoma patients; CXCL8 and ALDH1A1 were positively correlated with survival, and the other genes were negatively correlated with survival. These 10 hub genes may be involved in important processes in gastric adenocarcinoma development, providing new directions for research to clarify the role of these genes and offer insight for improved treatment.
Introduction
Gastric cancer is currently one of the most common malignant tumors worldwide, and the prognosis remains poor; the 5-year survival rate of gastric cancer patients in China is still below 40%. 1 Gastric cancer-related genes involved in the process of tumor development have been identified, which define the histological type of tumors. Gastric cancer is histologically classified as adenocarcinoma, adenosquamous carcinoma, squamous cell carcinoma, undifferentiated carcinoma, and carcinoid tumor. Gastric adenocarcinoma is the most common histological type of gastric cancer, accounting for 95% of all cases, and its morbidity and mortality are high. 2 Most patients with gastric adenocarcinoma are diagnosed when the cancer is at an intermediate or advanced stage of development, with high rates of recurrence and metastasis after surgery. 3 Although chemotherapy can help some patients improve their quality of life and prolong survival time, the effect remains limited, which is mainly due to delayed diagnosis and treatment. Despite research to uncover the molecular mechanism of gastric adenocarcinoma, specific genes that are responsible for gastric adenocarcinoma development have not been identified. Therefore, further work is needed to elucidate the potential mechanism in the occurrence and development of gastric adenocarcinoma, identify biomarkers of prognosis, and provide new targets and research directions to improve diagnosis and treatment.
Microarray technology and bioinformatic techniques are commonly used to screen genetic variants at the genomic level. However, false-positive results may be obtained in independent microarray analyses. Hub genes are defined as genes that play crucial roles in biological processes, and also influence the regulation of other genes in related pathways. Therefore, hub genes can represent important research targets to guide the precise treatment of cancer and the application of drugs.4,5
The Gene Expression Omnibus (GEO) database offers a rich genetic data resource, which has been widely used for analyses of lung cancer, ovarian cancer, breast cancer, and other cancer types.6–8 Therefore, in this study, we screened datasets derived from multiple gastric adenocarcinoma gene chips in the GEO database for differentially expressed genes, searched for hub genes among the candidates, and explored their expression distribution and prognostic significance. These results can provide a theoretical basis for the discovery of possible prognostic markers toward improving the clinical diagnosis and treatment of gastric adenocarcinoma.
Materials and methods
Datasets
The entire GEO database, including 488 series of human gastric cancer, was screened for relevant datasets based on the following criteria: (1) data based on expression profiling by microarray, (2) gastric cancer-related project, (3) analysis of gastric adenocarcinoma tissue and normal gastric tissue, and (4) at least two samples analyzed. After careful screening, three datasets (GSE79973, GSE118916, and GSE29998) were selected. GSE79973 was based on GPL570 ([HG-U133Plus2] Affymetrix Human Genome U133 Plus 2.0 Array), GSE118916 was based on GPL15207 ([PrimeView] Affymetrix Human Gene Expression Array), and GSE29998 was based on GPL6947 (Illumina HumanHT-12 V3.0 beadchip expression).9–11
Screening of differentially expressed genes
The GEO2R analysis tool was used to screen the differentially expressed genes between gastric adenocarcinoma tissues and normal tissues in the three databases. GEO2R uses the GEOquery and limma R packages of the Bioconductor project (an analytical tool for high-throughput genomic data) to compare the processed data sheets provided by the original submitter. Specifically, the GEOquery R package parses GEO data into R-compatible data structures that can be used by other R packages. The limma R package can handle a wide range of experimental designs and data types, and performs multiple testing corrections on p-values to help correct for the occurrence of false positives.12–14 Genes with differences in expression levels between tissue types at adjusted p-value <0.05 and |log fold-change (FC)| >1.5 were considered as differentially expressed genes. Statistical analysis was performed on each dataset and the Venn diagram online tool was used to identify intersecting parts.
Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of differentially expressed genes
To reveal the biological function of the differentially expressed genes in gastric adenocarcinoma, enrichment analysis was performed with GO and KEGG pathway analyses. GO analysis is mainly used for functional enrichment based on biological process, molecular function, and cellular component catogories. 15 KEGG is a database that integrates genomic, chemical, and systems functional information to identify biological pathways associated with genes of interest. 16 The DAVID tool was used for GO and KEGG pathway enrichment analyses of differentially expressed genes.17,18 GO enrichment terms with p value <0.05 and gene count ≥10 were further analyzed. We also screened the KEGG analysis results according to p-value <0.05; the p values were ranked from small to large, and the top 10 most significant pathways were further analyzed and plotted as a bubble diagram.
Protein-protein interaction (PPI) network construction and selection of hug genes
The STRING database was used to explore PPIs among the differentially expressed genes in gastric adenocarcinoma according to a confidence level of <0.04, 19 and Cytoscape was used to visualize interactive networks. 20
The nodes with a higher degree of connection in the PPI network are more essential for maintaining the stability of the whole network, and are therefore considered to be more relevant to the biological process as a whole. Radiality method is a topological analysis method based on the path, which plays an important role in the prediction of key nodes.21 The top 10 genes in the network were screened as hub genes using the radiality method with the CytoHubba plugin in Cytoscape software.
Oncomine analysis
Oncomine is a database for the analysis of large tumor gene chip datasets, which integrates the data of several public databases, including GEO, The Cancer Genome Atlas (TCGA), and others. The Oncomine database currently contains 715 gene expression datasets with related information from 86,733 human tumor tissues and normal tissue samples, and is continuously updated. In this study, Oncomine was used to retrieve information on the expression of candidate hub genes to verify their expression patterns in gastric cancer under the following screening thresholds: p-value <1E-4, fold change >2, gene rank = top 10%.
Expression of candidate genes in gastric adenocarcinoma
Differences in expression levels of hub genes in normal gastric tissues and gastric adenocarcinoma tissues were further confirmed with gene expression profiling interactive analysis (GEPIA) based on the TCGA and GTEx datasets, 22 including data on 9736 tumors and 8587 normal samples.
Survival analysis
The Kaplan–Meier plotter can be used to assess the impact of 54,000 genes on survival in 21 cancer types. We used the breast cancer (n = 6234), ovarian cancer (n = 2190), lung cancer (n = 3452), and gastric cancer (n = 1440) datasets 23 to evaluate the prognostic value of the hub genes in the gastric adenocarcinoma network. p < 0.05 was considered to indicate a statistically significant association with prognosis (overall survival).
Results
Screening of differentially expressed genes
Screening of the three GEO datasets (GSE79973, GSE118916, and GSE29998; Table 1) identified a total of 922 up-regulated genes and 1442 down-regulated genes in gastric adenocarcinoma compared to normal gastric tissues (Table 2). Venn diagrams demonstrated an overlap of 171 common differentially expressed genes in the three datasets (Figure 1).
Gene expression profile information of data set.
The number of up-regulated and down-regulated genes in the gene expression profile dataset.

Differential gene Venn plots common to all three GEO datasets. Identification of DEGs in mRNA expression profiling datasets GSE29998, GSE79973, GSE118916.
GO and KEGG pathway enrichment analysis of differentially expressed genes
We analyzed 171 differentially expressed genes. The differentially expressed genes were significantly enriched in the GO cellular component terms extracellular exosome, extracellular space, extracellular region, endoplasmic reticulum lumen, cell surface, apical plasma membrane, and integral component of the plasma membrane; in the biological process terms digestion, extracellular matrix organization, oxidation-reduction process, cell adhesion, and proteolysis; and in the molecular function terms zinc ion binding and identical protein binding (Figure 2). The genes were most significantly enriched in the KEGG pathways protein digestion and absorption, metabolism of xenobiotics by cytochrome P450, and drug metabolism–cytochrome P450 (Figure 3).

GO enrichment analysis. The results of GO enrichment analysis are shown in the figure.

KEGG Enrichment Analysis. A bubble diagram showing the enrichment of KEGG.
PPI network and hub gene identification
The PPI network based on the common differentially expressed genes included 171 nodes and 246 edges (Figure 4). The top 10 significant genes (p < 1.0e-16) were identified with the radiality method as follows (in order): ATP4A, TIMP1, SPP1, CXCL8, THY1, CA9, FGA, ALDH1A1, GHRL, and COL1A1 (Figure 5). Table 3 shows the specific scores of the hub genes in the network.

PPI network of differential genes. The interaction between up-regulated and down-regulated gene protein networks is shown in the figure.

Ten hub genes based on radiality. The top 10 hub genes obtained based on the radiality algorithm using Cytoscape software.
Hub gene sequencing.
Oncomine analysis
A search of the Oncomine database for mRNA expression levels of the 10 hub genes in different cancer types revealed that ATP4A, CA9, FGA, ALDH1A1, and GHRL are consistently expressed at low levels in gastric cancer, and TIMP1, SPP1, CXCL8, THY1, and COL1A1 are expressed at higher levels in gastric cancer (Figure 6). We found that these 10 genes also had significant high or low expression in other cancer tissues.

Expression levels of 10 hub genes in different cancers: (a) ATP4A, (b) FGA, (c) GHRL, (d) ALDH1A1, (e) CA9, (f) TIMP1, (g) SPP1, (h) CXCL8, (i) THY1, and (j) COL1A1.
GEPIA
The GEPIA results showed that ATP4A, CA9, FGA, ALDH1A1, and GHRL are expressed at lower levels in gastric adenocarcinoma tissues compared with normal gastric tissues, whereas the expression levels of TIMP1, SPP1, CXCL8, THY1, and COL1A1 are elevated in gastric adenocarcinoma. There was a significant difference in the expression levels of all hub genes between tissue types (p < 0.05; Figure 7).

Expression of 10 hub genes in gastric cancer: (a) ATP4A, (b) FGA, (c) GHRL, (d) ALDH1A1, (e) CA9, (f) TIMP1, (g) SPP1, (h) CXCL8, (i) THY1, and (j) COL1A1.
Survival analysis
Kaplan–Meier plotting results showed the effect of different genes on the prognosis of gastric cancer. We found that CXCL8 and ALDH1A1 were positively correlated with overall survival of gastric cancer patients, whereas the other hub genes (except for CA9) were positively correlated with survival (Figure 8).

Survival analysis. Results: (a) ATP4A, (b) FGA, (c) GHRL, (d) ALDH1A1, (e) CA9, (f) TIMP1, (g) SPP1, (h) CXCL8, (i) THY1, and (j) COL1A1.
Discussion
Gastric cancer is a progressive disease characterized by accumulation and transformation involving multiple genes. 24 We identified 171 differentially expressed genes in gastric adenocarcinoma, which mainly play roles in digestion, extracellular body/extracellular space, and in protein digestion and absorption, exogenous substance metabolism-cytochrome P450, and drug metabolism-cytochrome P450 pathways. Cytochrome P450 can affect the metabolism of drugs and xenobiotics, and a variety of cytochrome P450 genes and proteins have been associated with the susceptibility of gastric cancer or the biological function of the stomach.25,26 Our results therefore offer new insight for exploring the development process and detailed mechanism of gastric adenocarcinoma.
Oncomine analysis and GEPIA showed that the 10 hub genes in the PPI network exhibit significant differences in expression levels between gastric cancer tissues and normal tissues, suggesting a crucial role in the development of gastric adenocarcinoma. Among the up-regulated genes, TIMP1 encodes a protein that is an inhibitor of matrix metalloproteinases (MMPs), which regulates cell differentiation, migration, and anti-apoptosis pathways. 27 TIMP1 could promote or inhibit the progression of colorectal cancer, 28 and showed some predictive value in the N3 stage of gastric cancer. 29 The SPP1-encoded protein is mainly involved in the attachment of osteoclasts to the mineralized bone matrix, affects cell matrix interactions, and is overexpressed in gastric cancer, influencing cancer progression. 30 CXCL8 is a major mediator of the inflammatory response and encodes a protein that acts as a chemokine. CXC chemokines and their receptors affect tumor development by regulating the transformation, invasion, and metastasis of gastric cancer, and can also regulate angiogenesis and the interaction between tumors and the microenvironment to affect tumorigenesis. 31 CXCL8 expression has been correlated with the prognosis of patients with cancer. 32 THY1 encodes cell surface glycoproteins and immunoglobulins, which affect cell adhesion and cell communication in the immune and nervous systems, and was also found to promote the growth of gastric cancer cells. 33 Collagen can provide growth attachment and a scaffold for cancer cells and induces the migration of cancer cells. COL1A1 belongs to the collagen family and encodes procollagen type I chains to participate in collagen formation.34,35 COL1A1 has also been reported to be overexpressed in various cancers such as liver cancer, cervical cancer, and gastric cancer.36–38
Among the down-regulated genes, ATP4A encodes a protein belonging to the P-type cation-transporting ATPase family, which is a gastric proton pump whose function is to maintain an acidic environment in the stomach, and DNA methylation of ATP4A is involved in the down-regulation of its expression in association with gastric cancer prognosis. 39 CA9 encodes a carbonic anhydrase involved in various biological processes such as acid-base balance and gastric acid formation. CA9 was proposed as a key differentiation factor in the stomach that can control cell proliferation and growth in the gastric mucosa, and is associated with infiltration of gastric cancer. 40 FGA encodes the α subunit of the coagulation factor fibrinogen and is involved in fibrinogen production. 41 Fibrinogen contributes to the adhesion of tumor cells to endothelial cells of target tissues and promotes the connection between tumor cells and host cells, thereby increasing tumor invasiveness. Fibrinogen may allow tumor cells and platelets to bind into aggregates, thus allowing tumor cells to evade attack by the immune system.42–44 Moreover, FGA was reported to be associated with liver cancer, pancreatic cancer, and gastric cancer.45–47 ALDH1A1 encodes a protein of the aldehyde dehydrogenase family, which is an essential substance for the growth and differentiation of normal stem cells in tissues.48,49 ALDH1A1-overexpressing cancer cells showed higher invasion and metastatic activity to affect the prognosis of cancer. GHRL encodes ghrel in protein that induces growth hormone release from the pituitary gland and has effects of stimulating appetite, inducing obesity, and stimulating gastric acid secretion. The loss of ghrelin-producing cells is thought to be associated with atrophic gastritis, and ghrelin in serum may be a surrogate for mucosal function in upper gastrointestinal cancer, 50 although its specific role in gastric cancer progression is unknown.
Analysis of the Oncomine database showed that these 10 genes also had significantly high or low expression in other cancer tissues, such as lung cancer, liver cancer, and breast cancer. This suggests a common role of these genes in influencing the pattern and process of the development of different cancers, and the mechanism needs to be further explored.
Nine of the ten hub genes except for CA9 were associated with gastric cancer prognosis. In general, high expression levels of most genes in tumor samples are associated with poor prognosis. However, we found that the high expression of ALDH1A1 and CXCL8 in gastric cancer tissues was a favorable prognostic factor for patients, which can provide new insights into treatment and intervention strategies. Shen et al.51,52 also found a positive correlation between high expression of ALDH1A1 and the prognosis of gastric cancer patients. Yan et al. 53 and Qi et al. 54 also suggested that high expression of CXCL8 was positively correlated with the prognosis of gastric cancer patients. In addition, high expression of ALDH1A1 was identified as a positive prognostic factor for patients with hepatocellular carcinoma and primary glioblastoma.55,56 The regulatory mechanisms of various biological behaviors of tumor cells, and the interactions of various biomolecules may affect the ultimate clinical outcome of patients, especially the regulation of genes in the tumor microenvironment and the impact on the immune system. Such mechanisms might help to reveal how ALDH1A1 and CXCL8 improve the prognosis of gastric cancer patients, which requires further investigation. Extracellular matrix degradation is a key process in tumor invasion and metastasis, and MMPs are important enzymes that degrade the extracellular matrix and play a crucial role in mediating tumor angiogenesis, metastasis, and invasion. Although TIMP1 is an inhibitor of MMP, its expression was found to be negatively correlated with the prognosis of gastric cancer, and this gene was also reported to promote the occurrence of colorectal cancer; 57 however, its precise role in gastric adenocarcinoma is still unclear. Overall, our analysis and previous studies suggest that these 10 hub genes may be involved in and affect the progression of gastric adenocarcinoma.
The main limitation of the study is that the results were obtained from bioinformatic analysis of public databases with no corresponding experiments for confirmation. Moreover, we focused on only 10 hub genes that met our screening conditions, but other genes might also have significance in the occurrence and development of gastric adenocarcinoma. The detailed prognostic and therapeutic roles of these genes require further investigation in large clinical samples to confirm our hypothesis.
Conclusions
In summary, using comprehensive bioinformatic analysis, ATP4A, TIMP1, SPP1, CXCL8, THY1, CA9, FGA, ALDH1A1, ALDH1A1, GHRL, and COL1A1 were found to be significantly differentially expressed in gastric cancer tissues and may be involved in important processes related to the development of gastric adenocarcinoma. Except for the CA9 gene, the remaining nine genes have some value in predicting the prognosis of gastric adenocarcinoma patients. The CA9 gene may also play an important role in the progression of gastric adenocarcinoma, but has not been as widely reported. Although CXCL8 and ALDH1A1 can promote tumor invasiveness, they showed a positive association with prognosis in gastric adenocarcinoma patients. Similarly, TIMP1 has been shown to inhibit tumor invasiveness, but was associated with a poor prognosis in gastric adenocarcinoma patients. Based on our analysis, these 10 genes are likely to be important components in the development of gastric adenocarcinoma, which can offer new research directions to improve the early diagnosis of gastric adenocarcinoma, and in the development of targeted therapy and drugs.
Footnotes
Acknowledgements
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported by the National Natural Science Foundation of China (No. 81770631)
