Abstract
In this study, we aimed to screen out genes associated with a high risk of postoperative recurrence of lung adenocarcinoma and investigate the possible mechanisms of the involvement of these genes in the recurrence of lung adenocarcinoma. We identify Hub genes and verify the expression levels and prognostic roles of these genes. Datasets of GSE40791, GSE31210, and GSE30219 were obtained from the Gene Expression Omnibus database. Enrichment analysis of gene ontology and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways were performed for the screened candidate genes using the DAVID database. Then, we performed protein–protein interaction (PPI) network analysis through the database STRING. Hub genes were screened out using Cytoscape software, and their expression levels were determined by the GEPIA database. Finally, we assessed the relationships of Hub genes expression levels and the time of survival. Forty-five candidate genes related to a high-risk of lung adenocarcinoma recurrence were screened out. Gene ontology analysis showed that these genes were enriched in the mitotic spindle assembly checkpoint, mitotic sister chromosome segregation, G2/M-phase transition of the mitotic cell cycle, and ATP binding, etc. KEGG analysis showed that these genes were involved predominantly in the cell cycle, p53 signaling pathway, and oocyte meiosis. We screened out the top ten Hub genes related to high expression of lung adenocarcinoma from the PPI network. The high expression levels of eight genes (TOP2A, HMMR, MELK, MAD2L1, BUB1B, BUB1, RRM2, and CCNA2) were related to short recurrence-free survival and they can be used as biomarkers for high risk of lung adenocarcinoma recurrence. This study screened out eight genes associated with a high risk of lung adenocarcinoma recurrence, which might provide novel insights into researching the recurrence mechanisms of lung adenocarcinoma as well as into the selection of targets in the treatment of the disease.
Keywords
Introduction
In 2018, the China Cancer Center reported that the incidence and mortality of lung cancer ranked first among all cancers in China. 1 Lung adenocarcinoma is the most common pathological type of non–small-cell lung cancer (NSCLC), with a high incidence and mortality. Even early-stage patients with lung adenocarcinoma may suffer from postoperative recurrence and even have distant metastasis. Currently, surgery is the standard treatment method for patients with postoperative recurrence without distant metastasis. However, the treatment options are limited, and the prognosis of advanced-stage patients with postoperative recurrence accompanied by tumor metastasis is poor. Targeted treatment emerged in recent years but is effective only for patients carrying sensitive mutated genes. Therefore, it is extremely important to comprehensively understand the occurrence and development mechanisms of lung adenocarcinoma, as well as finding the possible molecular pathways and key genes involved in the recurrence of lung adenocarcinoma to screen out patients with high-recurrence risk of tumors and develop anti-tumor drugs. Gene chip, as a reliable high-throughput sequencing technology, can rapidly detect expression levels of multiple genes in tissues and generate data to be stored in public databases, providing valuable clues for studying cancer genes. 2 In the present study, genes related to a high risk of postoperative recurrence of lung adenocarcinoma were screened out by analyzing data of multiple lung adenocarcinoma gene chips in the Gene Expression Omnibus (GEO) database. Here, we investigated the possible mechanisms of the recurrence of lung adenocarcinoma by bioinformatics methods, aiming to find new treatment targets for patients with a high-recurrence risk.
Methods
Source of gene-chip data
The present study downloaded three gene-chip datasets (GSE40791, GSE31210 and GSE30219)3–5 from the GEO database (http://www.ncbi.nlm.nih.gov/geo). The obtained GSE40791 dataset included 100 cases of normal lung tissue and 94 cases of lung adenocarcinoma tissue. The GSE31210 dataset included 20 cases of normal lung tissue and 226 cases of lung adenocarcinoma tissue. GSE30219 dataset had 85 cases of lung adenocarcinoma tissue, and a total of 22 patients with lung adenocarcinoma who were confirmed to have recurrence within 5 years after the initial diagnosis were selected from this dataset to be regarded as the high-recurrence risk group; 44 cases with lung adenocarcinoma who were confirmed to have no recurrence within 5 years after the initial diagnosis and whose survival time was greater than 5 years were regarded as the low-recurrence risk group. Microarray data of the aforementioned three gene chips were obtained from the GPL570 platform (Affymetrix Human Genome U133 Plus 2.0 Array).
Data processing
Datasets of the three gene chips were analyzed using the online tool GEO2R (http://www.ncbi.nlm.nih.gov/geo/geo2r). 6 For the GSE40791 and GSE31210 datasets, differentially expressed genes (DEGs) of lung adenocarcinoma tissue and normal lung tissue were screened out based on the standards that |logFC| was >1.5, and the corrected p-value was <0.05. For the GSE30219 dataset, DEGs with a high-recurrence risk of lung adenocarcinoma were screened out based on the standards that |logFC| was >1 and the corrected p-value was <0.05. DEGs volcano plots of various gene chips were plotted using the ggplot2 package for R software (http://cran.r-project.org/web/packages/ggplot2; version 3.6.0). The Venn diagram of GSE40791, GSE31210, and GSE30219 datasets were plotted using the online tool Venn Diagram (http://bioinformatics.psb.ugent.be/webtools/Venn/), and the overlapping genes of the three were used as the candidate genes related to a high risk of postoperative recurrence of lung adenocarcinoma.
Enrichment analysis of GO and KEGG pathways
Gene ontology (GO, http://geneontology.org) analysis was an online tool that can identify biological characteristics of high-throughput transcriptome or genomic data, 7 which included biological process, cellular component and molecular function. Kyoto Encyclopedia of Genes and Genomes (KEGG, http://www.kegg.jp) is an online database collection used for genome processing, as well as that of diseases, biological pathways, drugs, and chemical materials. 8 To understand the biological functions of genes related to a high risk of postoperative recurrence of lung adenocarcinoma, we conducted enrichment analysis of GO and KEGG pathways through online bioinformatics tools, including the Database for Annotation, Visualization and Integrated Discovery (DAVID, http://david.ncifcrf.gov). 9 p < 0.05 was assumed to show statistical significance of the differences established.
Analysis of the interactions between gene-encoded proteins and screening of Hub genes
We did protein-protein interaction (PPI) network analysis of candidate genes by the online database Search Tool for the Retrieval of Interacting Genes (STRING, http://string-db.org), 10 and PPI network diagram was constructed by using Cytoscape 3.7.1 software package (www.cytoscape.org). 11 After that, we used MCC algorithm under the cytoHubba plugin to calculate the candidate genes’ scores. All genes in PPI network were ranked according to the scores, and the top ten genes with higher scores were screened as Hub genes.
Validation analysis of the relationship between the expression levels of Hub genes and the survival time
On the website of Gene Expression Profiling Interactive Analysis (GEPIA, http://gepia.cancer-pku.cn), 12 based on the RNA sequences of 9736 tumor specimens and 8687 normal controls from both the Genotype-Tissue Expression (GTEx, https://commonfund.nih.gov/gtex) and The Cancer Genome Atlas (TCGA, http://cancergenome.nih.gov), the expressions of screened ten Hub genes in both lung adenocarcinoma tissue and normal lung tissue were verified. Meanwhile, relationship between the expression levels and overall survival (OS) of the ten genes was analyzed. Relationship between the expression levels of Hub genes and recurrence-free survival (RFS) was also analyzed using the online tool Kaplan–Meier plotter (http://kmplot.com/analysis).
Results
Screening of candidate genes associated with a high risk of postoperative recurrence of lung adenocarcinoma
A total number of 2937 DEGs were detected in GSE40791 gene chip using the online tool GEO2R, including 1005 upregulated genes (logFC >0) and 1932 downregulated genes (logFC <0). A total of 1566 DEGs were detected in GSE31210 gene chip, including 557 upregulated and 1009 downregulated genes. In the GSE30219 gene chip, we detected a total number of 129 DEGs with a high risk of postoperative recurrence of lung adenocarcinoma, including 114 upregulated and 15 downregulated genes.
DEGs volcano plots of the above three gene chips are presented in Figure 1(a–c). Overlapping analysis was performed for the DEGs of lung adenocarcinoma and the DEGs with a high-recurrence risk of lung adenocarcinoma. As can be seen in Figure 1(d), there were 45 overlapping DEGs among GSE40791, GSE31210, and GSE30219 datasets. These overlapping genes were finally determined to be candidate genes associated with a high-recurrence risk of lung adenocarcinoma.

Volcano plots and Venn diagram of DEGs: (a–c) represents DEGs volcano plots of GSE40791 dataset, the GSE31210 dataset, and the GSE30219 dataset (red dots represent upregulated genes, blue dots represent downregulated genes) and (d) Venn diagram of DEGs from GSE40791 dataset, GSE31210 dataset and GSE30219 dataset (45 candidate genes associated with a high-recurrence risk after lung adenocarcinoma surgery).
Enrichment analysis of GO and KEGG pathways
We conducted GO and KEGG pathway enrichment analysis for the 45 candidate genes using the online tool DAVID. Regarding the biological process, genes associated with a high-recurrence risk of lung adenocarcinoma were mainly involved in mitotic spindle assembly checkpoint, microtubule-based movement, mitotic sister chromatid segregation, mitotic cytokines and the mitotic cell cycle G2/M-phase transition. Analysis of cellular component revealed that these genes mainly enriched in the kinetochore, the midbody, the kinesin complex, the spindle midzone, and the spindle microtubule. In addition, the molecular functions of these genes involved ATP binding, the motor activity of microtubules, the activities of ATPase and protein serine/threonine/ tyrosine kinase (Figure 2(a)). Our KEGG pathway analysis revealed predominant correlation of these genes with cell cycle, the p53 signaling pathway, as well as with oocyte meiosis (Figure 2(b) and Table 1).

Enrichment analysis of GO and KEGG pathways for genes associated with a high-recurrence risk of lung adenocarcinoma.
KEGG pathway analysis for genes associated with a high-recurrence risk of lung adenocarcinoma.
KEGG: Kyoto encyclopedia of genes and genomes.
PPI network analysis and screening of Hub genes
The 45 candidate genes were imported into STRING online database for PPI network analysis. Then, a PPI network diagram consisting of 41 nodes and 792 lines was obtained by using Cytoscape software (Figure 3). The top 10 Hub genes were screened out using MCC algorithm under the cytoHubba plugin, which were TOP2A, HMMR, CDC20, MELK, MAD2L1, BUB1B, BUB1, RRM2, CCNA2, and ZWINT in sequence.

Protein–protein interaction network diagram of the encoding proteins for genes associated with a high risk of lung adenocarcinoma recurrence.
Verification of the expression of Hub genes in lung adenocarcinoma and normal lung tissues
The expression levels of the 10 genes in lung adenocarcinoma and normal lung tissues were confirmed using the GEPIA database. We found that all 10 genes were highly expressed in the lung adenocarcinoma tissue with statistical significance (p < 0.05, Figure 4).

Expression levels of the Hub genes in the tumor and normal tissues.
Verification of the effects of Hub genes on the survival time
The relationship between the expression levels of the 10 genes and the OS was analyzed using the GEPIA database. The results suggested that OS in the group with the high expression of the 10 genes was significantly shorter than that in the group with low expression (p < 0.05, Figure 5).

Kaplan Meier curve for the relationship between the expressions of Hub genes and overall survival.
Subsequently, whether RFS of the 10 Hub genes were statistically different between the high expression group and the low expression group was analyzed using Kaplan–Meier plotter. The findings suggested that RFS of the eight Hub genes (TOP2A, HMMR, MELK, MAD2L1, BUB1B, BUB1, RRM2, and CCNA2) in the high expression group was significantly shorter compared to the low-expression group (p < 0.05, Figure 6). Ultimately, the eight genes were determined to be correlated with a high risk of postoperative recurrence of lung adenocarcinoma.

Kaplan–Meier curve for the relationship between the expression of Hub genes and recurrence-free survival.
Discussion
In this study, 45 candidate genes were screened out using bioinformatics methods, which were not only with differential expression in normal lung tissues and such of lung adenocarcinoma, but also had high risk of recurrence. Mokhlesi and Talkhabi 13 conducted GO analysis of high expression genes in lung adenocarcinoma. The result showed that these genes were partly involved in cell cycle related biological functions, such as microtubule-based movement, ATP-dependent microtubule motor activity. GO analysis of candidate genes of our research showed that the biological processes were mainly involved in the mitotic spindle assembly checkpoint, microtubule motor, mitotic sister chromatid segregation, mitotic cytokines, and the mitotic cell cycle G2/M-phase transition. Moreover, the analysis of the cellular components revealed that most of the genes were involved in the kinetochore, the midbody, the kinesin complex, and the spindle. Changes of molecular functions were observed mainly in the motor activity of microtubules, the ATP binding, the activities of ATPase and protein serine/threonine/tyrosine kinase. Combined with Mokhlesi’s study and our research, cell cycle related biological functions play an important role in the pathogenesis of lung adenocarcinoma and the mechanism of postoperative recurrence, which is worthy of in-depth study. Herein, the KEGG pathway was enriched mainly in the cell cycle, p53 signaling pathway and oocyte meiosis. Comprehensive studies have been conducted on the cell cycle and the mechanisms of lung cancer. For example, Fan et al. 14 found that the cell cycle and DNA-damage response pathway were associated with NSCLC accompanied by meningeal metastasis. Additionally, Qiu et al. 15 revealed that glycyrrhizin A blocked the G2/M-phase transition of the cell cycle, thus inhibiting the proliferation of lung cancer cells. Cell cycle is also closely correlated with tumor recurrence. El-Gendi and Abu-Sheasha 16 discovered that the cell cycle regulatory factors p63 and cyclin D1 were significantly correlated with recurrence of bladder urothelial carcinoma. Furthermore, Wang et al. 17 established that RFS was shorter in patients with high expression of centromere protein U in non–small-cell lung cancer. Recent study 13 has found that high expression genes in lung adenocarcinoma were involved in the p53 signaling pathway, and other scholars found that p53 signaling pathway was related to the apoptosis of lung cancer cells.18,19 The p53 genetic mutation or inhibition of the downstream target genes was speculated to lead to proliferation of tumor cells, accelerating the recurrence of lung cancer.
Herein, we screened out 10 Hub genes associated with a high-recurrence risk of lung adenocarcinoma. It was verified by GEPIA database that the 10 genes had significantly high expression in lung adenocarcinoma tissue, and the genetic expression was inversely proportional to the overall survival. Kaplan Meier-plotter confirmed that the higher expressions of eight genes was associated with shorter RFS of the patients, and ultimately the eight genes were determined to be with a high risk of postoperative recurrence of lung adenocarcinoma. Among the eight genes, BUB1B, BUB1, MAD2L1, and CCNA2 were important genes related to cell cycle, involved mainly in cell cycle regulation, cell proliferation and invasion, and metastasis. Both BUB1B and BUB1 were functional genes in the mitosis. Previous studies established correlations of the high expression levels of the two genes with a poor prognosis of most malignant tumors, including lung cancer.20–23 Wan et al. 24 reported that knockout of BUB1B can inhibit genome replication in cell cycle, thereby inhibiting growth of tumors in vivo. Nyati et al. 25 found that the kinase activity of BUB1 promoted the TGF-β signaling transduction, and the TGF-β signaling pathway was confirmed to exert important roles in tumor cell invasion and metastasis, as well as in disease recurrence and chemotherapeutic drug resistance.26,27 Therefore, BUB1 has been speculated to be a gene involved in the high risk malignant tumor recurrence. CCNA2 is critically involved in DNA synthesis and the cell cycle G2/M-phase transition. Gan et al. 28 revealed that CCNA2 knockout inhibited colorectal cancer cell proliferation by cell cycle suppression and cell apoptosis induction. Additionally the CCNA2 and Bcl-2 genes were found to interact synergistically, 29 promoting the recurrence of lung cancer and reducing the survival rate via cell proliferation and angiogenesis, which is in agreement with our results. MAD2L1 is an important component of the mitotic spindle assembly checkpoint, and the dysregulation of the activity of this gene was correlated with instability and aneuploidy of mitotic chromosome, increasing the risk of tumorigenesis. 30 Moreover, Wang et al. 31 established that downregulating the expression of MAD2L1 by siRNA reduced the proliferation of breast cancer cells and inhibited their metastasis and invasion. Nonetheless, the association between MAD2L1 and the pathogenesis of lung cancer has been insufficiently investigated, and it can thus be a subject that will attract focused research efforts in the future.
The other four genes associated with a high-recurrence risk of lung adenocarcinoma were TOP2A, HMMR, MELK, and RRM2. Kodiakov et al. 32 revealed that the expression of TOP2A was positively correlated with Ki-67 proliferation index. In this study, the patients with high expression of TOP2A had a large tumor diameter, low degree differentiation, and poor prognosis. HMMR is a proto-oncogene that can bind to microfilaments and microtubule actin in the cytoskeleton, leading to enhanced cell migration ability. Stevens et al. 33 have verified experimentally that HMMR enhanced the extracellular matrix-mediated signaling transduction, promoting the invasion and metastasis of lung adenocarcinoma cells. There are in-depth studies on the association of MELK with the pathogenesis of lung adenocarcinoma. MELK interacted mainly with proteins such as p53, FOXM1, and Bcl-2, involved in cell proliferation and apoptosis, and cell cycle regulation.34–36 Anti-cancer drugs targeting MELK are currently under development, but the involvement of RRM2 in the mechanism of the occurrence and development of lung cancer is still unclear. Earlier studies found that the high expression of RRM2 is associated with chemotherapeutic drug resistance and tumor cell invasion,37,38 but the specific mechanisms should be further investigated.
In summary, eight genes related to the high risk of postoperative recurrence of lung adenocarcinoma were identified in our study by bioinformatics methods. Furthermore, we performed in-depth investigation of the biological functions of these genes and their involvement in the occurrence and development of lung adenocarcinoma. Our findings provide novel insights into cancer pathogenesis in the population with a high risk of postoperative recurrence of lung adenocarcinoma. However, more experiments and analysis are needed to confirm our conclusions. Further research on the upstream genes or transcription factors related to the high risk of lung adenocarcinoma recurrence, such as non-coding RNA, will help us to find relevant signaling pathways and develop new treatment methods.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported by grants from the Startup Fund for Scientific Research of Fujian Medical University (2017XQ1132, 2018QH1116) and Pilot Project of Fujian Science and Technology Program (2017Y0016).
