Recurrence significantly influences the survival in patients with lung adenocarcinoma (LUAD). However, there are less gene signatures that predict recurrence risk of LUAD.
OBJECTIVE:
We performed this study to construct a model to predict risk of recurrence in LUAD.
METHODS:
RNA-seq data from 426 patients with LUAD were downloaded from The Cancer Genome Atlas (TCGA) and were randomly assigned into the training ( 213) and validation set ( 213). Differentially expressed genes (DEGs) between recurrent and non-recurrent tumors in the training set were identified. Recurrence-associated DEGs were selected using multivariate Cox regression analysis. The recurrence risk model that identifies patients at low and high risk for recurrence was constructed, followed by the validation of its performance in the validation set and a microarray dataset.
RESULTS:
In total, 378 DEGs, including 20 recurrence-associated DEGs, were identified between the recurrent and non-recurrent tumors in the training set. The signatures of 8 genes (including AZGP1, INPP5J, MYBPH, SPIB, GUCA2A, HTR1B, SLC15A1 and TNFSF11) were used to construct the prognostic model to assess the risk of recurrence. This model indicated that patients with high risk scores had shorter recurrence-free survival time compared with patients with low risk scores. ROC curve analysis of this model showed it had high predictive accuracy (AUC 0.8) to predict LUAD recurrence in the TCGA cohort (the training and validation sets) and GSE50081 dataset. This prognostic model showed high predictive power and performance in predicting recurrence in LUAD.
CONCLUSION:
We concluded that this model might be of great value for evaluating the risk of recurrence of LUAD in clinics.
Lung cancer, the most commonly diagnosed cancer worldwide, accounts for 11.6% of the total cancer cases and 18.4% of the total cancer-related deaths in 2018 [1]. Prognosis of lung cancer remains poor and the 5-year survival rate is less than 20% [2]. The 5-year postoperative recurrence in non-small cell lung cancer (NSCLC) ranges from 19.3% to 50% [3, 4, 5]. Adenocarcinoma is the predominant histologic type of lung cancer and is a leading cause of cancer death worldwide [6]. The 5-year postoperative recurrence in LUAD is 17% 40% [7, 8, 9]. The poor prognosis and high death rate of this disease, on a great extent, depend on the postoperative recurrence and aggressive metastasis.
NSCLC recurrence has been identified to be independently influenced by the clinical characteristics like age, histologic type, invasion and tumor size [3]. However, histopathology and clinical indicators are far from sufficient to predict clinical outcome and risk of recurrence in lung cancer.
Molecular markers are incorporated into recurrence identification, prediction and decision-making in LUAD in clinics [10, 11, 12]. Global changes in gene expression represent the common cause of tumorigenesis as well as predict the postoperative prognosis and recurrence in lung adenocarcinoma (LUAD) [6, 13, 14]. The overexpression of metastasis associated with the colon cancer 1 gene (MACC1) was reported to be associated with postoperative recurrence in LUAD [15]. Larsen et al. identified a 54-gene signature with potential to predict the risk of recurrence in LUAD [13]. The cell cycle progression score (CCP score) that consists of 31 cell cycle genes has been used for predicting disease response, mortality, survival and recurrence in LUAD [16, 17]. In addition, the combination of CCP score with molecular prognostic score (mPS) had higher performance in predicating distant and local recurrence in LUAD compared with CCP score alone [17]. However, this strategy includes too many genes to be widely implemented in preclinical or clinical practices. Signature including genes’ function in single molecular process might be insufficient to identify cancer development and metastasis. A signature with moderate number of genes that relate to multiple functions might be of great interest for predicting recurrence in human cancers.
We performed this study to identify a gene signature with prognostic power for predicting the risk of recurrence in LUAD following resection. Differentially expressed genes (DEGs) that associate with the recurrence in LUAD would be identified in a LUAD cohort in the The Cancer Genome Atlas (TCGA). An 8-gene model with high predictive power for the risk of recurrence in LUAD was identified and validated. The potential of using the prognostic model for the prediction of postoperative recurrence in LUAD would be discussed and defined.
Materials and methods
Study population and assignment
The available RNA-seq data and clinical information of 426 evaluable patients with LUAD were downloaded from TCGA (https://gdc-portal.nci.nih.gov/) in March 2019. Patients ( 426) were randomly divided into two sets: the training set ( 213) and the validation set ( 213).
DEGs identification
The DEGs between the samples with ( 76) and without ( 137) recurrence in the training set were identified using the Limma package (version 3.34.7; https://bioconductor.org/packages/release/bioc/html/li mma.html) [18]. DEGs were considered statistically significant if its false discovery rate (FDR) 0.05 and fold-change (FC) 1.2 (logFC 0.263). The expression profiles of all DEGs were presented using a heatmap analysis (pheatmap package, version 1.0.8; https://cran.r-project.org/web/packages/pheatmap/index. html) [19].
Identification of recurrence-associated DEGs
The recurrence-associated DEGs were identified using the univariate and multivariate Cox regression analyses (survival package, version 2.41-1; http://bioconduc tor.org/packages/survivalr/) [20]. Kaplan-Meier log-rank test -value 0.05 was considered as the threshold.
Selection of the optimal combination of DEGs
The DEGs that might be used for the construction of model were selected from recurrence-associated DEGs using the LASSO Cox regression model in penalized package (version 0.9.50; https://cran.r-project.org/web/ packages/penalized/index.html) [21]. The ‘lambda’ parameter (Cox-PH model) of the optimal combination of genes was obtained using the cross-validation likelihood algorithm (1000 times). Regression coefficients () of all recurrence-associated DEGs with recurrence were obtained.
Establishment of the recurrence risk model
The expression levels of the recurrence-associated DEGs in the aforementioned optimal combination were used for the analysis of optimal cut-off values using the X-Tile Bio-Informatics tool (version 2.41.3; https://medicine.yale.edu/lab/rimm/research/software. aspx) [22]. Monte Carlo methods value 0.05 was used as the threshold. The expression status of each recurrence-associated DEG was defined as 1 when the expression level was higher than the cut-off value, or otherwise 0. The recurrence risk score of each sample was defined as the combination of the weighted expression status by the LASSO Cox value, and calculated as: risk score Status (0/1). Samples with risk scores higher than the median value were stratified into the high-risk group, or otherwise the low-risk group.
Baseline characteristics of patients with LUAD
Clinical characteristics
Training set ( 213)
Validation set ( 213)
Entire set ( 426)
Age (mean SD)
65.12 10.07
65.30 10.24
65.21 10.15
Gender (Male/Female)
105/108
89/124
194/232
Pathologic M (M0/M1/NA)
139/13/61
135/4/74
274/17/135
Pathologic N (N0/N1/N2N3/NA)
144/33/31/0/5
137/45/24/2/5
281/78/55/2/10
Pathologic T (T1/T2/T3/T4/NA)
68/113/23/7/2
82/113/12/5/1
150/226/35/12/3
Pathologic stage (I/II/III/IV/-)
109/52/37/13/2
125/51/27/5/5
234/103/64/18/7
Radiation therapy (Yes/No/NA)
28/175/10
28/178/7
56/353/17
Targeted molecular therapy (Yes/No/NA)
69/133/11
69/135/9
138/268/20
Smoking history (Current/Reform/Never/NA)
23/59/9/122
17/48/13/135
40/107/22/257
Recurrence (Yes/No)
76/137
75/138
151/275
Recurrence free survival time (months SD)
26.21 26.64
26.35 27.47
26.78 27.03
Death (Dead/Alive)
69/144
70/143
139/287
Overall survival time (months SD)
29.92 28.08
30.16 28.42
30.04 28.22
Follow up (months, median and range)
18.93 (0.34 224.40)
18.93 (0.13 235.40)
18.93 (0.13 235.40)
SD, standard deviation. NA, not applicable.
Evaluation and assessment of the prognostic model
Kaplan-Meier log-rank test was used to evaluate the survival difference between the high- and low-risk groups. value 0.05 was defined as significantly different. Additionally, the areas under the ROC curve (AUC) in the training and validation sets were calculated for evaluating the performance of this model. The sensitivity and specificity were calculated. The third microarray dataset GSE50081 that consists of samples from 124 LUAD patients was downloaded from the NCBI’s Gene Expression Omnibus (GEO) and used to validate the predictive performance of this model.
Identification of characteristics that associate with LUAD recurrence
The clinical characteristics that associate with recurrence in LUAD were identified using the univariate and multivariate Cox regression analyses (survival package). Kaplan-Meier log-rank test -value 0.05 was considered as the threshold. The 95% confidence interval (CI) value of each clinical characteristic was evaluated. Stratification analyses were performed for the training and validation sets.
Identification and enrichment of the DEGs between patients with high and low risk of recurrence in LUAD
To investigate the differences between patients with high and low risk recurrence scores, the DEGs between the two groups were identified using Limma package (version 3.34.7) with the criteria of FDR 0.05 and logFC 0.263. The Gene Ontology biological processes and pathways associated with the DEGs were identified using clusterProfiler (Version 3.6.0; http:// bioconductor.org/packages/release/bioc/html/clusterPro filer.html) [23]. 0.05 was considered statistically significant.
Statistical analysis
Data statistical analysis was performed using the SPSS 22.0 software. The differences in continuous variables (patient age and survival time) were analyzed using -test, and those in categorical variables were analyzes using test, respectively. 0.05 was regarded as statistically significant.
Results
Identification of LUAD recurrence-associated DEGs
The baseline characteristics of the 426 patients with LUAD are shown in Table 1. The mean age was 65.21 10.15 years, and the male-to-female ratio was 1: 1.19. This study set consists of 234, 103, 64, 18, and 7 cases with stage I–IV and unreported stage tumors. In the training set ( 213), a total of 378 DEGs (including 279 down-regulated and 99 up-regulated genes) were identified in patients with recurrent LUAD ( 76) compared with patients with non-recurrent LUAD ( 137; Fig. 1A). The distinct expression profiles of these DEGs were presented by a heatmap (Fig. 1B).
Identification and statistics of the LUAD recurrence-associated differentially expressed genes (DEGs) in the training set ( 213). (A) the volcano plot of the DEGs between patient with and without recurrent LUAD in the training set ( 213). Green notes the significantly up-regulated and down-regulated genes, respectively. DEGs were defined with a false discovery rate 0.05 and log (fold-change) 0.263. (B) the heatmap depicting the two-way hierarchical clustering of the DEGs in the training set. The color scale indicates the up-regulation (red) or down-regulation (green) of expression. Rec, recurrent LUAD samples ( 76); Non-rec, LUAD samples without recurrence.
The prognosis and recurrence characteristics of the 213 patients in the training set were used for the identification of recurrence-associated DEGs. Univariate Cox regression analysis identified that 155 DEGs were associated with the recurrence in LUAD, and the subsequent multivariate Cox regression analysis screened out 20 DEGs.
Construction and assessment of the recurrence risk model
The optimal combination of DEGs ( 8) was identified in the training set using the LASSO Cox regression model (Table 2). We observed that the expression of 4 DEGs, including alpha-2-glycoprotein 1, zinc-binding protein (AZGP1), inositol polyphosphate 5-phosphatase PIPP (INPP5J), myosin binding protein H (MYBPH), and SPIB encoding gene, were negatively correlated with the recurrence in LUAD, whereas the other 4 genes including guanylin (GUCA2A), serotonin receptor 1B (HTR1B), peptide transporter 1 (SLC15A1) encoding gene and cytokine receptor activator of NF-B ligand (RANKL) encoding gene TNFSF11 were positively correlated with the recurrence.
The 8 LUAD prognosis-associated differentially expressed genes in the optimal group by LASSO Cox regression model
The differences in the baseline characteristics of patients with high and low risk scores of LUAD in the dataset GSE50081
Clinical characteristics
High ( 62)
Low ( 62)
value
Age (mean SD)
69.91 8.93
67.36 10.48
0.147
Gender (Male/Female)
35/27
29/33
0.28
Pathologic M (M0/M1/NA)
0/62/0
0/62/0
NA
Pathologic N (N0/N1/N2N3/NA)
43/19/0/0/0
49/13/0/0/0
0.305
Pathologic T (T1/T2/T3/T4/NA)
17/44/1/0/0
24/37/1/0/0
0.252
Pathologic stage (I/II/III/IV/-)
41/21/0/0/0
48/14/0/0/0
0.231
Smoking history (Current/Reform/Never/NA)
18/28/9/7
17/27/14/4
0.841
Recurrence (Yes/No)
22/40
13/49
0.110
Death (Dead/Alive)
33/29
16/46
0.003
Overall survival time (months SD)
39.12 25.88
47.69 28.00
0.079
and notes the difference is analyzed using -test and test, respectively. NA, not available.
The construction, validation and accuracy of the recurrence risk model in the training set (A), validation set (B), and entire cohort (C). The distribution of sample risk score (left), Kaplan-Meier survival analysis (middle) and the predictive accuracy (ROC curve; right) in the three sets of patients with LUAD. HR, hazard ratio. In the training set, the area under the ROC curve (AUC) for 3-year and 5-year survival was 0.97 and 0.94, with the sensitivity of 83.96% and 77.20% and the specificity of 72.86% and 88.79%, respectively. In the validation set, 3-year and 5-year AUC was 0.93 and 0.86, with the sensitivity of 82.72% and 92.02% and the specificity of 67.85% and 64.77%, respectively. In the entire set, 3-year and 5-year AUC was 0.96 and 0.92, with the sensitivity of 84.41% and 88.90% and the specificity of 70.67% and 79.23%, respectively.
The validation for the prognostic power of the recurrence risk model. The GSE50081 dataset consists of 124 patients with LUAD was used for the validation. The area under the ROC curve (AUC) for 3-year and 5-year survival was 0.75 and 0.62, with the sensitivity of 69.35% and 79.03% and the specificity of 80.65% and 48.39%, respectively.
Survival analysis for the pathologic stage of LUAD patients in the training set (A), validation set (B) and the entire cohort (C). Higher pathologic stage developed to shorter recurrence-free survival time.
The recurrence risk model for predicting the recurrence in LUAD was constructed using the linear combination of 8 DEGs, as risk score (0.07960) Status (0.23581) Status (0.02453) Status (0.17848) Status (0.11699) Status (0.01756) Status (0.13153) Status (0.02540) Status. The risk score of each sample was calculated, and 106 and 107 samples in the training set were divided into the low- and high-risk group according to the median score, respectively (Fig. 2A). Survival analysis showed that patients in the high-risk group had shorter recurrence-free survival (RFS) time compared with patients in the low-risk group ( 8.422 10, HR 3.30, 95% CI 1.99–5.45). The accuracy evaluation using ROC curve analysis showed the 3-year and 5-year AUC of the 8-gene recurrence risk model was 0.97 and 0.94 with the the sensitivity of 83.96% and 77.20% and the specificity of 72.86% and 88.79%, respectively. In the validation set ( 213; Fig. 2B) and entire cohort ( 426, Fig. 2C), patients had higher risk scores showed shorter RFS time compared with patients had lower risk scores (validation set: 0.038, HR 1.63, 95% CI 1.02–2.60; and entire cohort: 6.932 10, HR 2.31, 95% CI 1.64 3.24; Fig. 2), respectively. The ROC curve analyses showed the high accuracy of the recurrence risk model in predicting the recurrence in LUAD in the validation set and entire cohort (3-year AUC 0.93 and 0.96, 5-year AUC 0.86 and 0.92, respectively). Besides, the performance of this model in predicting recurrence in LUAD was validated in a third dataset GSD50081 (Fig. 3). The basic characteristics of subjects in GSD50081 are presented in Table 3. There was significant difference in death rates between patient had high and low risk scores ( 0.003, Table 3). No difference was found in the recurrence rate and overall survival time. Patients had higher scores showed obvious shorter RFS time compared with patients had low risk scores ( 0.044, HR 1.97, 95% CI 1.01–3.83), with a 3-year and 5-year AUC of 0.75 and 0.62, respectively (sensitivity: 69.35% and 79.03%; specificity: 80.65% and 48.39%, respectively). These results suggested the high predictive accuracy of the recurrence risk model for predicting the risk of recurrence in LUAD.
The predictive values of the clinical characteristics for LUAD recurrence in the training set, validation set and entire cohort
Clinical characteristics
Univariable cox
Multivariable cox
HR (95% CI)
value
HR (95% CI)
value
Training set ( 213)
Age (mean sd)
1.01 (0.99–1.03)
0.501
–
–
Gender (Male/Female)
1.20 (0.77–1.89)
0.425
–
–
Pathologic M (M0/M1)
1.46 (0.57–3.69)
0.426
–
–
Pathologic N (N0/N1/N2N3)
1.32 (0.99–1.76)
0.068
–
–
Pathologic T (T1/T2/T3/T4)
1.57 (1.17–2.11)
2.571 10
1.17 (0.84–1.63)
0.343
Pathologic stage (I/II/III/IV)
1.43 (1.15–1.79)
2.314 10
1.27 (1.10–1.67)
0.020
Radiation therapy (Yes/No)
1.84 (1.05–3.22)
0.030
1.22 (0.66–2.27)
0.526
Targeted molecular therapy (Yes/No)
1.16 (0.71–1.90)
0.553
–
–
Smoking history (Current/Reform/Never)
1.34 (0.80–2.24)
0.261
–
–
Risk score (High/Low)
3.30 (1.99–5.45)
8.422 10
2.90 (1.69–4.99)
1.120 10
Validation set ( 213)
Age (mean sd)
1.01 (0.99–1.04)
0.261
–
–
Gender (Male/Female)
0.65 (0.40–1.05)
0.761
–
–
Pathologic M (M0/M1)
1.39 (0.34–5.75)
0.663
–
–
Pathologic N (N0/N1/N2N3)
1.37 (1.04–1.82)
0.054
–
–
Pathologic T (T1/T2/T3/T4)
1.60 (1.15–2.23)
5.254 10
1.49 (0.93–2.16)
0.352
Pathologic stage (I/II/III/IV)
1.32 (1.02–1.71)
0.035
1.98 (1.09–3.60)
0.024
Radiation therapy (Yes/No)
2.12 (1.21–3.71)
7.182 10
1.17 (0.87–1.57)
0.292
Targeted molecular therapy (Yes/No)
0.95 (0.58–1.59)
0.856
–
–
Smoking history (Current/Reform/Never)
0.83 (0.51–1.37)
0.476
–
–
Risk score (High/Low)
1.63 (1.02–2.60)
0.038
1.31 (1.19–2.16)
3.207 10
Entire cohort ( 426)
Age (mean sd)
1.01 (0.99–1.03)
0.223
–
–
Gender (Male/Female)
0.90 (0.65–1.24)
0.519
–
–
Pathologic M (M0/M1)
1.41 (0.65–3.05)
0.361
–
–
Pathologic N (N0/N1/N2N3)
1.34 (1.10–1.63)
3.741 10
0.95 (0.70–1.28)
0.724
Pathologic T (T1/T2/T3/T4)
1.58 (1.27–1.96)
3.917 10
1.26 (0.91–1.62)
0.544
Pathologic stage (I/II/III/IV)
1.38 (1.17–1.63)
1.396 10
1.59 (1.04–2.43)
0.033
Radiation therapy (Yes/No)
1.97 (1.33–2.93)
5.776 10
1.23 (0.93–1.62)
0.140
Targeted molecular therapy (Yes/No)
1.06 (0.75–1.50)
0.725
–
–
Smoking history (Current/Reform/Never)
1.06 (0.747–1.512)
0.736
–
–
Risk score (High/Low)
2.31 (1.642–3.237)
6.932 10
1.98 (1.38–2.85)
2.230 10
Identification and statistics of the differentially expressed genes (DEGs) between patients with high and low risk of recurrence in LUAD. (A) The volcano plot of the DEGs between patient with low and high LUAD recurrence risk in the entire set ( 426). Blue indicates the significantly up-regulated and down-regulated genes, respectively. DEGs were defined with a false discovery rate 0.05 and log (fold-change) 0.263. (B) Venn diagram of the common DEGs between comparisons of recurrent vs. non-recurrent samples and high-risk group vs. low-risk group. (C) The two-way hierarchical clustering of the DEGs. Samples are ordered from low to high risk scores. The color scale indicates the up-regulation (red) or down-regulation (green) of expression.
Independent predictors of recurrence risk
The stepwise Cox regression analysis showed that pathologic stage was the independent risk factor for LUAD recurrence in the three sets (Table 4). Survival analysis showed patients with higher pathologic stage (III–IV) had shorter RFS time compared with patients with lower pathologic stage (I–II, Kaplan-Meier log-rank test -value 0.05, Fig. 4A–C).
The pathways associated with the differentially expressed genes (DEGs) between patients with high and low recurrence risk
Term
Count
Value
Genes
hsa04080: Neuroactive ligand-receptor interaction
28
1.702
DRD2, HTR1B, NPFFR2, NPFFR1, HTR1D, NTSR1…
hsa00980: Metabolism of xenobiotics by cytochrome P450
9
5.128
GSTA3, ADH4, UGT2B11, ADH7, AKR1C1…
hsa04610: Complement and coagulation cascades
8
5.609
F11, FGB, SERPINA5, F2, SERPIND1…
hsa04060: Cytokine-cytokine receptor interaction
14
6.766
IL11, CCR6, CCL14, TNFSF11, CXCR5, IL22RA2…
hsa00830: Retinol metabolism
5
8.887
CYP2A13, ADH4, UGT2B11, UGT2B4, ADH7
hsa00380: Tryptophan metabolism
4
1.299
KYNU, OGDHL, ACMSD, IDO2
hsa00500: Starch and sucrose metabolism
4
1.448
ENPP3, UGT2B11, UGT2B4, AMY2B
hsa04020: Calcium signaling pathway
8
3.174
ERBB4, DRD5, GRIN2A, ADRA1A, PTGFR, NTSR1…
hsa04514: Cell adhesion molecules (CAMs)
6
4.022
CADM3, NRXN2, CD40LG, CLDN6, NLGN4X, CD22
hsa04664: Fc epsilon RI signaling pathway
4
4.471
FCER1A, LAT, MS4A2, PLA2G3
The Gene Ontology biological processes associated with the differentially expressed genes (DEGs) between patients with high and low recurrence risk.
Identification of pathways related to the risk of recurrence
To investigate the molecular mechanisms that may relevant to the recurrence in LUAD, the 426 patients were assigned into the high-risk and low-risk groups using the prognostic model. Subsequently, 616 DEGs were selected between the two groups (Fig. 5A), including 378 DEGs that overlapped between DEGs by the comparison of LUAD tumor with and without recurrence (Fig. 5B and C). Enrichment analysis showed these DEGs were associated with 16 Gene Ontology biological processes including GO:0051969 regulation of transmission of nerve impulse (including HTR1B), GO:0031644 regulation of neurological system process (including HTR1B), GO:0007267 cell-cell signaling (including HTR1B) and GO:0006955 immune response (including TNFSF11; Fig. 6A). Pathway enrichment analysis showed HTR1B was associated with hsa04080: Neuroactive ligand-receptor interaction and TNFSF11 was involved in hsa04060: Cytokine-cytokine receptor interaction pathway, respectively (Table 5).
Discussion
This study identified prognostic model of a combination of 8-gene signature with high predictive power for recurrence in LUAD. The 8 genes include 4 recurrence “negative” DEGs (including AZGP1, INPP5J, MYBPH and SPIB) and 4 “positive” DEGs (including GUCA2A, HTR1B, SLC15A1 and TNFSF11). We divided patients into two groups according to the median risk scores according to the 8-gene signature, and confirmed the significant differences in RFS and recurrence rate between patients in the high-risk and low-risk groups, and the high accuracy (AUC 0.8). This high performance of this model in predicting the postoperative recurrence in LUAD highlighted the crucial roles of the 8 genes in LUAD recurrence.
Gene expression profiles have important effect on the local and distant recurrence in tumors apart from the histopathology stage [3, 13, 15]. A 21-gene recurrence score was used for the prediction of local and distant recurrence in most commonly diagnosed breast cancer in women [11, 12]. The 21-gene model showed notable performance in N0 and N hormone receptor – positive patients [11], as well as in hormone-receptor – positive, human epidermal growth factor receptor type 2 (HER2)-negative and axillary node – negative breast cancers [12]. A CCP scoring system based on 31 proliferation-related genes, including cyclin A2 (CCNA2), CCNE2, cell division cycle 25A (CDC25A), kinase inhibitor 1A (CDKN1A) and CDNK1B, showed high performance in predicting and stratifying early-stage LUAD at risk of recurrence [17, 24]. These clinical reports demonstrated the efficacy and performance of multiple gene signature in predicting the recurrence in cancer.
Our present study identified a recurrence risk model with high performance for predicting the recurrence in LUAD. The recurrence risk model consists of the signatures of 8 genes with multiple functions. Genes including HTR1B [25] and GUCA2A [26], peptide absorption and transport like SLC15A1 [27] take important roles in metabolism, TNFSF11 relates to bone formation [28], AZGP1 involves in tumor growth and recurrence [29, 30], and MYBPH participates in cell motility and metastasis [31]. However, none of these genes has been identified to be related with the recurrence in cancers including LUAD. Here in our present study we identified the association of them with the recurrence in LUAD. This finding may show the crucial roles of them in LUAD development and progression.
Of these 8 genes, MYBPH and AZGP1 have been reported to be related to LUAD [31, 32]. MYBPH is a myosin binding protein with largely unknown function [33]. It shares sequence and structural similarity with cardiac MYBPC which has significant impact on contractility [34, 35, 36]. MYBPC3 mutation accounts for 25% hypertrophic cardiomyopathy [35]. Hosono et al. [31] showed that MYBPH inhibited the activation of Rho kinase 1 (ROCK1) via directly interacting with ROCK1, and negatively regulated the organization of actomyosin and thus reducing cell motility, invasion and LUAD metastasis. They also revealed that MYBPH was a transcriptional target of thyroid transcription factor 1 (TTF-1/NKX2-1), a lung development regulator that regulates lung epithelial morphogenesis [37]. TTF-1 expression was positive in most LUAD and was positively associated with good overall survival in patients with advanced LUAD [38, 39]. Kyuichi et al. [39] suggested that the recurrence rate in TTF-1-negative LUAD was significantly higher than that in TTF-1-positive patients. These results suggested the important role of TTF-1 expression in LUAD recurrence prediction. In addition, the express of another largely unknown gene AZGP1 was downregulated in recurrent LUAD samples and the content of serum AZGP1 autoantibody was identified to be positively correlated with the 5-year overall survival in LUAD [32]. AZGP1 was reported negatively correlated with prostate cancer recurrence [29]. The inclusion of AZGP1 and TTF-1-targeted MYBPH in the recurrence risk model suggested the potent clinical value of using this model for predicting LUAD recurrence.
Other genes, including INPP5J and GUCA2A were reported to be directly or indirectly correlated with cancer cell proliferation, tumor size and survival [40, 41]. Kondo et al. had shown the high INPP5J expression in breast cancer and its positive correlation with disease-free survival [41]. GUCA2A is a component of guanylin-GUCY2C tumor suppressor axis [42]. Guanylin stimulates lipolysis in human visceral adipocytes [43]. The silencing of guanylin – GUCY2C axis promoted obesity-induced colorectal cancer [44]. The inclusion of INPP5J and GUCA2A as well as the other genes like SPIB, HTR1B, SLC15A1 and TNFSF11 suggested that complex mechanism underlying LUAD development and recurrence.
Conclusions
Our study presented a novel recurrence risk model of 8-gene signature with high performance in predicting the recurrence in LUAD. These 8 genes, including AZGP1, INPP5J, MYBPH, SPIB, GUCA2A, HTR1B, SLC15A1 and TNFSF11 had complex molecular functions, and most of them had not been reported to be correlated with LUAD before. Our present study highlighted their correlation with LUAD recurrence and development. However, clinical validation using a large cohort should be performed to validate the performance of this model which might be of great value for predicting postoperative recurrence in LUAD and making timely prevention.
Footnotes
Conflict of interest
The authors declare that they have no competing interests.
References
1.
BrayF.FerlayJ.SoerjomataramI.SiegelR.L.TorreL.A. and JemalA., Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries, CA: A Cancer Journal for Clinicians68 (2018), 394–424.
2.
SureshR.AliS.AhmadA.PhilipP.A. and SarkarF.H., The role of cancer stem cells in recurrent and drug-resistant lung cancer, Oxygen Transport to Tissue890 (2016), 57–74.
3.
DziedzicD.A.RudzinskiP.LangfortR. and OrlowskiT., Risk factors for local and distant recurrence after surgical treatment in patients with non-small-cell lung cancer, Clinical Lung Cancer17 (2016), e157–e167.
4.
TaylorM.D.NagjiA.S.BhamidipatiC.M.TheodosakisN.KozowerB.D.LauC.L. and JonesD.R., Tumor recurrence after complete resection for non-small cell lung cancer, Annals of Thoracic Surgery93 (2012), 1813–1821.
5.
Hans-StefanH.GesineH.GüntherR.ChristianeT.AndreasS.Rolf-EdgarS. and StefanB., Matrix metalloproteinase-12 expression correlates with local recurrence and metastatic disease in non-small cell lung cancer patients, Clinical Cancer Research11 (2005), 1086–1092.
6.
LanderE.S., Comprehensive molecular profiling of lung adenocarcinoma, Nature511 (2014), 543–550.
7.
MattonenS.A.PalmaD.A.JohnsonC.LouieA.V.LandisM.RodriguesG.ChanI.Etemad-RezaiR.YeungT.P.C. and SenanS., Detection of local cancer recurrence after stereotactic ablative radiation therapy (SABR) for lung cancer: physician performance versus radiomic assessment, International Journal of Radiation Oncology Biology Physics94 (2016), 1121–1128.
8.
HidekiU.KyuichiK.ChaftJ.E.DanielB.SimaC.S.Ming-ChingL.JamesH.TravisW.D.RizkN.P. and RudinC.M., Solid predominant histologic subtype in resected stage i lung adenocarcinoma is an independent predictor of early, extrathoracic, multisite recurrence and of poor postrecurrence survival, Journal of Clinical Oncology33 (2015), 2877–2884.
9.
KelseyC.R.MarksL.B.DonnaH.HubbsJ.L.ReadyN.E.D’AmicoT.A. and BoydJ.A., Local recurrence after surgery for early stage lung cancer: an 11-year experience with 975 patients, Cancer115 (2010), 5218–5227.
10.
IsakssonS.JönssonP.MonsefN.BrunnströmH.BendahlP.O.JönssonM.StaafJ. and PlanckM., CA 19-9 and CA 125 as potential predictors of disease recurrence in resectable lung adenocarcinoma, Plos One12 (2017), e0186284.
11.
MitchD.JackC.ChristopherW.JohnF.MallonE.A.JanineS.EmmaQ.AnitaD.MichaelB. and AmanB., Prediction of risk of distant recurrence using the 21-gene recurrence score in node-negative and node-positive postmenopausal patients with breast cancer treated with anastrozole or tamoxifen: a TransATAC study, Journal of Clinical Oncology28 (2010), 1829–1834.
12.
SparanoJ.A.GrayR.J.MakowerD.F.PritchardK.I.AlbainK.S.HayesD.F.GeyerC.E.DeesE.C.PerezE.A. and OlsonJ.A., Prospective Validation of a 21-Gene Expression Assay in Breast Cancer, New England Journal of Medicine373 (2015), 2005–2014.
13.
LarsenJ.E.PaveyS.J.PassmoreL.H.BowmanR.V.HaywardN.K. and FongK.M., Gene expression signature predicts recurrence in lung adenocarcinoma, Clinical Cancer Research13 (2007), 2946–2954.
14.
JieR., Gene-expression profiles predict survival of patients with lung adenocarcinoma, Nature Medicine8 (2002), 816–824.
15.
HidehikoS.HidetakaU.TakamitsuO.GuC.TakeshiH.TsunehiroO. and KoseiY., Overexpression of MACC1 mRNA in lung adenocarcinoma is associated with postoperative recurrence, Journal of Thoracic & Cardiovascular Surgery141 (2011), 895–898.
16.
EguchiT.KadotaK.ChaftJ.EvansB.KiddJ.TanK.S.DycocoJ.KolquistK.DavisT. and HamiltonS.A., Cell cycle progression score is a marker for five-year lung cancer-specific mortality risk in patients with resected stage I lung adenocarcinoma, Oncotarget7 (2016), 35241–35256.
17.
AraminiB.CasaliC.StefaniA.BettelliS.WagnerS.SangaleZ.HughesE.LanchburyJ.S.MaioranaA. and MorandiU., Prediction of distant recurrence in resected stage I and II lung adenocarcinoma, Lung Cancer101 (2016), 82–87.
18.
RitchieM.E.BelindaP.DiW.YifangH.LawC.W.WeiS. and SmythG.K., limma powers differential expression analyses for RNA-sequencing and microarray studies, Nucleic Acids Research43 (2015), e47.
19.
WangL.CaoC.MaQ.ZengQ.WangH.ChengZ.ZhuG.QiJ.MaH. and HaiN., RNA-seq analyses of multiple meristems of soybean: novel and alternative transcripts, evolutionary and functional implications, BMC Plant Biology, 14, 1 (2014-06-17)14 (2014), 169.
20.
WangP.WangY.HangB.ZouX. and MaoJ.H., A novel gene expression-based prognostic scoring system to predict survival in gastric cancer, Oncotarget7 (2016), 55343–55351.
21.
TibshiraniR., The lasso method for variable selection in the Cox model, Statistics in Medicine16 (1997), 385–395.
22.
CampR.L.MarisaD.F. and RimmD.L., X-tile: a new bio-informatics tool for biomarker assessment and outcome-based cut-point optimization, Clinical Cancer Research10 (2004), 7252–7259.
23.
HuangG.ZhaoG.XiaJ.WeiY.ChenF.ChenJ. and ShiJ., FGF2 and FAM201A affect the development of osteonecrosis of the femoral head after femoral neck fracture, Gene652 (2018), 39–47.
24.
RaphaelB.ElishaH.SusanneW.GutinA.S.LanchburyJ.S.YifanZ.ArcherM.A.CorinneG.JonesJ.T. and KristenR., Validation of a molecular and pathological model for five-year mortality risk in patients with early stage lung adenocarcinoma, Journal of Thoracic Oncology 10 (2015), 67–73.
25.
HernándezS.CamarenaB.GonzálezL.CaballeroA.FloresG. and AguilarA., A family-based association study of the HTR1B gene in eating disorders, Revista Brasileira De Psiquiatria38 (2016), 239–345.
26.
RodríguezA.Gómez-AmbrosiJ.CatalánV.EzquerroS.Méndez-GiménezL.BecerrilS.IbáñezP.VilaN.MargallM.A. and MoncadaR., Guanylin and uroguanylin stimulate lipolysis in human visceral adipocytes, International Journal of Obesity40 (2016), 1405–1415.
27.
SuL.ZhangY.ChengY.C.LeeW.M.YeK. and HuD., Slc15a1 is involved in the transport of synthetic F5-peptide into the seminiferous epithelium in adult rat testes, Scientific Reports5 (2015), 16271.
28.
MencejS.PreželjJ.KocijančičA.OstanekB. and MarcJ., Association of TNFSF11 gene promoter polymorphisms with bone mineral density in postmenopausal women, Maturitas55 (2006), 219–226.
29.
HenshallS.M.HorvathL.G.QuinnD.I.EggletonS.A.GrygielJ.J.StrickerP.D.BiankinA.V.KenchJ.G. and SutherlandR.L., Zinc-alpha2-glycoprotein expression as a predictor of metastatic prostate cancer following radical prostatectomy, Journal of the National Cancer Institute98 (2006), 1420.
HosonoY.YamaguchiT.EriM.KiyoshiY.ChinatsuA.ShutaT.YukakoS.MichiyoH.SeiichiK. and KoheiY., MYBPH, a transcriptional target of TTF-1, inhibits ROCK1, and reduces cell motility and metastasis, Embo Journal31 (2014), 481–493.
32.
AlbertusD.L.SederC.W.ChenG.WangX.HartojoW.LinL.SilversA.ThomasD.G.GiordanoT.J. and ChangA.C., AZGP1 autoantibody predicts survival and histone deacetylase inhibitors increase expression in lung adenocarcinoma, Journal of Thoracic Oncology3 (2008), 1236–1244.
33.
MoutonJ.LoosB.Moolman-SmookJ.C. and KinnearC.J., Ascribing novel functions to the sarcomeric protein, myosin binding protein H (MyBPH) in cardiac sarcomere contraction, Experimental Cell Research331 (2015), 338–351.
34.
KellermayerM.SziklaiD.PappZ.DeckerB.LakatosE. and MartonfalviZ., Topology of interactions between titin molecules and myosin thick filaments, Biophysical Journal114 (2018), 646a.
35.
RibeiroM.C.BirketM.J.TertoolenL.MonshouwerJ., KlootsMummeryC.L. and PassierR., Hypertrophic cardiomyopathy – impaired sarcomeric motion and contractile dysfunction caused by defective MYBPC3 in hPSC derived cardiomyocytes, From Fetus Towards Adult123 (2016).
36.
TschöpeC.KheradB.KleinO.LippA.BlaschkeF.GuttermanD.BurkhoffD.HamdaniN.SpillmannF. and VanL.S., Cardiac contractility modulation: mechanisms of action in heart failure with reduced ejection fraction and beyond, European Journal of Heart Failure21 (2019), 14–22.
SchilskyJ.B.NiA.AhnL.DattaS.TravisW.D.KrisM.G.ChaftJ.E.RekhtmanN. and HellmannM.D., Prognostic impact of TTF-1 expression in patients with stage IV lung adenocarcinomas, Lung Cancer108 (2017), 205–211.
39.
KyuichiK.Jun-IchiN.SarkariaI.S.SimaC.S.XiaoyuJ.AkihikoY.RuschV.W.TravisW.D. and AdusumilliP.S., Thyroid transcription factor-1 expression is an independent predictor of recurrence and correlates with the IASLC/ATS/ERS histologic classification in patients with stage I lung adenocarcinoma, Cancer119 (2013), 931–938.
40.
ZhuT.YuanJ.WangY.GongC.XieY. and LiH., MiR-661 contributed to cell proliferation of human ovarian cancer cells by repressing INPP5J expression, Biomedicine & Pharmacotherapy75 (2015), 123–128.
41.
KondoN.KimT.S.WanifuchiY.HatoY.HisadaT.NishimotoM.NishikawaS. and ToyamaT., Abstract P6-07-34: The prognostic impact of inositol polyphosphate 5-phosphatase PIPP (INPP5J) expression in breast cancer tissue, San Antonio Breast Cancer (2017). 10.1158/1538-7445.
42.
BlomainE.S.LinJ.E.ColóngonzálezF.KimG.W.HyslopT.ZhanT.SnookA.E. and WaldmanS.A., Abstract 2882: calorie-induced silencing of the tumor suppressive guanylin-GUCY2C paracrine axis underlies colorectal cancer in obesity, Cancer Research75 (2015), 2882–2882.
43.
RodríguezA.GómezambrosiJ.CatalánV.EzquerroS.MéndezgiménezL.BecerrilS.IbáñezP.VilaN.MargallM.A. and MoncadaR., Guanylin and uroguanylin stimulate lipolysis in human visceral adipocytes, International Journal of Obesity40 (2016), 1405–1415.
44.
LinJ.E.Colon-GonzalezF.BlomainE.KimG.W.AingA.StoeckerB.RockJ.SnookA.E.ZhanT. and HyslopT.M., Obesity-induced colorectal cancer is driven by caloric silencing of the guanylin-GUCY2C paracrine signaling axis, Cancer Research76 (2016), 339–346.