Abstract
BACKGROUND:
Clear cell Renal Cell Carcinoma (ccRCC) is one of the most prevalent types of kidney cancer. Unravelling the genes responsible for driving cellular changes and the transformation of cells in ccRCC pathogenesis is a complex process.
OBJECTIVE:
In this study, twelve microarray ccRCC datasets were chosen from the gene expression omnibus (GEO) database and subjected to integrated analysis.
METHODS:
Through GEO2R analysis, 179 common differentially expressed genes (DEGs) were identified among the datasets. The common DEGs were subjected to functional enrichment analysis using ToppFun followed by construction of protein-protein interaction network (PPIN) using Cytoscape. Clusters within the DEGs PPIN were identified using the Molecular Complex Detection (MCODE) Cytoscape plugin. To identify the hub genes, the centrality parameters degree, betweenness, and closeness scores were calculated for each DEGs in the PPIN. Additionally, Gene Expression Profiling Interactive Analysis (GEPIA) was utilized to validate the relative expression levels of hub genes in the normal and ccRCC tissues.
RESULTS:
The common DEGs were highly enriched in Hypoxia-inducible factor (HIF) signalling and metabolic reprogramming pathways. VEGFA, CAV1, LOX, CCND1, PLG, EGF, SLC2A1, and ENO2 were identified as hub genes.
CONCLUSION:
Among 8 hub genes, only the expression levels of VEGFA, LOX, CCND1, and EGF showed a unique expression pattern exclusively in ccRCC on compared to other type of cancers.
Introduction
Renal cell carcinoma (RCC) includes various subtypes of kidney tumours characterized by numerous genetic abnormalities. Clear cell renal cell carcinoma (ccRCC), also known as kidney renal clear cell carcinoma (KIRC) is a crucial RCC subtype that accounts for about 8 out of 10 kidney cancer diagnoses and 2–3% of all cancer [1]. According to GLOBOCAN, the global cancer statistics, more than four lakh new cases are reported, causing one lakh deaths per year in kidney cancer for the year 2020 [1, 2]. The ccRCC is characterized by clear cytoplasm in the malignant epithelial cells, dense alveoli, and acinar growth patterns interspersed with complex branched blood vessels [3]. The ccRCC tumour develops irrespective of age, but most cases are reported between 50 to 70 years with less than six percent of children and young adults affected [4, 5]. The exact cause of ccRCC progression is unknown, but the following factors increase the risk: smoking, repeated use of painkillers, patients with other kidney diseases undergoing dialysis, and major genetic predisposition conditions including gene mutations [6].
The biallelic loss of Von Hippel–Lindau (VHL) gene function has been reported in over 51% of ccRCC patients [7, 8]. The pVHL serves as a tumour suppressor, excreting negative regulation on hypoxia-inducible factor
In recent years, several drugs have undergone testing on ccRCC tumour patients, which either directly or indirectly target major signalling pathways [13]. However, tumour cells are developing resistance against drugs due to factors like tumour heterogeneity, lysosomal sequestration, mutation, altered expression in gene or alternate signalling pathways instead of the drug targeted pathways [14, 15]. The patients with the primary tumour stage undergoing nephrectomy are also exhibit signs of relapse [16]. So, identifying new biomarkers in ccRCC is necessary for early diagnosis in relapse and targeted therapeutic treatment. The development of high throughput techniques such as microarray and rapid processing of microarray datasets are being used to identify the differentially expressed genes (DEGs) in the cancer progression [17, 18, 19].
In this study, we aimed to discover the hub genes and key signalling pathways of ccRCC using an integrated bioinformatic approach. Subjecting the large number of ccRCC microarray datasets to integrated analysis brings out commonly altered DEGs in key signalling pathways and their role in cellular functions. Also, identifying the hub genes in these pathways will be a useful biomarker for ccRCC in clinical diagnosis and therapeutic applications.
Methodology
Identification and screening of microarray datasets
The ccRCC microarray datasets were processed from Gene Expression Omnibus (GEO) (
Identification and screening of DEGs
The DEGs between ccRCC tumour and normal samples from the 12 collected datasets were screened by using an R language-based online tool called GEO2R (
Functional enrichment analysis by ToppFun tool in ToppGene suite
The common DEGs identified were subjected to enrichment analysis using ToppFun online tool in ToppGene suit (
Construction of protein-protein interaction network (PPIN)
The construction of a PPIN for the common DEGs was performed using the STRING database version 11.5 (
Identification and analysis of clusters
The PPIN network of common DEGs of ccRCC was imported into Cytoscape, and the important clusters/modules from the PPI network were identified using Molecular Complex Detection (MCODE) version 2.0.0., The default MCODE parameters were utilized to analyse and identify the clusters, with a MCODE score
Identification of hub genes by the centrality
In Cytoscape, the centrality parameters closeness, betweenness, and degree of common DEGs from the PPI network were assessed using the Network Analyzer plugin [24]. The distribution of the centrality parameters and DEGs with top centrality scores were plotted using BIC. The node degree and the parameters reliability can be assessed by analysing the distribution characteristics of centrality parameters. The top 10% of genes of common DEGs from the selected centrality parameters were identified using Venny 2.1.
Validation of hub gene expression
Gene Expression Profiling Interactive Analysis (GEPIA) (
Results
Identification and screening of microarray datasets and DEGs
In GEO, the screening for ccRCC datasets using keywords “ccRCC and clear cell renal cell carcinoma” yielded a total of 13,876 search results(Supp. Table 1). Initially, individual sample submissions were excluded. For instance, dataset GSE40435 included 101 samples each for control and test, resulting in 202 single sample submissions in the search results. Consequently, dataset GSE40435 was selected, and all similar duplicate single sample submissions from GSE4045 were excluded. After applying these exclusion criteria to all the datasets, we identified a total of 86 ccRCC datasets from the GEO database.
Following this, the datasets that included both human kidney ccRCC samples and their adjacent normal samples were prioritized for the study. Conversely, datasets featuring expression profiles from cell lines (E.g., GSE78179), patient blood (E.g., GSE117230), or urine samples (E.g., GSE7292) and specific cell samples such as FACS sorted immune cells from ccRCC tissues (E.g., GSE108310) were omitted. The datasets exclusively featuring expression profiles of tumour samples were excluded from the analysis (E.g., GSE29609) as DEGs cannot be identified without corresponding normal tissue profiles. Furthermore, datasets containing repetitive samples across multiple entries were excluded. Several datasets were initially submitted with a GSE number, but were subsequently updated to include additional patient samples alongside the existing sets with new GSE number. For example, GSE66272 is updated dataset which includes the samples from GSE66270, GSE66271. So, GSE 66272 was taken for the analysis to ensure data integrity. Also, datasets showing very less DEGs (GSE3 - only had 9 DEGs) and datasets showing error ((E.g., GSE781)) during GEO2R processing were excluded.
The common genes between the 12 datasets were identified by Venny 2.1 and represented in flower plot.
By following these criteria’s finally 12 datasets were chosen for this study(Supp. Table 2). The selected datasets were subjected to analysis using the GEO2R online tool. The DEGs from all the datasets were obtained, and the differential expression was represented using a volcano plot(Supp. Fig. 1). Initially, the normality of the samples and their distributions in all the datasets were checked in the GEO2R analysis(Supp. Fig. 2). We filtered DEGs with logFC
Top 10 enriched CC, MF, BP and pathways of common DEGs identified using ToppFun enrichment analysis tool.
To obtain a deeper understanding of the biological functions and signalling pathways of the common 179 DEGs identified, ToppFun tool from the ToppGene suit was employed to conduct a GO annotation and pathway analysis. The top 10 enrichment in GO terms(Supp. Table 3) CC, BP, MF, and pathway (Supp. Table 4) from ToppFun analysis were listed in Fig. 2. In GO analysis of CC, the top significantly enriched terms were collagen-containing extracellular matrix (22 genes), platelet alpha granule (11 genes), and basolateral plasma membrane (16 genes). The top enriched terms of DEGs in the MF were carboxylic acid binding (16 genes), signalling receptor binding (38 genes), and monosaccharide binding (8 genes). The top enriched terms in the BP were the organic acid metabolic process (44 genes), carboxylic acid metabolic process (41 genes), and oxoacid metabolic process (41 genes).
The network of top 10 pathways from the enrichment analysis was given in Supp. Table 4. The pathway analysis showed common DEGs were enriched in the HIF-1-
Cluster identification of the PPI network using MCODE. The four most significant clusters were selected with a score of 
The PPI network of common DEGs from ccRCC datasets was constructed using STRING. The network encompasses 178 nodeswith 543 edges visually depictedin Cytoscape and represented in Supp Fig. 3. Utilizing the MCODE plugin within Cytoscape, clusters were discerned from the STRING generated PPI network based on the cut-off criteria of degrees
(a) Identification of hub genes with top centrality scores (Degree, betweenness, closeness). (b) Heatmap showing expression pattern of eight hub genes in all the 12 ccRCC datasets.
To identify the hub genes involved in the ccRCC manifestation, we examined the centralities closeness, betweenness, and degree of common DEGs using the network analyzer plugin within Cytoscape. The distribution of the three centralities were illustrated in the supplementary Fig. 4. Additionally, we displayed the correlations between degree and betweenness (Supp Fig. 5a), degree and closeness (Supp Fig. 5b), and betweenness and closeness (Supp Fig. 5c) were shown in the Supplementary Fig. 5. Subsequently, employing Venny 2.1, we identified 8 genes commonly present in the top 10% across all selected centralities, designating them as hub genes (Fig. 4a). Hub genes are key players in biological networks, central to regulating functions and pathways by interacting with many other genes. Their importance stems from their pivotal positions, as changes in these genes can significantly impact the entire system they control. Out of 8 hub genes, six genes VEGFA, CAV1, LOX, CCND1, PLG, and EGF, were a part of the first cluster, and two genes, ENO2 and SLC2A1, were present in both the second and fourth cluster. The expression levels of hub genes across all 12 ccRCC datasets were depicted in a heatmap (Fig. 4b).
Validation of hub gene expression. Hub gene expression levels were validated with TGCA datasets obtained using GEPIA. VEGFA (a), CAV1 (b), LOX (c), CCND1 (d), PLG (e), EGF (f), ENO2 (g), SLC2A1 (h). (KIRC interchangeably used for ccRCC).
Hub gene expression on different cancer types. Median expression levels of hub genes in different cancer types obtained from GEPIA. ACC, Adrenocortical carcinoma; BLCA, Bladder urothelial carcinoma; BRCA, Breast invasive carcinoma; CESC, Cervical squamous cell carcinoma and endocervical adenocarcinoma; CHOL, Cholangiocarcinoma; COAD, Colon adenocarcinoma; DLBC, Lymphoid Neoplasm Diffuse Large B-cell Lymphoma; ESCA, Esophageal carcinoma; GBM, Glioblastoma multiforme; HNSC, Head and Neck squamous cell carcinoma; KICH, Kidney chromophobe; KIRC, Kidney renal clear cell carcinoma; KIRP, Kidney renal papillary cell carcinoma; LAML, Acute myeloid leukemia; LGG, Brain lower grade glioma; LIHC, Liver hepatocellular carcinoma; LUAD, Lung adenocarcinoma; LUSC, Lung squamous cell carcinoma; OV, Ovarian serous cystadenocarcinoma; PAAD, Pancreatic adenocarcinoma; PCPG, Pheochromocytoma and Paraganglioma; PRAD, Prostate adenocarcinoma; READ, Rectum adenocarcinoma; SARC, Sarcoma; SKCM, Skin cutaneous melanoma; STAD. Stomach adenocarcinoma; TGCT, Testicular germ cell tumours; THCA, Thyroid carcinoma; THYM, Thymoma; UCEC, Uterine corpus endometrial carcinoma; UCS, Uterine carcinosarcoma.
The hub genes identified in the ccRCC were validated by examining their gene expression in GEPIA. In the 12 GEO datasets VEGFA, CAV1, LOX, CCND1, SLC2A1, ENO2, showed increased expression while PLG and EGF were down regulated (Fig. 4b). These findings were confirmed by GEPIA, showing upregulation of VEGFA, CAV1, LOX, CCND1, SLC2A1, and ENO2 , and downregulation of PLG, EGF in ccRCC (Fig. 5). Comparison of hub gene expression levels between ccRCC and other cancers using GEPIA highlighted VEGFA, LOX, CCND1, and EGF as distinctly expressed in ccRCC. Comparison of hub gene median expression levels between ccRCC and other cancers using GEPIA highlighted VEGFA, LOX, CCND1, EGF as distinctly expressed in ccRCC (Fig. 6).
The ccRCC is the most pervasive kidney cancer and exhibits poorer survival than the other subtypes such as chromophobe and papillary RCC. In the present study, integrated bioinformatic analysis was performed to identify the hub genes and key signaling pathways in ccRCC. In this study, we extensively searched the GEO database and identified 12 ccRCC datasets. The DEGs between the ccRCC tumour and normal samples were identified using the GEO2R tool. From all 12 datasets, 179 common DEGs were identified and subjected to GO and pathway enrichment analysis. GO analysis showed that the top biological function of these DEGs was enriched in the metabolic process such as the organic acid metabolic process, especially carboxylic acid metabolism.
Alteration in metabolic programs, including modifications in the fatty acid metabolic process, result in the accumulation of acetyl Co-A and synthetic fatty acids as lipid droplets in the cytoplasm of renal cell. These renal cells appear as a clear cell, which is the key histological phenotype in the ccRCC [29]. The top pathways identified from GO pathway analysis revealed that DEGs from the ccRCC samples were enriched in the HIF-1
The overall enrichment analysis revealed that DEGs in ccRCCs were significantly enriched in the HIF-1
Changes in glucose and lipid metabolism are closely related to ccRCC. Anaerobic glycolysis leads to an increased production of acetyl Co-A which serves as an essential source for fatty acid synthesis [29, 36]. Previous research has underscored the impact of fatty acid metabolism on ccRCC development, which was driven by HIF
The identification of hub genes among common DEGs involved calculating the centrality parameter for each gene within the common DEGs PPIN. Utilizing STRING, the PPIN of common DEGs was generated, and subsequent MCODE analysis in Cytoscape highlighted four clusters, specifically selecting clusters with degrees
The expression levels of these 8 hub genes were validated with TGCA ccRCC data obtained through GEPIA. The expression pattern of hub genes in all the 12 datasets included in our study were consistent from the results obtained from GEPIA. Also, we obtained median expression levels of these 8 genes in different cancers and found VEGFA, LOX, CCND1, (highly upregulated in ccRCC than other cancer types), and EGF (highly downregulated) expression levels were distinct in ccRCC. Out of these four genes, VEGFA, LOX, and CCND1 can be directly regulated through the HIF transcriptional machinery except EGF [38, 39, 40].
VEGFA (vascular endothelial growth factor A), a key gene involved in angiogenesis induced by the HIF signaling pathway in hypoxic conditions, is involved in the development of many vascularized tumours [34, 41]. In patients diagnosed with ccRCC, the genes most commonly found to be modified alongside VEGFA were those neighboring VHL, observed in 72.22% of cases [42]. The mutation in the VHL gene failed to regulate the degradation of HIF1 and 2
LOX (lysyl oxidases) is also one of the genes triggered by HIF-1
CCND1 (Cyclin D1) is a proto-oncogene that positively regulates the cell cycle progression by dimerizing the cyclin-dependent kinases (CDK) (CDK4 and CDK6) and conducts G1 to S phase transition. Higher expression of CCND1 has been reported in many cancers such as breast, ovarian, endometrial, and bladder cancer [52], along with ccRCC. Both high and low CCND1 levels were associated with the ccRCC than other RCCs (chromophobe and papillary RCC), where the CCND1 expression levels were unchanged [53]. So, CCND1 can be used as a biomarker for differentiating ccRCC from other RCC types [54]. The CCND1 levels decreased with an increase in the ccRCC tumour grade, which (low CCND1) leads to a poor prognosis in ccRCC patients. Patients with both low and high CCND1 levels showed recurrence of ccRCC [53].
EGF (Epidermal growth factor) expression leads to the dimerization of EGF receptors which induces the intracellular tyrosine kinase activity, which plays in cell proliferation, angiogenesis and blocks apoptosis. In our study in all 12 datasets, the expression levels of EGF were found highly reduced in ccRCC samples compared to normal adjacent cells in the kidney. In ccRCC tumour samples, the EGF expression levels were very low, whereas the EGFR levels were reported to be high in ccRCC samples [55]. The EGFR-induced prooncogenic signaling cascades such as AKT-PI3K-mTOR signaling pathways had a crucial role in ccRCC [55]. The EGFR signaling cascades can be activated by several mechanisms other than EGF in ccRCC [56], but the mechanism behind the downregulation of EGF has not been investigated in ccRCC. Also, the expression of CAV1 can be directly activated by HIF1 and HIF2, helping in caveolae formation which leads to the dimerization and phosphorylation of EGFR and activates the EGFR signaling cascade without the ligand interaction [57]. Because of this property, the tumours having hypoxia signatures activate the necessary downstream signaling cascades by initiating the receptor activation without their respective ligand [57]. Also, EGF treatment in A431 human epidermoid carcinoma cells showed complete downregulation of CAV1 [58]. So, increasing the EGF level might help to prevent the autoactivation of EGFR and regulate EGFR-induced pro-oncogenic signaling cascades such as AKT-PI3K-mTOR. Also, this autophosphorylation of receptor tyrosine kinases (RTKs) is needed for VEGFA-mediated activation of the class 1a PI3k signaling cascade [59]. Typically, the EGF expression levels are very high in the kidney compared to the other organs in our body. So increasing the EGF levels and inhibition of the EGFR signaling cascade might help to prevent the ccRCC cancer progression and prognosis.
Currently, various treatments like RTK inhibitors (such as sorafenib, sunitinib, pazopanib, axitinib, cabozantinib, and tivozanib), VEGF inhibitor (Bevacizumab), HIF2
To best of our knowledge, there have been no in vitro or in vivo studies conducted on treating ccRCC with EGF. Elevating EGF levels, especially in combination with existing drugs like RTK inhibitors, might enhance treatment efficacy for ccRCC. This study is primarily based on predictive bioinformatic integrated analysis. So comprehensive laboratory experiments are essential to explore the potential efficacy of EGF and other hub genes in ccRCC treatment.
Conclusion
In this study, we conducted an integrated analysis using ccRCC datasets sourced from the GEO database and identified eight hub genes. Notably, VEGFA, LOX, CCND1, and EGF were highly expressed in the ccRCC than other cancer types. These genes play pivotal roles in various processes crucial for tumour formation and progression, including angiogenesis, metastasis, cell cycle regulation, anti-apoptosis and drug resistance. These genes will be a suitable biomarker for ccRCC clinical applications. Apart from the HIF activated genes, (VEGFA, LOX, CCND1,) the drastic downregulation of EGF also has to be noted as an important alteration in ccRCC patients. Investigating the mechanisms behind the downregulation of EGF and identifying the role of EGF other than EGFR mediated functions will give new insights into clinical applications of ccRCC. However, it’s crucial to note that these findings, although predictive, require validation through laboratory experiments to substantiate the results obtained from bioinformatics analyses. Experimental verification will strengthen the reliability and applicability of these bioinformatics-driven conclusions.
Author contributions
Conception: Vinoth S, Satheeswaran Balasubramanian.
Interpretation or analysis of data: Vinoth S, Satheeswaran Balasubramanian.
Preparation of the manuscript Data curation.: Vinoth S.
Revision for important intellectual content. Ekambaram Perumal, Kirankumar Santhakumar.
Supervision; Kirankumar Santhakumar.
Supplementary data
The supplementary files are available to download from
Footnotes
Acknowledgments
The facilities provided by SRM Institute of Science and Technology is gratefully acknowledged. The authors thankfully acknowledge the financial assistance received from the Department of Biotechnology (DBT), New Delhi (BT/PR26189/GET/119/226/2017) and DST-SERB, New Delhi (EMR/2017/000465). Satheeswaran Balasubramanian acknowledges ICMR-SRF (No.45/51/2020-Nan/BMS) funded by ICMR, New Delhi, India for fellowship support.
