A comprehensive analysis of mRNA expression profiles of Esophageal Squamous Cell Carcinoma reveals downregulation of Desmoglein 1 and crucial genomic targets

Abstract

AIM:

Esophageal Squamous Cell Carcinoma (ESCC) is a histological subtype of esophageal cancer that begins in the squamous cells in the esophagus. In only 19% of the ESCC-diagnosed patients, a five-year survival rate has been seen. This necessitates the identification of high-confidence biomarkers for early diagnosis, prognosis, and potential therapeutic targets for the mitigation of ESCC.

METHOD:

We performed a meta-analysis of 10 mRNA datasets and identified consistently perturbed genes across the studies. Then, integrated with ESCC ATLAS to segregate ‘core’ genes to identify consequences of primary gene perturbation events leading to gene-gene interactions and dysregulated molecular signaling pathways. Further, by integrating with toxicogenomics data, inferences were drawn for gene interaction with environmental exposures, trace elements, chemical carcinogens, and drug chemicals. We also deduce the clinical outcomes of candidate genes based on survival analysis using the ESCC related dataset in The Cancer Genome Atlas.

RESULT:

We identified 237 known and 18 novel perturbed candidate genes. Desmoglein 1 (DSG1) is one such gene that we found significantly downregulated (Fold Change $=$ $-$ 1.89, $p$ -value $=$ 8.2e-06) in ESCC across six different datasets. Further, we identified 31 ‘core’ genes (that either harbor genetic variants or are regulated by epigenetic modifications) and found regulating key biological pathways via adjoining genes in gene-gene interaction networks. Functional enrichment analysis showed dysregulated biological processes and pathways including “Extracellular matrix”, “Collagen trimmer” and “HPV infection” are significantly overrepresented in our candidate genes. Based on the toxicogenomic inferences from Comparative Toxicogenomics Database we report the key genes that interacted with risk factors such as tobacco smoking, zinc, nitroso benzylmethylamine, and drug chemicals such as cisplatin, Fluorouracil, and Mitomycin in relation to ESCC. We also point to the STC2 gene that shows a high risk for mortality in ESCC patients.

CONCLUSION:

We identified novel perturbed genes in relation to ESCC and explored their interaction network. DSG1 is one such gene, its association with microbiota and a clinical presentation seen commonly with ESCC hints that it is a good candidate for early diagnostic marker. Besides, in this study we highlight candidate genes and their molecular connections to risk factors, biological pathways, drug chemicals, and the survival probability of ESCC patients.

Keywords

Esophagus cancer ESCC desmogleins toxicogenomics krüppel-like factor stanniocalcin

1. Introduction

Esophageal Squamous cell carcinoma (ESCC) is an epidermoid carcinoma, most often found in the upper and middle part of the esophagus. This malignancy turns out deadly primarily due to late-stage diagnosis, metastasis, resistance to treatment, and relapse of cancer cells [1]. ESCC is the most common form of the two types of esophageal cancer with high prevalence in East Asian, Eastern European, and African populations, whereas less prevalent in Central European and Hispanic populations. Importantly, two geographic belts, the Asian esophageal cancer belt across central Asia from the Caspian Sea to northern China, and a belt on the eastern coast of Africa, from Ethiopia to South Africa, were found to have the highest risk for ESCC [2].

Lifestyle factors including smoking, chewing tobacco or gutka (a preparation made of crushed areca nut, tobacco, catechu, paraffin wax, slaked lime, and sweet or savory flavoring), consumption of heavy alcohol, and hot drinks are considered to be the critical risk factors for ESCC [3, 4, 5]. Further, low intake of fresh fruits and vegetables, pickled vegetables, exposure to polycyclic aromatic hydrocarbons, and betel quid chewing with or without tobacco [6] also have been reported to be risk factors for ESCC. Notably, many of these risk factors are geographical region specific [7]. For example, consumption of salted meat and its interaction with alcohol intake and smoking was found to be a risk factor for ESCC in Uganda [8], trace element imbalance in the soils in South Africa and West Asia [7], frequent drinking of hot Arabic coffee in Al-Qaseem region of the Saudi Arabia [9], consumption of hot green tea in East Asia and betel quid/gutka chewing in regions of South and Southeast Asia [10]. In addition, dietary zinc deficiency is known to increase the risk of ESCC [11]. The serum levels of Selenium and Zinc correlate with incidence of gastroesophageal cancers in West Asia [12].

In our previous effort, we developed ESCC ATLAS, a manually curated database, for integrating genetic, epigenetic, transcriptomic and proteomic knowledge disposed of in published literature [13]. It comprised 3,475 genes with the evidence of altered transcription (2,600), altered translation (560), and associated with molecular events such as copy number variation/structural variations (233), SNPs (102), altered DNA methylation (82), Histone modifications (16) and miRNA-based regulation (261) in ESCC etiology. Inarguably, biomarkers are valuable in predicting the outcome and guiding treatments of diseases. However, only a few biomarkers such as epidermal growth factor receptor (EGFR), vascular endothelial growth factor (VEGF), epidermal growth factor receptor 3 (HER3), programmed death receptor 1 or PD-L1 programmed death-ligand 1 (PD-1/PD-L1) serve as a therapeutic target in ESCC treatment [14] despite severe side effects from drugs in patients. The standard chemotherapeutic drugs used in treatment for ESCC are Cisplatin, Docetaxel, Mitomycin, 5-Fluorouracil, and vinorelbine or their combinations. The combination of docetaxel-cisplatin has shown a higher pathological complete response rate than the fluorouracil-cisplatin combination [15].

Often the case in the setting of genome-scale studies, genes are filtered based on their lowest p-values to select the top list of genes. As a result, only a few genes are selected for the validation and functional characterization phase of the study. If these genes replicate in an independent cohort, they are considered “perturbed genes” and are reported in the main manuscript/research. Sadly, the rest of the potentially important genes falling right below the cream layer never become the focus of the follow-up studies. An important point that should be noted is that the number of genes that fall in the top hits list usually depends on the sample size of the study. Hence, when the genes that stand just below a p-value threshold are not considered, there is a high chance that a potentially high-value biomarker gene is not considered for further investigation. Hence, hypotheses inferred and validated based solely on the results of a single study could sometimes be incomplete or misleading. Typically, the results vary between studies. Hence, instead of a single study, drawing conclusions based on combined data across studies would offer more reliable results.

In this contribution, we performed a meta-analysis using the data collected from multiple published studies. Here we present a list of differentially perturbed genes across multiple datasets in context to the ESCC, and explore their gene-to-gene interaction network, toxicogenomic targets in ESCC etiology and treatment, and report their association with specific biological pathways and GO terms.

2. Materials and methods

2.1 Source and collection of datasets

Studies related to mRNA expression of human ESCC tissues and cell lines were searched using the following keywords – “ESCC” or “Esophageal Squamous Cell Carcinoma” in three different public gene expression databases – 1) Gene Expression Omnibus (NCBI-GEO) [16], 2) EBI ArrayExpress [17] and 3) All of gene expression (AOE) [18] with certain criteria; which included – 1) selection of studies of human origin involving normal-tumor/cell lines samples were selected 2) the drug-treated and gene knockdown studies were excluded, 3) only the protein-coding mRNA studies were included, whereas the and non-coding RNA/miRNA studies were excluded 4) studies with at least two normal and tumor samples were selected. Based on this, a total of 16 datasets, comprising 782 (of which 298 normal, 484 tumor) samples altogether, were selected and preprocessed for quality control in our current study. However, 10 of the 16 datasets were further selected based on the quality control measures for downstream analysis.

2.2 Quality control measures and tools for analysis

The quality check of the Microarray datasets was performed using arrayQualityMetrics (v3.48.0) [19], a Bioconductor package that provides a detailed array quality metrics report with diagnostic plots using six different outlier detection methods. The arrayQualityMetrics also flags the potential outlier samples; considering the outlier flags and manual inspection the samples were removed from the array (Supplementary Table S1). The dataset itself was discarded if a potential mislabeling sample mislabeling was identified or if it contained an unbalanced number of control and tumor samples. In the case of the RNAseq dataset (GSE32424), we referred to the quality check performed in the original study and directly used the normalized read counts available from GEO [20].

As our collection of datasets was produced from different platforms, the platform-specific methods for processing the dataset were implemented. Hence, the datasets produced using the Affymetrix array platform were analyzed using the Affy (v1.74.0) [21] Bioconductor package in R. Similarly, the data generated from the Agilent, Human Whole Genome Onearray (HOA), and CodeLink array platforms, were processed using the LIMMA (v3.52.2) [22, 23] Bioconductor package. Each dataset was subjected to background correction, followed by the platform-specific normalization methods. The selection of the normalization method was based on the data distributions in Table 2.

2.2.1 Data enhancement and annotation

All the raw datasets were subjected to platform specific background correction and normalization methods. For this, all the available normalization methods were compared and only the optimal normalization method that provided a normal distribution (or approximate normal distribution) of data was selected. The resultant data was used for further analysis. Multiple probes in each dataset were collapsed to their gene source either by taking mean or median value of probeset intensities. The choice of mean or median depended upon the data distribution (observation made using density plots). Gene mapping to its probe IDs across studies was done by matching the official gene symbols using either platform specific design files or Bioconductor annotation packages including AnnotationDBI (v1.58.0) [24].

2.2.2 Differential gene expression of individual datasets and Meta-analysis

To discover differentially expressed genes, contrast matrices with tumor $\sim$ normal were prepared and multiple linear model fit was performed for a series of arrays in each dataset. Followed to this step, ebayes function was used to compute moderated t-statistics, moderated F-statistic, and log-odds of differential expression. The resultant data including Log2FC, adjusted P-value, and 95% confidence intervals of all the (5,080) common genes across all the ten datasets were used as input data for the meta-analysis with the MetaVolcanoR package (v1.10.0) available from the Bioconductor suite. The Random Effect Model (REM) approach implemented in the MetaVolcanoR, which summarizes the gene fold change of several studies taking into account the variance approach was used in the meta-analysis. The genes consistently perturbed across all the studies were then ranked based on the Topconfects approach [25] implemented within the package. Topconfects is a method for ranking by confidence bounds on the log fold change, based on the previously developed TREAT test by McCarthy and Smyth [26]. We further selected the top 5% of the consistently perturbed ranked genes for downstream analysis. For the identification of the canonical cancer and non-cancer driver genes, the candidate gene set was queried against the NSG7.0 database [27] which catalogs the manually curated Networks of Cancer genes and Healthy drivers.

To identify genes that are either known to be associated with esophageal squamous cell carcinoma (ESCC) or are potentially novel, we compared differentially expressed genes to those that have previously been reported in ESCC ATLAS. Additionally, we evaluated the expression differences of these candidate novel genes across different tumor cell differentiability statuses (Poor, Moderate, and Well) and normal tissue using the GSE75241 dataset. The comparison was made using a Wilcoxon rank sum test, where the differentiability statuses range from being cancerous (Poor) to normal (Well).

2.2.3 Gene to gene network analysis

To identify if our candidate genes were associated to the ESCC related molecular events, such as – 1) structural variations, 2) SNPs, 3) altered DNA methylation, 4) Histone modifications or 5) miRNA targets, they were intersected against the genes curated in the ESCC ATLAS [13], as well as a list of additional mutated genes reported in ESCC studies that were not yet included in ESCC ATLAS [28, 29, 30, 31]. The genes mapping to these molecular events were considered as the ‘core’ genes, whereas the remaining as ‘peripheral’ genes.

The gene-to-gene interactions between ‘core’ and ‘peripheral’ genes were investigated using a gene-to-gene network generated by querying the STRING database using the Cytoscape (v3.7.2) StringApp (v1.6.0) (The minimum interaction confidence score was set to 0.6) [32]. In the network each node represented the candidate genes. The nodes were colored with a color scale representing the average Fold Change (FC) expression across all the studies that were obtained from the meta-analysis using the Omics Visualizer (v1.3.0). We also explored the gene-disease networks for the selected genes using the DisGeNET Cytoscape App (v7.3.0) [33]. The gene-disease network is bipartite graph with two types of vertices (genes and diseases) and the edges connecting the vertices of different types (e.g., a gene with a disease). In this graph, two vertices can be connected by more than one edge, representing the multiple pieces of evidence reporting the gene-disease and variant-disease associations such as different database sources or DisGeNET association type [33].

2.3 Functional enrichment analysis

To gain insights into the biological functions and molecular networks that may have been altered in ESCC etiology for the top 255 differential expressed genes, they were evaluated for the overrepresented GO terms using gseGO, KEGG pathways using gseKEGG, Reactome pathways using gsePathway and WikiPathways using gseWP functions from the ClusterProfiler (v4.4.4) [34] package. These functions compute the overrepresentation based on the hypergeometric distribution test; the p-values were adjusted for the multiple testing in each case using the Benjamini-Hochberg (BH) procedure. Finally, the candidate gene set was selected considering the threshold on the adjusted $p$ -value $<$ 0.05 and the mode of Normal Enrichment Score (NES) $>$ 1.5 from ClusterProfiler.

2.4 Toxicogenomics analysis

The interactions of the candidate genes with the environmental exposures (such as chemical carcinogens, tobacco smoke, trace elements deficiency) and drugs that are inferred in context to ESCC, were explored using the bioconductor package CTDquerier (v2.3.1) [35] and represented in the form of a network, using the information from the Comparative Toxicogenomic Database [36]. For the network representation we used the ggnet package (v0.1.0) in R.

Table 1
Characteristics of datasets used in the meta-analysis

Number	Study	GSE ID	Biological material (sample size)	Data generation platform
1	Hu N et al, 2010 [77]	GSE20347	Tumor (17) and matched normal (17) esophageal tissue pairs	Affymetrix Human Genome U133A 2.0 Array
2	Fujiwara T et al, 2016 (Unpublished)	GSE23964	Normal (2) and tumor transfected in series of KYSE cell lines (14)	3D-Gene Human Oligo chip 25k
3	Chen YK et al, 2016 [78]	GSE70409	17 tumor and adjacent normal esophageal tissues	Phalanx Human OneArray
4	Aoyagi K et al, 2011 [79]	GSE22954	Tumor (90) and normal (10) esophageal tissues	Affymetrix Human Genome U95A and U133 Arrays
5	Yan W et al, 2013 [80]	GSE33426	Tumor (59) and adjacent normal (12) esophageal tissues. GSE33426 had 18 common samples with GSE29001. They were removed.	Affymetrix GeneChip human genome U133A 2.0 array
6	Wang Q et al, 2013 [81]	GSE26886	Tumor (19) and normal esophageal tissues (9)	Affymetrix Human Genome U133 Plus 2.0 Array
7	Yan W et al, 2012 [82]	GSE29001	Tumor (21), normal basal esophageal epithelium (12) and normal differentiated esophageal epithelium (12)	Affymetrix Genechip Human Genome U133A 2.0 Array
8	Su H et al, 2011 [83]	GSE23400	Tumor (53) and adjacent normal esophageal tissues (53)	Affymetrix GeneChip Human Genome U133A 2.0 Array
9	Lee JJ et al, 2010 [84]	GSE17351	Tumor (5) and adjacent normal esophageal tissues (5)	Affymetrix Genechip U133 plus v2.0 Array
10	Nicolau-Neto P et al, 2018 [85]	GSE75241	Tumor (15) and adjacent normal esophageal tissues (15)	Affymetrix Human Exon 1.0 ST array
11	Yang H et al, 2020 [86]	GSE44021	Pairs of tumor (113) and normal (113) esophageal tissues	Affymetrix GeneChip Human Genome U133A 2.0 Arrays
12	Tong M et al, 2012 [20]	GSE32424	Tumor (7) and normal (5) esophageal tissues	Illumina Genome Analyzer IIx
13	Saito S et al, 2015 [87]	GSE63941	22 ESCC cell lines and 4 isolated primary human esophageal fibroblasts	Affymetrix GeneChip Human Genome U133A array
14	Shimokuni T et al, 2006 [88]	GSE9982	20 KYSE human ESCC cell lines and 2 non-cancerous HEEC-1 esophageal epithelial cell lines	Codelink Uniset Human 20K I Bioarray
15	Zhu L et al, 2013 (Unpublished)	GSE45168	Tumor (5) and adjacent normal (5) esophageal tissues	Agilent-026652 Whole Human Genome Microarray 4x44K v2
16	Erkizan HV et al, 2017 [89]	GSE77861	Tumor (7) and adjacent normal esophageal tissues (7)	Affymetrix Human Genome U133 Plus 2.0 Array

2.5 Survival analysis

To reveal contribution of candidate genes on patient survival, we performed overall survival prediction in an independent ESCC RNASeq dataset available at The Cancer Genome Atlas (TCGA). We used TCGAbiolinks (v2.20.1) in R [37] to access 90 ESCC tumor and 11 normal samples [38] of the total 559 samples corresponding to EC. Further, we used the survival package (v 0.4.9) and survminer package (v3.4.0) in R for survival analysis and for generating Kaplan–Meier survival plots respectively. For the genes that showed significant association with survival of ESCC patients, we performed Cox proportional hazard analyses using the dysregulated and intact candidate genes, and compared the distribution of log2CPM values for these genes in the tumor and normal samples using $t$ -test statistic.

3. Results

3.1 Characteristics of datasets

We collected the data available from a total of 16 studies that were selected with our exhaustive literature survey and selection criteria (see Methods). These studies differed from each other in terms of samples, experimental setup and the data generation platforms. 13 studies used the normal-tumor tissue samples, whereas the other 3 used the ESCC cell lines. Of the 16 datasets, 11 were generated from the Affymetrix platform, and the remaining 5 datasets were from Agilent, Phalanx Human OneArray, Codelink, and Illumina Genome Analyzer platforms (1 each. A detailed Information on the datasets with biological materials, sample size, microarray/sequencing platforms used in each study is provided in Table 1.

Figure 1.

QQ-plot showing observed versus theoretical quantiles of expression in all the 10 ESCC ‘normal vs tumor’ datasets.

For any meta-analysis it is crucial to assess the quality of the dataset being used, because the inclusion of an outlying dataset could result in inefficient or wrong biological conclusions [39]. Hence, we performed the quality check for each of our collected datasets (see Methods), such that there were no outlier samples in the dataset and the data itself were not unbalanced. Considering the stringent QC measures and the number of common gene set across the arrays, we disregarded 6 datasets for further analysis, these included – GSE63941, GSE9982, GSE45168, GSE70409, GSE22954 and GSE32424 (Supplementary Table S1), among these we found possible mislabeling of the samples in GSE9982, GSE45168 (Supplementary Figure S1). The details of the number of samples being excluded in each dataset and QC status of the datasets are provided in Table 2.

Table 2

Illustrates study specific details of the analysis method used and number of genes identified from original studies and this study

GSE ID	Quality control status of dataset (samples removed)	Normalization method used in the original study	Statistical method used in the original study	Normalization methods used in this study	Number of genes discovered in the original study	Number of DE genes discovered in each study	Number of genes discovered in this study	$P$ -value*
GSE20347	Passed (4T)	RMA	Unpaired $t$ -tests	MAS5	Linear modeling using LIMMA	67-up, 101-down	337-up, 445-down	$<$ 2.2e-16
GSE23964	Passed (2T)	Unpublished	Unpublished	NormExp, Cyclic Loess	Linear modeling using LIMMA	–	13-up, 17-down	8.7e-04
GSE33426	Passed (1N, 8T),	RMA with quantile normalization	Paired $t$ -statistics for each probe set and multivariate permutation test	RMA	Linear modeling using LIMMA	13-up	115-up, 492-down	$<$ 2.2e-16
GSE26886	Passed (1N, 1T)	RMA with quantile normalization	Paired $t$ -statistics for each probe set and multivariate permutation test	MAS5	Linear modeling using LIMMAs	1-up	505-up, 677-down	$<$ 2.2e-16
GSE29001	Passed (2N)	Quality controlling with normalized unscaled standard error (NUSE) and relative log expression (RLE). Normalization was with RMA	Paired $t$ -test and multivariate permutation test	RMA	Linear modeling using LIMMA	24-up, 24-down	9-up, 43-down	1.4e-02
GSE23400	Passed (5N, 3T)	RMA	Paired $t$ -test	MAS5	Linear modeling using LIMMA	116-up, 43-down	213-up, 175-down	$<$ 2.2e-16
GSE17351	Passed (2T)	GeneChip Robust Multiarray Average	Significance analysis of microarrays	MAS5	Linear modeling using LIMMA	28-up, 31-down	580-up, 411-down	$<$ 2.2e-16
GSE75241	Passed (2T)	RMA	Linear modeling using LIMMA	RMA	Linear modeling using LIMMA	992-up, 332-down	169-up, 265-down	4.5e-13
GSE44021	Passed (4N, 11T)	RMA	Paired $t$ -tests	RMA	Linear modeling using LIMMA	396-up, 422-down	153-up, 154-down	$<$ 2.2e-16
GSE77861	Passed (0)	RMA, Lowess modeling	paired $t$ -test	MAS5	Linear modeling using LIMMA	340-up, 416-down	289-up, 336-down	2.08e-06
GSE32424	Passed (0), however not included in meta-analysis ${}^{@}$	RPKM	$t$ test and Baggerley test	RPKM	Linear modeling using LIMMA	–	–

Table 2, continued
GSE ID	Quality control status of dataset (samples removed)	Normalization method used in the original study	Statistical method used in the original study	Normalization methods used in this study	Number of genes discovered in the original study	Number of DE genes discovered in each study	Number of genes discovered in this study	$P$ -value*
GSE70409	Passed (1T), However not included in analysis ${}^{@}$	Global Lowess normalizations	Random Forests classifier	NormExp, Scale	Linear modeling using LIMMA	–	–
GSE22954	Passed (1N, 10T), However, not included in meta-analysis ${}^{@}$	Genes with signal intensity $>$ 1000 in more than 10% of the samples were selected	Permutation test	MAS5	Linear modeling using LIMMA	–	–
GSE63941	Failed ${}^{\#}$	MAS5	–	–	–	–	–
GSE9982	Failed ${}^{\$}$	Global normalization	Unpublished	–	–	–	–
GSE45168	Failed ${}^{\$}$	Unpublished	Unpublished	–	–	–	–

Note #, all the normal samples failed in QC parameters, $ mislabeling was suspected in sample status, @datasets were removed for comprising less number of genes in those datasets, N refers to ‘normal’ samples and T refers to ‘tumor’ samples, * Test for symmetry between DE gene distribution between original and this study using McNemar–Bowker chi-square test.

Figure 2.

Heatmap showing differential expression of the most consistently perturbed 255 genes across 10 datasets.

3.1.1 Differentially expressed of genes in ESCC

We identified the differentially expressed genes in each dataset based on the linear modelling of tumor $\sim$ normal comparison. The comparison of the quantiles of the selected ten datasets against the standard normal are shown in the QQ plot (Fig. 1) that indicates no bias or confounding factors in the data. The total counts of up and downregulated genes of all the datasets used in this study and original studies are provided in Table 2. Given that in this contribution we focused on the meta-analysis, we combined the differential gene expression results from each dataset by selecting the common set of 5,080 genes across all the datasets, followed by summarizing their gene expression profiles using the REM approach (see Methods). With this, we selected the top 255 genes representing the top 5% of the most consistently perturbed genes in context to ESCC (Supplementary Table S2), across all the datasets. These 255 genes are henceforth referred to as the candidate genes that were further explored for the gene-to=gene interactions and functional characterizations. Differential expressions of the top 5% perturbed genes across the ten datasets are illustrated in (Fig. 2) using Heatmap. Among our candidate genes, 13 genes with average fold change (FC $=$ (“left limit of the fold change confidence interval” $+$ “right limit of the fold change confidence interval”)/2) $\geqslant$ 1.5, and 47 genes with FC $\leqslant$ $-$ 1.5 are shown in Table 3.

Table 3
List of genes consistently perturbed with $>$ 1.5 and $<$ $-$ 1.5 summary fold change

Gene symbol	Differential expression sign consistency among data sets	Summary fold change [95% CI]	$P$ -value	Rank	Status of gene perturbation in ESCC ATLAS
SPP1	8	2.9 [1.9 – 3.9]	1.33E-08	1	Known
SPINK5	–8	–3.26 [–4.5 – –2.02]	2.75E-07	2	Known
CYP4B1	–8	–2.91 [–3.99 – –1.83]	1.20E-07	3	Known
CRABP2	–10	–2.13 [–2.9 – –1.36]	5.17E-08	4	Known
MAL	–8	–4.08 [–5.84 – –2.33]	4.94E-06	5	Known
CRISP3	–8	–5.25 [–7.62 – –2.87]	1.47E-05	6	Known
SCEL	–8	–3.22 [–4.57 – –1.87]	2.75E-06	7	Known
CYP3A5	–8	–1.79 [–2.45 – –1.13]	9.67E-08	8	Known
KLK11	–8	–1.85 [–2.55 – –1.15]	2.10E-07	9	Known
SLURP1	–8	–3.4 [–4.92 – –1.89]	1.11E-05	10	Known
BLNK	–8	–1.93 [–2.7 – –1.16]	8.53E-07	11	Known
HLF	–8	–1.73 [–2.4 – –1.05]	4.84E-07	12	Known
RHCG	–8	–2.97 [–4.32 – –1.61]	1.70E-05	13	Known
PPP1R3C	–8	–2.42 [–3.48 – –1.36]	7.81E-06	15	Known
TGFBI	6	2.01 [1.16 – 2.85]	3.10E–06	16	Known
CLIC3	–8	–2.6 [–3.77 – –1.43]	1.35E-05	17	Known
CRNN	–8	–4.5 [–6.75 – –2.26]	8.50E-05	20	Known
EPCAM	8	1.5 [0.9 – 2.1]	8.73E-07	21	Known
TGM1	–8	–2.55 [–3.75 – –1.36]	2.83E-05	22	Known
ADH7	–8	–1.66 [–2.35 – –0.96]	3.01E-06	23	Known
KRT13	–8	–2.4 [–3.52 – –1.28]	2.57E-05	24	Known
DSG1	–10	–1.9 [–2.73 – –1.06]	8.21E-06	25	Novel
PTK6	–8	–2.03 [–2.94 – –1.11]	1.46E-05	28	Known
C1orf116	–8	–2.06 [–2.99 – –1.12]	1.60E-05	29	Known
EPS8L1	–8	–1.98 [–2.87 – –1.09]	1.30E-05	30	Known
TGM3	–8	–3.5 [–5.26 – –1.74]	9.93E-05	31	Known
MYBL2	8	1.67 [0.96 – 2.39]	4.90E-06	33	Known
CXCR2	–8	–2.61 [–3.87 – –1.35]	4.98E-05	34	Known
MALL	–8	–2.02 [–2.95 – –1.09]	2.26E-05	38	Known
TP53I3	–8	–1.57 [–2.25 – –0.9]	4.79E-06	39	Known
HEY1	6	1.91 [1.04 – 2.79]	1.81E-05	41	Known
SULT2B1	–8	–1.53 [–2.19 – –0.88]	4.20E-06	42	Known
GDPD3	–8	–2.1 [–3.12 – –1.09]	5.01E-05	51	Known
FNDC4	–8	–1.83 [–2.7 – –0.97]	3.03E-05	53	Known
EPHX2	–10	–1.51 [–2.17 – –0.84]	9.70E-06	54	Known
EVPL	–8	–1.53 [–2.23 – –0.83]	1.68E-05	62	Known
SERPINB13	–8	–1.57 [–2.29 – –0.84]	2.27E-05	67	Known
ABLIM1	–6	–1.61 [–2.37 – –0.85]	3.26E-05	72	Known
HOXB7	8	1.74 [0.89 – 2.58]	5.84E-05	74	Known
SLC16A6	–6	–1.76 [–2.63 – –0.9]	6.34E-05	75	Known
ANXA9	–6	–2.28 [–3.47 – –1.1]	0.000162	77	Known
PSCA	–6	–2.38 [–3.63 – –1.13]	0.000197	78	Known
COL5A2	6	1.72 [0.87 – 2.56]	6.92E-05	82	Known
GPD1L	–8	–1.57 [–2.32 – –0.81]	4.90E-05	86	Known
TJP3	–6	–1.64 [–2.45 – –0.83]	7.59E-05	94	Known
MMP13	8	2.7 [1.22 – 4.18]	0.000361	97	Known
PRSS3	–8	–1.99 [–3.04 – –0.95]	0.000184	100	Known
SERPINB4	–8	–1.9 [–2.89 – –0.91]	0.000168	102	Known
PAX9	–6	–1.52 [–2.29 – –0.75]	0.000105	118	Known
DUSP5	–8	–1.77 [–2.7 – –0.84]	0.000191	120	Known
CALB1	8	1.61 [0.78 – 2.45]	0.000157	125	Known
MMP10	6	2.21 [0.95 – 3.47]	0.0006	152	Known
LPCAT1	6	1.78 [0.79 – 2.76]	0.000401	155	Known
GABRP	–8	–2.15 [–3.39 – –0.92]	0.000617	158	Known
SLC27A6	–10	–1.55 [–2.39 – –0.71]	0.000284	159	Known
COL11A1	6	2.24 [0.94 – 3.54]	0.000744	166	Known
LAMC2	6	1.64 [0.73 – 2.54]	0.000405	170	Known
GCNT3	–8	–1.68 [–2.64 – –0.73]	0.000533	192	Known
TMPRSS11D	–8	–2.03 [–3.25 – –0.81]	0.001128	232	Known
S100P	–6	–1.98 [–3.18 – –0.78]	0.001226	250	Known

Figure 3.

Showing A) Forest plot showing the expression of the DSG1 consistently downregulated in all the studies with at least 6 of the studies showing significant downregulation (p-value $\leqslant$ 0.05). The forest plot shows the log 2-fold change values for individual studies, Fold Change summary representing the meta-log 2-fold change value for all studies combined and as well as the p-values for individual studies with meta-p value for all studies combined. Each study is illustrated by a solid dot. B) DSG1 gene-disease network from DisGeNET showing strong association with Palmoplantar keratoderma. C) Distribution of perturbed mRNA and protein expression observed from EsccAtlas along with cancer, non-cancer and healthy genes counts from NSG7.0 for the 255 perturbed expression of genes in this study. D) Distribution of molecular events from ESCC ATLAS mapped with top 255 differentially regulated genes.

To identify if our candidate genes set were previously known in context to ESCC or were associated to the ESCC related molecular events, we queried them against the ESCC ATLAS [13] and other additional studies that were not included in the ESCC ATLAS. We found, 237 of our 255 candidate genes were indeed known and listed in ESCC ATLAS. More importantly, we also found 18 perturbed genes (DSG1, RETSAT, KCNK7, AKR1B1, LY6G6C, GLA, TTC22, DBNDD1, GPR19, PROS1, GALNT2, ESRP2, COCH, VPS13D, PRPF4, PIGN, RAD1, and GALNT6) that were not known or reported in context to ESCC, and hence they are important genes to be further investigated.

Figure 4.

Showing interaction network of core genes with peripheral genes. The edges connecting the core gene and peripheral genes are highlighted; the color of the nodes represent the average log2FC of the genes across studies. The nodes corresponding to the core genes were highlighted with distinct colored squares, where the colors of the squares corresponded to specific molecular events.

We assessed the expression differences of these novel perturbed genes between tumor cell differentiation status and normal tissue using the data available from GSE75241. The results revealed significant differences in the expression of AKR1B1, DBNDD1, DSG1, ESRP2, GALNT2, GALNT6, LY6G6C, PIGN, RETSAT, and TTC22 genes (Supplementary Figure 3). The observation that these genes exhibit significant differences in their expression levels between tumor cell differentiation status (Moderate, Poor, and Well) and normal tissue suggests that they play important role in tumor cell differentiation and morphogenesis as well as embryonic development through their ability to influence intercellular signal transduction in ESCC.

Figure 5.

Illustrating functional gene enrichment analysis of overrepresentation terms from A) GO enrichment B) KEGG C) Wikipathways enrichment D) Reactome.

We also referred to the Cancer Dependency Map (DepMap) [40] portal which includes the effect size information for a given gene that would represent how much the loss of that gene affects the viability of the cancer cell line being studied. Of all the 18 genes only Desmoglein-1 (DSG1) gene knockout in DepMap was related to the ESCC cell line that was compared against the 24 different cell lineage-specific gene sets to calculate the gene dependency scores. We found the Gene Effect (Chronos) for DSG1 in ESCC was significantly different from the other lineage-specific gene sets and less than zero, indicating that DSG1 is essential in ESCC (Supplementary Figure 4).

Among these, Desmoglein-1 (DSG1) showed significant downregulation (FC $=$ $-$ 1.89 and $p=$ 8.2e-06) in Meta-analysis. DSG1 is known to express in the superficial upper layers of the skin epidermis and epithelial cells of esophagus, and is the only gene which is consistently downregulated in all the datasets, and the downregulation of which is significant in 6 datasets (GSE20347; $p=$ 2.6e-02, GSE23400; $p=$ 3.0e-03, GSE26886; $p=$ 3.3e-02, GSE33426; $p=$ 1.0e-03, GSE44021; $p=$ 1.0e-03, GSE75241; $p=$ 9.0e-03) with FC $<$ $-$ 1.5 (Fig. 3A). Interestingly, we also found DSG1 is strongly linked with Palmoplantar keratoderma, a clinical presentation seen with ESCC [41, 42] (Fig. 3B). Further details on the candidate genes including the information of the source studies are available in Supplementary Table S2.

By intersecting our candidate gene lists with the NSG 7.0, which catalogues cancer-healthy driver genes, we observed 53 canonical cancer genes, 201 non-cancer genes and one healthy driver genes in context to ESCC. This suggested that cancer genes were enriched ( $\sim$ 20%) in our candidate genes set. The distribution of cancer, non-cancer and healthy genes is shown in Fig. 3C.

Table 4

Lists of biological terms with summary statistics of functional enrichment analysis using ClusterProfiler

Ontology	ID	Description	Gene set size	Enrichment score	Adjusted $P$ -value	$q$ -values
BP	GO:0001503	Ossification	11	0.758	0.0045	0.0040
CC	GO:0062023	Collagen-containing extracellular matrix	12	0.726	0.0045	0.0040
MF	GO:0050840	Extracellular matrix binding	3	0.992	0.0045	0.0040
MF	GO:0005201	Extracellular matrix structural constituent	9	0.768	0.0097	0.0087
BP	GO:0001501	Skeletal system development	14	0.618	0.0205	0.0184
BP	GO:0051301	Cell division	17	0.566	0.0205	0.0184
CC	GO:0005788	Endoplasmic reticulum lumen	12	0.644	0.0205	0.0184
BP	GO:0016567	Protein ubiquitination	7	0.777	0.0205	0.0184
CC	GO:0005581	Collagen trimer	5	0.871	0.0205	0.0184
MF	GO:0030020	Extracellular matrix structural constituent conferring tensile strength	5	0.871	0.0205	0.0184
BP	GO:0001657	Ureteric bud development	3	0.960	0.0205	0.0184
BP	GO:0001823	Mesonephros development	3	0.960	0.0205	0.0184
BP	GO:0045596	Negative regulation of cell differentiation	11	0.672	0.0221	0.0198
CC	GO:0005583	Fibrillar collagen trimer	4	0.868	0.0312	0.0280
BP	GO:0001649	Osteoblast differentiation	6	0.774	0.0411	0.0368
KEGGPathway	hsa05165	Human papillomavirus infection	10	0.635	0.0371	0.0344
KEGGPathway	hsa04512	ECM-receptor interaction	6	0.764	0.0371	0.0344
Reactome	R-HSA-1474244	Extracellular matrix organization	14	0.785	5.27E-08	1.90E-06
Reactome	R-HSA-1474228	Degradation of the extracellular matrix	11	0.810	2.93E-07	5.27E-06
Reactome	R-HSA-69278	Cell Cycle, Mitotic	26	0.513	2.17E-05	0.00026
Reactome	R-HSA-1640170	Cell Cycle	27	0.494	3.04E-05	0.0002
Reactome	R-HSA-69620	Cell Cycle Checkpoints	15	0.577	0.00032	0.002
Reactome	R-HSA-1280218	Adaptive Immune System	11	0.576	0.00270	0.016
Reactome	R-HSA-392499	Metabolism of proteins	29	0.363	0.0035	0.018
Reactome	R-HSA-68886	M Phase	12	0.514	0.0071	0.031
Reactome	R-HSA-9006934	Signaling by Receptor Tyrosine Kinases	14	0.489	0.0081	0.032
WikiPathway	WP3932	Focal adhesion: PI3K-Akt-mTOR-signaling pathway	10	0.688	0.0097	0.0080
WikiPathway	WP474	Endochondral ossification	3	0.910	0.0260	0.0214
WikiPathway	WP4808	Endochondral ossification with skeletal dysplasias	3	0.910	0.0260	0.0214
WikiPathway	WP2911	miRNA targets in ECM and membrane receptors	3	0.896	0.0310	0.0255

3.1.2 ESCC core genes and gene-to-gene networks

Further, we segregated our candidate genes (see Methods) set as core (genes that are known to harbor genetic variants or undergo epigenetic regulation in ESCC ATLAS) and peripheral genes (other perturbed genes). In total we found 31 core genes, of which 7 were previously known to harbor SNPs in context to ESCC (namely, CRNN, CYP3A5, ADH7, SULT2B1, ALDH2, CYP11A1, MMP7), whereas other 11 genes were known to be associated with CNVs (namely, KRT13, ITGA6, UBE2C, CENPF, PAX9, ENAH, COL11A1, ABCC4, DHCR7, ASPN, CYP4F3) in relation to ESCC. Additionally, we identified 11 genes with either epigenetic modifications such as DNA methylation (namely, PRSS3, HLTF), Histone Modification (include, PTK6, WDHD1, KLF4), or targets of miRNAs (FSCN1, MMP10, KLF4, DSC2, CDK4, SNAI2, BIRC5). We also found 2 genes, FAT1 and ZNF750 that were associated with mutations in context to ESCC. The distribution of core genes with respect to molecular events were shown in Fig. 3D. Details of core genes concerning molecular events in the top 255 genes with curated information from the original studies are presented in Supplementary Dataset S1. We could map 215 mRNA genes and protein expression information for 24 genes in ESCC ATLAS (Fig. 3C).

Next, we explored the gene-to-gene interaction network, where the “core” and “peripheral” genes were interconnected based on the available evidence for functional associations or physical interactions between protein-protein pairs imported by the StringApp from the STRING database into Cytoscape. We found, of 31 total core genes, 27 interact with at least one other peripheral or core gene that is differentially regulated (Fig. 4). Whereas, 132 differentially regulated genes including 7 core genes did not have enough evidence for the interaction with other genes. Within our network of interacting genes, we found a sub-network of collagen type 1, 5, 6 and 11 genes that are all upregulated in ESCC. Collagen type XI alpha 1 chain (COL11A1) which is one of the core genes, is previously known for promoting ESCC proliferation, and for the target of miRNA-based regulation [43], and also to be associated with CNV in context to ESCC [44]. Collagen gene family are also known to be upregulated for the synthesis and assembly of collagens driven by tumor-associated macrophages (TAMs) which are in turn involved in the extracellular matrix (ECM) remodeling proteases in the primary tumor environment [45]. Another interesting gene in our network is the transcriptional regulator Krüppel-like factor 4 (KLF4) that is previously known to be downregulated in ESCC [46] and is also known to interact with miRNAs-based regulation [49]. We found KLF4 interacts with ZNF750 which is also a core gene, and is a known transcriptional regulator of epidermal differentiation. ZNF750 has been previously identified as a tumor suppressor in ESCC [47]. Several reports have shown that ZNF750 positively regulates KLF4 expression [48, 49]. Hence it is not surprising to see these two genes are downregulated together in the context to ESCC [49, 50].

3.2 Functional enrichment of genes

Gene perturbation driven tumultuous molecular responses discovered from functional enrichment analysis have shown activation of crucial biological processes and pathways in ESCC etiology. The enrichment analysis using the GO annotations showed 15 terms to be significantly (adjusted $p$ -value $\leqslant$ 0.05; $|$ NES $|$ $>$ 1.5) overrepresented in our candidate gene list when compared against all the common expressed genes in all samples (Fig. 5A). Similarly, using the Pathway annotations from the KEGG, REACTOME, and Wiki Pathways, we identified 2, 9 and 4 pathways respectively that are significantly (adjusted $p$ -value $\leqslant$ 0.05; $|$ NES $|$ $>$ 1.5) overrepresented in our candidate gene list in comparison to the rest of the expressed genes (Fig. 5B, C and D). The summary statistics of all the overrepresented GO terms and the biological pathways in our candidate gene set is listed in Table 4. Interestingly we found, the GO term “extracellular matrix” and related Pathways were significantly overrepresented in our candidate genes. Other crucial overrepresented terms among our candidate genes include - “Ossification”, “protein ubiquitination”, “collagen trimer”, “cell cycle”, “cell division”, “negative regulation of cell differentiation”, “Human papillomavirus infection”, “Adaptive Immune System”, “Signaling by Receptor Tyrosine Kinases” and “Focal adhesion: PI3K-Akt-mTOR-signaling pathway” (see Supplementary Table S3 and Supplementary Figure S2 for the genes associated with enrichment terms and their network respectively).

Figure 6.

Toxicogenomic database inferences for the candidate gene set showing genes interaction with risk factors of ESCC and drug chemicals used in treatment of ESCC. Strong color gradient for the genes indicate high inference score whereas light color indicate low inference score.

Evidently, literature reports support implication of these terms in cancer etiology, more particularly in ESCC. For example – i) Extracellular matrix remodeling: Tumor-induced ECM alterations are a hallmark of malignancy. It increases tissue stiffness and desmoplasia around the tumor [45]. ii) Ossification: the abnormal development of bone tissue within periarticular soft tissue is reported in some cancers [51] albeit not well reported in ESCC. iii) Protein ubiquitination: ubiquitination is the second most common post translational modification (PTM) for proteins and aberrant ubiquitination may lead to cancer development and progression [52]. iv) Collagen trimer: Collagen is a key structural component of ECM and its breakdown by matrix metalloproteinases (MMPs) and other proteases promote angiogenesis and tumor invasion [53]. v) Cell cycle, cell division and negative regulation of cell differentiation: alteration in the cell-cycle and apoptotic machineries allow cancer cells to escape the normal control of cell proliferation and cell death. vi) Human papillomavirus infection: HPV infection is a known risk factor for ESCC [54]. vii) Adaptive Immune System: Tumor-associated antigens are triggers of the immune response. They activate the T cell response, an important line of defense against tumorigenesis [55]. viii) Signalling by Receptor Tyrosine Kinases (RTKs): RTKs are a family of integral cell surface membrane receptors. They are essential to mediate cell-to-cell communication and play key roles in cellular growth, differentiation, metabolism and motility [56]. ix) Focal adhesion: PI3K-Akt-mTOR-signaling pathway: This pathway plays a crucial role in the regulation of cell growth, differentiation, migration, metabolism and proliferation. The PI3K/Akt/mTOR pathway is known to be activated in the development and progression of ESCC [57].

Figure 7.

The Kaplan Meier plot showing differences in overall survival between dysregulated vs intact expression of genes with survival curves. Genes including A) MCM10, B) CEP170 shown survival benefit whereas C) STC2 and D) DBNDD1 shown survival risk in 90 ESCC patients using their RNAseq based gene expression and clinical data from TCGA.

3.3 Toxicogenomic inferences

In order to investigate the chemical-gene interactions for our candidate gene set we referred to the Comparative Toxicogenomics Database (CTD) [58, 59] that catalogs the manually curated information about chemical – gene/protein interactions, chemical – disease and gene – disease relationships. We could map 240 candidate genes in the CTD. We found these genes were linked to various risk factors, chemical carcinogens and drugs in relation to ESCC, which includes – 223 genes associated with Tobacco Smoke Pollution, 18 genes with nitroso benzylmethylamine, 3 genes with 4-Nitroquinoline-1-oxide, 49 genes with trace element- Zinc, 168 genes with Cisplatin, 54 with Fluorouracil, 25 with Mitomycin, 6 with Docetaxel, 5 with diallyl trisulfide, 1 with each Vinorelbine, isoalantolactone, and longikaurin A) of therapeutic values in ESCC treatment. Among 240 gene relationships with chemicals, 4 genes (such as KRT13, ALDH2, BLVRB, FAT1) had ‘Direct’ evidence to chemical exposure. Of them, 3 genes (such as KRT13, ALDH2, BLVRB, and FAT1) were ‘core’ genes (see Supplementary Table S4). A network of interaction between differentially expressed genes and chemical exposures is illustrated in Fig. 6. An exercise of mapping core genes onto the network revealed the interaction of 26 core genes (CYP3A5, CRNN, ADH7, KRT13, PTK6, SULT2B1, ZNF750, HLTF, ITGA6, PRSS3, UBE2C, CENPF, FSCN1, PAX9, ALDH2, CYP11A1, MMP10, ENAH, KLF4, DSC2, DHCR7, BIRC5, CYP4F3, FAT1, SNAI2, MMP7) with Tobacco Smoke Pollution, 6 core genes (CYP3A5, UBE2C, ALDH2, KLF4, BIRC5, SNAI2) with Zinc, 4 core genes (ADH7, ALDH2, DHCR7, FAT1) with nitrosobenzylmethylamine and 1 core gene (MMP10) with 4-Nitroquinoline-1-oxide. Further, interaction of 21 core genes (CYP3A5, ADH7, SULT2B1, WDHD1, ITGA6, UBE2C, CENPF, FSCN1, ALDH2, CYP11A1, MMP10, ENAH, COL11A1, KLF4, DSC2, DHCR7, CDK4, BIRC5, CYP4F3, SNAI2, MMP7) with cisplatin, 5 (UBE2C, ENAH, KLF4, BIRC5, SNAI2) with Fluorouracil, 3 (BIRC5, SNAI2, MMP7) with diallyl trisulfide, 3 (CENPF, BIRC5, FAT1) with Mitomycin and 1 (BIRC5) with Docetaxel.

3.3.1 Identification of candidate genes associated with prognosis in ESCC

By using gene expression data of 90 ESCC patients from The Cancer Genome Atlas (TCGA) and a Cox proportional hazard regression analysis, we predicted the clinical outcomes of candidate genes in ESCC patients. In total, 4 genes – MCM10, CEP170, STC2 and DBNDD1, were significantly correlated with overall survival. We found upregulation of MCM10, and CEP170 have beneficial effect with Hazard Ratio (HR) $-$ 2.08 and $-$ 2.28 respectively, whereas upregulation of STC2 and downregulation of DBNDD1 have risk effect with HR 2.82 and 2.27 respectively, for the survival of ESCC patients (see Fig. 7). Importantly, differences in Mean $\pm$ SD of log2CPM values between tumor-normal for all the four genes shown sign consistency with logFC values of respective genes in meta-analysis, but the difference was statistically significant ( $P$ -value $<$ 0.05) for MCM10, CEP170 and STC2 except for, DBNDD1(see Table 5).

Table 5
Cox proportional Hazard model for prognosis of ESCC in 90 patient’s data accessed from TCGA

Candidate genes shown to affect survival of patients	Meta- analysis: logFC	Meta- analysis: $P$	Log2CPM in tumor (Mean $\pm$ SD)	Log2CPM in normal (Mean $\pm$ SD)	Test for difference in Mean- $P$ -value	Hazard ratio (95% CI)	$P$ -value of survival analysis upon comparison between genes intact (reference) and perturbed
MCM10-upregulated	1.14	9.14E-09	4.050527	$-$ 0.670704	8.4e-05	$-$ 2.089 (0.19–0.94)	0.036
CEP170-upregulated	0.79	3.87E-05	5.59891	4.207195	5.7e-04	$-$ 2.285 (0.10–0.83)	0.022
STC2-upregulated	1.31	5.31E-05	4.452297	1.015182	5.0e-06	2.825 (1.70–19.02)	0.004
DBNDD1-downregulated	$-$ 0.92	6.13E-05	3.86811	4.043039	7.0e-01	2.272 (1.16–8.28)	0.023

4. Discussion

Here we presented a Meta-analysis of 10 globally published and unpublished mRNA expression of ESCC tumor-normal tissue datasets, and report 255 differentially regulated genes, of which we classify 31 as core genes are associated with the previously known molecular events in relation to ESCC. We explored the Gene-Gene interactions of the 255 candidate genes including core genes that are possibly involved in the perturbation of the interconnected genes to bring about phenotypic changes. We believe our meta-analysis and the gene interaction network presented here shed light on the new set of genes that have not been investigated in the context of ESCC so far. We report 18 such perturbed genes in ESCC etiology, of which DSG1 showed 1.9 fold downregulation. Functionally, desmogleins are cadherin-like transmembrane glycoproteins, along with desmocollins they form a major component of the intercellular junction – desmosome. Desmosomes are essential for epithelial differentiation and the loss of intercellular adhesion that facilitates tumor cell invasion and metastasis. The encoded protein has been identified as a target of auto-antibodies in the autoimmune skin blistering disease pemphigus foliaceus, a clinical presentation seen with esophageal cancer [60, 61]. Downregulation of this gene has also been observed with eosinophilic esophagitis (EE) (a chronic inflammation within the esophageal mucosa) and the silencing of DSG1 resulting into the weakening of esophageal epithelial integrity, which could result in the cell separation and impaired barrier function [62]. EE may predispose patients to malignant transformation. However, there are only a few case reports on co-occurrence of EE and ESCC [63]. In line with our observation in ESCC, DSG1 is also downregulated in anal carcinoma of squamous cells [64]. Additionally, DSG1 is also reported to be a receptor for Staphylococcus aureus cell wall anchored Serine-Aspartate repeat containing protein D (SdrD). The evidence presented in a study [65] suggests a strong association between squamous cell carcinoma and presence of S. aureus [65] , a Gram-positive spherically shaped bacterium frequently found in the upper respiratory tract and on the skin. In recent studies a crucial link between S. aureus and smoking has also has shown that, normally harmless S. aureus in humans while adopting to smoking induced oxidative stress, becomes more virulent [66] and infects persistently [67]. When all these crucial links are put together, DSG1 could be an important early diagnostic marker which was plausibly missed by the ESCC research studies so far. Here we also point at two other differentially regulated genes, TMPRSS11D and PRSS3 (belonging to a family of serine proteases) that are known for regulating the interaction between microbes and the immune system. In human hosts, TMPRSS11D and PRSS3 are known to mediate proteolytic activation and replication of viruses respectively [68, 69]. Further, while identifying plausible bacterial role (S. aureus), we identified links to plausible HPV infection in ESCC etiology, where the following set of genes SPP1, HEY1, LAMC2, COL6A3, ITGA6, COL1A1 were involved in the viral infection. In support of the molecular connections seen, a recent study of HPV infections in the oral cavity of mice progressing to the cancer stage in conjunction with environmental pollutants and tobacco smoke has been identified in oral squamous cell carcinoma [70].

Interestingly, many of the biological processes or pathways identified from the functional enrichment analysis are ‘bittersweet’ in function, i.e, they could possess a dual regulatory role in cancer. For example, ECM (which is consistently overrepresented in our candidate genes) components possess both tumor-suppressing and tumor-promoting properties [45]. This stands true also for the “protein ubiquitination”, since ubiquitination is a reversible process, it can be utilized in cell reprogramming [52]. We also point at a set of the collagen related genes (specifically COL11A1, which is upregulated with fold change 2.23) that could be a potential target for ESCC treatment [53]. We see that our candidate genes set also overrepresents the RTKs pathways. The genes involved in this pathway are potential pharmacological targets for the treatment of many malignancies associated with oncogenic activation of RTKs. Also, many small molecule RTK inhibitors are clinically approved for treatment of several cancers [56]. We have shown the PI3K-Akt-mTOR-signaling pathway genes are enriched in our candidate gene set, and the inhibitors (including Oridonin-as Akt inhibitor, Rapamycin-as mTORC1 inhibitor and BEZ235- as an ATP-competitive dual pan-PI3K and mTORC1/mTORC2 inhibitor) targeting PI3K/Akt/mTOR pathway are shown to play an important role in ESCC [57]. Therefore, the candidate genes we present here are important to be further investigated and could possess potential value in guiding the treatment for ESCC.

By using the CTD, here we also unraveled the interaction of our candidate gene set with several cancer-causing environmental risk factors, chemicals and drug targets. One of the interesting interactions include the GPR19 (novel perturbed gene in relation to ESCC) with trace element- Zinc. Prolonged Zinc deficiency is known to cause inflammation in esophageal mucosa. It is previously shown that the treatment with low doses of the environmental carcinogen, N-nitrosomethylbenzylamine (NMBA) in zinc-deficient rats, the incidence of ESCC increased to 66%, whereas, in case of zinc-sufficient rats, the low doses of NMBA showed no cancers [11]. Inferences from CTD showed 8 other differentially regulated genes - CD24, ANXA3, DUSP5, TNFRSF12A, ALDH2, CEBPB, TMPRSS11D, and CXCL10 also interact with zinc as well as nitroso methyl benzylamine. This suggests that zinc could be an important risk factor with great therapeutic value in ESCC.

Based on the candidate genes, their interactions and the toxicogenomics network we present here, gives the opportunity to formulate several important questions towards the investigation of ESCC. Such as, the mode of action of risk factors and drug resistance [71, 72]. For instance, toxicogenomic inferences of differential genes have shown Tobacco smoke pollution and cisplatin share a larger number ( $n=$ 154) of genes (Fig. 6), that could explain a shared molecular basis for nicotine mediated cisplatin resistance in cancer patients. Interestingly, we also found important risk factors such as zinc (6 core genes namely, CYP3A5-rs17161780, UBE2C-amplification, ALDH2-rs1064933-rs886205-rs671, KLF4-Acetylation/hsa-miR-25, BIRC5-hsa-miR-203a, SNAI2-hsa-miR-203a) and nitroso methyl benzylamine (5 core genes, ADH7-rs17028973, ALDH2-rs1064933-rs886205-rs671, DHCR7-Amplification, FAT1-p.351_351del-p.K3040fs) shared 1 core gene (ALDH2). While ALDH2 is known to be involved in alcohol metabolism, interact with tobacco smoking and is also a link between the zinc and nitroso methyl benzylamine, it becomes a crucial gene to investigate environmental exposure interaction. Additionally, we found KLF4, a Zinc finger containing transcription factor downregulated (FC $=$ $-$ 1.05) in ESCC. Given that KLF4 is known to carry acetylation sites, and as a target for miRNA-based regulation, where several miRNAs [73] that are known to inculpate in zinc deficiency, it is intriguing to investigate if the KLF4 could act as molecular switch in response to the zinc deficiency. Such that it could either get activated by acetylation or deactivated based on the miRNA regulation in response to zinc levels.

Among the candidate genes, the survival analysis showed STC2 (Stanniocalcin-2) gene has a risk effect on survival of ESCC patients. This gene is shown to play a critical role in calcium-phosphate homeostasis and is seen overexpressed in a broad spectrum of tumor cells [74, 75] including esophagus. Under stress conditions like ER stress, hypoxia and nutrient deprivation it gets stimulated and promotes cell proliferation, migration and immune response. Further, known to promote the development of acquired resistance to chemo- and radio-therapies [76]. Hence, STC2 is a potential cross-cancer biomarker and a therapeutic target.

There are some potential limitations to consider in this study. Due to our use of a meta-analysis approach, we were only able to examine the common genes (5080) found in the datasets for the differential expression (DE) analysis. Unfortunately, a significant number of genes had to be excluded from the analysis due to variations in gene coverage among different technology platforms. Consequently, we were unable to identify potential post-transcriptional modifications or translational end products for many genes using ESCC ATLAS.

5. Conclusion

In this contribution, using the mRNA expression profiles of publicly available 10 datasets (together comprising 283 tumor and 290 normal quality samples), we highlight molecular connections to microbiota of esophagus, environmental chemical exposures, and (deficiency of) trace element zinc in conjunction with core genes led adjoining gene perturbation in ESCC etiology, and gene interaction with therapeutic agents. Our attempt of functional enrichment of the candidate genes deciphered crucial ‘bittersweet’ biological pathways which can be exploited for the ESCC treatment.

Availability of data and materials

The data generated in this study are available within the article and its supplementary data files.

Funding

This research project was funded by the Deanship of Scientific Research, Princess Nourah bint Abdulrahman University, through the Program of Research Project Funding After Publication, grant No (43- PRFA-P-8)

Authors’ contributions

P.H, A.A, and A.B conceptualized the study. V.P.G, P.S.G, S.R, and P.H performed data analysis, V.P.G, J.P and P.H did interpretation the analysis results. V.P.G, and P.H wrote the manuscript. J.P, R.N.S, S.M, D.M, A.T, L.B.V, A.K.B and M.W.C were involved in revision for important intellectual content of the manuscript. A.A, A.B and PH supervised the research work. All authors have reviewed and approved the final manuscript.

Supplementary data

The supplementary files are available to download from http://dx.doi.org/10.3233/CBM-230145.

sj-docx-1-cbm-10.3233_CBM-230145.docx - Supplemental material

Supplemental material, sj-docx-1-cbm-10.3233_CBM-230145.docx

sj-xlsx-1-cbm-10.3233_CBM-230145.xlsx - Supplemental material

Supplemental material, sj-xlsx-1-cbm-10.3233_CBM-230145.xlsx

sj-xlsx-2-cbm-10.3233_CBM-230145.xlsx - Supplemental material

Supplemental material, sj-xlsx-2-cbm-10.3233_CBM-230145.xlsx

Footnotes

Conflict of interest

Authors do not have any competing interests.

References

Reichenbach

Z.W.

Murray

M.G.

Saxena

Farkas

Karassik

E.G.

Klochkova

Patel

Tice

Hall

T.M.

Gang

Parkman

H.P.

Ward

S.J.

Tétreault

M.-P.

and Whelan

K.A.

, Clinical and translational advances in esophageal squamous cell carcinoma, Adv Cancer Res 144 (2019), 95–135.

Abnet

C.C.

Arnold

and Wei

W.-Q.

, Epidemiology of Esophageal Squamous Cell Carcinoma, Gastroenterology.154 (2018), 360–373.

Wang

J.M.

Rao

J.Y.

Shen

H.B.

Xue

H.C.

and Jiang

Q.W.

, Diet habits, alcohol drinking, tobacco smoking, green tea drinking, and the risk of esophageal squamous cell carcinoma in the Chinese population, Eur J Gastroenterol Hepatol 19 (2007), 171–176.

Okello

Churchill

Owori

Nasasira

Tumuhimbise

Abonga

C.L.

Mutiibwa

Christiani

D.C.

and Corey

K.E.

, Population attributable fraction of Esophageal squamous cell carcinoma due to smoking and alcohol in Uganda, BMC Cancer 16 (2016), 446.

Toh

Oki

Ohgaki

Sakamoto

Ito

Egashira

Saeki

Kakeji

Morita

Sakaguchi

Okamura

and Maehara

, Alcohol drinking, cigarette smoking, and the development of squamous cell carcinoma of the esophagus: molecular mechanisms of carcinogenesis, Int J Clin Oncol 15 (2010), 135–144.

Ghosh

N.R.

and Jones

L.A.

, Dietary risk factors for esophageal cancer based on World Health Organization regions, Nutr Burbank Los Angel Cty Calif.95 (2022), 111552.

Tarazi

Chidambaram

and Markar

S.R.

, Risk Factors of Esophageal Squamous Cell Carcinoma beyond Alcohol and Smoking, Cancers 13 (2021), 1009.

Lin

Wang

Huang

Liu

Zhao

I.T.S.

and Christiani

D.C.

, Consumption of salted meat and its interactions with alcohol drinking and tobacco smoking on esophageal squamous-cell carcinoma, Int J Cancer 137 (2015), 582–589.

Amer

M.H.

, Epidemiologic Aspects of Esophageal Cancer in Saudi Arabian Patients, Annals of Saudi Medicine 5 (1985), 69–77.

10.

Domper Arnal

M.J.

Ferrández Arenas

Á.

and Lanas Arbeloa

Á.

, Esophageal cancer: Risk factors, screening and endoscopic treatment in Western and Eastern countries, World J Gastroenterol 21 (2015), 7933–7943.

11.

Taccioli

Chen

Jiang

Liu

X.P.

Huang

Smalley

K.J.

Farber

J.L.

Croce

C.M.

and Fong

L.Y.

, Dietary zinc deficiency fuels esophageal cancer development by inducing a distinct inflammatory signature, Oncogene 31 (2012), 4550–4558.

12.

Hashemi

S.M.

Mashhadi

Moghaddam

A.A.

Yousefi

Mofrad

A.D.

Sadeghi

and Allahyari

, The Relationship between Serum Selenium and Zinc with Gastroesophageal Cancers in the Southeast of Iran, Indian J Med Paediatr Oncol Off J Indian Soc Med Paediatr Oncol 38 (2017), 169–172.

13.

Tungekar

Mandarthi

Mandaviya

P.R.

Gadekar

V.P.

Tantry

Kotian

Reddy

Prabha

Bhat

Sahay

Mascarenhas

Badkillaya

R.R.

Nagasampige

M.K.

Yelnadu

Pawar

Hebbar

and Kashyap

M.K.

, ESCC ATLAS: A population wide compendium of biomarkers for Esophageal Squamous Cell Carcinoma, Sci Rep 8 (2018), 12715.

14.

Yang

Y.-M.

Hong

W.W.

Q.-Y.

and Li

, Advances in targeted therapy for esophageal cancer, Signal Transduct Target Ther 5 (2020), 229.

15.

Zhang

Yang

Y.-D.

Liu

S.-L.

J.-H.

and Liu

M.-Z.

, Comparing docetaxel plus cisplatin versus fluorouracil plus cisplatin in esophageal squamous cell carcinoma treated with neoadjuvant chemoradiotherapy, Jpn J Clin Oncol 47 (2017), 683–689.

16.

Barrett

Wilhite

S.E.

Ledoux

Evangelista

Kim

I.F.

Tomashevsky

Marshall

K.A.

Phillippy

K.H.

Sherman

P.M.

Holko

Yefanov

Lee

Zhang

Robertson

C.L.

Serova

Davis

and Soboleva

, NCBI GEO: archive for functional genomics data sets–update, Nucleic Acids Res 41 (2013), D991-995.

17.

Athar

Füllgrabe

George

Iqbal

Huerta

Ali

Snow

Fonseca

N.A.

Petryszak

Papatheodorou

Sarkans

and Brazma

, ArrayExpress update – from bulk to single-cell expression data, Nucleic Acids Res 47 (2019), D711–D715.

18.

Bono

, All of gene expression (AOE): An integrated index for public gene expression databases, PloS One 15 (2020), e0227076.

19.

Kauffmann

Gentleman

and Huber

, arrayQualityMetrics – a bioconductor package for quality assessment of microarray data, Bioinforma Oxf Engl 25 (2009), 415–416.

20.

Tong

Chan

K.W.

Bao

J.Y.J.

Wong

K.Y.

Chen

J.-N.

Kwan

P.S.

Tang

K.H.

Qin

Y.-R.

Lok

Guan

X.-Y.

and Ma

, Rab25 is a tumor suppressor gene with antiangiogenic and anti-invasive activities in esophageal squamous cell carcinoma, Cancer Res 72 (2012), 6024–6035.

21.

Gautier

Cope

Bolstad

B.M.

and Irizarry

R.A.

, affy – analysis of Affymetrix GeneChip data at the probe level, Bioinforma Oxf Engl 20 (2004), 307–315.

22.

Ritchie

M.E.

Silver

Oshlack

Holmes

Diyagama

Holloway

and Smyth

G.K.

, A comparison of background correction methods for two-colour microarrays, Bioinforma Oxf Engl 23 (2007), 2700–2707.

23.

Ritchie

M.E.

Diyagama

Neilson

van Laar

Dobrovic

Holloway

and Smyth

G.K.

, Empirical array quality weights in the analysis of microarray data, BMC Bioinformatics 7 (2006), 261.

24.

Pagès

Carlson

Falcon

and Li

, AnnotationDbi: Manipulation of SQLite-based annotations in Bioconductor, R package version 1580. (2022).

25.

Harrison

P.F.

Pattison

A.D.

Powell

D.R.

and Beilharz

T.H.

, Topconfects: a package for confident effect sizes in differential expression analysis provides a more biologically useful ranked gene list, Genome Biol 20 (2019), 67.

26.

McCarthy

D.J.

and Smyth

G.K.

, Testing significance relative to a fold-change threshold is a TREAT, Bioinforma Oxf Engl 25 (2009), 765–771.

27.

Dressler

Bortolomeazzi

Keddar

M.R.

Misetic

Sartini

Acha-Sagredo

Montorsi

Wijewardhane

Repana

Nulsen

Goldman

Pollitt

Davis

Strange

Ambrose

and Ciccarelli

F.D.

, Comparative assessment of genes driving cancer and somatic evolution in non-cancer tissues: an update of the Network of Cancer Genes (NCG) resource, Genome Biol 23 (2022), 35.

28.

Song

Gao

Zhang

Wang

Zhou

Liu

Zhao

Huang

Fan

Dong

Chen

Yang

Chen

Zhuang

Huang

Qiu

Yin

Guo

Feng

Chen

Zhao

Luo

Chen

Tong

Wang

Liu

Lin

Zhang

Yang

Wang

and Zhan

, Identification of genomic alterations in oesophageal squamous cell cancer, Nature 509 (2014), 91–95.

29.

Lin

D.-C.

Hao

J.-J.

Nagata

Shang

Meng

Sato

Okuno

Varela

A.M.

Ding

L.-W.

Garg

Liu

L.-Z.

Yang

Yin

Shi

Z.-Z.

Jiang

Y.-Y.

W.-Y.

Gong

Zhang

Kalid

Shacham

Ogawa

Wang

M.-R.

and Koeffler

H.P.

, Genomic and molecular characterization of esophageal squamous cell carcinoma, Nat Genet 46 (2014), 467–473.

30.

Wang

Jia

Y.-M.

Zuo

Wang

Y.-D.

Fan

Z.-S.

Feng

Zhang

Han

Lyu

W.-J.

and Ni

Z.-Y.

, Gene mutations of esophageal squamous cell carcinoma based on next-generation sequencing, Chin Med J (Engl) 134 (2021), 708–715.

31.

Mangalaparthi

K.K.

Patel

Khan

A.A.

Manoharan

Karunakaran

Murugan

Gupta

Khanna-Gupta

Chaudhuri

Kumar

Nair

Kumar

R.V.

Prasad

T.S.K.

Chatterjee

Pandey

and Gowda

, Mutational Landscape of Esophageal Squamous Cell Carcinoma in an Indian Cohort, Front Oncol 10 (2020), 1457.

32.

Doncheva

N.T.

Morris

J.H.

Gorodkin

and Jensen

L.J.

, Cytoscape StringApp: Network Analysis and Visualization of Proteomics Data, J Proteome Res 18 (2019), 623–632.

33.

Piñero

Saüch

Sanz

and Furlong

L.I.

, The DisGeNET cytoscape app: Exploring and visualizing disease genomics data, Comput Struct Biotechnol J 19 (2021), 2960–2967.

34.

Chen

Guo

Dai

Feng

Zhou

Tang

Zhan

Liu

and Yu

, clusterProfiler 4.0: A universal enrichment tool for interpreting omics data, Innov Camb Mass 2 (2021), 100141.

35.

Hernandez-Ferrer

and Gonzalez

J.R.

, CTDquerier: a bioconductor R package for Comparative Toxicogenomics DatabaseTM data extraction, visualization and enrichment of environmental and toxicological studies, Bioinforma Oxf Engl 34 (2018), 3235–3237.

36.

Davis

A.P.

King

B.L.

Mockus

Murphy

C.G.

Saraceni-Richards

Rosenstein

Wiegers

and Mattingly

C.J.

, The Comparative Toxicogenomics Database: update 2011, Nucleic Acids Res 39 (2011), D1067-1072.

37.

Colaprico

Silva

T.C.

Olsen

Garofano

Cava

Garolini

Sabedot

T.S.

Malta

T.M.

Pagnotta

S.M.

Castiglioni

Ceccarelli

Bontempi

and Noushmehr

, TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data, Nucleic Acids Res 44 (2016), e71.

38.

Cancer Genome Atlas Research Network, Analysis Working Group: Asan University, BC Cancer Agency, Brigham and Women’s Hospital, Broad Institute, Brown University, Case Western Reserve University, Dana-Farber Cancer Institute, Duke University, Greater Poland Cancer Centre, Harvard Medical School, Institute for Systems Biology, KU Leuven, Mayo Clinic, Memorial Sloan Kettering Cancer Center, National Cancer Institute, Nationwide Children’s Hospital, Stanford University, University of Alabama, University of Michigan, University of North Carolina, University of Pittsburgh, University of Rochester, University of Southern California, University of Texas MD Anderson Cancer Center, University of Washington, Van Andel Research Institute, Vanderbilt University, Washington University, Genome Sequencing Center: Broad Institute, Washington University in St. Louis, Genome Characterization Centers: BC Cancer Agency, Broad Institute, Harvard Medical School, Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins University, University of North Carolina, University of Southern California Epigenome Center, University of Texas MD Anderson Cancer Center, Van Andel Research Institute, Genome Data Analysis Centers: Broad Institute, Brown University:, Harvard Medical School, Institute for Systems Biology, Memorial Sloan Kettering Cancer Center, University of California Santa Cruz, University of Texas MD Anderson Cancer Center, Biospecimen Core Resource: International Genomics Consortium, Research Institute at Nationwide Children’s Hospital, Tissue Source Sites: Analytic Biologic Services, Asan Medical Center, Asterand Bioscience, Barretos Cancer Hospital, BioreclamationIVT, Botkin Municipal Clinic, Chonnam National University Medical School, Christiana Care Health System, Cureline, Duke University, Emory University, Erasmus University, Indiana University School of Medicine, Institute of Oncology of Moldova, International Genomics Consortium, Invidumed, Israelitisches Krankenhaus Hamburg, Keimyung University School of Medicine, Memorial Sloan Kettering Cancer Center, National Cancer Center Goyang, Ontario Tumour Bank, Peter MacCallum Cancer Centre, Pusan National University Medical School, Ribeirão Preto Medical School, St. Joseph’s Hospital &Medical Center, St. Petersburg Academic University, Tayside Tissue Bank, University of Dundee, University of Kansas Medical Center, University of Michigan, University of North Carolina at Chapel Hill, University of Pittsburgh School of Medicine, University of Texas MD Anderson Cancer Center, Disease Working Group: Duke University, Memorial Sloan Kettering Cancer Center, National Cancer Institute, University of Texas MD Anderson Cancer Center, Yonsei University College of Medicine, Data Coordination Center: CSRA Inc., and Project Team: National Institutes of Health, Integrated genomic characterization of oesophageal carcinoma, Nature 541 (2017), 169–175.

39.

Eysenck

H.J.

, Meta-analysis and its problems, BMJ.309 (1994), 789–792.

40.

Tsherniak

Vazquez

Montgomery

P.G.

Weir

B.A.

Kryukov

Cowley

G.S.

Gill

Harrington

W.F.

Pantel

Krill-Burger

J.M.

Meyers

R.M.

Ali

Goodale

Lee

Jiang

Hsiao

Gerath

W.F.J.

Howell

Merkel

Ghandi

Garraway

L.A.

Root

D.E.

Golub

T.R.

Boehm

J.S.

and Hahn

W.C.

, Defining a Cancer Dependency Map, Cell 170 (2017), 564–576e16..

41.

Lee

Z.-E.

and Modiri

, Images in clinical Palmoplantar keratoderma associated with esophageal cancer, N Engl J Med 367 (2012), e35.

42.

Ilhan

Erbaydar

Akdeniz

and Arslan

, Palmoplantar keratoderma is associated with esophagus squamous cell cancer in Van region of Turkey: a case control study, BMC Cancer 5 (2005), 90.

43.

Kang

Zhu

Sun

Zhang

Liang

Kou

Zhu

Carbonelli

Sakao

and Zhang

, COL11A1 promotes esophageal squamous cell carcinoma proliferation and metastasis and is inversely regulated by miR-335-5p, Ann Transl Med 9 (2021), 1577.

44.

Chattopadhyay

Singh

Phukan

Purkayastha

Kataki

Mahanta

Saxena

and Kapur

, Genome-wide analysis of chromosomal alterations in patients with esophageal squamous cell carcinoma exposed to tobacco and betel quid from high-risk area in India, Mutat Res 696 (2010), 130–138.

45.

Winkler

Abisoye-Ogunniyan

Metcalf

K.J.

and Werb

, Concepts of extracellular matrix remodelling in tumour progression and metastasis, Nat Commun 11 (2020), 5120.

46.

Yang

and Katz

J.P.

, KLF4 is downregulated but not mutated during human esophageal squamous cell carcinogenesis and has tumor stage-specific functions, Cancer Biol Ther 17 (2016), 422–429.

47.

Otsuka

Akutsu

Sakata

Hanari

Murakami

Kano

Toyozumi

Takahashi

Matsumoto

Sekino

Yokoyama

Okada

Shiraishi

Komatsu

Iida

and Matsubara

, ZNF750 Expression Is a Potential Prognostic Biomarker in Esophageal Squamous Cell Carcinoma, Oncology 94 (2018), 142–148.

48.

Sen

G.L.

Boxer

L.D.

Webster

D.E.

Bussat

R.T.

Zarnegar

B.J.

Johnston

Siprashvili

and Khavari

P.A.

, ZNF750 is a p63 target gene that induces KLF4 to drive terminal epidermal differentiation, Dev Cell 22 (2012), 669–677.

49.

Boxer

L.D.

Barajas

Tao

Zhang

and Khavari

P.A.

, ZNF750 interacts with KLF4 and RCOR1, KDM1A, and CTBP1/2 chromatin regulators to repress epidermal progenitor genes and induce differentiation genes, Genes Dev 28 (2014), 2013–2026.

50.

Zhu

Yan

Rodriguez-Canales

Rosenberg

A.M.

Goldstein

A.M.

Taylor

P.R.

Erickson

H.S.

Emmert-Buck

M.R.

and Tangrea

M.A.

, MicroRNA analysis of microdissected normal squamous esophageal epithelium and tumor cells, Am J Cancer Res 1 (2011), 574–584.

51.

Guo

Collaco

C.R.

and Bruera

, Heterotopic ossification in critical illness and cancer: a report of 2 cases, Arch Phys Med Rehabil 83 (2002), 855–859.

52.

Deng

Meng

Chen

Wei

and Wang

, The role of ubiquitination in tumorigenesis and targeted drug discovery, Signal Transduct Target Ther 5 (2020), 11.

53.

Liu

Lai

Jiang

and Huang

, Collagen XI alpha 1 chain, a potential therapeutic target for cancer, FASEB J Off Publ Fed Am Soc Exp Biol 35 (2021), e21603.

54.

Guo

Liu

Wang

Weiss

N.S.

Madeleine

M.M.

Liu

Tian

Song

Pan

Ning

Yang

Shi

Cai

and Ke

, Human papillomavirus infection and esophageal squamous cell carcinoma: a case-control study, Cancer Epidemiol Biomark Prev Publ Am Assoc Cancer Res Cosponsored Am Soc Prev Oncol 21 (2012), 780–785.

55.

Ribas

, Adaptive Immune Resistance: How Cancer Protects from Immune Attack, Cancer Discov 5 (2015), 915–919.

56.

Saraon

Pathmanathan

Snider

Lyakisheva

Wong

and Stagljar

, Receptor tyrosine kinases and cancer: oncogenic mechanisms and therapeutic approaches, Oncogene 40 (2021), 4079–4093.

57.

Luo

Liu

Huang

Dong

and Li

, PI3K/Akt/mTOR Signaling Pathway: Role in Esophageal Squamous Cell Carcinoma, Regulatory Mechanisms and Opportunities for Targeted Therapy, Front Oncol 12 (2022), 852383.

58.

Davis

A.P.

Grondin

C.J.

Johnson

R.J.

Sciaky

Wiegers

T.C.

and Mattingly

C.J.

, Comparative Toxicogenomics Database (CTD): update 2021, Nucleic Acids Res 49 (2021), D1138–D1143.

59.

Mattingly

C.J.

Rosenstein

M.C.

Colby

G.T.

Forrest

J.N.

and Boyer

J.L.

, The Comparative Toxicogenomics Database (CTD): a resource for comparative toxicological studies, J Exp Zoolog A Comp Exp Biol 305 (2006), 689–692.

60.

Takahashi

Okushiba

Kondo

Morikawa

Hirano

Miyamoto

Shichinohe

Hara

Kawarada

Saito

and Takeuchi

, Esophageal pemphigus vulgaris with carcinoma: postoperative steroid therapy based on pemphigus-related antibodies, Dis Esophagus Off J Int Soc Dis Esophagus 18 (2005), 413–417.

61.

Browne

V.C.

Choi

Capitle

E.M.

and Khianey

, A Case of Severe Refractory Pemphigus Vulgaris in a Patient With Stable Esophageal Malignancy, Cureus 13 (2021), e14576.

62.

Sherrill

J.D.

Djukic

Caldwell

J.M.

Stucke

E.M.

Kemme

K.A.

Costello

M.S.

Mingler

M.K.

Blanchard

Collins

M.H.

Abonia

J.P.

Putnam

P.E.

Dellon

E.S.

Orlando

R.C.

Hogan

S.P.

and Rothenberg

M.E.

, Desmoglein-1 regulates esophageal epithelial barrier function and immune responses in eosinophilic esophagitis, Mucosal Immunol 7 (2014), 718–729.

63.

Fukuchi

Sakurai

Suzuki

Naitoh

Tabe

Fukasawa

Kiriyama

Yokobori

and Kuwano

, Esophageal squamous cell carcinoma with marked eosinophil infiltration, Case Rep Gastroenterol 5 (2011), 648–653.

64.

Myklebust

M.P.

Fluge

Ø.

Immervoll

Skarstein

Balteskard

Bruland

and Dahl

, Expression of DSG1 and DSC1 are prognostic markers in anal carcinoma patients, Br J Cancer 106 (2012), 756–762.

65.

Kullander

Forslund

and Dillner

, Staphylococcus aureus and squamous cell carcinoma of the skin, Cancer Epidemiol Biomark Prev Publ Am Assoc Cancer Res Cosponsored Am Soc Prev Oncol 18 (2009), 472–478.

66.

Kulkarni

Antala

Wang

Amaral

F.E.

Rampersaud

Larussa

S.J.

Planet

P.J.

and Ratner

A.J.

, Cigarette smoke increases Staphylococcus aureus biofilm formation via oxidative stress, Infect Immun 80 (2012), 3804–3811.

67.

Lacoma

Edwards

A.M.

Young

B.C.

Domínguez

Prat

and Laabei

, Cigarette smoke exposure redirects Staphylococcus aureus to a virulence profile associated with persistent infection, Sci Rep 9 (2019), 10798.

68.

Rattanakomol

Srimanote

Tongtawe

Khantisitthiporn

Supasorn

and Thanongsaksrikul

, Host neuronal PRSS3 interacts with enterovirus A71 3A protein and its role in viral replication, Sci Rep 12 (2022), 12846.

69.

Sasaki

Itakura

Kishimoto

Tabata

Uemura

Ito

Sugiyama

Wastika

C.E.

Orba

and Sawa

, Host serine proteases TMPRSS2 and TMPRSS11D mediate proteolytic activation and trypsin-independent infection in group A rotaviruses, J Virol (2021), JVI00398-21.

70.

Christensen

N.D.

Chen

K.-M.

Stairs

D.B.

Sun

Y.-W.

Aliaga

Balogh

K.K.

Atkins

Shearer

Brendle

S.A.

Gowda

Amin

Walter

Viscidi

and El-Bayoumy

, The environmental pollutant and tobacco smoke constituent dibenzo[def,p]chrysene is a co-factor for malignant progression of mouse oral papillomavirus infections, Chem Biol Interact 333 (2021), 109321.

71.

Manyanga

Ganapathy

Bouharati

Mehta

Sadhasivam

Acharya

Zhao

and Queimado

, Electronic cigarette aerosols alter the expression of cisplatin transporters and increase drug resistance in oral cancer cells, Sci Rep 11 (2021), 1821.

72.

Huang

Pan

Zhang

Liu

and Zhang

, Nicotine inhibits apoptosis induced by cisplatin in human oral cancer cells, Int J Oral Maxillofac Surg 36 (2007), 739–744.

73.

Liu

C.-M.

Liang

Jin

D.-J.

Zhang

Y.-C.

Gao

Z.-Y.

and He

Y.-T.

, Research progress on the relationship between zinc deficiency, related microRNAs, and esophageal carcinoma, Thorac Cancer 8 (2017), 549–557.

74.

Wang

and Lin

, Clinical significance of high expression of stanniocalcin-2 in hepatocellular carcinoma, Biosci Rep 39 (2019), BSR20182057.

75.

Lin

Guo

Wen

Lin

Cui

Sang

and Pan

, Survival analyses correlate stanniocalcin 2 overexpression to poor prognosis of nasopharyngeal carcinomas, J Exp Clin Cancer Res CR 33 (2014), 26.

76.

Qie

and Sang

, Stanniocalcin 2 (STC2): a universal tumour biomarker and a potential therapeutical target, J Exp Clin Cancer Res CR 41 (2022), 161.

77.

Clifford

R.J.

Yang

H.H.

Wang

Goldstein

A.M.

Ding

Taylor

P.R.

and Lee

M.P.

, Genome wide analysis of DNA copy number neutral loss of heterozygosity (CNNLOH) and its relation to gene expression in esophageal squamous cell carcinoma, BMC Genomics 11 (2010), 576.

78.

Chen

Y.-K.

Tung

C.-W.

Lee

J.-Y.

Hung

Y.-C.

Lee

C.-H.

Chou

S.-H.

Lin

H.-S.

M.-T.

and Wu

I.-C.

, Plasma matrix metalloproteinase 1 improves the detection and survival prediction of esophageal squamous cell carcinoma, Sci Rep 6 (2016), 30057.

79.

Aoyagi

Minashi

Igaki

Tachimori

Nishimura

Hokamura

Ashida

Daiko

Ochiai

Muto

Ohtsu

Yoshida

and Sasaki

, Artificially induced epithelial-mesenchymal transition in surgical subjects: its implications in clinical and basic cancer research, PloS One 6 (2011), e18196.

80.

Yan

Shih

Rodriguez-Canales

Tangrea

M.A.

Player

Diao

Goldstein

A.M.

Wang

Taylor

P.R.

Lippman

S.M.

Wistuba

I.I.

Emmert-Buck

M.R.

and Erickson

H.S.

, Three-dimensional mRNA measurements reveal minimal regional heterogeneity in esophageal squamous cell carcinoma, Am J Pathol 182 (2013), 529–539.

81.

Wang

and Kemmner

, Wdr66 is a novel marker for risk stratification and involved in epithelial-mesenchymal transition of esophageal squamous cell carcinoma, BMC Cancer 13 (2013), 137.

82.

Yan

Shih

J.H.

Rodriguez-Canales

Tangrea

M.A.

Ylaya

Hipp

Player

Goldstein

A.M.

Taylor

P.R.

Emmert-Buck

M.R.

and Erickson

H.S.

, Identification of unique expression signatures and therapeutic targets in esophageal squamous cell carcinoma, BMC Res Notes 5 (2012), 73.

83.

Yang

H.H.

Wang

Takikita

Wang

Q.-H.

Giffen

Clifford

Hewitt

S.M.

Shou

J.-Z.

Goldstein

A.M.

Lee

M.P.

and Taylor

P.R.

, Global gene expression profiling and validation in esophageal squamous cell carcinoma and its association with clinical phenotypes, Clin Cancer Res Off J Am Assoc Cancer Res 17 (2011), 2955–2966.

84.

Lee

J.J.

Natsuizaka

Ohashi

Wong

G.S.

Takaoka

Michaylira

C.Z.

Budo

Tobias

J.W.

Kanai

Shirakawa

Naomoto

Klein-Szanto

A.J.P.

Haase

V.H.

and Nakagawa

, Hypoxia activates the cyclooxygenase-2-prostaglandin E synthase axis, Carcinogenesis 31 (2010), 427–434.

85.

Nicolau-Neto

Da Costa

N.M.

, de Souza Santos Gonzaga

I.M.

Ferreira

M.A.

Guaraldi

Moreira

M.A.

Seuánez

H.N.

Brewer

Bergmann

Boroni

Mencalha

A.L.

Kruel

C.D.P.

Lima

S.C.S.

Esposito

Simão

T.A.

and Pinto

L.F.R.

, Esophageal squamous cell carcinoma transcriptome reveals the effect of FOXM1 on patient outcome through novel PIK3R3 mediated activation of PI3K signaling pathway, Oncotarget 9 (2018), 16634–16647.

86.

Yang

Wang

Giffen

Goldstein

A.M.

Lee

M.P.

and Taylor

P.R.

, Integrated analysis of genome-wide miRNAs and targeted gene expression in esophageal squamous cell carcinoma (ESCC) and relation to prognosis, BMC Cancer 20 (2020), 388.

87.

Saito

Morishima

Hoshino

Matsubara

Ishikawa

Aburatani

Fukayama

Hosoya

Sata

Lefor

A.K.

Yasuda

and Niki

, The role of HGF/MET and FGF/FGFR in fibroblast-derived growth stimulation and lapatinib-resistance of esophageal squamous cell carcinoma, BMC Cancer 15 (2015), 82.

88.

Shimokuni

Tanimoto

Hiyama

Otani

Ohtaki

Hihara

Yoshida

Noguchi

Kawahara

Natsugoe

Aikou

Okazaki

Hayashizaki

Sato

Todo

Hiyama

and Nishiyama

, Chemosensitivity prediction in esophageal squamous cell carcinoma: novel marker genes and efficacy-prediction formulae using their expression data, Int J Oncol 28 (2006), 1153–1162.

89.

Erkizan

H.V.

Johnson

Ghimbovschi

Karkera

Trachiotis

Adib

Hoffman

E.P.

and Wadleigh

R.G.

, African-American esophageal squamous cell carcinoma expression profile reveals dysregulation of stress response and detox networks, BMC Cancer 17 (2017), 426.

A comprehensive analysis of mRNA expression profiles of Esophageal Squamous Cell Carcinoma reveals downregulation of Desmoglein 1 and crucial genomic targets

Abstract

AIM:

METHOD:

RESULT:

CONCLUSION:

Keywords

1. Introduction

2. Materials and methods

2.1 Source and collection of datasets

2.2 Quality control measures and tools for analysis

2.2.1 Data enhancement and annotation

2.2.2 Differential gene expression of individual datasets and Meta-analysis

2.2.3 Gene to gene network analysis

2.3 Functional enrichment analysis

2.4 Toxicogenomics analysis

Table 1 Characteristics of datasets used in the meta-analysis

3. Results

3.1 Characteristics of datasets

Table 3 List of genes consistently perturbed with > 1.5 and < - 1.5 summary fold change

3.2 Functional enrichment of genes

3.3.1 Identification of candidate genes associated with prognosis in ESCC

Table 5 Cox proportional Hazard model for prognosis of ESCC in 90 patient’s data accessed from TCGA

5. Conclusion

Availability of data and materials

Funding

Authors’ contributions

Supplementary data

sj-docx-1-cbm-10.3233_CBM-230145.docx - Supplemental material

sj-xlsx-1-cbm-10.3233_CBM-230145.xlsx - Supplemental material

sj-xlsx-2-cbm-10.3233_CBM-230145.xlsx - Supplemental material

Footnotes

Conflict of interest

References

Table 1
Characteristics of datasets used in the meta-analysis

Table 3
List of genes consistently perturbed with $>$ 1.5 and $<$ $-$ 1.5 summary fold change

Table 5
Cox proportional Hazard model for prognosis of ESCC in 90 patient’s data accessed from TCGA