Abstract
Southeast Asia, especially India, is well known for the highest use of smokeless tobacco. These products are known to induce oral squamous cell carcinoma. However, not all long-term tobacco-chewers develop oral squamous cell carcinoma. In addition, germline variants play a crucial role in susceptibility, prognosis, development, and progression of the disease. These prompted us to study the genetic susceptibility to oral squamous cell carcinoma among the long-term tobacco-chewers. Here, we presented a retrospective study on prolonged tobacco-chewers of Northeast India to identify the potential protective or risk-associated germline variants in tobacco-related oral squamous cell carcinoma along with HPV infection. Targeted re-sequencing (n = 60) of 170 genetic regions from 75 genes was carried out in Ion-PGM™ and validation (n = 116) of the observed variants was done using Sequenom iPLEX MassARRAY™ platform followed by polymerase chain reaction–based HPV genotyping and p16-immunohistochemistry study. Subsequently, estimation of population structure, different statistical and in silico approaches were undertaken. We identified one nonsense-mediated mRNA decay transcript variant in the DFNA5 region (rs2237306), associated with Benzo(a)pyrene, as a protective factor (odds ratio = 0.33; p = 0.009) and four harmful (odds ratio > 2.5; p < 0.05) intronic variants, rs182361, rs290974, and rs169724 in SYK and rs1670661 in NELL1 region, involved in genetic susceptibility to tobacco- and HPV-mediated oral oncogenesis. Among the oral squamous cell carcinoma patients, 12.6% (11/87) were HPV positive, out of which 45.5% (5/11) were HPV16-infected, 27.3% (3/11) were HPV18-infected, and 27.3% (3/11) had an infection of both subtypes. Multifactor dimensionality reduction analysis showed that the interactions among HPV and NELL1 variant rs1670661 with age and gender augmented the risk of both non-tobacco- and tobacco-related oral squamous cell carcinoma, respectively. These suggest that HPV infection may be one of the important risk factors for oral squamous cell carcinoma in this population. Finally, we newly report a DFNA5 variant probably conferring protection via nonsense-mediated mRNA decay pathway against tobacco-related oral squamous cell carcinoma. Thus, the analytical approach used here can be useful in predicting the population-specific significant variants associated with oral squamous cell carcinoma in any heterogeneous population.
Keywords
Introduction
There is a wide geographical variation in the incidence of oral squamous cell carcinoma (OSCC) with the highest rate found in the Indian subcontinent (India, Pakistan, Sri Lanka, Bangladesh, etc.) contributing to 25% of the new incident cases. 1 Among these countries, approximately 1 million OSCC cases are being diagnosed each year in India, with a male-to-female ratio of 2:1 and the most common site is gingivobuccal complex followed by the tongue. 2 Among the total oral cancer cases across the world, nearly 30% occur in India, whereas the western world and USA have less than 10% and 3% oral cancer cases, respectively. 3
In India, oral cancer is ranked as the most common cancer types in males, with an annual incidence of 53,842 and is the fifth most common cancer in females, with a yearly incidence of 23,161. 4 Generally, 90%–95% of all the oral cancer cases are found to be squamous cell carcinomas (SCC).5,6 The wide discrepancy in the incidence, as well as mortality rates of oral cancer in different parts around the world, is due to the combined effects of the regional differences in the prevalence of disease-specific risk factors, aging of the population, and other genetic predisposing factors.3,7
Some well-established risk factors such as tobacco, betel quid chewing, smoking, alcoholism, human papilloma virus (HPV) infections,5,6,8,9 and poor oral hygiene 5 are significantly associated with this anatomical and clinical subtype of head and neck cancer. Among these factors, smokeless tobacco (SLT) is one of the single most important causes for the higher incidence of oral cancer. 10 Globally, almost 90% of OSCC cases are associated with tobacco habits in different forms. 11 It has been reported that nearly 85% of the total SLT usage burden is in Southeast Asia. 12 A recent data from national surveys showed a clear shift in the product preference from smoking to SLT in Southeast Asia. 13 Subsequently, cancers of the buccal mucosa, tongue along with hypo-pharynx, are very common, especially in India, 14 attributing to 74% of the total global SLT usage burden. 12
According to the Global Adult Tobacco Survey (GATS) of the Ministry of Health and Family Welfare released in June 2017, nearly 199 million Indians were using SLT. 15 According to the consolidated population-based cancer registry (PBCR) report published by the Indian Council of Medical Research (ICMR) through their National Cancer Registry Program (NCRP) 2016, it has been projected that by 2020, tobacco-related oral cancer will contribute to 30% of the total number of incidences (i.e. approximately 523,471 cases annually) among all the other anatomical sites. 16 Interestingly, the consumption of SLT products in Northeast (NE) India is significantly higher as compared to the other states of India. 17 Therefore, the incidence of tobacco-related oral cancer in the northeastern part is about 33%. 18 These statistical standpoints make this disease a significant public health importance to India, especially for the NE India, which additionally emphasizes the relevance of this study undertaken.
Apart from tobacco, HPV infection is also an important etiological factor in oral cancer. Approximately 26% of all oral cancer cases had high-risk (HR) type of HPV globally, and HPV16 was the major HR subtype, followed by HPV18. 19 However, this trend is population-specific and also varies with the method undertaken for detecting the viral load.9,20 The role of this virus in carcinogenesis typically involves the pE7 and pE6 proteins that hamper the p53-, p21-, and pRb-based signaling cascades. 21
In addition to tobacco and HPV infection, hereditary predisposing genetic factors are very important for the etiopathogenesis of OSCC. The genetic susceptibility is influenced by the germline single nucleotide polymorphisms (SNPs) located within the promoter or other regulatory regions of its genes and is thought to be associated with disease development and progression. Previously, a genome-wide association study (GWAS) conducted on Western Indian population identified 93 SNPs with significantly increased minor allele frequency (MAF) in the tobacco chewing OSCC patients. 22 There are also other high-throughput studies, identifying the important polymorphic genes related to this disease globally as well as in other parts of the Indian subcontinent.23,24 Nevertheless, very scanty or no information about the gene–gene or gene–environment interactions associated with tobacco-related OSCC in NE Indian population is available till date. Therefore, this study was aimed to identify the potential germline variants (protective or risk) associated with tobacco-related oral cancer for the prolonged tobacco-chewers from this population and additionally, we have tailored our analytical approach to derive findings suitable to any heterogeneous population.
In this study, we performed next generation sequencing (NGS)-based targeted re-sequencing, followed by validations using a mass spectroscopy-based high-throughput genotyping technique to identify potential SNPs associated with OSCC in the prolonged tobacco-chewers of NE India. We also performed the population structure estimation, polymerase chain reaction (PCR)-based HPV genotyping, and p16-immunohistochemistry study. Furthermore, multinomial logistic regression (MLR), multifactor dimensionality reduction (MDR) analysis, and different in silico approaches were undertaken to evaluate the effect of the identified SNPs as well as the gene–gene and gene–environment interactions.
Materials and methods
Sampling and demographic data collection
The institutional review board (IRB) of Assam University, Silchar (IEC/AUS/2015-004 dated 4 September 2015), and Cachar Cancer Hospital & Research Centre, Silchar, approved the sampling procedure from different regions of NE India with signed consent form and questionnaire. Data regarding age, gender, type, and duration of consuming smokeless/chewing tobacco of patients as well as healthy subjects were abstracted from hospital records as well as personal interviews.
In total, 50 OSCC patients (case 1) and 74 healthy individuals (without the family history of cancer and without any pre-cancerous lesion/condition) having the prolonged (≥10 years) tobacco-chewing habits along with 52 non-chewers OSCC patients (case 2) from 12 heterogeneous ethnic groups of NE India were considered for this study (see Supplementary Table S1). Surgically removed tissue (7–10 mm) and buccal swab samples were collected aseptically from the histopathologically confirmed OSCC patients and healthy individuals, respectively. In addition, 5-mL peripheral blood was collected for the genetic susceptibility study. The samples were collected during the period of 2013–2015.
Genomic DNA extraction
The genomic DNA from the collected tissue and buccal swab samples was extracted using the standard phenol/chloroform extraction method. 25 Genomic DNA from peripheral blood was extracted using QIAamp DNA Blood Mini Kit following manufacturer’s instructions (Qiagen, Hilden, Germany). The extracted DNA samples were quantitated by Qubit® 2.0 Fluorometer (Life Technologies, Grand Island, NY). The purified DNA samples with good quality were used for NGS-based targeted re-sequencing and high-throughput SNP genotyping experiments.
Targeted re-sequencing in the Ion PGMTM platform on the discovery set
The GWAS related to oral cancer have identified 170 target regions from 75 genes significantly associated with the disease globally and in other parts of Indian subcontinent.22–24 Two pools of multiplex primers were designed by AmpliSeq™ Designer, a web-based tool by Thermo Fisher Scientific, USA, to cover these 170 targets (see Supplementary Table S2). Total panel size was 31.34 Kb. The targeted re-sequencing experiment was carried out in high-throughput Ion PGMTM sequencer for 60 case-control samples comprising 20 case-1, 20 healthy, and 20 case-2 individuals as discovery set. A depth of 423.68 ± 17.01 X (mean ± standard error (SE)) with 95.85% uniformity was achieved using the Ion 318TM chip.
Validation and replication in Sequenom iPLEX MassARRAYTM platform on the confirmation set
The result obtained from the targeted re-sequencing was validated and then replicated in another set of 116 case-control samples (30 case-1 patients, 54 healthy, and 32 case-2 individuals as confirmation set) using the Sequenom iPLEX MassARRAYTM platform. For the genotyping of 115 SNPs (see Supplementary Table S3), four pools of primer pairs (see Supplementary Table S4) were designed. Totally, five sets of experiments were performed using these pools for getting the final SNP genotyping results.
PCR-based HPV genotyping
PCR amplification for the detection of HR-HPV types was carried out with the consensus primers GP5 +/GP6 + followed by the primers for genotyping of HPV16 and HPV18.26,27 The methodology adopted in this study was as per our previous study. 8
Immunohistochemistry for p16 expression
The surgically removed tissue samples from both the case groups were processed for paraffin block preparation and serial 4-µm sections were consecutively cut from the specimens. The sections were then deparaffinized in xylene and was rehydrated by passing through graded alcohols. Finally, the sections were processed for immunohistochemistry study of P16 antibody (JC-8; Santa Cruz Biotechnology) for cases, negative control (BSA), and positive (cervical cancer) control as instruction given in the kit (DAKO). The representative image is given in Supplementary Figure S1.
In each field, 100 cells were counted and area was marked using measure tool. Similarly, total of 1000 cells were counted and % positivity was calculated. All the counting was done at the objective magnification of 40X. High nuclear and high cytoplasmic staining in greater than 70% of tumor cells was considered positive as suggested by the American Society of Clinical Pathology and College of American pathologists previously.28,29 The samples were considered as HPV positive only when found positive in both the methods.
Statistical analysis
Genotypes, allele frequencies, and linkage disequilibrium were evaluated using Haploview software 30 followed by the stratified association study using PLINK. 31 The population structure was then estimated by STRUCTURE using StrAuto.32,33 The number of clusters (K) was estimated by the Delta-K method. 34 We calculated odds ratio (OR) and 95% confidence intervals (CIs) using the MLR model. To strengthen the association between variant genotypes and OSCC risk factors, the ORs were calculated after adjusting the confounding factors such as population/ethnicity estimated from the previous step. The statistical analyses were carried out in SPSS software (PASW Statistics 18, Release 18.0.0, 30 July 2009). Pearson’s chi-square test was performed to evaluate p-values and the p-value of <0.05 was considered as statistically significant.
MDR approach
The MDR approach is used to detect high-order gene–gene and gene–environment interactions associated with the disease risk. 35 The advantage of this approach over conventional parametric statistical methods is to overcome sample size limitations that often encountered in the retrospective study. 36 In this study, we used MDR software package, MDR v3.0.2 (http://multifactordimensionalityreduction.org/) to predict the multifactor models associated with both the types of OSCC susceptibility. The best predictive models were picked out on the basis of the maximum testing balance accuracy (TBA) and corresponding cross-validation consistency (CVC). The p-value of <0.05 was considered to be statistically significant.
Interaction entropy graph
The interaction entropy graph was made based on the MDR result, for better understanding and visualization of interactions among the gene-environmental factors 37 as described in our previous studies.38–40 The entropy graph was constructed to determine the types of interactions (synergistic or redundant) among various genetic as well as environmental factors and then their interaction effects could be compared with the independent effect of each variable. 41
In silico analysis
In silico functional analysis was performed to understand the biological importance of identified significant SNPs. We used Ensembl Variant Effect Predictor tool, 42 ESRsearch tool, 43 and F-SNP 44 for this. The toxicogenomic analysis was done on the Comparative Toxicogenomics Database. 45 Subsequently, BiNGO v3.0.3 and Enrichment Map v2.1.0 were also used to get the over-represented Gene Ontology (GO) categories associated with our identified genes.46,47 Finally, the pathway analysis 48 by constructing a gene–gene network in GeneMANIA v3.5.0 was done to visualize other genes interacting via some important pathways. 49
Results
Demographic characteristics of the participants: females are more susceptible to tobacco-related OSCC than males
The summary statistic of patient demography is given in Table 1. The mean age (±SE) of our studied individuals was 49.14 (± 1.22) years. Majority of the OSCC patients (72.5%; 74/102) were in the age group of >49 years, whereas most of the healthy individuals (73%; 54/74) had the age of ≤49 years. Among the female subjects, we observed that 61.1% (33/54) were prolonged (>10 years) tobacco-chewers, out of which 75.8% (25/33) were OSCC patients, whereas, among the males, 74.6% (91/121) were prolonged (>10 years) tobacco-chewers, out of which only 27.5% (25/91) had OSCC. Among the different cancer sites, the buccal mucosa was found to be the most common site (81.4%; 83/102), followed by the tongue (18.6%; 19/102). The HR-type HPV infection was observed in 12.6% (11/87) OSCC patients, out of which 45.5% (5/11) was HPV16-infected, 27.3% (3/11) had the HPV18 infection, while 27.3% (3/11) was infected by both HPV16 and HPV18. In this context, when we checked the distribution of total HR-HPV infection in our observed cancer sites, we found that buccal mucosa was the only HPV infected site and no tongue cancers had HPV infection. Apart from this, most of the OSCC specimens (>70%) were well-differentiated squamous cell carcinoma (WDSQCC) and majority were in advanced stage IV (54.9%), followed by stage III (18.6%).
Demographic characteristics of study subjects.
HPV: human papilloma virus; PCR: polymerase chain reaction; IHC: immunohistochemistry; MDSQCC: moderately differentiated squamous cell carcinoma; WDSQCC: well-differentiated squamous cell carcinoma.
p-value <0.05 is considered as statistically significant (bold); all of the p-values were estimated for case 1 and case 2 after comparing with the control group.
Identification of strongly associated SNPs from a stratified analysis: DFNA5 variant showed protective role, whereas SYK and NELL1 variants demonstrated harmful effect in OSCC
Our designed custom panel studied in the discovery set (n = 60) contained 1828 SNPs. After performing the targeted re-sequencing experiment, we found 115 SNPs significantly associated with OSCC (see Supplementary Table S3). We verified the same in confirmation set samples (n = 116) and finally obtained strongly associated SNPs are summarized in Supplementary Table S5. We have identified five SNPs significantly associated with tobacco-related OSCC in our studied population. Out of them, the variant rs2237306 from the DFNA5 region was found to have a protective role (OR = 0.33, p = 0.009) and the remaining four SNPs were positively associated (OR > 2.5, p < 0.05) with the disease. We also found that the identified SNPs present in SYK region were in complete linkage disequilibrium (D′= 1) and co-represented (r2 > 0.8) (see Supplementary Figure S2).
Risk assessment of various external factors as well as the identified significant SNPs
While estimating the structure of our studied population, we observed that our heterogeneous samples from 12 different ethnic groups were clustered into the five major language families50,51 as Indo-Aryan, Austro-Asiatic, Sino-Tibetan, Kuki-Chin, and Tibeto-Burman (see Supplementary Figure S3). Based on this knowledge, MLR analysis was performed to assess the risk of the factors considered in this study (Table 2).
Status of various external factors as well as SNPs and their association with OSCC.
SNP: single nucleotide polymorphism; OSCC: oral squamous cell carcinoma; OR: odds ratio adjusted for population; CI: confidence interval; Ref.: reference allele; RA: risk allele; MLR: multinomial logistic regression; HPV: human papilloma virus; Nd: not determined.
This SNP demonstrated the protective effect. Unlike other SNPs, the minor/variant allele of this SNP is the protective allele; p-value <0.05 is considered as statistically significant (bold). For MLR analysis, we have considered 155 samples (case 1 = 40, case 2 = 47, and control = 68). The remaining samples (21 samples) were not considered for the analysis because in those samples, the HPV typing was unsuccessful (given in Table 1).
The subjects in the age group of >49 years had the 9.05-fold increased risk of tobacco-related OSCC and 7.00-fold increased risk of non-tobacco-related OSCC than those who are under the ≤49 years age group. Females had a 4.99-fold increased risk of the former OSCC type and 5.09-fold enhanced risk of latter as compared to males.
Apart from these physical factors, a significant 3.58-fold increased risk of tobacco-related OSCC was observed in the case of risk-allele-containing genotypes CA + CC of rs2237306 as compared to non-risk homozygous AA genotype. In addition, the risk-allele-containing genotypes TC + CC of rs1670661 was found to be associated with 6.57-fold and 2.67-fold enhanced risk of tobacco- and non-tobacco-related OSCC, respectively, and the heterozygous TC genotype was found to be associated with 6.33-fold increased risk of tobacco-related OSCC as compared to non-risk homozygous TT genotype. However, the risk-allele-containing genotypes CA + AA of rs182361 was found to be associated with 2.47-fold and heterozygous CA genotype was associated with 2.50-fold more risk of non-tobacco-related OSCC as compared to homozygous AA genotype.
MDR analysis to predict the multifactor models associated with both the types of OSCC
Interactions between gene–gene and gene–environmental factors for the occurrence of OSCC risk were evaluated by MDR analysis as summarized in Table 3. This analysis suggested that aging was the best “one-factor” model for both the types of OSCC (TBA = 0.7279, CVC = 100/100, p < 0.0001; TBA = 0.5299, CVC = 77/100, p < 0.0001, correspondingly) implying that the increasing age is the most significant factor for predicting the risk of OSCC. A combination of aging and rs1670661 variant was the best “two-factor” model (TBA = 0.6471, CVC = 88/100, p < 0.0001; TBA = 0.6067, CVC = 88/100, p < 0.0001, respectively) for both the types of OSCC. It suggests that this dual combination, identified across the 88 cross-validation subsets, involved in the causation of OSCC. The best “three-factor” model added the presence of HPV to the “two-factor” model for both the types of OSCC, but the aging got replaced by gender in the case of tobacco-related OSCC (TBA = 0.7581, CVC = 97/100, p < 0.0001; TBA = 0.7760, CVC = 100/100, p < 0.0001, respectively). Generally, the model with the highest TBA is taken as the best overall model. Therefore, the “three-factor” models were considered to be the best risk-predictive models for both the types of OSCC.
Summary of MDR analysis for OSCC risk prediction.
TrBA: training balance accuracy; TBA: testing balance accuracy; CVC: cross-validation consistency.
Best model predicted for oral cancer risk with the highest test balance and corresponding maximum CVC; p-value <0.05 is considered as statistically significant.
To determine whether the interactions were synergistic, non-synergistic, or redundant among the gene–gene and gene–environment factors under investigation, the entropy-based graphical model, that is, the “Interaction Entropy Graph” for OSCC-associated risk was constructed (Figure 1). In this graphical model, the age factor demonstrated the highest individual effect of 14.53% and SYK variant rs182361 demonstrated the lowest individual effect of 0.83% in tobacco-related OSCC (Figure 1(a)). Among the factors shown in the best risk-predictive “three-factor” model in this category, the presence of HPV (11.32%) showed a correlation with the NELL1 variant rs1670661 (9.84%) and gender (11.26%), whereas the effect of this variant was independent of gender. In the non-tobacco-related OSCC (Figure 1(b)), the presence of HPV showed the highest individual effect of 20.13%, and it showed the correlation with the NELL1 variant rs1670661 (4.44%) and aging (14.69%) in the best “three-factor” risk-predictive model. Similarly, the effect of this variant was independent of age and gender (7.94%). Interestingly, we observed that the SYK variants rs182361 and rs290974 were independent of age and correlated with gender in the case of tobacco-related OSCC. However, in non-tobacco-related OSCC, these two variants interacted with age and gender in an opposite manner. Moreover, all these variants in this analysis significantly correlated with the presence of HPV.

Interaction entropy graph for showing the interactions among the various factors associated with (a) tobacco-related OSCC and (b) non-tobacco-related OSCC risk. Positive percentage of entropy indicates the synergistic interaction; negative values of entropy represent redundancy; golden lines suggest independence; and green and blue lines suggest redundancy or correlation.
Biological consequences of the variants derived by in silico approaches
According to the Ensembl VEP tool, all the variants identified from this study are intronic variants, especially the DFNA5 variant rs2237306, which is a nonsense-mediated mRNA decay (NMD) transcript variant (Transcript ID: ENST00000411476.2). The ESRsearch tool predicted that the G > T change (in reverse strand) in rs2237306 introduced a known ESR, hnRNP-B binding site. Similarly, T > C change in the NELL1 variant rs1670661 created a known ESR, SRp40 binding site and broke the potential exonic splicing silencer motif sequence. The G > T change in the SYK variant rs182361 introduced a potential exonic splicing silencer motif sequence. These results recommended a probable role of these variants in alternative splicing. The F-SNP tool used TFSEARCH v1.3 for the transcription factor binding site prediction. According to this tool, the C > A change in rs2237306 and T > C change in rs1670661 introduced the binding sites for the transcription factors, c-Myb and v-Myb, respectively. Apart from this, the enrichment map of the over-represented GO categories for SYK, DFNA5, and NELL1 genes was also generated (Figure 2) and in order to identify the biological significance of these genes, a gene–gene interaction network (Figure 3) was constructed.

Involvement of the identified three genes in different biological pathways. Enrichment map of the over-represented Gene Ontology (GO) categories for SYK, DFNA5, and NELL1 genes has been generated from BiNGO (p-value < 0.001).

Visualization of a co-expressed network. The figure represents the gene–gene network by GeneMANIA, where the significant genes interacting with our identified genes are demonstrated.
Discussion
The demographic characteristics of the studied NE Indian populations showed that the buccal mucosa was the most common (81.4%) cancer site, which is highly prevalent in India. 52 Previously, it has been suggested that when betel quid usage is extensive, the buccal mucosa and tongue are the most frequent sites for OSCC. 53 We found that the majority of our patients (72.5%) in the age group of >49 years, as expected, as the incidence of oral cancer is age specific. 3 The MLR analysis showed that the subjects older than 49 years were more susceptible to both the types of OSCC than those who were from the lower age group. MDR analysis suggested that the aging had the highest individual effect in case of tobacco-related OSCC. These results actually gave support for including only those individuals who had at least 10 years of chewing habits, as also suggested by earlier studies.22,54 It implies that the prolonged tobacco-chewing habits increase the chances of being exposed to various carcinogenic compounds for a longer period.
Our observations are similar to a previous study, 55 suggesting females likely to be using smokeless (chewed) tobacco rather than smoking as compared to males in India. This has also been shown in a report from southern India. 56 Another study on North Indian population suggested that the use of SLT is a critical risk factor for females. 57 Furthermore, it was also reported that women are more susceptible to oral cancer than men in South Asian countries. 58 This study also showed that the females were ∼5-fold more likely to have both the types of OSCC than males as suggested by MLR analysis.
In this study, the HPV-infected site was ascertained in accordance with the previous studies59,60 after taking the consensus observations from the PCR-based method as well as p16-immunohistochemistry study. Here, we observed 12.6% HR-HPV infected OSCC patients, which is lower as compared to the former studies on NE Indian population8,61 but elevated than a report on a hospital-based study in North America. 62 The evidence of HPV prevalence in oral cavity among Indian as well as global populations is emerging to be highly discordant. Studies with data based on developed nations accounted for a variable prevalence ranging from 10% to 80%. 63 While checking the global scenario on HPV prevalence, we found that the earlier population-based case-control studies in Sweden 64 and Mexico 65 reported a higher HPV prevalence (35% and 43.5%, respectively), whereas another retrospective study from Canada 66 showed a lesser HPV prevalence (19%). Similarly, in India, a study from the Southern region indicated no HPV infection, 67 whereas the prevalence of HPV in OSCC is reported to be different across the various regions with 6% in the western region, 68 33.6% in the eastern region, 69 and 28%–43.5% in the northeastern region.8,61 The higher HPV prevalence in NE India as compared to other regions of the Indian subcontinent is very likely due to the relatively higher consumption of tobacco products, 17 as some earlier reports suggested that any form of tobacco use is probably linked to an increased risk of HPV infection.70,71
Apart from the above-mentioned external factors, the identified genetic predisposing factors included the risk variants, rs290974 from SYK region and rs1670661 from NELL1 region, which are reported for the first time in this study. The other two SNPs (rs182361 and rs169724) from SYK region were previously reported in the GWAS conducted on western Indian population. 22 The MDR analysis showed that the SYK variants were significantly correlated with the presence of HPV. The interactions among the presence of HPV, NELL1 variant rs1670661 along with age and gender increased the risk of both the types of OSCC.
The observations from the stratified analysis suggested that the genes like SYK and NELL1 may not be directly associated with the tobacco-related OSCC, but the newly identified variant in the DFNA5 region showed the protective role against tobacco-related OSCC. This observation was further verified by the MLR analysis, which reconfirmed that the variant rs1670661 from NELL1 region was associated with both the types of OSCC. However, the variant rs2237306 from DFNA5 region was associated with tobacco-related OSCC, while the variant rs182361 from SYK region had an association with non-tobacco-related OSCC. These make the DFNA5 as an important gene of interest for tobacco-related OSCC. The toxicogenomic analysis also suggested that Benzo(a)pyrene, one of the major chemical components of SLT products,72,73 was involved in the increased expression of DFNA5 gene, the deafness-associated tumor suppressor gene (see Supplementary Figure S4). This chemical compound was previously known to be associated with miRNA-mediated activation of specific BaP-responsive pathways in human. 74 Interestingly, the DFNA5 gene, responsible for the autosomal dominant hearing loss 75 is inversely correlated with estrogen receptor expression in breast cancer. 76 The down-regulated DFNA5 contributes to acquired etoposide resistance in melanoma cells, as it is positively associated with etoposide-induced apoptotic pathway and caspase-3 activity. 77 Besides, this gene is positively involved in p53-induced apoptosis in response to DNA damage, 78 while mutDFNA5 (the mutated form of DFNA5)-induced cell death is mediated by the MAPK pathway, especially through the extracellular signal-regulated kinase (ERK) and the c-Jun N-terminal kinase (JNK) branch. 79
Results from different in silico tools suggested that the variants identified in this study were involved in the transcriptional regulation of the respective candidate genes. Interestingly, for rs2237306, we noted more than one transcripts of DFNA5 where this variant occurs. However, we have focused on the NMD transcript because we hypothesized that this particular NMD transcript may be predominant over other transcripts, activating the NMD pathway that can degrade other aberrant transcripts, 80 which has to be proved experimentally. Among the over-represented GO categories for SYK, DFNA5, and NELL1 genes, we observed that these genes were significantly involved in the regulation of GM-CSF biosynthetic process that has a role in immune system development by promoting defense against infections. 81 Moreover, these genes were involved in positive regulation of IL-3 biosynthetic process, differentiation, and activation of gamma-delta T-cells that link innate and adaptive immune responses 82 and also capable of phagocytosis. 83
From the pathway analysis, we infer that the DFNA5 gene that is co-expressed with the TERT gene physically interacts with SMG6 and SMG5, the NMD factors. 84 This DFNA5 gene is also involved in genetic interactions with the TEP1 and UBE3A genes. 85 However, the NELL1 gene is involved in genetic interactions with PTTG1 and SYK genes. 85 This PTTG1 gene that is co-expressed with the DFNA5 gene 86 physically interacts with TP53 gene and co-localized with SYK gene. 87 Moreover, this SYK gene that is co-localized with TP53 involved in genetic interactions with TEP1 and UBE3A genes,85,87 which physically interact with TP53 gene. Interestingly, the UBE3A gene encodes for a viral pE6-associated protein (E6AP) and the E6/E6AP complex facilitates inactivation and degradation of the p53 protein as shown before. 88 Apart from this, in combination with the viral pE7 protein, HR-HPV pE6 protein is involved in the induction of telomerase activity by inducing hTERT expression at a transcriptional level. High telomerase activity, observed in more than 85% of cancer cells, strongly represents a crucial role in tumorigenesis.89,90 However, further experimental verification of these observations is necessary.
The above-mentioned observations suggest that the NELL1 and SYK genes are probably involved in the HPV-mediated oncogenesis process and so, the SNPs on these genes represented potential risk for OSCC. However, DFNA5 gene variant has a possible protective role against tobacco-related OSCC by activating the NMD pathway, which degrades the aberrant DFNA5 transcript (Figure 4). This, in turn, activates the p53-induced apoptotic pathway in response to DNA damage. 78

Proposed model for the probable mechanism of action of the identified protective SNP, rs2237306 in the DFNA5 genetic region.
In conclusion, our study approach can help in predicting the population-specific significant polymorphisms associated with OSCC in any heterogeneous population. On a larger scale, this would also help in developing an SNP panel for the early detection of the disease. Understanding the role of SNPs in different cell-signaling pathways may lead us to new therapeutic interventions.
Supplemental Material
Addional_file_1_revised_13-06-2018 – Supplemental material for Association of DFNA5, SYK, and NELL1 variants along with HPV infection in oral cancer among the prolonged tobacco-chewers
Supplemental material, Addional_file_1_revised_13-06-2018 for Association of DFNA5, SYK, and NELL1 variants along with HPV infection in oral cancer among the prolonged tobacco-chewers by Sharbadeb Kundu, Vijayalakshmi Ramshankar, Akalesh Kumar Verma, Soundara Viveka Thangaraj, Arvind Krishnamurthy, Rajeev Kumar, Ravi Kannan and Sankar Kumar Ghosh in Tumor Biology
Supplemental Material
Additional_file_2_revised_13-06-2018 – Supplemental material for Association of DFNA5, SYK, and NELL1 variants along with HPV infection in oral cancer among the prolonged tobacco-chewers
Supplemental material, Additional_file_2_revised_13-06-2018 for Association of DFNA5, SYK, and NELL1 variants along with HPV infection in oral cancer among the prolonged tobacco-chewers by Sharbadeb Kundu, Vijayalakshmi Ramshankar, Akalesh Kumar Verma, Soundara Viveka Thangaraj, Arvind Krishnamurthy, Rajeev Kumar, Ravi Kannan and Sankar Kumar Ghosh in Tumor Biology
Supplemental Material
Supplementary_Information_revised_13-06-2018 – Supplemental material for Association of DFNA5, SYK, and NELL1 variants along with HPV infection in oral cancer among the prolonged tobacco-chewers
Supplemental material, Supplementary_Information_revised_13-06-2018 for Association of DFNA5, SYK, and NELL1 variants along with HPV infection in oral cancer among the prolonged tobacco-chewers by Sharbadeb Kundu, Vijayalakshmi Ramshankar, Akalesh Kumar Verma, Soundara Viveka Thangaraj, Arvind Krishnamurthy, Rajeev Kumar, Ravi Kannan and Sankar Kumar Ghosh in Tumor Biology
Footnotes
Acknowledgements
The authors gratefully acknowledge the Department of Biotechnology (DBT), Govt. of India, for providing research grant; Department of Science & Technology (DST), Govt. of India, for providing fellowship to S.K. under DST-INSPIRE scheme (DST/INSPIRE Fellowship/2013/862). The authors are also thankful to Bioinformatics division of Invitrogen Bioservices India Pvt. Ltd. (Thermo Fisher Scientific) for their support in NGS data analysis and Mr Yogesh Pandey from Imperial Life Sciences for his support and help with Sequenom Mass Array Experimentation.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the NER Twinning project (BT/349/NE/TBP/2012), funded by the Department of Biotechnology (DBT), Govt. of India.
Supplementary Information
Supplementary Information is available at Tumor Biology online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
