Abstract
Early and accurate diagnosis of oral potentially malignant lesions (OPML) is of critical importance in preventing malignant transformation. Although histopathological interpretation of the degree of epithelial dysplasia is considered the gold standard for diagnosis, this method is subjective and lacks sensitivity. Therefore, many attempts have been made to identify objective molecular biomarkers to improve diagnosis. Microarray technology has the advantage of screening the expression of the whole genome making it one of the best tools for searching for novel biomarkers. However, microarray studies of OPMLs are limited, and no review has been published to highlight and compare their findings. In this paper, we systematically review all studies that have incorporated microarray analyses in the investigation of gene profile alterations in OPMLs and suggest a set of commonly dysregulated genes across multiple gene expression profile studies. This list of common genes may help focus selection of markers for further analysis regarding their importance in the diagnosis and prognosis of OPMLs.
Introduction
Oral squamous cell carcinoma (OSCC) usually develops from oral potentially malignant lesions (OPMLs). 1 Early and accurate diagnosis of OPMLs is of critical importance in preventing malignant transformation. 2 The current gold standard for diagnosis is histopathological interpretation of the degree of epithelial dysplasia on a biopsy specimen. 1 However, histopathologic diagnosis is subjective and lacks sensitivity, namely that there is no agreement on which of the features of dysplasia are important in predicting progression, and there is both inter- and intra-observer variation in interpreting the degree of epithelial dysplasia.3–5 Therefore, many attempts have been made to identify objective molecular biomarkers for diagnosis using different types of approaches such as loss of heterozygosity, DNA ploidy, telomerase activity, methylation, and gene expression analysis. Nonetheless, these efforts have failed to characterize or predict the behavior of OPMLs since studies have been based on analyzing one or a few markers, despite the well known fact that carcinogenesis is dictated by the expression of thousands of genes along complex molecular pathways. Therefore, a new strategy for discovering useful molecular biomarkers through analyzing the expression of the entire genome at different stages of oral carcinogenesis is required.
Microarray technology (cDNA- and oligonucleotide-based microarrays) allows rapid screening of the whole genome.6,7 This technique has helped elucidate many significant genetic events that may lead to cancer, and has revealed new pathways in the pathophysiology of tumorigenesis. In addition, it is one method currently used in the search for novel biomarkers which have allowed the successful molecular classification of several cancers regarding their stage, metastasis, recurrence potential, prognostic outcome and response to therapy.8–10 The strength of microarrays lies in their ability to perform simultaneous analysis of tens of thousands of genes at a time, raising the probability of discovering novel markers. However, questions have been raised regarding the reproducibility and reliability of microarray experiments. While microarrays can be used to determine mRNA levels, it is impossible to predict protein concentration or activity. 11 Regardless of these limitations, if appropriate candidate markers are applied, purpose-designed arrays can be used one day to obtain expression fingerprints in routine diagnostic protocols of OPMLs, similar to commercial multigene assays (20–70 signature genes) available for breast cancer prognosis and prediction. 12
Methodology
To identify all studies that have incorporated microarray analyses in the investigation of gene profile alterations in OPMLs, we searched the PubMed medical literature database for the following keywords: “(oral dysplastic or oral dysplasia) OR (potentially malignant) AND (microarray or gene expression profile)”. Supplemental PubMed searches for references cited by review articles were undertaken to identify any additional manuscripts not included in the primary queries. After exclusion of non-related articles, 15 studies were included in this review.
In order to define a set of commonly dysregulated genes in OPMLs across multiple gene expression profile studies, we prepared a universal datasheet containing all differentially expressed genes extracted from microarray studies on OPMLs. We attempted to obtain all dysregulated gene sets but we were only able to extract published tables and supplementary data from 9 out of 15 published articles.13–21
Direct matching for repeated genes was not feasible because authors published their results using various forms of gene identification (eg, gene name, gene symbol, Genbank accession number, Affymetrix probe set ID, or Unigene cluster ID). Therefore, we used standardized gene identification by converting all these forms into Genbank accession number utilizing Clone/Gene ID converter tool (http://idconverter.bioinfo.cnio.es/IDconverter.php). 22 We searched for duplicate genes in the spreadsheet and constructed a set of commonly dysregulated genes.
Results
The microarray studies which are included in this study are shown in Table 1. A list of commonly dysregulated genes in OPMLs across multiple gene expression profile studies was prepared. The concordance between studies was low because of differences in sample number, clinical diagnosis, histologic grading, microarray platforms, experimental design, and analysis methods. This lack of agreement between studies was not surprising as this constraint is a common criticism of expression profiling studies. Nevertheless, we identified 31 genes with common expression changes in at least two independent studies (Table 2). Some of these genes have roles in human carcinogenesis supporting their use as potential diagnostic markers for OPML. However, literature mining for these genes showed that the majority of these had not been validated by alternative experimental methods.
Studies using microarray analysis of OPMLs. a
(↑)over-expression, (↓)under-expression, (NA) not applicable, (IHC) Immunohistochemistry, (PCR) Polymerase chain reaction, (OPML) oral potentially malignant lesion.
Common gene alterations seen in OPMLs identified by microarray studies. a
(↑) over-expressed, (↓) under-expressed, (OPML) oral potentially malignant lesion.
Discussion
Several studies have incorporated microarray analyses in the investigation of the genetic profile of oral cancer,6,7,20,21,23–31 but the literature on microarray analysis of OPMLs is limited. Microarray studies of OPMLs have had different objectives. Some have aimed at identification of molecular subtypes and drawing predictive molecular signatures, while others have utilized microarray analysis to perform global gene expression and then chose one or more genes for further investigation. Only two studies found a limited number of genes that could be used to discriminate mild dysplasia from severe dysplasia15,18 and one study suggested a prediction model for cancer progression. 20 In this review we highlight some of the major findings from microarray studies undertaken on OPMLs.
In 2002, Mendez et al compared the gene expression profiles of 26 OSCC and 2 OPMLs with 18 normal oral tissue samples. Hierarchical clustering analysis showed that patients with OPMLs displayed a higher degree of relatedness to OSCC compared to normal samples. 32 Although the sample size of OPMLs in this study was too small to draw general conclusions, a similar result was found in a later larger sample. 33 Similarly, Ha et al found a small difference in gene expression between OPMLs and malignant lesions, compared to differences between normal and OPMLs. In addition, unsupervised hierarchical clustering revealed that both the malignant and OPML groups tended to form related but distinct clusters. 33 Data from these two studies demonstrate that a greater proportion of transcriptional alterations occur during the transition from normal to potentially malignant mucosa, than in the transition from potentially malignant to malignant tissue.32,33
In 2005, Banerjee et al showed a large number of significantly altered genes in OPMLs. They identified 1300 genes as significantly activated and 400 genes as significantly repressed. 14 Among the genes found to be dysregulated in OPMLs, were genes functionally known to be involved in angiogenic stimulation, apoptosis, cell mitosis, cell cycle control, cell signaling, DNA repair, epithelial to mesenchymal differentiation, oncogenesis, invasiveness, immunoregulation, and protein translation. The expression of several genes was validated by real-time-PCR and immunohistochemistry (IHC). 14 Although this study was not designed to identify a gene signature classifier, it provided a molecular basis for persistent proinflammatory conditions in oral premalignant tissues.
In an attempt to define the genetic signature of dysplastic progression, Carinci et al performed a comparison between lingual dysplasia (5 mild and 4 severe) and 11 normal oral tissue specimens. Microarray analysis showed remarkable differences in 270 expressed genes (161 up-regulated and 109 down-regulated) between normal tissue and mild dysplasia and 181 genes (63 up- and 118 down-regulated) in mild versus severe dysplasia. The authors suggested that those genes could be used for classification of dysplasia at the molecular level. 13 In another study, Carinci et al compared 9 dysplastic lesions with 8 tumors without metastasis and found only 33 genes that were differentially expressed. 34 However, they did not confirm their results by additional methods, despite the well known importance of validating gene expression results by other techniques.15,35 It should be noted that the true test of a classifier is whether it predicts accurately in independent data. A final validation on an independent cohort is vital before a gene signature can be used in clinical practice.
In 2006, Odani et al compared 4 cases of oral leukoplakia and 2 cases of OSCC with their normal matched tissue obtained from normal sites adjacent to these lesions. All cases of leukoplakia showed overexpression of 8 genes as well as under-expression of 10 genes that belonged to diverse functional groups. 16 Most of the genes found to be over-expressed were genes included in the keratinocyte cytoskeleton network, while the set of down-regulated genes included genes associated with the cornified cell envelope of epithelial cells, epithelial cell adhesion, cancer antigen, keratinocyte activation, transcription of epithelial cells, and epithelial cell embryonicity. The number of altered genes was very small compared with the results of other studies. This may be because normal matched tissue in this study was obtained from normal sites close to the lesions and according to the concept of field cancerization, adjacent areas of tissue share common genetic changes.36,37
Other microarray studies were performed using dysplastic cell lines. Vigneswaran et al used microarrays to conduct a global analysis of gene expression in a set of cell lines to identify genes that were differentially expressed during malignant transformation of oral epithelial cells. They found that the expression of extracellular matrix metalloproteinase inducer (EMMPRIN) was markedly up-regulated in potentially malignant cells compared to normal oral and epidermal keratinocytes. To confirm this result, they used real time-PCR, western blot and IHC for validation. They concluded that EMMPRIN overexpression was an important early event in oral carcinogenesis. 38 Hunter et al tested the hypothesis that mortal and immortal OSCCs involve distinct transcriptional changes, by performing microarray analysis of primary cultures of 4 normal oral mucosa biopsies, 19 dysplasias, and 16 OSCCs. They concluded that there were divergent mortal and immortal pathways for OSCC development via intermediate dysplasias. 39 Nevertheless, there are potential limitations of using cell lines as opposed to using actual premalignant samples for microarray analysis. Although cell lines can produce relatively large amounts of good-quality RNA for comparative expression analyses, established cell lines do not completely resemble the original source tissue.
Two studies have investigated gene expression profiles of oral submucous fibrosis (OSF). Li et al demonstrated increased expression of 661 genes and decreased expression of 129 genes in OSF samples with respect to normal tissue. 17 The study of Hu et al showed that 716 genes were up-regulated and 149 genes were down-regulated in OSF. The altered genes were related to different ontology groups such as immune response, inflammatory response and epithelial-mesenchymal transition. 40 Hierarchical clustering analysis revealed two clearly distinct groups of normal and OSF according to their gene expression profile.17,40 In microarray studies, due to large inter-individual variability, an adequate sample size is a necessity. In both studies only 4 OSF and 4 normal samples were used for the development of gene signatures. Significantly larger patient groups are needed before these classifier can be applied clinically.
In order to generate classifiers for OSCC and leukoplakia, Kondoh et al analyzed 27 OSCC and 19 leukoplakias using oligonucleotide microarrays. Using pooled samples, they identified an 11-gene predictor set that could best distinguish OSCC from leukoplakia. Furthermore, they found that seven of these gene predictors could be used to differentiate mild dysplasia from severe dysplasia. 15 Kuribayashi et al revealed that 16 selected genes could be used to differentiate mild dysplasia from severe dysplasia. 18 However, both studies did not use histologically normal tissues as controls.
Watanabe et al compared the gene expression profile of 3 oral leukoplakias and 3 early tongue cancers to normal epithelium using microarray analysis. They used laser micro-dissection to procure only the dysplastic cells. They found that 5 genes were significantly over-expressed and 10 genes under-expressed in oral epithelial dysplasia. 19 More recently, Sumino et al also used laser micro-dissection to analyze the gene expression profile in normal tissues compared to oral dysplasia and in oral dysplasia compared to OSCC. They identified 15 candidate genes with continuously increasing or decreasing expression during oral carcinogenesis. 21 The strength of these studies lies in the advantage of using laser micro-dissection which allows for the precise isolation of individual cells of interest. However, owing to the small size of the arrayed samples, the true representation of the disease may be questioned.
In an interesting study, Saintigny et al investigated gene expression alterations in 35 oral dysplastic samples that progressed to OSCC compared with 51 samples that did not develop OSCC over a median follow-up of 6.08 years. 20 The authors were able to develop a 29-transcript prediction model that had a prediction error rate of 8%. However, a limitation of this study lies in the fact that all patients were enrolled in a clinical chemoprevention trial which would change the progression outcome. 20
Among the genes consistently reported at a highly significant rate in OPMLs are components of the keratinocyte cytoskeleton network such as loricrin (LOR), calmodulin-like skin protein (CLSP), keratin 1 (KRT1), keratin 10 (KRT10) and keratin 19 (KRT19). LOR is a major constituent of cornified cell envelopes which is strongly expressed at the later stage of epithelial differentiation. 41 In two microarray studies, LOR expression was significantly up-regulated with the largest fold change in oral leukoplakia 16 and OSF. 17 These results were validated by semi-quantitative reverse transcription-PCR. IHC staining of paraffin embedded tissue of normal samples showed no detectable expression of LOR, while 63.6% of OSF cases exhibited intensive staining. Nevertheless, expression of LOR and keratins is dependent on whether the dysplasia is keratinized or not, reducing the potential value of these genes in the diagnosis and prognosis of OPMLs. A statistically significant association with histologic grade of OSF was also found. 17 CLSP was found to be upregulated in oral leukoplakia 16 and OPMLs. 14 A role for CLSP in late keratinocyte differentiation has been suggested. 42 Changes in the type and distribution of keratins have been observed during oral carcinogenesis.43–48 E74-like factor 3 (ELF3), another gene related to keratinocyte differentiation, was the only gene found to be down-regulated in two microarray studies,16,17 however it has been found to be over-expressed in other tumors such as lung adenocarcinoma and synovial sarcoma.49,50
Other genes of note include those related to immunologic response such as C-X-C motif ligand chemokines CXCL9, CXCL10, CXCL13, USP18, IFI44, and epithelial V-like antigen 1 (EVA1). Members of the CXC chemokine family are known for their ability to regulate angiogenesis, an important hallmark of carcinogenesis.51,52 Dysregulation of three subtypes of chemokine was noticed in OPMLs in different microarray studies14,15,17 suggesting a role for CXC chemokine expression in the progression to oral cancer. This is supported by the fact that CXCL9 was found to be up-regulated in OSCC 25 and has been proposed as a useful candidate biomarker for breast cancer screening. 53 In addition, a functional role for CXCL13 in OSCC has been implicated in other studies.54,55 USP18 is a member of the ubiquitin-specific protease family which is known to be an interferon-stimulated gene 15 (ISG15) specific isopeptidase that removes ISG15 from its conjugated proteins. 56 It has been found to control IFN-b stimulated genes which regulate a group of immune response related genes. 57 Dysregulation of USP18 was found in OPMLs in two microarray studies.13,15 Another study found that USP18 was significantly over-expressed in OSCC compared to normal controls. 25 Duex et al suggested that USB18 may have oncogenic properties through controlling microRNAs and cancer cell growth. 58
IFI44 is an interferon stimulating gene. It has been demonstrated that IFI44 over-expression leads to cell cycle arrest in vitro. 59 Carinci et al and Kondoh et al both found that IFI44 was dysregulated in OPMLs.13,15 In another two studies, IFI44 was found to be over-expressed in OSCC suggesting a role for this gene in oral carcinogenesis.60,61 EVA1, a member of the immunoglobulin superfamily 62 was found to be down-regulated in mild dysplasia but up-regulated in severe dysplasia 13 and OPMLs. 14 It has been shown that EVA1 is one of the significantly down-regulated genes in the transition from prostatic intraepithelial neoplasia to prostate cancer. 63
The list of commonly dysregulated genes also contains two types of heat shock proteins, namely, heat shock 70 kDa protein 4-like (HSPA4 L) and heat shock 27 kDa protein 3 (HSPB3). Heat shock proteins are synthesized by cells in response to a variety of stress conditions, including carcinogenesis. 64 Experimental evidence suggests that these proteins may be associated with tumor progression by inhibiting apoptosis. 65 Microarray studies found that HSPA4 L was up-regulated in OPMLs.14,17 Other studies found significant correlation between heat shock 70 kDa protein expression and the severity of oral dysplasia.65,66 A five year follow up study showed that the median transition time (premalignancy to malignancy) was significantly shorter in cases showing over-expression of heat shock 70 kDa protein. 66 Contradictory results have been found for HSPB3 in microarray studies. One study found that HSPB3 was up-regulated in OSF, 17 while another found that it was down-regulated in OPMLs. 14 IHC found no significant difference for cytoplasmic expression of heat shock 27 kDa protein between oral leukoplakia with or without epithelial dysplasia. 65
Other sets of genes commonly seen to be up-regulated in OPMLs which are also found to be over-expressed in other cancer types include heparanase (HPSE), protein tyrosine phosphatase receptor-type Z polypeptide 1 (PTPRZ1), cartilage oligomeric matrix protein (COMP), leucine-rich repeat containing 15 (LRRC15), and dihydropyrimidinase-like 3 (DPYSL3). HPSE is an endoglycosidase that cleaves heparan sulphate complex (glycosaminoglycan consisting of polysaccharide expressed on the cell surface and in the extracellular matrices). 67 It has been found that cancer metastasis correlated with high levels of HPSE activity, 68 and that HPSE expression could serve as an indicator of aggressive potential and poor prognosis in cervical cancer. 69 COMP is a tissue specific non-collagenous matrix protein, 70 highly up-regulated in OSF with the largest fold change. Immunohistochemical analysis has shown that the expression of COMP is significantly associated with histologic grade of OSF. 17 Its high expression has also been observed in hepatocellular carcinoma. 71 LRRC15 is a cell surface glycoprotein normally expressed only in the invasive cytotrophoblast layer of the placenta. Various studies indicate that LRRC15 is frequently over-expressed in different types of cancer such as prostate and breast cancer.72,73 DPYSL3 is a developmentally regulated protein, strongly expressed in early embryonic post-mitotic neural cells and in adult brain in regions that retain neurogenesis, such as granular neurons in dentate gyrus. It is believed to play a role in neuronal differentiation, axonal outgrowth, and possibly, neuronal regeneration. 74 Significantly different DPYSL3 expression was found between metastasized and non-metastasized head and neck squamous cell carcinoma. 75
Contradictory results have been found for some genes such as basic helix-loop-helix domain containing class B 2 (BHLHB2), periostin osteoblast specific factor (POSTN), and serpin peptidase inhibitor clade B (ovalbumin) member 1 (SERPINB1), as one study reported significant up-regulation, whereas another reported down-regulation.13,14,16,17 BHLHB2 is a basic helix-loop-helix (bHLH) domain- containing protein that acts as a transcriptional repressor. Falvella et al found BHLHB3 transcript levels were low in three human lung cancer cell lines and down-regulated in human lung adenocarcinomas as compared to normal lung tissue suggesting a potential role for BHLHB2 protein as a tumor suppressor of lung cancer. 76 POSTN is a mesenchyme-specific gene, known to be over-expressed in human breast cancer. 77 SERPINB1 was found to be down-regulated in prostatic intraepithelial neoplasia. 63
Other identified genes such as cholesterol 25-hydroxylase (CH25H) and aspartylglucosaminidase (AGA) have no well documented known role in carcinogenesis but their reproducible expression in multiple studies, and their ability to pass noise and error which usually occur in microarray experiments may indicate a true biologic role for these novel genes in oral carcinogenesis, however further studies are needed to validate this.
Examination of the common gene list (Table 2) shows it to be devoid of traditional markers for OPMLs. This could be explained by the possible absence of probes for these genes in the diverse array platforms employed in previous microarray studies. A wide range of probe numbers (7000–29000) was used in different studies. Furthermore, carcinogenesis is controlled by multiple molecular pathways and opposing modulation of different genes can produce a similar final effect, thus studies on limited samples can reach discordant findings. 13 For example, the list is lacking cancer stem cell markers such as ALDH1 and CD44. 78 Cancer stem cells, by definition, have the ability to propagate, and to differentiate into mature, functional cells. Premalignant conditions may serve as a model for the cancer stem cell concept. 79 Additionally, the list is lacking cancer metabolism markers such as LDH5 and TKTL1. 80 As in most solid tumors, oral cancer displays dramatically altered glucose metabolism. 81 Various markers which are associated with aerobic glycolysis, have been found to be correlated with malignant transformation of oral epithelial dysplasia.80,81
The applicability of the common gene list is highlighted in a recent immunohistochemistry study undertaken in our group, where we investigated the usefulness of 5 genes (CLSP, ELF3, USP18, IFI44, CXCL13) for further analysis. Our selection was based on the fact that these genes were associated with human cancer but had not been examined in oral dysplasia or OSCC by other methods. We found that significant alterations in the expression of CLSP, ELF3, and IFI44 which were initially identified by microarray studies were associated with similar changes in protein expression in epithelial cells based on immunohistochemical analyses. 82
Thus, the common gene list elucidated in this review may help focus selection of markers for further analysis regarding their importance in diagnosis and prognosis of OPMLs. A limitation of the true value of this list stems from a lack of statistical power due to low sample numbers in some studies, unavailability of raw data in others (since some authors published only part of their gene list), variation of clinical data such as age, sex, smoking/alcohol habits, and site of lesion, and heterogeneity of OPML diagnostic criteria as most studies did not diagnose lesions according to the degree of epithelial dysplasia proposed by the WHO. 83 Further validation of the usefulness of these markers for OPML diagnosis is required.
Author Contributions
Conceived and designed the experiments: CSF. Analyzed the data: AAA. Wrote the first draft of the manuscript: AAA. Contributed to the writing of the manuscript: AAA, CSF. Agree with manuscript results and conclusions: AAA, CSF. Jointly developed the structure and arguments for the paper: AAA, CSF. Made critical revisions and approved final version: CSF. All authors reviewed and approved of the final manuscript.
Funding
Author(s) disclose no funding sources.
Competing Interests
Author(s) disclose no potential conflicts of interest.
Disclosures and Ethics
As a requirement of publication the authors have provided signed confirmation of their compliance with ethical and legal obligations including but not limited to compliance with ICMJE authorship and competing interests guidelines, that the article is neither under consideration for publication nor published elsewhere, of their compliance with legal and ethical guidelines concerning human and animal research participants (if applicable), and that permission has been obtained for reproduction of any copyrighted material. This article was subject to blind, independent, expert peer review. The reviewers reported no competing interests.
