Abstract
Gene expression profiling has provided insights into different cancer types and revealed tissue-specific expression signatures. Alterations in microRNA expression contribute to the pathogenesis of many types of human diseases. Few studies have integrated all levels of gene expression, miRNA and methylation to uncover correlations between these data types. We performed an integrated profiling to discover instances of miRNAs associated with a gene expression and DNA methylation signature across multiple cancer types. Using data from The Cancer Genome Atlas (TCGA), we revealed a concordant gene expression and methylation signature associated with the microRNA hsa-miR-142 across the same samples. In all cancer types examined, we found a signature of co-expression of a gene set R and methylated sites M, which correlate positively (M+) or negatively (M–) with the expression of hsa-miR-142. The set R consistently contains many genes, such as TRAF3IP3, NCKAP1L, CD53, LAPTM5, PTPRC, EVI2B, DOCK2, LCP2, CYBB and FYB. The signature is preserved across glioblastoma, ovarian, breast, colon, kidney, lung, uterine and rectum cancer. There is 28% overlap of methylation sites in M between glioblastoma (GBM) and ovarian cancer. There is 60% overlap of genes in R between GBM and ovarian (P = 1.3e−-11). Most of the genes in R are known to be expressed in lymphocytes and haematopoietic stem cells, while M reflects membrane proteins involved in cell-cell adhesion functions. We speculate that the hsa-miR-142 associated signature may signal haematopoietic-specific processes and an accumulation of methylation events triggering a progressive loss of cell-cell adhesion. We also observed that GBM samples belonging to the proneural subtype tend to have underexpressed hsa-miR-142 and R genes, hypomethylated M+ and hypermethylated M–, while the mesenchymal samples have the opposite profile.
Introduction
Gene expression regulation through mechanisms that involve microRNAs and epigenetics (DNA methylation) has attracted much attention recently. 1 MiRNAs are a class of small RNA molecules that target mRNAs, causing translation repression. MiRNAs regulate genes associated with different biological processes, such as apoptosis, stress response, or tumorigenesis. 2 In the context of cancer, miRNAs have emerged as new molecular players involved in carcinogenesis. Deregulation of miRNAs has been shown in glioblastoma, colon and ovarian cancer.3,4 Recently, it was shown that miR-10b is upregulated in gliomas, even though it is not expressed in healthy human brain tissue. 5 Distinguishing cancer subtypes on the basis of miRNA expression may help to personalize therapies; this would enable doctors to better match a patient with the treatment the patient is likely to respond to with fewest side effects.6–11
Expression of genes is affected by miRNA expression and DNA methylation, which are known to regulate each other in both directions. 12 MiRNAs are known to be targets of epigenetic regulation, as well as to target the epigenome, allowing for self- regulatory loops. One third of all human miRNA genes have a CpG island in the upstream region, suggesting epigenetic regulation of miRNA expression.13–15 On the other hand, several miRNAs are known to drive methylation signatures via interactions with DNA methyltransferases (DNMTs) and RBL2 proteins16,17 DNMTs are responsible for the methylation of the CpG islands of genes in an RBL2–dependent manner. The over expression of DNA methyltransferases (DNMTs) is often a poor prognostic indicator for cancer, since they induce hypermethylation and under-expression of tumor suppressors.18,19 It still remains largely unknown how, exactly, miRNAs affect genome methylation at the epigenetic level. The precise epigenetic mechanisms underlying the alteration of miRNA expression also remain largely unknown.
Few studies have performed an integrated analysis of gene expression, miRNA expression and methylation using data from the same samples.2,20–24 In this study we explore relations of these three data types, in terms of their correlation to one another. Our goal was to identify particular miRNAs having strong and consistent correlative relationships with gene expression and methylation. We focused on integrated cancer datasets that are available from The Cancer Genome Atlas (TCGA) repository. The TCGA Research Network was established to generate the comprehensive catalog of genomic abnormalities driving tumorigenesis.25,26 We found that hsa-miR-142 is consistently associated with a gene expression and methylation signature in all cancer datasets.
Methods
Our analysis involved a computational method to identify miRNA, methylation sites and genes of potential importance in a set of samples, based on correlations among the different data types across samples. 27 We performed an integrated correlation analysis of the miRNA, methylation and gene expression data that is available for glioblastoma and ovarian cancer. Our method's aim is to evaluate the correlations between methylation-miRNA and mRNA-miRNA. For example, the correlation of miRNA Mi's and gene Ri's expression (across the samples for which both miRNA and gene expression are available) would show if there is a positive or negative correlation between Mi and Ri. 28
We downloaded methylation, miRNA and gene expression data for glioblastoma and ovarian cancer, which was available on The Cancer Genome Atlas (TCGA) repository in April 2011. We used the level 3 data from TCGA and filtered out the healthy samples, such that we considered only tumor samples. For gene and miRNA expression the level 3 data represent the expression for particular genes or miRNAs per sample. For methylation the level 3 data represent the methylated sites/genes per sample. Level 3 data was derived after the raw signals per probe (level 1) were normalized per probe set (level 2) and then averaged for a gene or miRNA. Supplementary file 6 shows the number of tumor samples that were available on TCGA for each combination of data types. Additionally, for breast, colon, kidney, uterine and rectum cancer we downloaded the sequencing-based RNASeq and miRNASeq level 3 data available on TCGA in November 2011. This data represents the calculated expression calls for genes and miRNAs per sample. It is derived after calculating the expression signal for all reads aligning to a particular gene or miRNA (level 1) and normalizing the reads with TMM normalization (level 2).
Discretization of gene expression, miRNAs and methylation
We performed a pair wise correlation analysis between the different data types, as Figure 1 shows. Initially, the algorithm compares the sample-specific methylation of each gene with the expression level of each miRNA, across all samples. To evaluate the pairwise Pearson correlation between different data types (gene expression, miRNA expression and methylation), we discretized all three data types. A gene's expression (similarly, a methylation site's or miRNA's) took a discrete value of –1, 0, or 1 for a sample, representing underexpression (hypomethylation), moderate, or overexpression (hypermethylation) in the sample. The discretization is based on whether the gene's (or methylation site's) expression is at least one standard deviation away from the gene's mean value over all samples. We represented a gene (or miRNA or methylation site) as a vector of discrete values across all samples. Discretization of the values serves the following two purposes: (1) it allows us to compare two data types on a common basis, ie, whether there is aberrant expression or methylation observed in the same samples. (2) It allows us to filter out high correlations if there is moderate (non-extreme) expression/methylation (0) in most or all of the samples.

Our analysis method differs from that of 29 who used z-scores for discretizing the data. The z-scores indicate how many standard deviations a value is above or below the mean; previous studies discretized a gene expression value if z-score >2 or <–2. The study then used Fisher's exact P-value to evaluate correlations between gene expression and mutations by assigning an exact P-value to a correlation, considering only up or only down regulation, and keeping P < 0.01. Pearson correlation, on the other hand, allows us to consider both up and down regulation for a pair, offering us two benefits. Using the Pearson correlation allows us to find negative correlations, as well as positive ones, such as concordantly over- expressed genes and hypomethylated sites. Moreover, Pearson correlation allows us to find negative correlations and it allows us to find subclasses of samples defined by opposite expression/methylation patterns (such as, the M+ and M– patterns). Since we are not dealing with somatic mutations, we believe Pearson correlation is a more suitable choice than Fisher's exact P-value for our analysis.
To evaluate the statistical significance of the Pearson correlation between miRNA and gene expression (or methylation) we used a two-tailed t-test. The t-test was based on the Student's t-distribution with n-2 degrees of freedom, where n is the number of samples. The t-test was two-tailed since either positive or negative correlation may be of interest. We performed Bonferroni-correction by multiplying the P-value with the number of genes or methylation sites. We set a threshold of 0.01 for the Bonferroni-corrected P-value. Additionally, we made one million random permutations of the samples and evaluated the correlation from the permutated data; a correlation higher than the original is a false discovery. We applied a cutoff false discovery rate of 0.01, such that if the calculated FDR was greater than 0.01, the original correlation was rejected as false. In other words, the FDR was derived by counting how often the permutated absolute correlation was at least as high as the original correlation.
Association with gene ontology
Gene ontology enrichment was assessed using the Database for Annotation, Visualization, and Integrated Discovery—DAVID. 30 We also used the Gene Set Enrichment Analysis tool by the Broad Institute to evaluate the annotation enrichment of M and R. The full results are given in Supplementary files 12 and 13. 31
Association with proneural and mesenchymal GBM subclasses
We used the Suppl. Info from 26 to associate R and M genes to different glioblastoma subclasses, as well as tumor samples to proneural, neural, classical or mesenchymal subclasses. We used Suppl. Table S3 to retrieve the genes highly expressed in different subclasses. We used Table S7 to retrieve samples with specific subclass associations.
Results
We first examined the relative significance of all miRNAs in the context of all miRNA-gene expression (or miRNA-methylation) Pearson correlation coefficient values. For this purpose, we first matched miRNA with gene expression data (or miRNA with methylation data) on the same samples. Then, we created histograms covering all miRNA-gene expression and miRNA-methylation correlation values, including all 17,814 genes and 27,578 methylation sites, over multiple cancers. We found that hsa-miR-142 stands out significantly among other miRNAs as having higher correlation values with more genes and methylation sites. Figure 2 shows the resulting histograms after averaging the correlation coefficients for miRNA-gene expression and miRNA-methylation across all cancer types available. Highlighted in red color are the occurrences where the histogram involving hsa-miR-142 lies outside the envelope of all other histograms. A significant amount of such histogram “tails” exist at the right side of the gene expression histograms (R), and at both sides of the methylation histograms (M+ and M–). Supplementary files 1–2 give the lists of the highest and lowest correlations for hsa-miR-142-methylation and hsa-miR-142-gene expression in all cancer types (corrected P-value <0.01). Supplementary file 5 shows the miRNA-gene expression correlation histograms for individual cancer types.

Core of the signature: R and M
After we determined the significance of hsa-miR-142 in terms of high correlation with a set of genes and methylation sites across multiple cancer types, our aim is to define the sets of genes R and methylation sites M with which hsa-miR-142 is most frequently associated.
In glioblastoma and ovarian cancer, we derived the sets R and M by ranking the genes and methylation sites in descending order by their correlation with hsa-miR-142. We derived R for breast, colon, kidney, uterine and rectum cancer by correlating the RNASeq with the miRNASeq sequencing-based data available on TCGA. Then, R and M consisted of the genes or methylation sites with a Bonferroni-corrected P-value less than 0.01.
We derived M for breast, colon, kidney, uterine, rectum and lung cancer by using a metagene approach since sequencing-based methylation data was not available to correlate with the miRNASeq data. We first aggregated the 100 genes that had the highest average correlation with has-miR-142 in GBM and ovarian; the average value of these 100 genes over all samples in a cancer gave the first metagene. In each of 11 cancers, we ranked all genes based on their mutual information with the metagene. Then, we averaged the mutual information of the genes with the metagene across all 11 cancers and took the 15 top genes to create a second metagene. We ranked the correlation of this second metagene to all methylation sites in breast, colon, kidney, lung, rectum and uterine cancer. Supplementary files 7–10 give the metagene and the derived M.
M and R signature and overlaps between cancers
Figure 3 illustrates the M+ and M– signatures in glioblastoma and ovarian cancer. Sites positively correlated with hsa-miR-142 are M+, while negatively correlated sites are M–. The M sets shown consist of the methylation sites that have a Bonferroni-corrected P-value of less than 0.01 with hsa-miR-142 over the samples. This heatmap is the result of step4 of our analysis (see Methods section).

results of step4 of our analysis method (see Methods section).
Tables 1–2 show the overlaps of the R and M signatures between all cancer types considered. As shown, we found a significant overlap of M and R between different cancer types. We matched the precise probes of the methylation sites in M between different cancers.
Overlap of R between cancers.
Overlap of M signatures between cancers.
P-values of the overlaps using the hypergeometric cumulative distribution function.
We used P-values (matlab's hygecdf function) to evaluate the significance of the overlap sizes of M and R between different cancer types, getting small P-values near zero, as Tables 1–2 show. We also evaluated the significance of the correlations between miRNA, methylation and gene expression by using permutation-based P-values. This involved repeatedly shuffling the values of one of the data types over all samples and re-computing the correlations over one million trials; the P-values were always near zero, which points to the correlations’ significance.
We further evaluated whether the R and M associations were specific for the particular genes and methylation sites in GBM. For this purpose, we reversed roles by looking for any correlation of expression of M genes with methylation of R genes. When we looked for correlation of CX3CL1 gene expression (rather than methylation) with the methylation of any genes in R_GBM, there was always zero correlation. This supports that the results are specific to M methylation and R gene expression and hsa-miR-142 is a good representative of the signature.
R is enriched in glioblastoma mesenchymal subclasses
We found the signature profile to reflect two genetically distinct subclasses, which in glioblastoma tend to correspond to the previously identified proneural and mesenchymal subclasses. Table 3 shows how the miRNAs and expressed genes (R) are differentially expressed between the different glioblastoma subclasses. The miRNA hsa-miR-142 and genes R were distinctly expressed between proneural-mesenchymal glioblastoma subclasses. Our integrated analysis revealed that the signature differs between glioblastoma subclasses in that the miRNA hsa-miR-142 and genes R are underexpressed in proneural and overexpressed in mesenchymal. The majority of samples with over-expressed R and hsa-miR-142 belong to the mesenchymal subclass and the P-value of the distributions illustrates the significance.
The distribution of glioblastoma samples between proneural, neural, classical and mesenchymal classes according to R gene and miRNA expression.
We examined the overlap of the R_GBM genes and the R_ovarian genes with the mesenchymal (216), classical (162), proneural (178) and neural (129) genes from (Verhaak and others 2010). 26 182 of the R_GBM genes are found in the mesenchymal genes and 122 of the R_ovarian genes are found in the mesenchymal genes. There was significantly less overlap of R_GBM and R_ovarian with proneural genes (53 and 15, respectively), neural genes (3 and 10) or classical genes (8 and 11). Therefore, the genes in R significantly overlap with the genes in 26 that distinguish between proneural vs. mesenchymal subtypes in glioblastoma. The P-values of the overlaps are <0.0001.
Discussion
Table 4 gives the functional annotations of the top-ranked genes in R; we derived these genes as shown in Supplementary files 8–9. The genes R are all expressed in human hematopoietic stem cells or leukocytes. Many of these genes are involved in regulating T-cell activation; T-cell is a type of lymphocyte. The gene set R is particularly enriched in genes known to be preferentially expressed in T-cell differentiation stages. 32 Some of the R genes represent membrane proteins involved in cell-cell adhesion in leukocytes. Lymphocytes are known to play a role in cancer, but their role remains quite controversial. 33 Lymphocytes are generally known to detect and eliminate cancer cells, but on the other hand infiltration of lymphocytes, such as B cells and T cells, into cancer can be an indication of poor prognosis (recurrence and metastasis). It was shown that lymphocytes that make the RANKL inflammatory protein are critical for metastasis. No metastasis was found in mice without lymphocytes, but adding RANKL in mice restored the potential for metastasis compensating for the function of lymphocytes. 34
The known functions of the top-ranked genes in R.
Table 5 shows the methylation sites that appear in M for both ovarian cancer and GBM. Similar to R, most of M consists of trans-membrane proteins, which are related to cell-cell adhesion functions. For example, SSTR3, SSTR5, TMEM149, are all trans-membrane proteins with cell-cell adhesion functions. A few of the proteins are also involved in signal transduction. The most prominent correlation was between hsa-miR-142 and the CX3CL1 methylation site, which was shown to control cell invasion in glioma. 35 Many of the genes in R are also found in the M– set, which suggests that their undermethylation affects their expression. In recent work, hsa-miR-142 was shown to be expressed in hematopoietic and leukocytic cells, which supports the miRNA's association with R and M.27,36–39
Functional annotations of the methylation sites that overlap between M_GBM and M_ovarian.
We examined the annotation enrichment of the 139 M methylation sites and 671 R genes that overlap between ovarian cancer and GBM, using the Broad Institute's Gene Set Enrichment Analysis tool. There was an overlap with several modules that were previously found to be implicated in cancer. 40 The enriched Gene Ontology annotations included plasma membrane, response to wounding, cell-cell adhesion, extracellular space, cell-surface receptor signal transduction, inflammatory response, oxidoreductase activity. The P-values of the overlaps of M and R with these modules is <1.0e-5. These results suggest the methylation of M+ and M–, as well as miRNA and gene expression, may play a role in progressive disruption of adhesion and junction molecules and loss of cell adhesion components. In several types of cancer, disruption of cell-cell adhesion molecules is a hallmark in phenotypes such as the epithelial-mesenchymal transition and invasion. 41
Table 6 shows the top clusters for a functional annotation clustering of the 671 R genes that overlap between ovarian cancer and GBM, as derived with the DAVID tool. 30 Inflammatory and defense response are the most significant annotations, which is consistent with the anti-inflammatory functions of the interleukin genes found in the top-ranked R genes (see Table 4). Integral membrane and transmembrane annotations are also among the most enriched annotations. Additionally, we examined the R genes in the Broad Institute's Gene Set Enrichment Analysis (GSEA) tool, finding that one third of the genes overlap with sets of known membrane proteins. Several subsets of these genes were previously implicated in breast cancer (74 up regulated and 43 down regulated), liver cancer (41), bladder cancer (59 up regulated) and thyroid cancer (38 up regulated). The P-values of the overlaps is low. Supplementary files 12–13 give the complete functional annotation clusters for R as derived using DAVID, as well as the enriched annotations in M using the Broad Institute's GSEA tool.
The top three functional annotation term clusters associated with the list of 671 R genes that overlap between R_ovarian and R_GBM.
Several miRNAs are known to suppress methylation via interactions with DNMTs and RBL2 proteins. DNMTs are known to be responsible for the methylation of the CpG islands of tumor suppressors in an RBL2–dependent manner.16,17 Therefore, miRNAs and their DNMT targets are known to cause epigenetic changes (or their reversal) that contribute to malignant transformation in human tumors. 19 The aberrant expression of DNA methyltransferases (DNMTs) is often a poor prognostic indicator for cancer. 42 We speculate that the M signature may be partially induced because hsa-miR-142 targets the DNA methyltransferases (DNMTs).31,43 Either DNMT3L or DNMT3B was part of our M+ signature in most cancers. DNMT3L was hypermethylated in mesenchymal GBM phenotype together with hsa-miR-142 overexpression. The miRNA hsa-miR-142 is known to target the DNA methyltransferases DNMT1 and DNMT3A, which are also regulated by DNMT3L. 44 DNMT3L is a regulatory factor of DNA methyltransferases, which is essential for the function of DNMT3A and DNMT3B.45,46 In previous studies, the coexpression of DNMT3L with DNMT3A resulted in a striking stimulation of de novo methylation by DNMT3A. 47 Whether hsa-miR-142 influences the methylation of M via its regulation of DNMTs remains to be determined in lab experiments. Additionally, hsa-miR-142 has CpG islands embedded in its promoter region. 15 Genes associated with CpG islands are believed to be targeted by DNA methyltransferases and are known to be prone to methylation in cancer 14 suggesting hsa-miR-142 may be regulated by epigenetics (see Supp. File 4).
Conclusion
By performing miRNA vs. gene expression and miRNA vs. methylation analysis, our work identified hsa-miR-142 as the miRNA most strongly and consistently correlated with a set of genes and methylation sites. Our results show that miRNA expression is associated with altered expression profiles of genes in R and methylated sites in M. The hsa-miR-142 miRNA is the best representative of this signature for all cancers (see Supp. Files 5 and 11). When we used hsa-miR-142 as a proxy in several cancers the resulting M and R overlapped significantly between cancer types. The P-value of the resulting overlaps between cancer types was close to 0. This signature appears to be associated with transformation from proneural to mesenchymal subclass in glioblastoma. The results suggest that in the mesenchymal tumor subtypes R genes tend to be over-expressed, M+ is hypermethylated, M- is hypomethylated and hsa-miR-142 is over-expressed. In proneural subtypes, on the other hand, the gene and miRNA expression and methylation are the opposite. Our study is the first that associated this signature with several cancer types, showing distinct methylation and expression patterns between the proneural and mesenchymal subclasses.
All of the R genes are known to be expressed in leukocytes and haematopoietic stem cells. Many of the M genes represent membrane proteins related to cell-cell adhesion functions. We speculate that in cancer the signature may contribute to a progressive loss of cell-cell adhesion. We found either DNA methyltransferase DNMT3L or DNMT3B to be methylated and associated strongly with hsa-miR-142 expression in the cancer types examined. From a previous study, DNA methyltransferases were shown to be associated with the methylation of cell adhesion-related genes in carcinomas. 41 Integrated microRNA and gene expression and methylation data for healthy human samples was unavailable at the time of analysis; in the future, a human dataset of healthy samples may provide additional insights.
List of Abbreviations Used
R_GBM, the R signature of gene expression for glioblastoma; R_OV, the R signature of gene expression for ovarian cancer; M_GBM, the M signature of methylation for glioblastoma; M_OV, the M signature of methylation for ovarian cancer; M_COAD, the M signature of methylation for colon cancer; M_BRCA, the M signature of methylation for breast cancer; M_UCEC, the M signature of methylation for uterine cancer; M_READ, the M signature of methylation for rectum adenocarcinoma; M_KIRC, the M signature of methylation for kidney renal clear cell carcinoma; M_KIRP, the M signature of methylation for kidney renal papillary cell carcinoma; M_LUSC, the M signature of methylation for lung squamous cell carcinoma; M+, the methylation sites in M that are positively correlated with hsa-miR-142; M-, the methylation sites in M that are negatively correlated with hsa-miR-142; miRNA, microRNA; DNMT, DNA methyltransferase; GBM, glioblastoma.
Author Contributions
Conceived and designed the experiments: BA, DA. Analysed the data: BA, DA. Wrote the first draft of the manuscript: BA. Contributed to the writing of the manuscript: BA, DA. Agree with manuscript results and conclusions: BA, DA. Jointly developed the structure and arguments for the paper: BA, DA. Made critical revisions and approved final version: BA, DA. All authors reviewed and approved of the final manuscript.
Disclosures and Ethics
As a requirement of publication author(s) have provided to the publisher signed confirmation of compliance with legal and ethical obligations including but not limited to the following: authorship and contributorship, conflicts of interest, privacy and confidentiality and (where applicable) protection of human and animal research subjects. The authors have read and confirmed their agreement with the ICMJE authorship and conflict of interest criteria. The authors have also confirmed that this article is unique and not under consideration or published in any other publication, and that they have permission from rights holders to reproduce any copyrighted material. Any disclosures are made in this section. The external blind peer reviewers report no conflicts of interest.
Footnotes
Acknowledgements
The authors would like to thank Wei Yi Cheng for performing the analysis of RNASeq and miRNASeq data and Dr. Hoon Kim for helpful discussions.
Supplementary Files
The supplementary material for this manuscript is available from 9037SupplementaryFiles.zip.
