Abstract
Objective
To explore stable and functional microRNA (miRNA)–disease relationships using a genome-wide expression profile pattern matching strategy.
Methods
We applied the ranked microarray pattern matching strategy Gene Set Enrichment Analysis to identify miRNA permutations with similar expression patterns to diseases. We also used quantitative reverse transcription PCR to validate the predicted expression levels of miRNAs in three diseases: inflammatory bowel disease (IBD), oesophageal cancer, and colorectal cancer.
Results
We found that hsa-miR-200 c was upregulated more than 40-fold in oesophageal cancer. The expression of miR-16 and miR-124 was not consistently upregulated in IBD or colorectal cancer.
Conclusions
Our results suggest that this expression profile matching strategy can be used to identify functional miRNA–disease relationships.
Keywords
Introduction
MicroRNAs (miRNAs) are a class of single-stranded small noncoding RNAs (∼22 nucleotides long) that negatively regulate messenger RNA (mRNA) expression at the post-transcriptional level.1,2 miRNAs bind to the 3′-untranslated region of target genes through base-pairing, resulting in mRNA degradation and/or translational inhibition. 3 Accumulating evidence shows that miRNAs are involved in multiple biological processes and cellular signalling pathways,4,5 and that mutations or the dysregulation of miRNAs can cause various diseases. Recently, several studies have identified close relationships between miRNAs and disease,4,6–9 leading to the construction of dozens of miRNA-related databases. For example, miR2Disease, a resource of miRNAs that are dysregulated in human diseases, currently includes 3273 miRNA–disease relationship entries. 10
A common strategy to explore potential disease-related miRNAs is to identify miRNAs that are differentially expressed in a disease state using technologies such as quantitative reverse transcription (qRT)-PCR, miRNA microarray analysis, or small RNA deep sequencing. Alternatively, cell transcriptional responses to various treatments or perturbations can be compared using algorithms such as Gene Set Enrichment Analysis (GSEA) 11 and large gene expression profiling datasets to quantitatively calculate the relevance of different perturbations. The first large-scale effort to apply this principle was the Connectivity Map project, which aimed to find potential connections among molecule treatments, disease states, and other bioprocesses by querying large-scale expression profiling data and validated gene sets. 12 Since then, several studies have also demonstrated the feasibility of this approach in drug reposition studies.13–15
Hypothetically, because miRNAs are integrated into regulatory networks that influence target and other downstream genes, cell transcriptional responses against miRNA perturbations (either overexpression or knockdown) may reflect the transcriptional response of the related disease state to some extent. Thus, miRNA–disease relationships can be identified using this transcriptional response comparison strategy. Previously, Jiang et al. suggested that it might be possible to determine miRNA–drug links by integrating miRNA targets with the expression profiles of cancers and cell responses to small molecules. 16 However, to the best of our knowledge, no investigation has directly compared the transcriptional responses induced by both miRNA genotype variation and disease. Recently, our group developed the ExpTreeDB database, which allows users to mine relationships among manually curated perturbations such as agent treatment, genotype variation, disease state, stress, and infection. 17 In this study, we explored miRNA–disease relationships using this methodology and datasets from the Gene Expression Omnibus (GEO). We collected global transcriptional response datasets representing 40 human diseases and 30 miRNA variation treatments. Pair-wise similarities were calculated to identify putative miRNA–disease links for a literature investigation and experimental validation in three diseases: inflammatory bowel disease (IBD), oesophageal cancer, and colorectal cancer. We found that miR-200 c was significantly overexpressed in oesophageal cancer. However, we did not observe the consistently upregulated expression of miR-16 or miR-124 in IBD or colorectal cancer.
Methods
Specimens
Subjects in this study were recruited by Zhejiang Provincial People’s Hospital, and included five oesophageal cancer patients (four men and one woman; average age, 67 years), five colorectal cancer patients (three men and two women; average age, 62 years), and five IBD patients (two Crohn’s disease cases and three ulcerative colitis cases; four women and one man; average age, 38 years). The diagnosis of all patients had been confirmed by pathology. Patients with cancer were free of other malignant neoplasms, and had not undergone radiotherapy or chemotherapy.
This work was approved by the Ethics Committee of Zhejiang Provincial People’s Hospital (approval number: 2014KY059). Informed consent was obtained from all subjects prior to their participation.
Two 1×1 cm tissue samples and the 5 cm margin of each carcinoma sample were obtained during surgical resection. All samples were washed with saline solution then immediately placed into 2 ml RNAlater solution and stored at 4℃ overnight (>16h). The tissues were then stored at −80℃ until required for analysis. Tissue samples from IBD and corresponding normal tissues were collected during pretreatment endoscopic biopsies and prepared as described above.
RNA isolation from clinical tissues
Total RNA was isolated using TRIzol reagent (Invitrogen, Waltham, MA, USA). Tissues were centrifuged at 2000 rpm, speed 500 xg for 5 min, then cell pellets were homogenized in 1.0 ml TRIzol reagent and incubated at room temperature for 5 min. Each sample was treated with 200 µl of chloroform, repeatedly inverted for 15 s, then incubated at room temperature for 10 min. Samples were then centrifuged at 12,800 rpm for 10 min at 4℃, and the colourless supernatant was transferred to a fresh tube and treated with the same volume of isopropyl alcohol. After incubation at 4℃ for 10 min, samples were again centrifuged at 12,800 rpm for 10 min at 4℃, the supernatant was removed, and the pellet was washed with 75% ethanol. Samples were centrifuged twice at 11,800 rpm for 5 min at 4℃, with removal of the supernatant after the first centrifugation, then the pellet was dried at room temperature and dissolved in RNase-free water. The RNA concentration was quantified by a Nanodrop 2000 Spectrophotometer (Thermo Scientific, Waltham, MA, USA).
qRT-PCR
cDNA was synthesized according to the manufacturer’s protocol (Promega). Briefly, 10 -µl reverse transcription reactions contained 2 nM miRNA RT primers, 500 µM dNTPs, 0.2 µl M-MLV reverse transcriptase, and 1 µg total RNA. Conditions were as follows: 16℃ for 30 min followed by 42℃ for 1 h and 75℃ for 10 min. The real-time PCR system contained 10 µl SYBR Premix Ex Taq, 0.5 µl upstream primer, 0.5 µl downstream primer (2.5 µM), 1 µl cDNA, and 8 µl RNase-free H2O. The reaction was incubated at 95℃ for 30 s, followed by 45 cycles of 95℃ for 5 s, then 60℃ for 30 s. miRNA expression was analysed using the 2 −ΔΔCt method, and U6 was selected as the control gene. Primer sequences are shown in Table S1. SPSS 17.0 software was used to perform statistical analysis. Data were shown as the mean ± SD. The t-test was used for analysis between cancer tissue and adjacent normal tissue or between disease and control group. Significance was defined as P value < 0.05.
Gene expression profiling data collection
We downloaded human global gene expression profiles representing transcriptional responses to miRNA perturbations and disease states from GEO dataset (GDS) records produced by two platforms, the Affymetrix human genome U133 plus 2.0 array (GPL570) and the Affymetrix human genome U133A array (GPL96). These two human microarray platforms are the most frequently used in GEO and include over 20,000 genes, which is favourable for use with the GSEA algorithm.12,13,18,19 Human GDS records with a “disease state” subtype description were downloaded as disease state datasets and were manually examined in ExpTreeDB. 17 Datasets related to miRNA perturbations were downloaded from GEO series (GSE) resources, and detailed treatments as well as miRNA entries were manually extracted. A similar approach was used to obtain small RNA silencing datasets. We required that one gene expression profile related to a given perturbation must include a blank control group so that transcriptional responses to perturbations could be clearly defined.
Generating a ranked gene list
Within each GEO dataset, we manually selected the experimental group of a particular perturbation. The experimental groups were then manually matched to normal control groups. Specific genotype variations and disease states were defined as “perturbations” to cells that induce transcriptional responses. Following the approach used in ExpTreeDB, 17 we generated a phenotype ranked list (PRL) to denote the transcriptional response to a perturbation such as miRNA overexpression, gene silencing, or disease state. In brief, pair-wise global gene expression fold-changes were calculated, and all fold-change lists under a certain perturbation were merged into a PRL using a hierarchical majority voting scheme to prevent poor representation of heterogeneous experimental settings. 20 Thus, the genes showing the highest level of up/down-regulation (presented as microarray probe names) under each permutation were placed at the top/bottom of the corresponding PRL.
For the disease state, two perturbations were defined to a different extent: a description of the disease, such as IBD, and the provision of more detailed information about stage or subtype, such as ulcerative colitis (UC) or Crohn’s disease (CD). We generated PRLs for both definitions, and correlation calculations showed that the two PRLs had high correlation coefficients (Enrichment Score = 0.662, Figure 1). Therefore, in this study we employed a general description of disease states and perturbation definitions (Table 1 and Figure 1).
A heat map of classified disease correlation scores. Each permutation is represented by a phenotype ranked list where the genes showing the most up-regulation are placed at the top while the most down-regulated genes are at the bottom. The correlation scores were computed by measuring gene regulation in correspondence with the GSEA method to obtain a score from −1 to +1. The color scale shows positive correlations in red and negative correlations in blue. Data sets used in this study. AD, Alzheimer’s disease; ACTH, adrenocorticotropic hormone; GIP, gastric inhibitory polypeptide; ET, essential thrombocythaemia; RCC, renal cell carcinoma; LCCS, lethal congenital contracture syndrome; FKRP, fukutin-related protein; GH, growth hormone; PRL, prolactin; SCID, severe combined immunodeficiency; HCC, hepatocellular carcinoma; OE, overexpression; KD, knockdown
Similarity determinations based on the GSEA algorithm
We quantified pair-wise similarities among PRLs as correlation scores based on GSEA. Following the strategy used by Iorio et al,13,20 we extracted the top 250 and the bottom 250 probes as gene signatures. The R package Gene Expression Signature with integrated GSEA was used for the similarity calculations.
21
In brief, the enrichment of a signature in a PRL is estimated by the Kolmogorov–Smirnov test for uniform distributions. The signature of permutation A is represented as (
Results
Overview of miRNA–disease similarities
From GSE records, we obtained transcriptional responses including genotype variation of 30 miRNAs for nine knock-down and 21 overexpression treatments. Variations in Drosha, importin 8 (IPO8), and DGCR8 were also included for their functions in miRNA biogenesis. Transcriptional responses representing 42 distinct disease states were collected from GDS records. Based on a GSEA-centred pipeline, we calculated the pair-wise similarities between the transcriptional responses of miRNA variations and disease states (Figure 2(a)). We found that the overall miRNA–disease similarity values followed a normal distribution (mean = −0.002, variance = 0.074, Figure 2(b)). However, we also observed some outliers with extreme similarity scores.
Pair-wise similarity scores between the transcriptional responses of miRNA variation and disease states. (a) A heat map of similarity scores. miRNA variations and diseases were sorted by their correlations with each other in the top 5%. (b) A distribution and normal probability plot of similarity scores.
Correlated miRNA–disease links
miRNA–disease relationships within the top five percentile.
KD, knockdown; OE, overexpressionl; CS, Correlation Score; PR, Percentile Rank
We found that the overexpression of miR-210 was most strongly correlated with disease state, showing a wide range of relationships to different diseases. Of the eight diseases linked to miR-210, four were solid cancers, including lung cancer and kidney sarcoma. IBD was identified as a disease connected with the most miRNA genotype variations, and was predicted to share transcriptional phenotype relationships with IPO8, KSHV miRNA, miR-124, miR-16, and miR-338-3 p/451 dysregulation. We also identified multiple myeloma as a disease with five relationships, but connecting with only one miRNA genotype variation containing a single miRNA type: miR-335.
Experimental validation of miRNA differential expression
We selected correlated miRNA links in both disease state samples and controls. Based on bio-sample availability, we selected oesophageal cancer samples to analyse miR-200 c expression (OE), and colorectal cancer samples to assess miR-16 (OE) and miR-124 expression (OE). We also included IBD samples for miR-124 expression as a comparison with previous studies.
We observed consistently high up-regulation of miR-200 c in the oesophageal cancer samples compared with adjacent control samples (Figure 3). Of four sample pairs, miR-200 c was expressed >47-fold in adenocarcinoma, and in another pair it was expressed 261-fold compared with controls. This result strongly agreed with our prediction that microRNA-200c highly correlated with the development of oesophageal cancer. We also identified six genes that were significantly (P < 0.01) downregulated in association with miR-200 c overexpression and in oesophageal cancer samples using GEO2R of the GEO database. We queried these six genes in TarBase,
15
and found that one, Quantitative RT-PCR assay for miR-200 c expression in paired normal and tumour tissues from five oesophageal adenocarcinoma patients. The expression levels of normal tissues were standardized to 1, and the fold-changes are plotted as means ± SD of three replicates.
We did not observe consistent upregulated expression of miR-16 or miR-124 in either IBD or colorectal cancer samples compared with controls (Figure 4). In colorectal tissue samples from two CD and three UC patients, miR-124 was shown to be upregulated in two (one CD and one UC). The expression profiles were similar to those of miR-16/24 in colorectal cancer samples.
Quantitative RT-PCR assays for miR-16 and miR-124 expression in colorectal cancer patients and miR-124 expression in IBD patients. The expression levels of normal tissues are standardized to 1, and the fold-changes are plotted as means ± SD of three replicates.
Discussion
Screening results based on microarray analysis are usually unstable and less functionally related than those derived from other assays. In this study, we connected miRNAs with diseases to better determine their functional relationships. We found that hsa-miR-200 c was upregulated >47-fold in five oesophageal cancer samples compared with normal samples. We also identified the experimentally validated miRNA target
We observed only a slight increase or no change in miR-124 expression in IBD patient samples compared with controls. This is inconsistent with the findings of Koukos et al., who reported miR-124 downregulation in UC samples. This difference could be explained by individual variation in miRNA levels even among patients with the same disease, reflecting differences in development stage or genetic background. Alternatively, it could be caused by the small sample size in our study.
Relationships between IBD, miR-16, and miR-124 have previously been reported,23,24 and the dysregulation of miR-16 has been observed in both CD and UC.23,25 The decreased expression of miR-124 has also been suggested to play a role in promoting inflammation and the pathogenesis of UC through up-regulating signal transducer and activator of transcription 3. 23 Moreover, miR-335 has previously been shown to play a role in multiple myeloma, 26 while miR-210 was reported to be induced by hypoxia in breast cancer. 27 This agrees with our observation that half of the diseases associated with miR-210 in the present study were solid tumours, which are known to often be hypoxic.28,29
A large systematic collection of gene expression changes resulting from cellular exposure to perturbations such as small molecules and genetic variations would help the understanding of cellular pathways and the development of therapies and biomarkers. The Library of Integrated Network-Based Cellular Signatures project founded by the National Institutes of Health has therefore been established. This work should increase the power of simulated screening following the accumulation of expression change information under various perturbations.
The main limitation of our study is that we only investigated a small number of miRNAs in association with disease. In the future, we plan to include more diseases and miRNAs and to increase the number of patient samples. We also aim to further explore the relationships between miRNA and diseases using
Footnotes
Declaration of conflicting interest
The authors declare that there is no conflict of interest.
Funding
This work was supported by grants from the National Nature Science Foundation of China [81273488 to XC Bo, 81230089 to XC Bo, 81102419 to F Li, and 31100951 to M Ni] and the National Key Technologies R&D Program for New Drugs of China [2012ZX09301-003 to KL Liu].
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
