Abstract
BACKGROUND:
Gatrointestinal stromal tumors (GISTs) are the main mesenchymal tumors found in the gastrointestinal system. GISTs clinical phenotypes differ significantly and their molecular basis is not yet completely known. microRNAs (miRNAs) have been involved in carcinogenesis pathways by regulating gene expression at post-transcriptional level.
OBJECTIVE:
The aim of the present study was to elucidate the expression profiles of miRNAs relevant to gastric GIST carcinogenesis, and to identify miRNA signatures that can discriminate the GIST from normal cases.
METHODS:
miRNA expression was tested by miScript™miRNA PCR Array Human Cancer PathwayFinder kit and then we used machine learning in order to find a miRNA profile that can predict the risk for GIST development.
RESULTS:
A number of miRNAs were found to be differentially expressed in GIST cases compared to healthy controls. Among them the hsa-miR-218-5p was found to be the best predictor for GIST development in our cohort. Additionally, hsa-miR-146a-5p, hsa-miR-222-3p, and hsa-miR-126-3p exhibit significantly lower expression in GIST cases compared to controls and were among the top predictors in all our predictive models.
CONCLUSIONS:
A machine learning classification approach may be accurate in determining the risk for GIST development in patients. Our findings indicate that a small number of miRNAs, with hsa-miR218-5p as a focus, may strongly affect the prognosis of GISTs.
Clinical and histopathological characteristics of GIST cases
Clinical and histopathological characteristics of GIST cases
Gastrointestinal stromal tumours (GISTs) are some of the most frequent mesenchymal tumours of the gastrointestinal tract. The major initial event in GIST pathogenesis is linked with gain-of-function mutations of the receptor tyrosine kinase genes (KIT) or that of the platelet-derived growth factor gene (PDGFRA) [1]. GISTs may be developed in any part of the gastrointestinal tract, but are mainly found in the stomach [2]. These tumours present asymptomatically in 18% of cases, especially as small tumours (
Recently non-coding RNAs have been studied for their involvement in post-transcriptional regulation of gene expression and have attracted scientific interest for the identification of their role in carcinogenesis. Among them it has been suggested that micro-RNA (miRNA) expression is related to carcinogenesis and the phenotypic expression of several tumors including GISTs [8]. Regarding the latter, specific miRNA expression profiles are associated with chromosome 14q loss [9, 10], GISTs anatomical site [11], KIT and/or PDGFRA mutation [10], GIST development risk [9] and overall survival [11]. Additionally, expression of specific miRNAs is related to imatinib resistance in GIST [12, 13]. Thus, a number of studies support that miRNAs can be used as diagnostic, prognostic and/or predictive biomarkers or have therapeutic potential and this increasing recognition on their role in GISTs opens the way for additional studies in the field that could improve the clinical practice. In the present work we study the expression profiles of miRNAs relevant to carcinogenesis, in a cohort study of 20 patients with stomach GIST and we employ machine learning approaches to help us not only understand the clinical features of GIST, but also to evaluate this approach for future personalized medicine applications.
Material and methods
Patients and tissue samples
Tissue from twenty gastric GISTs and twenty healthy gastric biopsies were included in the study. All tumors were sporadic, as assessed by personal and family histories. The criteria used to collect the samples were: 1) only gastric GISTs included in the study, 2) all the neoplasms were primary tumors and resectable according to the preoperative evaluation, and 3) no neoadjuvant therapy had been performed. Healthy gastric biopsies were received from patients suspected of non-malignant diseases (i.e. gastritis). All cases were identified in the 1st Propaedeutic Department of Surgery of Hippocration General Hospital, National and Kapodistrian University of Athens between March 2015 and November 2018. Authorization for the use of these tissues for research purposes was obtained from the Hospital Review Board and all the samples were obtained with informed consent from the participants. The clinical and histopathological details of all cases are presented in Table 1.
MiRNA expression
MiRNA isolation was performed using the NucleoSpin miRNA kit (Machnery-Nagel, Germany). Reverse transcription of 500 ng of RNA was performed with the miScript II RT Kit (Qiagen), and the expression of a panel which tests for 84 miRNAs, was performed using the miScript™miRNA PCR Array Human Cancer PathwayFinder (MIHS-102Z, Qiagen) and miScript SYBR Green PCR Kit (Qiagen). This panel includes miRNAs that have been correlated with the diagnosis, staging, progression, or prognosis of various tumors. Each array contains six different snoRNA/snRNA as a normalization control for the array data (SNORD61, SNORD68, SNORD72, SNORD95, SNORD96A, RNU6-6P), miRNA reverse transcription control (RTC) and positive PCR control (PPC). Samples were grouped into two categories: Normal and Cancer. The miRNA relative expression was calculated by the 2
Machine learning modeling approach
To further assess the value of our expression results and refine it by identifying the most important miRNAs, which might be able to distinguish between our groups, we employed several classification and regression models using the caret package [14] in R [15] on the entire miRNA panel regardless of previous differential expression results. This allowed for wider approach which didn’t preclude several features (miRNAs). We used the previously calculated relative expression per miRNA per sample, after preprocessing with “scale” and “center”, to train and validate the accuracy of six models using the appropriate algorithms: two Classification And Regression Trees (rpart2 and bagtree), one Random Forest implementations (ranger), a k-Nearest Neighbors (knn), a Support Vector Machine (svm) and a C5.0 classification tree (C5.0). All training models used a leave-group-out cross-validation (LGOCV) approach with a 70%–30% partitioning and 100 iterations. Because of the large differences of the number and identity of predictors each model used, we also employed a Recursive Feature Extraction method to identify the best set of predictor miRNAs [16, 17].
miRNA target identification and functional analysis
miRNA target identification was performed using the multimir R package [18]. Two different sets of gene targets were identified: one based on validated miRNA-target interactions from 3 databases (“mirecords” [19], “mirtarbase” [20], and “tarbase” [21]) and one using predicted miRNA-target interactions from 8 databases (“diana_microt”, “elmmo”, “microcosm”, “miranda”, “mirdb”, “pictar”, “pita”, and “targetscan”) [22]. Both gene lists were used as input to the clusterprofiler package [23] to be enriched using Gene Ontology [24] terms (both from GO-Biological Process and GO-Molecular Function). We also validated the target genes using Disease Ontology [25] and the DisGeNet database [26] for their associations with specific phenotypes (e.g. neoplasms). The
Results
Differential MiRNA expression
The analysis of the two sample groups (Normal and Cancer) showed, in total, 56 differentially expressed miRNAs with a
Differentially expressed miRNAs in GIST. Fold regulation is based on the 2
method while
-values derive from a student
-test between GIST and healthy samples. FDR was calculated using the Benjamini-Hochberg method
Differentially expressed miRNAs in GIST. Fold regulation is based on the 2
Volcano plot of all 84 miRNAs in our assay. Red dots represent miRNAs with a 
Overall Accuracy and Kappa for the seven models trained and validated on our miRNA dataset.
Top 20 predictors for each model ranked by importance.
Validation of the miRNA predictors using Disease Ontology and the DisGeNet database. Lists of validated and predicted target interactions were used and the top 30 results are represented ranked by adjusted 
As described in our methodology, we trained and validated seven classification models on the normalized miRNA expression data. For each one the mean Accuracy and Kappa was calculated along with their percent of false negative hits. The Random Forest (rforest), Classification And Regression Trees (rpart) and Bootstrap Aggregating of Classification And Regression Trees (CARTbag) models performed similarly having mean accuracies of 99.58%, 99.5% and 9933% respectively. These were followed by the C5.0 (C50), k-nearest neighbors (KNN), and support vector machine (SVM) models which had mean accuracies of 95.91%, 89.58%, and 86.67% respectively. All models were applied without individual tuning, which might increase their performance. Figure 2 shows the total Accuracy and Kappa for each model. What was more important for us, was to see which miRNAs each model picked as predictors for distinguishing sample groups. Figure 3 shows the top 20 preferred predictors (miRNAs which can predict if a sample belongs to the normal or GIST group) along with their percent importance for each model. Unfortunately the models could provide a consensus only on hsa-mir-218-5p and most of them agreed on some subsets. For example the C5.0 model only accounted for hsa-mir-218-5p as the sole predictor. For this reason we applied a Recursive Feature Extraction (RFE) algorithm to identify a subset of our miRNAs which can best explain our sample groupings. The RFE algorithm reported 100% accuracy and Kappa when using groupings of 1, 4, 5, 6, 9 and 10 miRNAs. We wanted our downstream analysis to be as broad as possible, so we selected the grouping of ten miRNAs which included hsa-miR-218-5p, hsa-miR-222-3p, hsa-miR-196a-5p, hsa-let-7c-5p, hsa-miR-125a-5p, hsa-miR-126-3p, hsa-miR-146a-5p, hsa-miR-149-5p, hsa-miR-30c-5p, and hsa-miR-148a-3p.
Biological background for the targets of our miRNA predictors. The validated targets of the miRNAs were used for enrichemnt in GO-BP and GO-MF. Top 30 results are shown ranked by adjusted 
Using the 10 miRNAs previously identified we performed a miRNA-target analysis using databases which provide both validated and predicted interactions. The validated results included 21,648 miRNA-gene interactions, whereas, the predicted results were 34710 (
Discussion
Several studies have identified the differential expression of miRNAs in GISTs and have shown clearly different miRNA profiles between GIST and non-cancerous tissues. In the present study, in addition to studying the expression of 84 miRNAs involved in the carcinogenesis of gastric GIST, we also utilized a machine learning in order to find a miRNA profile that can predict the risk for GIST development. Most of the miRNAs that we found to be differentially expressed in our samples have been previously shown to be associated with GIST [11, 27, 28, 29, 30] In agreement with previous studies hsa-miR-196a-5p, hsamiR-148a-3p and hsamiR-125a-5p were found to be significantly upregulated in our cohort, whereas hsamiR- let-7f-5p, hsamiR-126-3p, hsa-miR-222-3p, hsa-miR-146a-5p, hsa-miR-218-5p among others were found to be significantly downregulated [11, 28, 29, 30, 31, 32]. It is important to note that these miRNAs directly target fundamental genes in GIST pathogenesis like KIT/AKT, PDGFRA pathways, and have also been found as crucial carcinogenesis mediators in other gastrointestinal cancers such as gastric cancer [30, 33, 34].
The miRNAs that were differentially expressed were used to construct a machine learning classifier approach to pinpoint the miRNAs that could be independently related to GIST risk prognosis. All the risk models we used based on miRNA expression, seem to have a high accuracy for GIST risk prediction. Hsa-miR-218-5p was found to be the best predictor for GIST development in our cohort. Hsa-miR-218-5p serves as tumor suppressor in numerous cancer types. Its role in GIST has been reported in a few studies. Fan et al. [30] in agreement with our findings, reported that the expressions of miR-218 in tumor tissue and GIST cell lines were significantly decreased compared to the normal GISTadjacent tissue, and found that miR-218 can negatively control the expression of the KIT protein and inhibit the proliferation and invasion of GIST cells. Additionally it has been suggested that miR-218 increases the sensitivity of GIST to imatinib and more specifically that the expression of miR-218 is down-regulated in an imatinib mesylate-resistant GIST cell line (GIST430), while miR-218 over-expression can enhance the sensitivity of GIST cells to imatinib mesylate [35].
Hsa-miR-146a-5p, hsa-miR-222-3p, and hsa-miR-126-3p exhibit significantly lower expression levels in GIST cases compared to controls in our study and were among the top predictors in all of our models. The role of hsa-miR-146a-5p has not yet been investigated in GIST cases, but it is known that it acts as a tumor suppressor miRNA in some cancers (ie non-small cell lung cancer, esophageal squamous cell cancer, pancreatic cancer), and as an oncogenic miRNA in others (i.e. bladder cancer, cervical cancer, melanoma) [36]. Although, In a number of neoplasms, controversial results have been produced; for example, in gastric cancer there is evidence indicating a tumor suppressive role for miR-146a, but several studies have provided support for the opposite [37, 38]. Regarding hsa-miR-222-3p, in agreement with our results, it has been found reduced in most GISTs, in contrast to other tumors [39], however the functional role of this downregulation is not fully understood. Ihle et al. suggested that miR-222 downregulation induces apoptosis in vitro by a signaling cascade involving KIT, AKT and BCL2, and this miRNA appears to functionally counteract oncogenic signaling pathways in GIST [40]. Hsa-miR-126-3p has not been extensively studied in GISTs but Choi et al. reported that it was down-regulated in high risk GISTs and is implicated in cell cycle arrest, cell growth and death [9]. Also, in other cancers like non-small-cell lung cancer (NSCLC), hepatocellular carcinoma (HCC), cholangiocarcinoma (CCA) miR-126-3p’s expression in tumor tissues was also decreased [41]. Our results demonstrate that the previously mentioned miRNA signatures can be predictive indicators for GIST development.
The analysis of the miRNA-target regulatory networks shows mainly the involvement of neoplastic phenotypes and that our miRNA predictors are directly associated with cancer. The gene ontology analysis was significantly enriched in chromatin remodeling and histone modification, whereas molecular function focused on the regulation of cell adhesion molecules and cadherin binding. Indeed, it is strongly believed that epigenetic phenomena including chromatin modifications underlie GIST tumorigenesis and influence the clinical course and response to treatment [42]. Additionally, cell adhesion molecules like L1 cell adhesion molecule (CD171) overexpression predicts poor prognosis in GISTs [43] and E-cadherin significant under-expression was closely related to metastasis of GISTs [44]. Therefore, the miRNA expression may influence the GIST prognosis via the regulation of important pathways related to carcinogenesis.
Even though we performed comprehensive machine learning and bioinformatics analyses using the miRNA expression profile of GIST and confirmed the classification accuracy by cross-validation, there are some limitations in our study. The sample size was small since we only used gastric GIST in order to have a homogenous population, and our samples were from one surgery clinic. Due to the limited sample availability the study lacks validation experiments to assess the expression of the predictive miRNAs and the corresponding target genes. Our high accuracy scores are not caused by overfitting but are prone to exaggeration due to cross-validation. This can be amended in future works by using larger datasets that can effectively be partitioned into different training and validations sets. Therefore further studies are needed to support our findings.
In conclusion, a Machine Learning classification approach may be accurate in determining the risk for GIST development in patients. Moreover a small number of miRNAs, with hsa-miR218-5p as a focus, may strongly affect the prognosis of GIST.
Author contributions
IKS, DT and KGT contributed to samples and data collection. IKS, ND and MG carried out the experiments, designed the model and the computational framework and analyzed the data. MG, ND, GZ and KGT wrote the manuscript with input from all authors.
Supplementary data
The supplementary files are available to download from http://dx.doi.org/10.3233/CBM-210173.
sj-xlsx-1-cbm-10.3233_CBM-210173.xlsx - Supplemental material
Supplemental material, sj-xlsx-1-cbm-10.3233_CBM-210173.xlsx
Footnotes
Acknowledgments
This study was funded by Scholarship – Grant to Ioannis K. Stefanou by a non-profit organization of Greek Society of Cancer Biomarkers and Targeted Therapy.
