Sage Journals: Discover world-class research

Abstract

Personalized medicine is promising a revolution for medicine and human biology in the 21st century. The scientific foundation for this revolution is accomplished by analyzing biological high-throughput data sets from genomics, transcriptomics, proteomics, and metabolomics. Currently, access to these data has been limited to either rather simple Web-based tools, which do not grant much insight or analysis by trained specialists, without firsthand involvement of the physician. Here, we present the novel Web-based tool “BioMiner,” which was developed within the scope of an international and interdisciplinary project (SYSTHER^†) and gives access to a variety of high-throughput data sets. It provides the user with convenient tools to analyze complex cross-omics data sets and grants enhanced visualization abilities. BioMiner incorporates transcriptomic and cross-omics high-throughput data sets, with a focus on cancer. A public instance of BioMiner along with the database is available at http://systherDB.microdiscovery.de/ , login and password: “systher”; a tutorial detailing the usage of BioMiner can be found in the Supplementary File.

Keywords

data mining multiomics data integration biomarker detection pathway visualization personalized medicine cancer

Introduction

On the way to the medicine of the 21st century, personalized (also called individualized) medicine is a great and crucial step. It allows physicians to tailor therapy plans to the patient's distinctiveness, providing better treatment outcomes for patients and overall less drug use. This equates to faster recovery, less side effects, and lower expenses for the health care system.¹

Personalized medicine is especially promising with regard to the treatment of neoplastic disorders. They are among the most complex diseases known. Individualized medicine helps in developing pharmaceuticals and treatments that can efficiently eliminate cancer cells and at the same time cause the least amount of damage to the surrounding tissues and the patient overall. For personalized medicine and the fight against cancer, disease-centered high-throughput data sets are ideal resources. They can be used to generate or verify hypotheses, to compare own findings with public data, and to find novel biomarkers and identify their functional contexts. To achieve this, a complex data processing pipeline is required, comprising (at least) the following steps^2,3: 1.

Searching and downloading a suitable public data set

Importing data into a statistical environment

Performing adequate data normalization

Performing quality control of data and normalization

Formulating and implementing a statistical analysis and

Interpreting the results

The realization of this workflow is very time consuming and requires special know-how in bioinformatics and statistics.⁴ Physicians and biologists not experienced in high-throughput data analysis can profit immensely from a tool that allows easy and intuitive access to cross-omics high-throughput data sets, granting them firsthand access to this vast resource of knowledge.

The majority of biological high-throughput data sets are stored in public data repositories such as Gene Expression Omnibus (GEO)⁵ and ArrayExpress⁶ for microarray data, PRoteomics IDEntifications (PRIDE)⁷ for proteomics data, or Sequence Read Archive (SRA)⁸ of the National Center for Biotechnology Information for next-generation sequencing data. These repositories provide data storage for a dedicated set of single “omic” data type but do not support linking of different types of omics data (cross-omics). GEO and PRIDE also provide mining tools to inspect their data sets and to perform some limited data analysis.^9,10 Workflow management systems such as KNIME,¹¹ TAVERNA,¹² Orange4WS,¹³ and Galaxy¹⁴ use graphical user interfaces to build complex multistep data-mining workflows. These tools are designed to grant some limited analysis capabilities but do not provide unified (cross-omics) access to data sets. Performing data analysis based on cross-omics data sets proves to be a challenging task, even for those experienced with data analysis. Table 1 presents a comparison of BioMiner with other systems.

Table 1.

Comparison of features between BioMiner and equivalent tools.

FEATURES	BioMiner	GEO	KNIME	GALAXY	R
Interactive Plotting	✓	X	X	X	X
Easy access to omics data	✓	X	X	X	X
Contains database with relevant information	✓	X	X	X	X
Shared data / remote access	✓	✓	X	X	X
Import own/public data	X	✓	✓	✓	✓
Programmable	X	X	✓	✓	✓

ANALYTICAL CAPABILITIES	BioMiner	GEO	KNIME	GALAXY	R
Differential analysis	✓	✓	✓	✓	✓
Correlation analysis	✓	X	✓	✓	✓
Cross-omics mapping	✓	X	X	X	X
Enrichment analysis	✓	X	✓	✓	✓
ANOVA, advanced modelling	X	X	✓	✓	✓
Clustering	X	X	✓	✓	✓

Notes: Aa “✓” indicates that the tool supports the feature. “x” indicates that the feature is missing in standard installations. Please note that in KNnIMEme, Galaxy, and R, advanced users may extend standard capabilities.

In a joint effort, the partners of the binational SYSTHER (Systems Biology Tools Development for Cell Therapy and Drug Development – ww.systher.eu) project successfully investigated the molecular mechanisms of different cancer types. To analyze cancer from multiple points of view, complementary high-throughput data sets from transcriptomics, metabolomics, and proteomics were collected.^15–18 To store these different “omics” data sets together with complementary data from public resources, we created SystherDB, resorting to previous work.^19,20

A database can only be beneficial to the user if the stored data are well presented and easily accessible. The complex nature of biological high-throughput data makes this a challenging task. Rising to this challenge, we developed BioMiner, a Web-based toolbox for mining and visualizing data stored in SystherDB. The main objective is to give all partners in the SYSTHER project a tool that they can use to obtain their own results from high-throughput data. Regardless of their expertise, the partners can use BioMiner to interactively explore public and proprietary multiomics data sets.

A public instance of BioMiner is freely available online. It currently contains 18 different studies (Table 2), with almost 4,000 microarrays and more than 187 Mio measured values of genes, proteins, or metabolites. Because BioMiner was developed in the SYSTHER project, most of the studies are related to the focus of the project, glioblastoma multiforme (GBM). Most of them have been imported from different public resources. The large studies in particular highlight the potential of BioMiner to easily inspect and analyze huge data sets. We demonstrate the application by presenting typical use cases for the identification of cancer biomarkers. A graphical representation of the workflow can be found in Figure 1.

Table 2.

Studies in SystherDB: overview of the studies currently available in the SystherDB.

TITLE	TYPE	FACTORS	GEO-ID	SAMPLES	PM-ID	BIOMOLECULES
Glioma-derived stem cell factor effect on angiogenesis in the brain	Glioma	Cell Type; Tumor Grade	GSE4290	180	16616334³⁶	Genes
High-grade gliomas (HG-U133B)	Glioma	Cell Type; Tumor Grade; Necrosis; Survival Time	GSE4271	100	16530701⁴⁶	Genes
Gliomas of grades III and IV	Glioma	Cell Type; Tumor Grade	GSE4412	85	15374961⁴¹	Genes
Expression profiles of human glioblastoma frozen tumors and cell lines	Glioma	Cell Type; Cell Number	GSE9171	30	18394558⁴⁷	Genes
Glioblastoma from a homogenous cohort of patients treated within clinical trial	Glioma	Disease State; Survival Time	GSE7696	84	18565887	Genes
Feedback circuit among INK4 tumor suppressors constrains human glioblastoma development	Glioma	Cell Type	GSE9171	30	18394558	Genes
Gene expression analysis of glioblastomas identifies the major molecular basis for the prognostic benefit of younger age (HG-U133A)	Glioma	Tumor Type; Survival Time	GSE13041	191	18940004	Genes
Transcriptome profile of human colorectal adenomas	CRC	Disease State	GSE8671	64	18171984⁴³	Genes
Clinical significance of osteoprotegerin expression in human colorectal cancer	CRC	Disease State; Method	GSE21510	148	21270110⁴⁸	Genes
Human colorectal cancer cell lines treated with several inhibitors of PI3Kinase-AKT signaling pathway	CRC	Cell Type; Treatment	GSE18005	15	20546605	Genes
RAS signaling in colon carcinoma: target gene deregulation and growth control through Y-box-binding protein 1	CRC	Cell Type; Treatment	GSE18232	18	21170361	Genes
Expression data from 290 primary colorectal cancers	CRC	Tumor Grade; Survival Time	GSE14333	226	19996206	Genes
NCI60 expression profiling using the Agilent Whole Human Genome Oligo Microarray	Cancer	Cell Line; Cell Name	GSE22821	249		Genes, Proteins, Metabolites
Expression data from the Cancer Cell Line Encyclopedia (CCLE)	Cancer	Tumor Location; Tumor Histology	GSE36133	917	22460905	Genes
Gene expression profile of peripheral blood lymphocytes: comparison between melanoma patients and healthy controls	Melanoma	Disease; Cell Type	GSE6887	46	17488182⁴⁹	Genes
Global control of cell cycle transcription by coupled CDK and network oscillators	Cell Cycle	Group; Time	GSE8799	60	18463633⁵⁰	Genes
Human body index - transcriptional profiling		Tissue; Disease; State	GSE7307	677		Genes
A genomic storm in critically injured humans		Age; Sex	GSE36809	812	22110166	Genes

All analyses presented in the methods section can readily be reproduced and expanded using the public Web server http://systherDB.microdiscovery.de/ , with login and password “systher.”

Figure 1

Typical workflow in which BioMiner can be integrated. The yellow boxes represent steps within the field of bioinformatics (or in this case, done in BioMiner), the green box represents the wet-lab work, and the gray box represents the theoretical part (eg, study design) of the workflow.

Methods

BioMiner is designed as a Web application and uses Google Web Toolkit (GWT) for the graphical user interface. GWT functionality is enhanced by customized plot-applets tailored for interactive visualization of large data sets. Results are presented in two parallel views composed of a table and a plot. Both views are interactive and user-defined selections can be synchronized. Pathway visualization is achieved by extending the PathVisio library.²¹

Experimental data from genomics, proteomics and metabolomics are stored in a large manually curated MySQL^‡ database (SystherDB). To ensure data consistency, data import has to be performed by a dedicated specialist. Specific indexing methods have been implemented to achieve good response times. Even when querying large studies comprising several millions of measurements, response times are typically within just a few seconds. Metabolite data are annotated using three different identifier systems: Golm Metabolome Database,²² Human Metabolome Database (HMDB),²³ and Kyoto Encyclopedia of Genes and Genomes (KEGG).²⁴ Integrating the different data types into a single outcome requires a predefined cross-omics relationship, eg, a mapping of metabolites onto genes or vice versa. Integration of semantic information is achieved by singular enrichment analysis.²⁵ Especially, pathway and functional information from Reactome,²⁶ KEGG,²⁴ WikiPathways,²⁷ and GeneOntology²⁸ are supported. Enrichment statistics with multiple testing-corrected P-values are calculated as described in Bluthgen et al.²⁹ Correlation analyses are based on Pearson correlation coefficients. To filter noise and improve response times, correlations are calculated for high-variance genes (by default, the top 500 genes).

‡

www.ensembl.org

BioMiner complies with public data management standards such as Minimum Information About a Microarray Experiment (MIAME),³⁰ Minimum Information About a Proteomics Experiment (MIAPE),³¹ and Minimum Information About a Metabolomics Experiment (MIAMET).³²

Cross-Omics Mapping

For the mapping between genes and proteins, we use the relations provided by ENSEMBL^#³³ database accessible via BioMart.³⁴ This mapping is updated on a regular basis (coupled with new versions of ENSEMBL). For establishing relations between genes and metabolites, we use the combined information of ConsensusPathDB³⁵ and HMDB.²³ These databases provide information about metabolites (compounds) and associated metabolic enzymes. Genes coding for the metabolic enzymes are mapped to the corresponding metabolite. This mapping is updated with new versions of HMDB.

www.ensembl.org

Results

The main goal of BioMiner (beyond easily accessible storage of high-throughput cross-omics data) is to provide a set of tools that allow the performing of a number of statistical investigations. To demonstrate the intuitive usability of the application, we present typical workflows used for the identification of genes and pathways associated with GBM. The presented use cases are mainly based on data of the GBM Study by Sun et al.³⁶, containing data from 180 microarrays derived from tissues of patients with different brain tumor types and grades, as well as from tissues of nontumor controls. This study has a relatively simple experimental design with a large number of replications. The major analysis steps of BioMiner are shown in Figure 2, including study overview, detection of differentially expressed genes, identification of pathways, visual pathway inspection, comparison of gene expression with protein expression, and correlation of genes with survival times. These steps are demonstrated in more detail.

Figure 2

Data mining with BioMiner. Screenshots of different results from data mining with BioMiner including the following: (A) study overview, (B) detection of differentially expressed genes, (C) correlation of gene expression and survival time, (D) identification of significantly enriched pathways, (E) visual pathway inspection based on predefined layouts, and (F) biomolecule comparison of gene and protein expression. Results are typically presented in synchronized, parallel views composed of a table and a plot. The pathway inspection is shown in more detail in Figure 4.

Study Selection and Inspection

In order to access data, an experiment (study) is selected from an overview presenting key properties of all studies in the database (Fig. 2A). To find a study of interest, the table can be filtered using keywords such as “glioma” or biomolecule types such as “genes.” Once a study is selected, different biological and technical properties can be inspected, such as number and grouping of samples, experimental or clinical parameters, or the experimental technologies used.

Differential Analysis

The simplest and most common approach for the detection of tumor-associated biomarkers is a differential analysis based on statistical tests such as the t-test. Differential biomolecule identification can be configured using a drag-and-drop mechanism of the relevant samples. As an example, we compare GBM WHO grade IV samples versus nontumor control samples in the study of Sun et al.³⁶ The comparison is performed on the fly, and the resulting fold values and P-values are presented in a tabular view and can be displayed in interactive plots (eg, volcano plot: Fig. 2B). The two top upregulated genes between glioma WHO grade IV and nontumor samples are IGFBP2 and CHI3L1, both of which are described as important glioma-related markers in the literature.^37–40 The table with the differential results can be filtered for a specific gene name or description. Searching for the ratio of cursive reveals a 2.6–fold upregulation in GBM WHO grade IV (Log2 ratio of 1.4).

Correlation Analysis

Correlation is a general and versatile tool to identify associations and predictive relationships between biomolecules or experimental parameters. For example, if a study is annotated with survival times, the analysis of correlated events allows the identification of markers associated with survival. This association may be positive (high gene expression indicates long survival) or negative (high gene expression indicates short survival). BioMiner implements different types of correlation analyses to search for an association between different biomolecules or between biomolecules and clinical parameters. As an example, Figure 2C shows the results of a correlation analysis between genes and survival time based on the study of Freije et al.⁴¹ A ranked table of highly correlated genes is complemented by a scatter plot of the top gene OSBPL11. We included several GBM- and colorectal cancer (CRC)-related studies with annotated survival times (Table 2).

Working with Gene Groups

The user can define and store a specific group of biomolecules, eg, by selecting a set of genes with significant P-values or high fold changes. Defining a biomolecule group offers the possibility for cross-study analyses, addressing, for instance, the question: How do the top upregulated genes from the data set of Sun et al.³⁶ behave in other data sets? To this end, a group of biomolecules can be loaded into existing plots. With this approach, the user can easily compare two studies. An example comparing two similar studies is given in Figure 3. This volcano plot visualizes the differential biomolecule comparison for astrocytoma grade III versus GBM grade IV using the data set from Freije et al.⁴¹ Top upregulated genes identified with the data set from Sun et al.³⁶ are highlighted in red (refer “Differential analysis” section earlier). The majority of the highlighted genes show a very good agreement between both GBM studies. This agreement even increases if the highlighted gene set is created from the comparison of tumor grade III with tumor grade IV in the study by Sun et al.³⁶ (data not shown). The top upregulated gene in both studies is MIR21 (rightmost red point), which has been identified recently as a useful biomarker for GBM.⁴² But there are also some disagreements between both data sets. Especially the gene MALAT1 (leftmost point in Fig. 3) is downregulated in the data based on Freije et al.⁴¹ while showing an upregulation in the data set from Sun et al.³⁶

Figure 3

Volcano plot for cross-study comparison. Volcano plot visualizing the differential comparison of astrocytoma grade III versus GBM grade IV using the data set from Freije et al.⁴¹ Top upregulated genes identified with the data set from Sun et al.³⁶ (GBM grade IV vs control) are highlighted to investigate the relation between different studies. The majority of the highlighted genes show a good agreement between both brain tumor experiments.

Beyond comparing closely related studies, this approach can also be used to investigate relations between more heterogeneous studies. By highlighting the top genes from Sun et al.³⁶ in the differential comparison of adenoma versus normal mucosa,⁴³ we detect general and more specific cancer markers. IGFBP2 (already reported above) and TGFBI are upregulated in both tumor types, while the gene PLOD2 shows upregulation only in GBM but not in CRC.⁴⁴

Functional Characterization and Interactive Pathway Plot

Typically, the result of a differential analysis is not a single gene but rather a set of up- or downregulated genes that may be related in a systemic context (such as a biochemical pathway). The contextual interpretation of a given set of genes can be challenging, especially considering the multitude of different gene functions and involvements in biochemical pathways. BioMiner provides a straightforward way to identify biochemical pathways, Gene Ontology terms, or chromosomal regions significantly related to a given set of genes (Fig. 2E). Pathways from WikiPathways and KEGG can be displayed and interactively inspected based on predefined layouts. Integrating results from differential analyses in the pathway graph allows the investigation of genes and metabolites in their systemic contexts. This is illustrated for the “cell cycle” pathway in Figure 4 using the differential biomolecule identification of GBM tumor grade 4 versus control (refer “Differential analysis” section).

Figure 4

Pathway visualization. Interactive pathway visualization of the cell cycle pathway from WikiPathways repository.

Cross-Omics Analysis

The system internally contains cross-omics mappings between genes and proteins or between genes and metabolites. The complexity of the different cross-omics mappings is hidden from the user. For studies containing cross-omics data, the user can easily correlate different omics data types, eg, by creating a scatter plot comparing differential gene expression with differential protein expression (Fig. 2F).

Discussion

In this article, we demonstrate that the Web-based application BioMiner is well suited to perform statistical analyses in cancer-related high-throughput experiments. The studies selected for illustration are related to GBM. All presented analyses can be reproduced using the publicly available instance of the application.

In order to create a tool useful for a broad spectrum of life science users, our major goal was to keep BioMiner's usage as simple as possible while offering substantial flexibility in answering biological questions. Simplicity is achieved by following several different guiding principles. First, technical details are hidden from the user: ready-to-use normalized data are provided together with information on experimental design and clinical parameters. Second, experimental data are complemented by public knowledge on pathways and gene functions. Third, instead of providing an extensive repertoire of data analysis algorithms, we rather focus on a limited set of proven methods. To perform analyses beyond this set, it may be necessary to download result tables or complete study data and run analyses in more advanced statistical frameworks such as R/BioConductor.⁴⁵ Fourth, we model key aspects of the experimental design so that the main questions can be addressed rapidly. For instance, the case-control branches of a study are explicitly represented so that differential analysis can be achieved straightforwardly. More sophisticated analyses deviating from the default design are possible but may result in slower response times.

Of course, easy accessibility comes at a price: performance-optimized queries with response times of a few seconds are available only for predefined groups. All preprocessing steps, notably data normalization, have to be performed before data import and cannot be changed during an analysis. At the current state of development, there is no option for users to import own data sets. The predefined cross-omics mappings cannot be changed or individualized by the user. They have to be updated by the system administrator on a regular basis. Given the benefit of a fluent and straightforward usability, we think these limitations are acceptable.

The true potential of BioMiner is revealed when looking at the huge studies with several hundreds of microarrays. It takes only seconds and a few mouse clicks to compare 165 lung cancer samples with 56 breast cancer samples from the Cancer Cell Line Encyclopedia (CCLE) study comprising >900 microarrays. The interactive result presentation makes it very easy to inspect the results, search for additional information, or identify functional relations.

The system is able to represent personalized data such as high-throughput data (genomics, epigenomics, proteomics, and metabolomics) in combination with clinical parameters. If used in a clinical context, BioMiner is well suited to address the needs of personalized medicine. Interfacing with tumor biobanks or clinical data management systems, the system will enable clinicians to directly evaluate patient-related high-throughput data and draw therapeutic conclusions.

Conclusion and Outlook

BioMiner is a resource that enables rapid identification and assessment of relevant features in complex biological experiments. It provides clinicians and physicians a platform integrating high-throughput data together with clinical parameters. Straight design and ease of use make BioMiner perfectly suited for those who are not experts in bioinformatics and statistics. The application enables experimentalists to directly assess data, validate hypotheses, draw biological conclusions, and plan further experimental procedures. With an increasing number of experiments available for data analysis, the usefulness of BioMiner will still expand.

Given its intuitive usability, BioMiner may also serve as a platform for education, with instructors using the application for hands-on demonstration of high-throughput data analysis.

In the future, additional analytical capabilities of BioMiner will augment the use of this platform in the context of personalized medicine even further. Researchers will be able to submit their patients’ molecular profiles to BioMiner and receive information about the patients’ disease status and optimal course of treatment.

Author Contributions

CB, AG, and JSch developed and implemented the application. CB, KS, AG, KG, CS, MO-G, JS, and JSch drafted the manuscript. All authors read and approved the final manuscript.

supplementary Materials

Supplementary File 1

A tutorial detailing the usage of BioMiner.

Supplementary File 2

BioMiner methods for log-expression values, means and standard deviations, differential analysis with two groups, differential analysis with four groups, multiple testing adjustment and correlation analysis.

Footnotes

Acknowledgments

We thank Sergio Grimbs and Matjaz Hren for helpful discussions.

References

Auffray

, Chen

, Hood

Systems medicine: the future of medical genomics and healthcare. Genome Med. 2009; 1(1): 2.

Godman

, Finlayson

A.E.

, Cheema

P.K.

. Personalizing health care: feasibility and future implications. BMC Med. 2013; 11: 179.

Hamburg

M.A.

, Collins

F.S.

The path to personalized medicine. N Engl J Med. 2010; 363: 4.

Ramasamy

, Mondry

, Holmes

C.C.

, Altman

D.G.

Key issues in conducting a meta-analysis of gene expression microarray datasets. PLoS Med. 2008; 5: e184.

Barrett

, Troup

D.B.

, Wilhite

S.E.

. NCBI GEO: archive for functional genomics data sets–10 years on. Nucleic Acids Res. 2011; 39: D1005–10.

Parkinson

, Sarkans

, Kolesnikov

. ArrayExpress update-an archive of microarray and high-throughput sequencing-based functional genomics experiments. Nucleic Acids Res. 2011; 39: D1002–4.

Vizcaino

J.A.

, Côté

, Reisinger

. A guide to the proteomics identifications database proteomics data repository. Proteomics. 2009; 9: 4276–83.

Leinonen

, Sugawara

, Shumway

The sequence read archive. Nucleic Acids Res. 2011; 39: 19–21.

Barrett

, Troup

D.B.

, Wilhite

S.E.

. NCBI GEO: mining tens of millions of expression profiles - database and tools update. Nucleic Acids Res. 2007; 35: D760–5.

10.

Wang

, Fabregat

, Rios

. PRIDE Inspector: a tool to visualize and validate MS proteomics data. Nat Biotechnol. 2012; 30: 135–7.

11.

Berthold Michael

, Cebron

, Dill

. KNIME: the Konstanz Information Miner. Studies in Classification, Data Analysis, and Knowledge Organization (GfKL 2007). Freiburg, Germany: Springer; 2007.

12.

Hull

, Wolstencroft

, Stevens

. Taverna: a tool for building and running workflows of services. Nucleic Acids Res. 2006; 34: W729–32.

13.

Podpečan

, Lavrač

, Mozetič

. SegMine workflows for semantic microarray data analysis in Orange4 WS. BMC Bioinformatics. 2011; 12: 416.

14.

Goecks

, Nekrutenko

, Taylor

, Galaxy Team. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010; 11: R86.

15.

Gruden

, Hren

, Herman

. A “crossomics” study analysing variability of different components in peripheral blood of healthy caucasoid individuals. PLoS One. 2012; 7: e28761.

16.

Guo

K.T.

, Juerchott

, Fu

. Isolation and characterization of bone marrow-derived progenitor cells from malignant gliomas. Anticancer Res. 2012; 32: 4971–82.

17.

Schichor

, Albrecht

, Korte

. Mesenchymal stem cells and glioma cells form a structural as well as a functional syncytium in vitro. Exp Neurol. 2012; 234: 208–19.

18.

Jürchott

, Guo

K.T.

, Catchpole

. Comparison of metabolite profiles in U87 glioma cells and mesenchymal stem cells. Biosystems. 2011; 105: 130–9.

19.

Schuchhardt

, Glintschert

, Hartl

. BrainProfileDB - a platform for integration of functional genomics data. Proteomics. 2008; 8: 1162–4.

20.

Bauer

, Glintschert

, Schuchhardt

ProfileDB: A resource for proteomics and cross-omics biomarker discovery. Biochim Biophys Acta. 2014; 1844(5): 960–6.

21.

van Iersel

M.P.

, Kelder

, Pico

A.R.

. Presenting and exploring biological pathways with PathVisio. BMC Bioinformatics. 2008; 9: 399.

22.

Kopka

, Schauer

, Krueger

. GMD@CSB.DB: the Golm Metabolome Database. Bioinformatics. 2005; 21: 1635–8.

23.

Wishart

D.S.

, Knox

, Guo

A.C.

. HMDB: a knowledgebase for the human metabolome. Nucleic Acids Res. 2009; 37: D603–10.

24.

Kanehisa

, Goto

, Furumichi

, Tanabe

, Hirakawa

KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res. 2010; 38: D355–60.

25.

Huang

, Sherman

B.T.

, Lempicki

R.A.

Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 2009; 37: 1–13.

26.

Matthews

, Gopinath

, Gillespie

. Reactome knowledgebase of human biological pathways and processes. Nucleic Acids Res. 2009; 37: D619–22.

27.

Pico

A.R.

, Kelder

, Iersel

M.P.

, Hanspers

, Conklin

B.R.

, Evelo

WikiPathways: pathway editing for the people. PLoS Biol. 2008; 6: e184.

28.

Ashburner

, Ball

C.A.

, Blake

J.A.

. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000; 25: 25–9.

29.

Bluthgen

, Brand

, Cajavec

, Swat

, Herzel

, Beule

Biological profiling of gene groups utilizing Gene Ontology. Genome Inform. 2005; 16: 106–15.

30.

Brazma

, Hingamp

, Quackenbush

. Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet. 2001; 29: 365–71.

31.

Taylor

C.F.

, Paton

N.W.

, Lilley

K.S.

. The minimum information about a proteomics experiment (MIAPE). Nat Biotechnol. 2007; 25: 887–93.

32.

Bino

R.J.

, Hall

R.D.

, Fiehn

. Potential of metabolomics as a functional genomics tool. Trends Plant Sci. 2004; 9: 418–25.

33.

Flicek

, Aken

B.L.

, Ballester

. Ensembl's 10th year. Nucleic Acids Res. 2010; 38: D557–62.

34.

Haider

, Ballester

, Smedley

, Zhang

, Rice

, Kasprzyk

BioMart Central Portal – unified access to biological data. Nucleic Acids Res. 2009; 37: W23–7.

35.

Kamburov

, Pentchev

, Galicka

, Wierling

, Lehrach

, Herwig

ConsensusPathDB: toward a more complete picture of cell biology. Nucleic Acids Res. 2011; 39: D712–7.

36.

Sun

, Hui

A.M.

, Su

. Neuronal and glioma-derived stem cell factor induces angiogenesis within the brain. Cancer Cell. 2006; 9: 287–300.

37.

, Jiang

, Zhang

. Elevated serum antibodies against insulin-like growth factor-binding protein-2 allow detecting early-stage cancers: evidences from glioma and colorectal carcinoma studies. Ann Oncol. 2012; 23: 2415–22.

38.

Hsieh

, Hsieh

, Stea

, Ellsworth

IGFBP2 promotes glioma tumor stem cell expansion and survival. Biochem Biophys Res Commun. 2010; 397: 367–72.

39.

Serao

N.V.

, Delfino

K.R.

, Southey

B.R.

, Beever

J.E.

, Rodriguez-Zas

S.L.

Cell cycle and aging, morphogenesis, and response to stimuli genes are individualized biomarkers of glioblastoma progression and survival. BMC Med Genomics. 2011; 4: 49.

40.

B.M.

, Lee

Y.K.

, Ryu

. CHI3 L1 (YKL-40) is expressed in human gliomas and regulates the invasion, growth and survival of glioma cells. Int J Cancer. 2011; 128: 1316–26.

41.

Freije

W.A.

, Castro-Vargas

F.E.

, Fang

. Gene expression profiling of gliomas strongly predicts survival. Cancer Res. 2004; 64: 6503–10.

42.

Ilhan-Mutlu

, Wagner

, Wöhrer

. Plasma MicroRNA-21 concentration may be a useful biomarker in glioblastoma patients. Cancer Invest. 2012; 30: 615–21.

43.

Sabates-Bellver

, Flier

L.G.

, Palo

. Transcriptome profile of human colorectal adenomas. Mol Cancer Res. 2007; 5: 1263–75.

44.

Dong

, Nutt

C.L.

, Betensky

R.A.

. Histology-based expression profiling yields novel prognostic markers in human glioblastoma. J Neuropathol Exp Neurol. 2005; 64: 948–55.

45.

R Development Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2011.

46.

Phillips

H.S.

, Kharbanda

, Chen

. Molecular subclasses of high-grade glioma predict prognosis, delineate a pattern of disease progression, and resemble stages in neurogenesis. Cancer Cell. 2006; 9: 157–73.

47.

Wiedemeyer

, Brennan

, Heffernan

T.P.

. Feedback circuit among INK4 tumor suppressors constrains human glioblastoma development. Cancer Cell. 2008; 13: 355–64.

48.

Tsukamoto

, Ishikawa

, Iida

. Clinical significance of osteoprotegerin expression in human colorectal cancer. Clin Cancer Res. 2011; 17: 2444–50.

49.

Critchley-Thorne

R.J.

, Yan

, Nacu

, Weber

, Holmes

S.P.