Sage Journals: Discover world-class research

Abstract

Bioinformatics analyses frequently yield results in the form of lists of genes sorted by, for example, sequence similarity to a query sequence or degree of differential expression of a gene upon a change of cellular condition. Comparison of such results may depend strongly on the particular scoring system throughout the entire list, although the crucial information resides in which genes are ranked at the top of the list. Here, we propose to reduce the lists to the mere ranking of the genes and to compare only the ranked lists. To this end, we introduce a measure of similarity between ranked lists. Our measure puts particular emphasis on finding the same items near the top of the list, while the genes further down should not have a strong influence. Our approach can be understood as a special version of a two-dimensional Kolmogorov-Smirnov statistic. We present a dynamic programming algorithm for its computation and study the distribution of the similarity values. The performance on simulated and on real biological data is studied in comparison to other available measures. Supplementary Material is available online (www.liebertonline.com/cmb).

Get full access to this article

View all access options for this article.

References

Boulesteix

A.L.

, Slawski

2009. Stability and aggregation of ranked gene lists. Brief Bioinform., 10:556.

Eden

, Lipson

, Yogev

et al. 2007. Discovering motifs in ranked lists of DNA sequences. PLoS Comp. Biol., 3:e39.

Ein-Dor

, Zuk

, Domany

2006. Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc. Natl. Acad. Sci. USA, 103:5923.

Fasano

, Franceschini

1987. A multidimensional version of the Kolmogorov-Smirnov test. Mon. Not. Roy. Astron. Soc., 225:155–170.

Fury

, Batliwalla

, Gregersen

P.K.

et al. 2006. Overlapping probabilities of top ranking gene lists, hypergeometric distribution, and stringency of gene selection criterion. Conf. Proc. IEEE Eng. Med. Biol. Soc.

Hughes

T.R.

, Marton

M.J.

, Jones

A.R.

et al. 2000. Functional discovery via a compendium of expression profiles. Cell, 102:109–126.

Jacso

2005. Visualizing overlap and rank differences among web-wide search engines. Online Inform. Rev., 29:554–560.

Jurman

, Merler

, Barla

et al. 2008. Algebraic stability indicators for ranked lists in molecular profiling. Bioinformatics, 24:258.

Lamb

, Crawford

E.D.

, Peck

et al. 2006. The connectivity map: using gene-expression signatures to connect small molecules, genes, and disease. Science, 313:1929–1935.

10.

Lottaz

, Yang

, Scheid

et al. 2006. OrderedList—a bioconductor package for detecting similarity in ordered gene lists. Bioinformatics, 22:2315.

11.

Peacock

JA.

1983. Two-dimensional goodness-of-fit testing in astronomy. Mon. Not. Roy. Astron. Soc., 202:615–627.

12.

Qiu

, Xiao

, Gordon

et al. 2006. Assessing stability of gene selection in microarray data analysis. BMC Bioinform., 7:50.

13.

Roider

H.G.

, Manke

, O'Keeffe

et al. 2009. PASTAA: identifying transcription factors associated with sets of co-regulated genes. Bioinformatics, 25:435.

14.

A.I.

, Wiltshire

, Batalov

et al. 2004. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl. Acad. Sci. USA, 101:6062.

15.

Subramanian

, Tamayo

, Mootha

V.K.

et al. 2005. Gene set enrichment analysis: knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA, 102:15545–15550.

16.

Wang

E.T.

, Sandberg

, Luo

et al. 2008. Alternative isoform regulation in human tissue transcriptomes. Nature, 456:470–476.

17.

Yang

, Bentink

, Scheid

et al. 2006. Similarities of ordered gene lists. J. Bioinform. Comput. Biol., 4:693.

18.

Zucknick

, Richardson

, Stronach

E.A.

2008. Comparing the characteristics of gene expression profiles derived by univariate and multivariate classification methods. Stat. Appl. Genet. Mol. Biol., 7article 7.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

1.06 MB

R2KS: A Novel Measure for Comparing Gene Expression Based on Ranked Gene Lists

Abstract

Abstract

Get full access to this article

References

Supplementary Material