Metric-Space Indexes as a Basis for Scalable Biological Databases

Abstract

Biochemical databases will be best served by the development of new specialized database management systems whose storage managers are based on metric-space indexing techniques and the development a database query languages that embody semantics derived from biochemical models of similarity and evolution. Important biochemical data types cannot be effectively mapped to low dimensional coordinate systems on which O(log n) indexing methods rely. It is clear from an abundance of bioinformatic discoveries that biochemical data is not random and exhibits interesting structure with respect to clustering. Metric-space indexing exploits a data set's intrinsic clustering to speed the execution of similarity queries, even when the data cannot be mapped to a coordinate system. Database management systems that seamlessly integrate semantically rich query languages with a metric-storage and retrieval mechanism will allow biologists to simply and concisely develop informatic studies that have traditionally been large and labor intensive.

Get full access to this article

View all access options for this article.