Abstract
Abstract
Searching for similar structures from a three-dimensional (3-D) structure database of proteins is one of the most important problems in post-genomic computational biology. To compare two structures, we ordinarily use a measure called the root mean square deviation (RMSD) as the similarity measure. We consider a very fundamental problem of finding all the substructures whose RMSDs to the query are within some given threshold, from a 3-D structure database. The problem also appears in many other fields, such as computer vision and robotics. In this article, we propose the first algorithm that runs in faster than linear time on average. Our new algorithm runs in average-case O(m + N/m1−ε), where N is the database size, m is the query length, and ε is an arbitrary small constant such that 0 < ε < 1. It is a significant improvement over previous algorithms on the problem, considering that the best known worst-case time complexity of the problem is O(N log m), and the best known average-case (expected) time complexity of the problem was O(N).
Get full access to this article
View all access options for this article.
