Abstract
This paper describes a novel anytime branch-and-bound or best-first threading search algorithm for gapped block protein sequence-structure alignment with general sequence residue pair interactions. The new algorithm (1) returns a good approximate answer quickly, (2) iteratively improves that answer to the global optimum if allowed more time, (3) eventually produces a proof that the final answer found is indeed the global optimum, and (4) always terminates correctly within a bounded number of steps if allowed sufficient space and time. It runs in polynomial space, which is asymptotically dominated by the O(m2ñ2) space required by the lower bound computation. Using previously published data sets and the Bryant-Lawrence (1993) objective function, the algorithm found the true (proven) global optimum in less than 5 min in all search spaces size 1025 or smaller (sequences to 478 residues), and a putative (not guaranteed) optimum in less than 5 hr in all search spaces size 1060 or smaller (sequences to 793 residues, cores to 42 secondary structure segments). The threading in the largest case studied was eventually proven to be globally optimal; the corresponding search speed in that case was the equivalent of 1.5x1056 threadings/sec, a speed-up exceeding 1025 over previously published batch branch-and-bound speeds, and exceeding 1050 over previously published exhaustive search speeds, using the same objective function and threading paradigm. Implementation-independent measures of search efficiency are defined for equivalent branching factor, depth, and probability of success per draw; empirical data on these measures are given. The general approach should apply to other alignment methodologies and search methods that use a divide-and-conquer strategy.
Keywords
Get full access to this article
View all access options for this article.
