Abstract
The challenge of similarity search in massive DNA sequence databases has inspired major
changes in BLAST-style alignment tools, which accelerate search by inspecting only pairs
of sequences sharing a common short "seed," or pattern of matching residues. Some of
these changes raise the possibility of improving search performance by probing sequence
pairs with several distinct seeds, any one of which is sufficient for a seed match. However,
designing a set of seeds to maximize their combined sensitivity to biologically meaningful
sequence alignments is computationally difficult, even given recent advances in designing
single seeds. This work describes algorithmic improvements to seed design that address the
problem of designing a set of
Get full access to this article
View all access options for this article.
