Sage Journals: Discover world-class research

Abstract

In this article, we propose a new method for computing rare maximal exact matches between multiple sequences. A rare match between k sequences S₁, … , S_k is a string that occurs at most t_i-times in the sequence S_i, where the t_i > 0 are user-defined thresholds. First, the suffix tree of one of the sequences (the reference sequence) is built, and then the other sequences are matched separately against this suffix tree. Second, the resulting pairwise exact matches are combined to multiple exact matches. A clever implementation of this method yields a very fast and space efficient program. This program can be applied in several comparative genomics tasks, such as the identification of synteny blocks between whole genomes.

Keywords

alignment algorithms strings suffix trees

Get full access to this article

View all access options for this article.

Space Efficient Computation of Rare Maximal Exact Matches between Multiple Sequences

Abstract

Keywords

Get full access to this article