Abstract
Multiple sequence alignment is one of the important research topics in computational biology and is widely used in the field of DNA and protein analysis. On the one hand, when the number and length of the sequences are increased when developing copy-number variant (CNV) and Single Nucleotide Polymorphisms (SNP), the multiple sequence alignment becomes very complicated and difficult; on the other hand, the accuracy of the sequence alignment directly influences the results of DNA or protein analysis. In this paper, a novel algorithm for multiple sequence alignment based on center star alignment and MapReduce framework is proposed. The algorithm adapts improved star align strategy so as to work accurately and makes full use of the specialties of data analysis in MapReduce when assembling center sequence and matching the maximum sub strings of two sequences. Experimental results show that the proposed algorithm has better accuracy than other existing algorithms and can relatively quickly align multiple sequences.
Get full access to this article
View all access options for this article.
