We present a fast method for calculation of Hamming distance vector based on a simple preprocessing of the target text. For applications on protein sequences, with alphabet of 20 symbols or more, the proposed method is an order of magnitude faster than the brute force approach while much simpler than previously published methods.
Get full access to this article
View all access options for this article.
References
1.
AbrahamsonK.1987. Generalized string matching. SIAM. J. Comput., 16, 1039–1051.
2.
Al-OkailyA.2015. Error tree: A tree structure for Hamming & edit distances & wildcards matching. J. Comput. Biol., 22, 1118–1128.
3.
AmirA., LewensteinM., and PoratE.2004. Faster algorithms for string matching with k mismatches. J. Algorithm., 50, 257–275.
4.
The UniProt Consortium. 2015. UniProt: A hub for protein information. Nucleic Acids Res. 43, D204–D212.