Sage Journals: Discover world-class research

Abstract

This paper is an extension of a deterministic algorithm, [1, 2], that was initially designed to measure the rate of similarity between DNA sequences, and any sequences made up with symbols of alphabets of cardinality 4. Here, a modified and extended version to handle sequences of symbols from alphabets of cardinality > 4 is presented. This extension opens up its application area. As a test ground, we search for peptides within a protein database. Computational results on real data and a comparison with BLAST will be discussed.

Keywords

BLAST Deterministic Algorithm Alpha-Numeric Sequence Numeration System Database

References

Kheniche

Harrison

Dowden

J.M.

, and Salhi

, (2008), “A Deterministic Algorithm for DNA Sequence Comparison”, Proc. BIOCOMP'08, Arabnia

H.R.

Yang

M.Q.

, and Kyng

(Editors), Vol.II, 848:854.

Kheniche

Salhi

Harrison

, and Dowden

, (2010), “A Deterministic DNA Database Search”, In Advances in Computational Biology, Chapter 42, pp. 371–378, Arabnia

H. R.

Ed., Springer.

Camacho

H. H.

, and Salhi

, (2006), “A Redundancy Detection Approach to Mining Bioinformatics Data, in Computer Aided Methods in Optimal Design and Operations, Bogle

I.D.L.

and Zilinskas

editors, Vol. 7, pp.89–98.

Camacho

, and Salhi

, “A String Metric Based on a One-to-One Greedy Matching Algorithm, (2006), in Research in Computing Science: Advances in Computer Science and Engineering, Gelbukh

Torres

, and Lopez

, editors, vol. 19, p.171–182.

Orengo

C.A.

, and Jones

D.T.

, (2003), “Bioinformatics: Genes, Proteins & computers”, BIOS Scientific Publishers Ltd.

Xiong

, (2006), “Essential Bioinformatics”, Cambridge University Press.

Ewens

W. J.

Gregory

, and Grant

C.A.

, (2005),“Statistical Methods in Bioinformatics”, Springer, 2nd Ed.

Baxevanis

A.D.

and Ouellette

B.F.

, (2005),“Bioinformatics; A practical guide to the analysis of genes and proteins”, John Wiley & Sons, Inc., 3rd ED.BIOS Scientific Publishers Ltd.

Birkhoff

, and MacLane

, (1977),“A Survey of Modern Algebra”, Macmillan Publishing Co., 4th Ed.

10.

Westhead

D.R.

Parish

J.H.

and Twyman

R.M.

, (2002), “Bioinformatics”, BIOS Scientific Publishers Ltd.

11.

Krawetz

S.A.

, and Womble

D.D.

, (2003), “Introduction to Bioinformatics: Genes, A Theoretical and Practical Approach”, Humana Press Inc.

12.

http://blast.ncbi.nlm.nih.gov/blastoverview.shtml

13.

http://www.ebi.ac.uk/fasta/.

14.

http://www.clcbio.com/index.php?id=995.

15.

http://www.mathworks.com/.

16.

http://bioinfolab.unl.edu/emlab/documents/blastreadme/README.bls.html.

17.

Altschul

S.F.

Madden

T.L.

Schäffer

A.A.

Zhang

Miller

, and Lipman

D.J.

, (1997), “Gapped BLAST and PSI-BLAST: A new generation of protein database search programs”, Nucleic Acids Research, 25:3389–3402.

18.

Altschul

S.F.

Gish

Miller

Myers

E.W.

, and Lipman

D.J.

, (1990), “Basic local alignment search tool.” J. Mol. Biol. 215:403–410.

19.

Nesvizhkii

A.I.

and Aebersold

, (2005),“Interpretation of Shotgun Proteomics Data: The protein inference problem”, Molecular and Cellular Proteomics, 4, 1419–1440.

20.

Domon

and Aebersold

, (2006), “Mass Spectrometry and Protein Analysis”, Science 312 (5771): 212–217.

21.

Lam

and Aebersold

, (2010), “Using Spectral Libraries for Peptide Identification from Tandem Mass Spectrometry (MS/MS) Data”, http://www.currentprotocols.com/protocol/ps2505.

A Deterministic Algorithm for Alpha-Numeric Sequence Comparison with Application to Protein Sequence Detection

Abstract

Keywords

References