Multivariate approach for protein identification based on mass spectrometric data

Abstract

Protein mass spectrometry provides a powerful tool for detecting and identifying proteins. Several database searching algorithms may be used for this purpose. However, most of them depend on the heuristic approaches and the use of probability-based or statistical approach was very restrictive in the current algorithms. In this study, we present a statistical modelling of scores based on a generalized linear mixed model and provide a feasible computation method using penalized generalized weighted least squares. This model incorporates the dependency among matches into a new statistical scoring function, and uses the beta-binomial distribution to derive the score. Based on simulation experiments and analysis using real examples, we have improved protein searching performance and provided feasible computation procedures to deal with very large datasets. In particular, our methods may significantly increase accuracy in identifying medium and small proteins.

Keywords

protein identification mass spectrometry generalized linear mixed model two-part model

Get full access to this article

View all access options for this article.

References

Aebersold

Mann

. Mass spectrometry-based proteomics. Nature 2003; 422: 198–207.

Mann

Hojrup

Roepstorff

. Use of mass spectrometric molecular weight information to identify proteins in sequence databases. Biol Mass Spectrom 1993; 22: 338–345.

Yates

III JR

Speicher

Griffin

Hunkapiller

. Peptide mass maps: a highly informative approach to protein identification. Anal Biochem 1993; 214: 397–408.

Zhang

Chait

. ProFound: an expert system for protein identification using mass spectrometric peptide mapping information. Anal Chem 2000; 72: 2482–2489.

Chamrad

Koerting

Gobom

Thiele

Klose

Meyer

. Interpretation of mass spectrometry data for high-throughput proteomics. Anal Bioanal Chem 2003; 376: 1014–1022.

McHugh

Arthur

. Computational methods for protein identification from mass spectrometry data. PLoS Comput Bio 2008; 2: e2–e2.

Zhang

Riley

Buck

. Current status of computational approaches for protein identification using tandem mass spectra. Curr Proteomics 2007; 4: 121–130.

Fitzmaurice

Laird

Ware

. Applied longitudinal analysis, New York: Wiley, 2004.

Palzkill

. Proteomics, Boston: Kluwer, 2001.

10.

Pappin

Hojrup

Bleasby

. Rapid identification of proteins by peptide-mass fingerprinting. Curr Biol 1993; 3: 327–332.

11.

Perkins

Pappin

Creasy

Cottrell

. Probability-based protein identification by searching sequence databases using mass spectrometry data. J S Electrophoresis 1999; 20: 3551–3567.

12.

Magnin

Masselot

Menzel

Cloinge

. OLAV-PMF: A novel socring scheme for high-throughput peptide mass fingerprinting. J Prot Res 2004; 3: 55–60.

13.

Eriksson

Fenyo

. Probity: A protein identification algorithm with accurate assignment of the statistical significance of the results. J Prot Res 2004; 3: 32–36.

14.

Jiang

. Consistent estimators in generalized linear mixed models. J Am Statist Assoc 1998; 93: 273–282.

15.

Jiang

. Conditional inference about generalized linear mixed models. Ann Statist 1999; 27: 1974–2007.

16.

Jiang

Zhang

. Robust estimation in generalized linear mixed models. Biometrika 2001; 88: 753–765.

17.

Clauser

Baker

Burlingame

. Role of accurate mass measurement (+/− 10 ppm) in protein identification strategies employing MS or MS/MS and database searching. Anal Chem 1999; 71: 2871–2882.

18.

Tiengo

Barbarini

Troiani

Rusconi

Magni

. A Perl procedure for protein identification by Peptie Mass Fingerprinting. BMC Bioinfo 2009; 10: S11–S11.

19.

Strahler JR, Veine D, Walker A, Kachman M, Ulintz P, Falkner J. A publicly available dataset of MALDITOF/TOF mass spectra of known proteins. Proceedings of 53rd American Society for Mass Spectrometry Conference on Mass Spectrometry and Allied Topics, 2005.

20.

Salmi

Nyyman

Nevalainen

Aittokallio

. Filtering strategies for improving protein identification in high-throughput MS/MS studies. Proteomics 2009; 9: 848–860.

21.

Colinge

Masselot

Giron

Dessingy

Magnin

. OLAV: toward high-throughput tandem mass spectrometry data identification. Proteomics 2003; 3: 1454–1463.