Abstract
Protein mass spectrometry provides a powerful tool for detecting and identifying proteins. Several database searching algorithms may be used for this purpose. However, most of them depend on the heuristic approaches and the use of probability-based or statistical approach was very restrictive in the current algorithms. In this study, we present a statistical modelling of scores based on a generalized linear mixed model and provide a feasible computation method using penalized generalized weighted least squares. This model incorporates the dependency among matches into a new statistical scoring function, and uses the beta-binomial distribution to derive the score. Based on simulation experiments and analysis using real examples, we have improved protein searching performance and provided feasible computation procedures to deal with very large datasets. In particular, our methods may significantly increase accuracy in identifying medium and small proteins.
Get full access to this article
View all access options for this article.
