Abstract
Background: There are many computational approaches to predict the protein–protein interactions using support vector machines (SVMs) with high performance. In fact, performance of currently reported methods are significantly over-estimated and affected by the object repetitiveness in the datasets used.
Objective: To study the effect of object repetitiveness of datasets on predicting results.
Method: We present novel methods to construct different positive datasets with or without repeating proteins using graph maximum matching in the protein–protein interaction datasets and corresponding series of negative datasets with different proteins repetitiveness are constructed using graph adjacency matrix. The relationship between the SVM prediction results and the repeated proteins (repeat numbers and repeat rates) and the distributions of repeated proteins in the datasets are analyzed.
Results: Protein repetitiveness of positive and negative datasets can affect the prediction result: high protein repetitiveness of positive or negative datasets yield high performance prediction result.
Conclusion: This indicate that dealing with object repetitiveness of datasets is a key issue in protein–protein interactions prediction using SVMs since real world data contain certain degrees of repeat proteins.
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
0.00 MB
0.03 MB
0.63 MB
1.27 MB
1.25 MB
0.23 MB
0.18 MB
0.02 MB
0.02 MB