Support Vector Machines in HTS Data Mining: Type I MetAPs Inhibition Study

Abstract

This article reports a successful application of support vector machines (SVMs) in mining high-throughput screening (HTS) data of a type I methionine aminopeptidases (MetAPs) inhibition study. A library with 43,736 small organic molecules was used in the study, and 1355 compounds in the library with 40% or higher inhibition activity were considered as active. The data set was randomly split into a training set and a test set (3:1 ratio). The authors were able to rank compounds in the test set using their decision values predicted by SVM models that were built on the training set. They defined a novel score PT₅₀, the percentage of the test set needed to be screened to recover 50% of the actives, to measure the performance of the models. With carefully selected parameters, SVM models increased the hit rates significantly, and 50% of the active compounds could be recovered by screening just 7% of the test set. The authors found that the size of the training set played a significant role in the performance of the models. A training set with 10,000 member compounds is likely the minimum size required to build a model with reasonable predictive power.

Keywords

support vector machines high-throughput screening MetAP machine learning

References

Schwardt O , Kolb H , Ernst B : Drug discovery today. Curr Top Med Chem 2003;3:1-9.

Erhardt PW : Medicinal chemistry in the new millennium: a glance into the future. Pure Appl Chem 2002;74:703-785.

Young SS , Ekins S , Lambert CG : So many targets, so many compounds, but so few resources. Curr Drug Discov 2002;December:17-22.

Lipinski CA , Lombardo F , Dominy BW , Feeney PJ : Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Delivery Rev 1997;23:3-25.

Bocker A , Schneider G , Teckentrup A : Status of HTS mining approaches. QSAR Comb Sci 2004;23:207-213.

Winkler DA : Neural networks as robust tools in drug lead discovery and development. Mol Biotech 2004;27:139-167.

Vapnik VN : Statistical Learning Theory. New York: John Wiley, 1998.

Liu HX , Zhang RS , Yao XJ , Liu MC , Hu ZD , Fan BTJ : QSAR study of ethyl 2-[(3-Methyl-2,5-dioxo(3-pyrrolinyl))amino]-4-(trifluoromethyl) pyrimidine-5-carboxylate: an inhibitor of AP-1 and NF-kB mediated gene expression based on support vector machines. J Chem Inf Comput Sci 2003;43:1288-1296.

Trotter MW , Buxton BF , Holden SB : Support vector machine in combinatorial chemistry [Online]. Retrieved from http://www.cs.ucl.ac.uk/research/rocket/private/papers/mc_paper-mt-bb-sh.doc

10.

Burbidge R , Trotter M , Buxton B , Holden S : Drug design by machine learning: support vector machines for pharmaceutical data analysis. Compu Chem 2001;26:5-14.

11.

Byvatov E , Fechner U , Sadowski J , Schneider G : Comparison of support vector machine and artificial neural network systems for drug/nondrug classification. J Chem Inf Comput Sci 2003;43:1882-1889.

12.

Muller KR , Ratsch G , Sonnerburg S , Mika S , Grimm M , Heinrich N : Classifying ‘drug-likeness’ with kernel-based learning methods. J Chem Inf Model 2005;45:249-253.

13.

Wilton D , Willett P : Comparison of ranking methods for virtual screening in lead-discovery programs. J Chem Inf Comput Sci 2003;43:469-474.

14.

Ye QZ , Xie SS , Huang M , Huang WJ , Lu JP , Ma ZQ : Metalloform-selective inhibitors of Escherichia coli methionine aminopeptidase and x-ray structure of a Mn(II)-form enzyme complexed with an inhibitor. J Am Chem Soc 2004;126:13940-13941.

15.

Guyon I , Elisseeff A : An introduction to variable and feature selection. JMachine Learning Res 2003;3:1157-1182.

16.

Yang Y , Pederson JO : A comparative study on feature selection in text categorization. In Fisher DH (ed): Proceedings of the ICML-97: 14th International Conference on Machine Leaning. San Francisco: Morgan Kaufmann, 1997:412-420.

17.

Rogati M , Yang Y : High-performing feature selection for text classification. In Proceedings of the 11th International Conference on Information and Knowledge Management. New York: ACM Press, 2002:659-661.

18.

Brank J , Grobelnik M , Milic-Frayling N , Mladenic D : Interaction of feature selection methods and linear classification models. Proceedings of the ICML-02 Workshop on Text Learning, Sydney, Australia, 2002.

19.

Taira H , Haruno M : Feature selection in SVM text categorization. In Proceedings of the Sixteenth National Conference on Artificial Intelligence and the Eleventh Innovative Applications of Artificial Intelligence Conference.Menlo Park, CA: American Association for Artificial Intelligence, 1999:480-486.

20.

Burges CJC : A tutorial on support vector machines for pattern recognition. Data Mining Knowledge Discov 1998;2:121-167.

21.

Mercer J : Function of positive and negative type and their connection with the theory of integral equations. Philos Trans Roy Soc London 1909;A209:415-446.

22.

Chang CC , Lin CJ : LIBSVM: a library for support vector machines [Online]. Retrieved from http://www.csie.ntu.edu.tw/%7Ecjlin/libsvm

23.

Matthews BW : Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta 1975;405:442-451.