Gene selection by sequential search wrapper approaches in microarray cancer class prediction

Abstract

In the last years, there has been a large growth in gene expression profiling technologies, which are expected to provide insight into cancer related cellular processes. Machine Learning algorithms, which are extensively applied in many areas of the real world, are not still popular in the Bioinformatics community. We report on the successful application of four well known supervised Machine Learning methods (IB1, Naive-Bayes, C4.5 and CN2) to cancer class prediction problems in three DNA microarray datasets of huge dimensionality (Colon, Leukemia and NCI-60). The essential gene selection process in microarray domains is performed by a sequential search engine, evaluating the goodness of each gene subset by a wrapper approach which executes, by a leave-one-out process, the supervised algorithm to obtain its accuracy estimation. By the use of the gene selection procedure, the accuracy of supervised algorithms is significantly improved and the number of genes of the classification models is notably reduced for all datasets.

Get full access to this article

View all access options for this article.