Abstract
Abstract
Pattern recognition techniques suffer from a well-known curse, the dimensionality problem. The microarray data classification problem is a classical complex pattern recognition problem. Selecting relevant genes from microarray data poses a formidable challenge to researchers due to the high-dimensionality of features, multiclass categories being involved, and the usually small sample size. The goal of feature (gene) selection is to select those subsets of differentially expressed genes that are potentially relevant for distinguishing the sample classes. In this paper, information gain and chaotic genetic algorithm are proposed for the selection of relevant genes, and a K-nearest neighbor with the leave-one-out crossvalidation method serves as a classifier. The chaotic genetic algorithm is modified by using the chaotic mutation operator to increase the population diversity. The enhanced population diversity expands the GA's search ability. The proposed approach is tested on 10 microarray data sets from the literature. The experimental results show that the proposed method not only effectively reduced the number of gene expression levels, but also achieved lower classification error rates than other methods.
Get full access to this article
View all access options for this article.
