Abstract
Gene expression profiles have been used for Cancer Classification recently. In this work, the multi-SVM (Support Vector Machine) approach with a novel Gene selection method using Mutual Information (MI) is developed for multi-class classification in the cancer diagnosis area. The mutual information between genes and class label is computed and used for identifying the discriminating genes in each category. All the genes are assigned rank based on their mutual information value and the optimal number of genes with the highest values are chosen and fed into the classifier. The multi-SVM classifier constructs separate classifier for each class and the combined multi-class classifier assigns a tissue sample to the class with the highest support. The performance of the proposed Multiclass Support Vector Machine (mSVM) with Gene Selection using the mutual information approach is evaluated on four benchmark gene expression datasets for cancer diagnosis, namely, the Leukemia dataset, the Lymphoma dataset, the NCI60 dataset and the GCM dataset. The multi-SVM approach develops the most effective classifier in achieving an accurate cancer diagnosis by analyzing gene expression data and it outperforms other popular machine learning algorithm like k-Nearest Neighbor. From the simulation study it is observed that the proposed approach reduces the dimension of the input features by identifying the most discriminating gene subset for each category and improves the predictive accuracy for multi-class cancer.
Keywords
Get full access to this article
View all access options for this article.
