Abstract
Feature selection is an essential part in the data preprocessing. In the text classification, most of the previous feature selection algorithms rarely consider the redundancy between features. This paper focuses on eliminating redundancy. After modifying the formula of feature correlation of original fast correlation-based filter (FCBF) and updating the algorithm strategy, we propose a new approach named improved feature size customized fast correlation-based filter (IFSC-FCBF). In addition, we combine IFSC-FCBF with Naive Bayes (NB) classifier for text classification, and test it in four typical text corpus data sets. The results demonstrate that with the same feature size, IFSC-FCBF method has the advantages of higher accuracy and shorter running time than other methods.
Get full access to this article
View all access options for this article.
