Abstract
Feature selection is a crucial aspect in classification problems, especially in domains such as text classification, where usually there is a large number of features. Recently, a two-stage feature selection method for text classification which combines class-based and corpus-based feature selection, was introduced. Based on their experiments, the authors conclude what parameter values for both, corpus-based and class-based approaches, allow a feature selection which improves the traditional methods in text classification. In this paper, we revisit this two-stage feature selection method and based on several experiments we come to a different conclusion: the parameters suggested by the original work do not necessarily provide the best results. Based on our experiments, we conclude that by combining the best parameter value for each stage, for the specific corpus under study, the two stage selection method based on coverage policies provides a subset of features which allows to get statistically significant increase over the traditional methods in the success rates of the classifier.
Get full access to this article
View all access options for this article.
