Abstract
Massive data sets have become common in many applications making the task of finding an optimum subset of attributes extremely difficult. Traditional feature selection techniques can be very inefficient in high dimensional data, especially when the subset evaluation is obtained through a learning algorithm. We describe a method based on the statistical significance of adding a feature from a ranked-list to the final subset. To measure individual feature, we propose a new simple and fast criterion based on the projections of data set elements onto each attribute.
Get full access to this article
View all access options for this article.
