New heuristics in feature selection for high dimensional data

Abstract

Massive data sets have become common in many applications making the task of finding an optimum subset of attributes extremely difficult. Traditional feature selection techniques can be very inefficient in high dimensional data, especially when the subset evaluation is obtained through a learning algorithm. We describe a method based on the statistical significance of adding a feature from a ranked-list to the final subset. To measure individual feature, we propose a new simple and fast criterion based on the projections of data set elements onto each attribute.

Keywords

Machine learning preprocessing feature ranking feature selection

Get full access to this article

View all access options for this article.