Abstract
Directly applying original high-dimensional data as the input for machine learning leads to curse of dimensionality, decline of generalization ability and even misleading conclusion. Feature engineering technique, which can effectively reduce feature size and data dimension, is the core of data mining. However, feature selection has strong interpretability and low computational expense but cannot explore deep information. Feature extraction can capture deep and complex information but has the large computational cost and poor interpretability. To integrate the advantages of two feature engineering techniques, a novel method based on feature subset selection and multi-feature extraction is proposed in this paper. The proposed method first performs feature selection to generate initial feature subsets through an improved binary nutcracker optimization algorithm. Then initial feature subsets with preliminary dimensionality reduction are used for feature extraction through dynamic convolution to generate optimal feature subsets. The method for feature selection is compared as an independent part with five high-performing metaheuristic wrapper-based methods and five widely used filter-based methods. The complete method incorporating dynamic convolution as a feature extraction method is compared with the proposed method without feature extraction, the proposed method without feature selection and six other effective feature dimensionality reduction methods. All these methods are experimentally analyzed and comparatively evaluated on twenty datasets with various sizes. The results demonstrate the superior performance of the proposed method compared to other similar techniques.
Keywords
Get full access to this article
View all access options for this article.
