Abstract
Data collection plays an important role in business agility; data can prove valuable and provide insights for important features. However, conventional data collection methods can be costly and time-consuming. This paper proposes a hybrid system R-EDML that combines a sequential feature selection performed by Reinforcement Learning (RL) with the evolutionary feature prioritization of Evolutionary Distance Metric Learning (EDML) in a clustering process. The goal is to reduce the features while maintaining or increasing the accuracy leading to less time complexity and future data collection time and cost reduction. In this method, features represented by the diagonal elements of EDML matrices are prioritized using a differential evolution algorithm. Further, a selection control strategy using RL is learned by sequentially inserting and evaluating the prioritized elements. The outcome offers the best accuracy R-EDML matrix with the least number of elements. Diagonal R-EDML focusing on the diagonal elements is compared with EDML and conventional feature selection. Full Matrix R-EDML focusing on the diagonal and non-diagonal elements is tested and compared with Information-Theoretic Metric Learning. Moreover, R-EDML policy is tested for each EDML generation and across all generations. Results show a significant decrease in the number of features while maintaining or increasing accuracy.
Get full access to this article
View all access options for this article.
