Abstract
Aviation customer churn analysis is a difficult point, which has puzzled over airlines. The difficulties lie in the imbalance of customer churn data distribution and noisy data interference. Although some existing sampling techniques and ensemble models are good at dealing with class imbalance problem, noisy examples in dataset seriously affects the sampling quality and predictive accuracy of classifiers. Therefore, the purpose of our work is to effectively solve the problem of noise interference in imbalanced data classification and improve the effect of the ensemble classifier. In this paper, we propose a novel noise filtering algorithm that combined Tomek-link with distance weighted KNN (TWK), which can effectively filter the noise from both minority and majority class in the imbalanced dataset and prevent relative value samples from being rejected by mistake. We integrate TWK and feature sampling into EasyEnsemble to get a new ensemble model, named FSEE-TWK for short, for customer churn analysis. The introduction of feature sampling to FSEE-TWK accelerate the process of training and avoid model over-fitting. We obtained imbalanced customer data from a major Chinese airline to predict potential churn customers. We use F-Measure and G-Mean to evaluate the performance of the new ensemble model. The experimental results show that the proposed model can effectively improve the classification of datasets and significantly reduce the training time of the model.
Keywords
Get full access to this article
View all access options for this article.
