Improving PART algorithm with K-L divergence for imbalanced classification

Abstract

Rule-learning extracts the knowledge from a dataset and represent it in a form that is easy for people to understand. RIPPER (Repeated Incremental Pruning to Produce Error Reduction) and PART (Partial Decision Trees) are two well-known schemes for rule-learning. However, due to overpruning of RIPPER and skew-sensitivity of PART, it is difficult to use two methods to learn from imbalanced datasets. To bypass these difficulties, we propose a K-L divergence-based PART (KLPART) that use K-L divergence as a splitting criterion to build partial decision trees. An experimental framework is carried out with a wide range of imbalanced datasets over RIPPER, PART, KLPART and the combination of these methods for classification with SMOTE processing. The results obtained, which contrasted through nonparametric statistical tests, show that KLPART is robust in the presence of class imbalance, especially when combined with SMOTE. We thereby recommend the use of KLPART with SMOTE when learning from imbalanced datasets.

Keywords

RIPPER PART K-L divergence SMOTE imbalanced datasets

Get full access to this article

View all access options for this article.