Abstract
The object of Software Defect Prediction (SDP) is to identify modules that are prone to defect. This is achieved by training prediction models with datasets obtained by mining software historical depositories. When one acquires data through this approach, it often includes class imbalance which has an unequal class representation among their example. We hypothesize that the imbalance learning is not a problem in itself and decrease in performance is also influenced by other factors related to class distribution in the data. One of these is the existence of noisy and borderline examples. Thus, the objective of our research is to propose a novel preprocessing method using Synthetic Minority Over-Sampling Technique (SMOTE), Fuzzy-rough Instance Selection type II (FRIS-II) and Iterative Noise Filter based on the Fusion of Classifiers (INFFC) which can overcome these problems. The experimental results show that the new proposal significantly outperformed all the methods compared in this study.
Get full access to this article
View all access options for this article.
