Abstract
This paper presents a technique that improves the accuracy of classification models by enhancing the quality of training data. The idea is to eliminate instances that are likely to be noisy, and train classification models on “clean” data. Our approach uses 25 different classification techniques to create an ensemble classifier to filter noise. Using a relatively large number of base-level classifiers in the ensemble filter helps achieve different levels of desired noise removal conservativeness with several possible levels of filtering. It also provides a high degree of confidence in the noise elimination procedure as the results are less likely to get influenced by (possible) inappropriate learning bias of a few algorithms with 25 base-level classifiers than with a relatively smaller number of base-level classifiers. An empirical case study with software measurement data of a high assurance software project demonstrates the effectiveness of our noise elimination approach in improving classification accuracies. The similarities among predictions from the 25 classifiers are also investigated, and preliminary results suggest that the 25 classifiers may be effectively reduced to 13.
Get full access to this article
View all access options for this article.
