IIvotes ensemble for imbalanced data

Abstract

In the paper we present IIvotes – a new framework for constructing an ensemble of classifiers from imbalanced data. IIvotes incorporates the SPIDER method for selective data pre-processing into the adaptive Ivotes ensemble. Such an integration is aimed at improving balance between sensitivity and specificity (evaluated by the G-mean measure) for the minority class in comparison with single classifiers also combined with SPIDER. Using SPIDER to pre-process specific learning samples inside the ensemble improves sensitivity of derived component classifiers. At the same time the controlling mechanism of IIvotes ensures that overall accuracy (and thus specificity) is kept at a reasonable level. The new proposed IIvotes ensemble was thoroughly evaluated in a series of experiments where we tested it with symbolic (decision trees and rules) and non-symbolic (Naive Bayes) component classifiers. The results confirmed that combining SPIDER with an ensemble improved the performance (in terms of the G-mean measures) in comparison to a single classifier with SPIDER for all tested types of classifiers and two SPIDER pre-processing options (weak and strong amplification). These advantages were especially evident for decision trees and rules where differences between single and ensemble classifiers with SPIDER were more significant for both pre-processing options than for Naive Bayes. Moreover, the results demonstrated advantages of using a special abstaining classification strategy inside IIvotes rule ensembles, where component rule-based classifiers may refrain from predicting a class when in doubt. Abstaining rule ensembles performed much better with regard to G-mean than their non-abstaining variants.

Keywords

Imbalanced data ensemble classifiers Ivotes adaptive ensemble SPIDER method informed re-sampling

Get full access to this article

View all access options for this article.