Abstract
Let us consider a set of training examples described by continuous or symbolic attributes with categorical classes. In this paper we present a measure of the potential quality of a region of the attribute space to be represented as a rule condition to classify unseen cases. The aim is to take into account the distribution of the classes of the examples. The resulting measure, called impurity level, is inspired by a similar measure used in the instance‐based algorithm IB3 for selecting suitable paradigmatic exemplars that will classify, in a nearest‐neighbor context, future cases. The features of the impurity level are illustrated using a version of Quinlan's well‐known C4.5 where the information‐based heuristics are replaced by our measure. The experiments carried out to test the proposals indicate a very high accuracy reached with sets of classification rules as small as those found by RIPPER.
Get full access to this article
View all access options for this article.
