A heuristic for learning decision trees and pruning them into classification rules

Abstract

Let us consider a set of training examples described by continuous or symbolic attributes with categorical classes. In this paper we present a measure of the potential quality of a region of the attribute space to be represented as a rule condition to classify unseen cases. The aim is to take into account the distribution of the classes of the examples. The resulting measure, called impurity level, is inspired by a similar measure used in the instance‐based algorithm IB3 for selecting suitable paradigmatic exemplars that will classify, in a nearest‐neighbor context, future cases. The features of the impurity level are illustrated using a version of Quinlan's well‐known C4.5 where the information‐based heuristics are replaced by our measure. The experiments carried out to test the proposals indicate a very high accuracy reached with sets of classification rules as small as those found by RIPPER.

Keywords

Machine learning classification rules pruning decision trees impurity level

Get full access to this article

View all access options for this article.