Abstract
Abstract
Overfitting the training data is a major problem in machine learning, particularly when noise is present. Overfitting increases learning time and reduces both the accuracy and the comprehensibility of the generated rules, making learning from large data sets more difficult. Pruning is a technique widely used for addressing such problems and consequently forms an essential component of practical learning algorithms. An important class of pruning techniques is that based on the minimum description length (MDL) principle. This paper presents three new techniques using the MDL principle for pruning rule sets. An important advantage of these techniques is that all of the training data can be used for both inducing and evaluating rule sets. The performance of the techniques are evaluated using three criteria: classification accuracy, rule set complexity, and execution time. This shows that the new techniques, when incorporated into a rule induction algorithm, are more efficient and lead to accurate rule sets that are significantly smaller in size compared with the case before pruning.
Keywords
Get full access to this article
View all access options for this article.
