An Analysis of Probabilistic Approximations for Rule Induction from Incomplete Data Sets

Abstract

The main objective of our research was to test whether the probabilistic approximations should be used in rule induction from incomplete data. For our research we designed experiments using six standard data sets. Four of the data sets were incomplete to begin with and two of the data sets had missing attribute values that were randomly inserted. In the six data sets, we used two interpretations of missing attribute values: lost values and “do not care” conditions. In addition we used three definitions of approximations: singleton, subset and concept. Among 36 combinations of a data set, type of missing attribute values and type of approximation, for five combinations the error rate (the result of ten-fold cross validation) was smaller than for ordinary (lower and upper) approximations; for other four combinations, the error rate was larger than for ordinary approximations. For the remaining 27 combinations, the difference between these error rates was not statistically significant.

Get full access to this article

View all access options for this article.