Abstract
Classifier accuracy is extremely important and can be improved by increasing the size of the training data set. However, in experimental studies it might be very costly to survey cases; therefore, limiting sample size to a minimum is essential. Sometimes very large data sets might not contain enough information, and additional computer resources do not improve accuracy. Stopping at the optimal iteration results in the minimum amount of observations being used, possibly saving computational time and sampling costs.
For this reason, a sequential method of training classifiers can be of use. This paper proposes a sequential method that seeks to sample the minimum number of observations necessary to train a classifier to estimate the feasible minimum rate of misclassification, the Bayes error.
Using SAS/IML
Get full access to this article
View all access options for this article.
