Abstract
On-line learning systems which use incoming batches of training examples to induce rules for a classification task, such as credit card fraud detection, may have to deal with concept drift whereby some of the underlying class definitions change over time. Identifying drift against a background of noise and maintaining accuracy of the learned rules are challenging tasks.
We propose a methodology for handling these problems based on the assessment of relevance of a time-stamp attribute (TSAR). In place of the time-windowing of examples that tends to be used in current approaches, we employ a new purging mechanism to remove examples that are no longer valid but retain valid examples regardless of age. This allows the example base to grow thus facilitating good classification.
We describe one particular TSAR algorithm, CD3, which utilises ID3 with post pruning. We report on trials that show CD3 can cope very well in a variety of batch-drift scenarios.
Get full access to this article
View all access options for this article.
