Abstract
Uncertain data are data accompanied with probability, which makes frequent itemset mining more challenging. This paper focuses on the problem of mining probabilistic maximal frequent itemsets. We redefine the concept of probabilistic maximal frequent itemset to be consistent with the traditional definition and provide a better view on how to devise pruning strategies. A tree-based index called the probabilistic maximal frequent itemset tree is constructed to maintain the probabilistic frequent itemsets. We proposed a depth-first probabilistic maximal frequent itemset mining algorithm to bottom-up generate the exact results, in which support and expected support are used to estimate the range of probabilistic support, enabling the frequency of an itemset to be inferred with less runtime and memory usage. Also, superset pruning is employed to further reduce mining cost. Nevertheless, certain probabilistic supports have to be computed when the minimum support is low, which may result in highly increased mining speed. This problem is addressed in our approximate probabilistic maximal frequent itemset mining method, which uses the expected support to directly compute the probabilistic support. Theoretical analysis and experimental studies demonstrate that our proposed algorithms have high accuracy, expend less computational time and use less memory, and significantly outperform the
Keywords
Get full access to this article
View all access options for this article.
