Sage Journals: Discover world-class research

Abstract

Uncertain data are data accompanied with probability, which makes frequent itemset mining more challenging. This paper focuses on the problem of mining probabilistic maximal frequent itemsets. We redefine the concept of probabilistic maximal frequent itemset to be consistent with the traditional definition and provide a better view on how to devise pruning strategies. A tree-based index called the probabilistic maximal frequent itemset tree is constructed to maintain the probabilistic frequent itemsets. We proposed a depth-first probabilistic maximal frequent itemset mining algorithm to bottom-up generate the exact results, in which support and expected support are used to estimate the range of probabilistic support, enabling the frequency of an itemset to be inferred with less runtime and memory usage. Also, superset pruning is employed to further reduce mining cost. Nevertheless, certain probabilistic supports have to be computed when the minimum support is low, which may result in highly increased mining speed. This problem is addressed in our approximate probabilistic maximal frequent itemset mining method, which uses the expected support to directly compute the probabilistic support. Theoretical analysis and experimental studies demonstrate that our proposed algorithms have high accuracy, expend less computational time and use less memory, and significantly outperform the TODIS-MAX [20] state-of-the-art algorithm.

Keywords

Uncertain database probabilistic frequent itemset data mining probabilistic maximal frequent itemset

Get full access to this article

View all access options for this article.

References

Han

Cheng

Xin

and Yan

, Frequent pattern mining: current status and future directions, in Data Mining and Knowledge Discovery 17 (2007), 55–86.

Aggarwal

C.C.

and Yu

P.S.

, A survey of uncertain data algorithms and applications, in Transaction of Knowledge and Data Mining 21(5) (2009), 609–623.

Bayardo

R.J.

, Efficiently Mining Long Patterns from Databases, in Proceedings of SIGMOD, 1998.

Pasquier

Bastide

Taouil

and Lakhal

, Discovering frequent closed itemsets for association rules, in Proceedings of ICDT, 1999.

Calders

and Goethals

, Mining all non-derivable frequent itemsets, in Proceedings of PKDD, 2002.

Chui

C.K.

Kao

and Hung

, Mining Frequent Itemsets from Uncertain Data, in Proceedings of PAKDD, 2007.

Chui

C.K.

and Kao

, A Decremental Approach for Mining Frequent Itemsets from Uncertain Data, in Proceedings of PAKDD, 2008.

Leung

C.K.S.

Mateo

M.A.F.

and Brajczuk

D.A.

, A tree-based approach for frequent pattern mining from uncertain data, in Advances in Knowledge Discovery and Data Mining, 2008.

Aggarwal

C.C.

Wang

and Wang

, Frequent Pattern Mining with Uncertain Data, in Proceedings of KDD, 2009.

10.

Leung

C.K.S.

and Tanbeer

S.K.

, Fast Tree-based Mining of Frequent Itemsets from Uncertain Data, in Proceedings of DASFAA, 2012.

11.

Leung

C.K.S.

and MacKinnon

R.K.

, BLIMP: A Compact Tree Structure for Uncertain Frequent Pattern Mining, in DaWak, 2014.

12.

Leung

C.K.S.

and Brajczuk

D.A.

, Efficient algorithms for the mining of constrained frequent patterns from uncertain data, in SIGKDD Explorer 11(2) (2009), 123–130.

13.

Calders

Garboni

and Goethals

, Efficient Pattern Mining of Uncertain Data with Sampling, in Proceedings of PAKDD, 2010.

14.

Leung

C.K.S.

and Hao

, Mining of Frequent Itemsets from Streams of Uncertain Data, in Proceedings of ICDE, 2009.

15.

Leung

C.K.S.

and Jiang

, Frequent Pattern Mining from Time-fading Streams of Uncertain Data, in Proceedings of DaWaK, 2011.

16.

Leung

C.K.S.

Cuzzocrea

and Jang

, Discovering frequent patterns from uncertain data streams with time-fading and landmark models, in Transactions on Large-Scale and Knowledge-Centered Systems 7790 (2013), 174–196.

17.

Leung

C.K.S.

and Hayduk

, Mining frequent patterns from uncertain data with MapReduce for big data analytics, in Database Systems for Advanced Applications 7825 (2013), 440–455.

18.

Zhang

and Yi

, Finding Frequent Items in Probabilistic Data, in Proceedings of SIGMOD, 2008.

19.

Bernecker

Kriegel

H.P.

Renz

Verhein

and Zuefle

, Probabilistic Frequent Itemset Mining in Uncertain Databases, in Proceedings of SIGKDD, 2009.

20.

Sun

Cheng

Cheung

D.W.

and Cheng

, Mining Uncertain Data with Probabilistic Guarantees, in Proceedings of KDD, 2010.

21.

Bernecker

Kriegel

H.P.

Renz

Verhein

and Zuefle

, Probabilistic frequent Pattern Growth for Itemset Mining in Uncertain Databases, in Proceedings of SSDM, 2012.

22.

Wang

Cheng

Lee

S.D.

and Cheung

, Accelerating Probabilistic Frequent Itemset Mining: a Model-based Approach, in Proceedings of CIKM, 2010.

23.

Wang

Cheung

Cheng

Lee

S.D.

and Yang

X.S.

, Efficient mining of frequent item sets on large uncertain databases, in Transaction of Knowledge and Data Mining 24(12) (2012), 2170–2183.

24.

Calders

Garboni

and Goethals

. Approximation of Frequentness Probability of Itemsets in Uncertain Data, in Proceedings of ICDM, 2010.

25.

Tong

Chen

Cheng

and Yu

P.S.

, Mining Frequent Itemsets over Uncertain Databases, in Proceedings of VLDB, 2012.

26.

Tang

and Peterson

E.A.

, Mining Probabilistic Frequent Closed itemsets in Uncertain Databases, in Proceedings of ACMSE, 2011.

27.

Peterson

E.A.

and Tang

, Fast approximation of probabilistic frequent closed itemsets, in Proceedings of ACMSE, 2012.

28.

Tong

Chen

and Ding

, Discovering Threshold-based Frequent Closed Itemsets over Probabilistic Data, in Proceedings of ICDE, 2012.

29.

Liu

Chen

and Zhang

, Mining Probabilistic Representative Frequent Patterns From Uncertain Data, in Proceedings of SDM, 2013.

30.

Liu

Chen

and Zhang

, Summarizing Probabilistic Frequent Patterns: A Fast Approach, in Proceedings of KDD, 2013.

31.

Liu

Y.H.

, Mining frequent patterns from univariate uncertain data, in DKE 71(Issue 1) (2012), 47–68.

32.

Pei

Zhao

Chen

Zhou

and Chen

, FARP: mining fuzzy association rules from a probabilistic quantitative database, in Information Sciences 237 (2013), 242–260.

Probabilistic maximal frequent itemset mining methods over uncertain databases

Abstract

Keywords

Get full access to this article

References