Mining maximal frequent itemsets from data streams

Abstract

Frequent pattern mining from data streams is an active research topic in data mining. Existing research efforts often rely on a two-phase framework to discover frequent patterns: (1) using internal data structures to store meta-patterns obtained by scanning the stream data; and (2) re-mining the meta-patterns to finalize and output frequent patterns. The defectiveness of such a two-phase framework lies in the fact that the two stages provide barriers to dynamically and immediately finding frequent patterns with online functionalities. It is expected that a single-phase algorithm can fulfil frequent pattern mining from data streams in such a way that the users can see patterns in an immediate and dynamic manner, as soon as the patterns have become frequent. In this paper, we propose INSTANT, a single-phase algorithm for discovering frequent itemsets from data streams. The theoretical foundation of INSTANT is based on a framework theory on a set of itemsets, which is also presented in the paper. The novel design of INSTANT ensures that it employs compact data structures to mine frequent patterns from data streams in a single phase. Our experimental results demonstrate the time and space efficiency of the proposed algorithm.

Keywords

data mining data stream frequent itemset set of itemsets

Get full access to this article

View all access options for this article.

References

R. Agrawal and R. Srikant , Fast algorithms for mining association rules. In: Proceedings. of the 20th International Conference on very Large Databases (VLDB'94) (Santiago, Chile, 1994) 487—499.

B. Babcock , S. Babu , R. Datar. R. Motwani and J. Widom , Models and issues in data stream systems. In: Proceedings of SIGMOD/PODS (Madison, WI, 2002) 1—16.

G. Dong , J. Han , L. Lakshmanan , J. Pei , H. Wang and P. Yu , Online mining of changes from data streams: research problems and preliminary results. In: Proceedings of the 2003 Workshop on Management and Processing of Data Streams (MPDS 2003) ( San Diego, CA, 2003).

J. Hsu , Data mining trends and developments: the key data mining technologies and applications for the 21st century. In: D Colton , M J Payne , N Bhatnagar and C R Woratschek (eds), The Proceedings of ISECON 2002, v 19 (San Antonio): 224b. AITP Foundation for Information Technology Education (2002 ).

R. Agrawal , T. Imielinski and A. Swami , Mining association rules between sets of items in large databases . In: Proceedings of the ACM SIGMOD International Conference on Management of Data (ACM Press, Washington, DC, 1993) 207—16.

N. Pasquier , Y. Bastide , R. Taouil and L. Lakhal , Discovering frequent closed itemsets for association rules. In: Proceedings of the 7th International Conference on Database Theory (Springer-Verlag , Jerusalem 1999) 398—416.

J. Pei , J. Han and R. Mao , CLOSET: an efficient algorithm for mining frequent closed itemsets. In: SIGMOD'00 (ACM Press, Dallas, TX, 2000) 21—30.

M.J. Zaki and C.J. Hsiao , CHARM: an efficient algorithm for closed itemset mining. In: SDM' 02 (2000) 12—28.

J. Han , J. Pei and Y. Yin , Mining frequent patterns without candidate generation. In: (SIGMOD' 00ACM Press, Dallas, TX , 2000) 1—12.

10.

W. Teng , M. Chen and P. Yu , A regression-based temporal pattern mining scheme for data streams . In: Proceedings of the 29th VLDB Conference ( Berlin, Germany, 2003) 93—104.

11.

H. Li , S. Lee and M. Shan , An efficient algorithm for mining frequent itemsets over the entire history of data streams. In: 1st International Workshop on Knowledge Discovery in Data Streams (Pisa, Italy, 2004) 20—24.

12.

S. Zhang and X. Wu , Large scale data mining based on data partitioning, Applied Artificial Intelligence, 15(2) (2001) 129—39.

13.

Y. Chi , H. Wang , P. Yu and R. Muntz , Moment: maintaining closed frequent itemsets over a stream sliding window . In: Rajeev Rastogi et al. (eds) Proceedings of 4th IEEE International Conference on Data Mining (Brighton, UK, 2004) 59—66.

14.

G.S. Manku and R. Motwani , Approximate frequency counts over data streams. In: Proceedings of the 28th VLDB Conference (Hong Kong, China, 2002).

15.

C. Giannella , J. Han , E. Robertson and C. Liu , Mining frequent itemsets over arbitrary time intervals in data streams. In: Technical Report TR587 (Indiana University, IN 2003).

16.

S. Zhang and C. Zhang , Encoding probability propagation in belief networks IEEE Transactions on Systems, Man and Cybernetics (Part A) 32(4) (2002) 526—31.

17.

J. Chang and W. Lee , Finding recent frequent itemsets adaptively over online data streams . In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2003) ( Washington, DC, 2003) 226—35.

18.

T. Asai , H. Arimura , K. Abe , S. Kawasoe and S. Arikawa , Online algorithms for mining semi-structured data streams. In: Proceedings of the 2002 International Conference on Data Mining (ICDM'02) (Maebashi City, Japan, 2002 ) 27—36.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.05 MB