Efficiently mining high utility sequential patterns in static and streaming data

Abstract

High utility sequential pattern (HUSP) mining has emerged as a novel topic in data mining. Although some preliminary works have been conducted on this topic, they incur the problem of producing a large search space for high utility sequential patterns. In addition, they mainly focus on mining HUSPs instatic databases and do not take streaming data into account, where unbounded data come continuously and often at a high speed. To efficiently deal with both problems, we propose a novel framework for mining high utility sequential patterns over static and streaming databases. In this regard, two efficient data structures named ItemUtilLists (Item Utility Lists) and HUSP-Tree (High Utility Sequential Pattern Tree) are proposed to maintain essential information for mining HUSPs in both offline and online fashions. In addition, a novel utility model called Sequence-Suffix Utility is proposed for effectively pruning the search space inHUSP mining. We propose an algorithm named HUSP-Miner (High Utility Sequential Pattern Miner) to find HUSPs in static databases efficiently. Then, a one-pass algorithm named HUSP-Stream (High Utility Sequential Pattern mining over Data Streams) is proposed to incrementally update ItemUtilLists and HUSP-Tree online and find HUSPs over data streams. To the best of our knowledge, HUSP-Stream is the first method to find HUSPs over data streams. Experimental results on both real and synthetic datasets show thatHUSP-Miner outperforms the compared algorithms substantially in terms of execution time, memory usage and number of generated candidates. The experiments also demonstrate impressive performance of HUSP-Stream to update the data structures and discover HUSPs over data streams.

Keywords

High utility sequential pattern mining data streams sliding window

Get full access to this article

View all access options for this article.

References

Agrawal

and Srikant

, Mining sequential patterns, In Proc. of the Eleventh International Conference on Data Engineering, 1995, 3-14.

Ahmed

C.F.

, Tanbeer

S.K.

and Jeong

, A novel approach for mining high-utility sequential patterns in sequence databases, Electronics and Telecommunications Research Institute Journal 32 (2010), 676-686.

Ahmed

C.F.

, Tanbeer

S.K.

and Jeong

B.-S.

, A framework for mining high utility web access sequences, IETE Technical Review 28 (2011), 3-16.

Ahmed

C.F.

, Tanbeer

S.K.

and Jeong

B.S.

, Interactive mining of high utility patterns over data streams, Expert Systems With Applications 39 (2012), 11979-11991.

Ahmed

C.F.

, Tanbeer

S.K.

, Jeong

B.S.

and Lee

Y.K.

, Efficient tree structures for high-utility pattern mining in incremental databases, IEEE Transactions on Knowledge and Data Engineering 21 (2009), 1708-1721.

Ayres

, Flannick

, Gehrke

and Yiu

, Sequential pattern mining using a bitmap representation, In Proc. of ACM SIGKDD Int'l Conference on Knowledge Discovery and Data Mining, 2002, 429-435.

Rassi

P.P.C.

and Teisseire

, Speed: Mining maximal sequential patterns over data streams, In Proc. of the IEEE Int'l Conference on Intelligent Systems, 2006, 546-552.

Chang

, Wang

, Yang

and Luan

, Seqstream: Mining closed sequential patterns over stream sliding windows, In Proc. of the IEEE International Conference on Data Mining, 2008, 83-92.

Chen

, Wu

and Zhu

, Mining Sequential Patterns across Data Streams, PhD thesis, University of Vermont, 2005.

10.

Chu

, Tseng

V.S.

and Liang

, An efficient algorithm for mining temporal high utility itemsets from data streams, Journal of Systems and Software 81 (2008), 1105-1117.

11.

Fournier-Viger

, Gomariz

, Gueniche

, Soltani

, Wu

and Tseng

V.S.

, Spmf: A java open-source pattern mining library, Journal of Machine Learning Research 15 (2014), 3389-3393.

12.

Fournier-Viger

, Wu

C.-W.

, Zida

and Tseng

V.S.

, FHM: Faster High-Utility Itemset Mining Using Estimated Utility Co-occurrence Pruning, 83-92. Springer International Publishing, Cham, 2014.

13.

Han

, Pei

, Mortazavi-Asl

, Chen

, Dayal

and Hsu

, Freespan: Frequent pattern-projected sequential pattern mining, In Proc.of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2010, 355-359.

14.

, Li

, Kuo

and Lee

, Incremental mining of sequential patterns over a stream sliding window, In Proc. of the ICDM Workshops, 2006, 677-681.

15.

H.F.

, Huang

H.Y.

, Chen

Y.C.

, Liu

Y.J.

and Lee

S.Y.

, Fast and memory efficient mining of high utility itemsets in data streams, In Proc. of the 8th IEEE Int'l Conference on Data Mining, 2008, 881-886.

16.

Liu

and Qu

, Mining high utility itemsets without candidate generation, In Proc. of the 21st ACM international Conference on Information and Knowledge Management, 2012, 55-64.

17.

Liu

, Liao

W.k.

and Choudhary

, A fast high utility itemsets mining algorithm, In Proc. of the 1st International Workshop on Utility-Based Data Mining, 2005, 90-99.

18.

Marascu

and Masseglia

, Mining sequential patterns from temporal streaming data, In Prc. of the ECML/PKDD Workshop on Mining Complex Data, 2005, 355-359.

19.

Pei

, Han

, Mortazavi-Asl

, Wang

, Pinto

, Chen

, Dayal

and Hsu

M.-C.

, Mining sequential patterns by pattern-growth: The prefixspan approach, IEEE Transactions on Knowledge and Data Engineering 16 (2004), 1424-1440.

20.

Pisharath

, Liu

, Ozisikyilmaz

, Narayanan

, Liao

W.K.

, Choudhary

and Memik

, Nu-minebench version 2.0 dataset and technical report, http://cucis.ece.northwestern.edu/projects/dms/minebench.html, 2012.

21.

Shie

, Hsiao

and Tseng

V.S.

, Efficient algorithms for discovering high utility user behavior patterns in mobile commerce environments, Knowledge and Information Systems journal 37 (2013).

22.

Shie

B.E.

, Yu

P.S.

and Tseng

V.S.

, Efficient algorithms for mining maximal high utility itemsets from data streams with different models, Expert Systems with Applications 39 (2012), 12947-12960.

23.

Srikant

and Agrawal

, Mining sequential patterns: Generalizations and performance improvements, In Proc. of the Intelligent Conference on Extending Database Technology: Advances in Database Technology, 1996, 3-17.

24.

Tseng

, Shie

, Wu

C.-W.

and Yu

, Efficient algorithms for mining high utility itemsets from transactional databases, IEEE Transactions on Knowledge and Data Engineering 25 (2013), 1772-1786.

25.

Yen

, Wu

, Lee

, Tseng

V.S.

and Hsieh

, A fast algorithm for mining frequent closed itemsets over stream sliding window, In Proc. of the Int'l Conference on Fuzzy Systems, 2011, 996-1002.

26.

Yin

, Zheng

and Cao

, Uspan: An efficient algorithm for mining high utility sequential patterns, In Proc. of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2012, 660-668.

27.

Zaki

M.J.

, Spade: An efficient algorithm for mining frequent sequences, Machine Learning Journal 42 (2001), 31-60.

28.

Zihayat

and An

, Mining top-k high utility patterns over data streams, Information Sciences 285 (2014), 138-161.

29.

Zihayat

, Wu

C.-W.

, An

and Tseng

V.S.

, Mining high utility sequential patterns from evolving data streams, In Proc. of the ASE BigData & SocialInformatics 2015, 2015, 52:1-52:6.