Frequent similar pattern mining using non Boolean similarity functions

Abstract

There are many problems were the objects under study are described by mixed data (numerical and non numerical features) and similarity functions different from the exact matching are usually employed to compare them. Some algorithms for mining frequent patterns allow the use of Boolean similarity functions different from exact matching. However, they do not allow the use of non Boolean similarity functions. Transforming a non Boolean similarity function into a Boolean one, and then applying the previous algorithms for mining frequent patterns, could lead to loss some patterns, and even more to generate some other patterns which indeed should not be considered as frequent similar patterns. In this paper, we extend the similar frequent pattern mining by allowing the use of non Boolean similarity functions. Several properties for pruning the search space of frequent similar patterns and a data structure that allows computing the frequency of patterns candidates, are proposed. Also, three algorithms for mining frequent patterns using non Boolean similarity functions are proposed. Experimental results show the efficiency and efficacy of the algorithms. The proposed algorithms obtain better patterns for classification than those patterns obtained by traditional frequent pattern miners, and miners using Boolean similarity functions.

Keywords

Data mining frequent patterns similarity functions Mixed data

Get full access to this article

View all access options for this article.

References

Han

, Cheng

, Xin

and Yan

, Frequent pattern mining: Current status and future directions, Data Mining and Knowledge Discovery15 (2007), 55–86.

Fernández

, Gómez

, Lecumberry

, Pardo

and Ramírez

, Pattern recognition in latin america in the big data era, Pattern Recognition48 (2015), 1185–1196.

Chiu

C.Y.

, Yeh

C.T.

and Lee

, Frequent Pattern Based User Behavior Anomaly Detection for Cloud System, In Proceedings of the Conference on Technologies and Applications of Artificial Intelligence (TAAI), Taiwan, 2013.

Fan

, Ye

and Chen

, Malicious sequential pattern mining for automatic malware detection, Expert Systems with Applications52 (2016), 16–25.

Nahar

, Imam

, Tickle

K.S.

and Chen

Y.P.

, Association rule mining to detect factors which contribute to heart disease in males and females, Expert Systems with Applications40 (2013), 1086–1093.

Wen

, Zhong

and Wang

, Activity recognition with weighted frequent patterns mining in smart environments, Expert Systems with Applications42 (2015), 6423–6432.

Kotsiantis

and Kanellopoulos

, Association rules mining: A recent overview, International Transactions on Computer Science and Engineering32 (2006), 71–82.

Hernández-León

, Carrasco-Ochoa

J.A.

, Martínez-Trinidad

J.F.

and Hernández-Palancar

, Classification based on specific rules and inexact coverage, Expert Systems with Applications39 (2012), 11203–11211.

Beil

, Ester

and Xu

, Frequent term-based text clustering, In Proceedings of the 2002 ACM SIGKDD International Conference on Knowledge Discovery in Databases (KDD02), Edmonton, Canada, 2002, pp. 436–442.

10.

Agrawal

, Imielinski

and Swami

, Mining associations between sets of items in massive databases, In Buneman

and Jajodia

(eds), Proceedings of the ACM SIGMOD International Conference on Management of Data, Washington DC, 1993, pp. 207–216.

11.

Yates

R.B.

and Neto

B.R.

, Modern Information Retrieval, Addison-Wesley, New York, 1999.

12.

Zhang

, Wang

Y.J.

, Cui

and Cong

, Semantic similarity based on compact concept ontology, In Proceedings of the 17th International Conference on WorldWideWeb (WWW ’08), ACM, New York, NY, USA, 2008.

13.

Janowicz

and Wilkes

, SIM-DL_A: A Novel Semantic Similarity Measure for Description Logics Reducing Inter-Concept to Inter-Instance Similarity, In Proceedings of the 6th Annual European Semantic Web Conference (ESWC2009), LNCS 5554, Springer Verlag, Berlin, Germany, 2009, pp. 353–367.

14.

Ortiz-Posadas

M.R.

, The Logical Combinatorial Approach Applied to Pattern Recognition in Medicine, New Trends and Advanced Methods in Interdisciplinary Mathematical Sciences, Springer, Cham, 2017, pp. 169–188.

15.

Dánger

, Ruiz-Shulcloper

and Berlanga

, Objectminer: A New Approach for Mining Complex Objects, In Proceedings of the Sixth International Conference on Enterprise Information Systems, Oporto, Portugal, 2004, pp. 42–47.

16.

Rodríguez-González

A.Y.

, Martínez-Trinidad

J.F.

, Carrasco-Ochoa

J.A.

and Ruiz-Shulcloper

, Mining frequent patterns and association rules using similarities, Expert Systems with Applications40 (2013), 6823–6836.

17.

Rodríguez-González

A.Y.

, Martínez-Trinidad

J.F.

, Carrasco-Ochoa

J.A.

and Ruiz-Shulcloper

, RP-Miner: A relaxed prune algorithm for frequent similar pattern mining, Knowledge and Information System27 (2011), 451–471.

18.

Rodríguez-González

A.Y.

, Lezama

, Iglesias-Alvarez

C.A.

, Martínez-Trinidad

J.F.

, Carrasco-Ochoa

J.A.

and Muños de Cote

, Closed frequent similar pattern mining: Reducing the number of frequent similar patterns without information loss, Expert Systems with Applications96 (2018), 271–283.

19.

Rodríguez-González

A.Y.

, Martínez-Trinidad

J.F.

, Carrasco-Ochoa

J.A.

and Ruiz-Shulcloper

, Using Non Boolean similarity Functions for Frequent Similar Pattern Mining, In Proceedings of the 23th Canadian Conference on Artificial Intelligence 2010 (AI 2010), LNCS 6085, Springer Verlag, Berlin, Germany, 2010, pp. 374–378.

20.

Agrawal

and Srikant

, Fast Algorithms for Mining Association Rules in Large Databases, In Proceedings of 20th International Conference on Very Large Data Bases, Morgan Kaufmann, Santiago de Chile, Chile, 1994, pp. 487–499.