Mining class association rules on imbalanced class datasets

Abstract

The task of discovering sets of good rules from imbalanced class datasets may not come easy for existing class association rule mining algorithms. The reason is that they often generate rules belonging to the dominant classes. For example, in medical applications, some symptoms of illness are not popular, and the doctors are very interested in the rules associated with these symptoms. This paper proposes a novel approach for mining class association rules (CARs) in imbalanced class datasets. Firstly, assuming there are n given classes, the training dataset is split into n corresponding groups. For each group, the data is clustered by the k-means algorithm into k groups where the value of k is equal to the number of records of the smallest group. Secondly, we combine all records from the groups after clustering and use the CAR-Miner-Diff algorithm to mine all CARs. We also propose an iterative method to get a highly accurate classifier. From experiments, we show that the proposed approach outperforms existing algorithms while maintaining a large number of useful rules in the classifier.

Keywords

Class association rules associative classification imbalanced class dataset clustering data mining

Get full access to this article

View all access options for this article.

References

Alwidian

, Hammo

B.H.

and Obeid

WCBA: Weighted classification based on association rules algorithm for breast cancer disease, Applied Soft Computing 62 (2018), 536–549.

Azmi

and Berrado

Class-association rules pruning using regularization, in Proc. of 2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA) (2016), pp. 1–7.

Bechini

, Marcelloni

and Segatori

A MapReduce solution for associative classification of big data, Information Sciences 322 (2016), 33–55.

Breiman

, Friedman

J.H.

, Olshen

R.A.

and Stone

C.J.

Classification and Regression Trees. Wadsworth, Belmont, CA: Republished by CRC Press 1984.

Cao

and Yang

An improved k-medoids clustering algorithm, in Proc of Computer and Automation Engineering (ICCAE), 2010.

Chen

, Wang

, Li

, Wu

and Tian

Principal Association Mining: An efficient classification approach, Knowledge-Based Systems 67 (2014), 16–25.

Coenen

, Leng

and Zhang

Threshold tuning for improved classification association rule mining, in Proc of PAKDD 2005, LNAI 3518, (2005), pp. 216–225.

Hadi

, Issa

and Ishtaiwi

ACPRISM: Associative classification based on PRISM algorithm, Information Sciences 417 (2017), 287–300.

, Lu

, Zhou

and Shi

Integrating classification and association rule mining: A concept lattice framework, in Proc of the International Workshop on New Directions in Rough Sets, Data mining, and Granular-Soft Computing (1999), pp. 443–447.

10.

Kaufman

and Rousseeuw

P.J.

Clustering by Means of Me, 1987.

11.

, Han

and Pei

CMAR: Accurate and efficient classification based on multiple class-association rules, in Prof of 1st IEEE international conference on Data mining (2001), pp. 369–376.

12.

Liu

, Hsu

and Ma

Integrating classification and association rule mining, in Proc of the 4th International Conference on Knowledge Discovery and Data Mining (1998), pp. 80–86.

13.

Liu

Y.Z.

, Jiang

Y.C.

, Liu

and Yang

S.L.

CSMC: A combination strategy for multiclass classification based on multiple association rules, Knowledge-Based Systems 21(8) (2008), 786–793.

14.

Liu

, Ma

and Wong

C.K.

Improving an association rule based classifier, in Proc of the 4th European Conference on Principles of Data Mining and Knowledge Discovery (2000), pp. 80–86.

15.

MacQueen

J.B.

Some methods for classification and analysis of multivariate observations, in Proc of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (1967), pp. 281–297.

16.

R.T.

and Han

CLARANS: A Method for Clustering Objects for Spatial Data Mining, IEEE Transactions on Knowledge and Data Engineering 14(5) (2002), 1003–1016.

17.

Nguyen

L.T.T.

and Nguyen

N.T.

An improved algorithm for mining class association rules using the difference of Obidsets, Expert Systems with Applications 42(9) (2015), 4361–4369.

18.

Nguyen

L.T.T.

and Nguyen

N.T

Updating mined class association rules for record insertion, Applied Intelligence 42(4) (2015), 707–721.

19.

Nguyen

, Nguyen

L.T.T.

, Vo

and Hong

T.P.

A novel method for constrained Class-association rule mining, Information Sciences 320 (2015), 107–125.

20.

Nguyen

L.T.T.

, Nguyen

N.T.

, Vo

and Nguyen

H.S.

Efficient method for updating class association rules in dynamic datasets with record deletion, Applied Intelligence 48(6) (2018), 1491–1505.

21.

Nguyen

, Nguyen

L.T.T.

, Vo

and Pedrycz

Efficient mining of class association rules with the itemset constraint, Knowledge-Based Systems 103 (2016), 73–88.

22.

Nguyen

L.T.T.

, Vo

, Hong

T.P.

and Thanh

H.C.

CAR-Miner: An efficient algorithm for mining class-association rules, Expert Systems with Applications 40(6) (2013), 2305–2311.

23.

Nguyen

L.T.T.

, Vo

and Mai

Thanh-Long Nguyen: A Weighted Approach for Class Association Rules, in Proc of ACIIDS 2018, pp. 213–222.

24.

Qin

, Ma

, Herawan

and Zain

J.M.

ACIIDS, Data Filling Approach of Soft Sets under Incomplete Information 2 2011, pp. 302–311.

25.

Quinlan

J.R.

C4.5: Program for machine learning, Morgan Kaufmann 1992.

26.

Quinlan

J.R.

Introduction of decision tree, Machine Learning 1(1) (1986), 81–106.

27.

Segatori

, Bechini

and Ducange

A distributed fuzzy associative classifier for big data, IEEE Transactions on Cybernetics 48(9), 2656–2669.

28.

Sudipto

, Rajeev

and Kyuseok

CURE: An Efficient Clustering Algorithm for Large Databases, in Proc. of the 1998 ACM SIGMOD international conference on Management of data (1998), pp. 73–84.

29.

Thabtah

F.A.

A review of associative classification mining, Knowledge Engineering Review 22(1) (2007), 37–65.

30.

Thabtah

, Cowling

and Peng

MMAC: A new multi-class, multi-label associative classification approach, in Brighton, UK, Proc of the 4th IEEE International Conference on Data Mining (2004), pp. 217–224.

31.

Tolun

M.R.

and Abu-Soud

S.M.

ILA: An inductive learning algorithm for production rule discovery, Expert Systems with Applications 14(3) (1998), 361–370.

32.

Tolun

M.R.

, Sever

, Uludag

and Abu-Soud

S.M.

ILA-2: An inductive learning algorithm for knowledge discovery, Cybernetics and Systems 30(7) (1999), 609–628.

33.

and Le

A novel classification algorithm based on association rule mining, in Proc of the 2008 Pacific Rim Knowledge Acquisition Workshop (Held with PRICAI’08) (2008), pp. 61–75.

34.

Advances in K-means clustering: A data mining thinking, in Springer Science & Business Media (2012), pp. 17–35.

35.

et al., Top 10 algorithms in data mining, Knowledge and Information Systems 14(1) (2008), 1–37.

36.

C.-H.

and Wang

J.-Y.

Associative classification with a new condenseness measure, Journal of the Chinese Institute of Engineers 38(4) (2015), 458–468.

37.

, Wang

, Pang

and Tian

Maximum margin of twin spheres machine with pinball loss for imbalanced data classification, Applied Intelligence 48(1) (2018), 23–34.

38.

Yin

and Han

CPAR: Classification based on predictive association rules, in SIAM International Conference on Data Mining (SDM’03) (2003), pp. 331–335.

39.

Zhang

, Raakrishman

and Livny

BIRCH: An efficient data clustering method for very large databases, in Proc. of ACM SIGMOD Conference Management of data (1996), pp. 103–114.

40.

Zhang

, Zhao

, Cao

and Zhang

Class association rule mining with multiple imbalanced attributes, in Proc of Australasian Joint Conference on Artificial Intelligence 2007, LNAI 4830, (2007), pp. 827–831.