A novel approach to identify anomalies using rough sets

Abstract

Most datasets contain objects whose attributes differ significantly as compared to those of other objects in the same dataset. Although initially disregarded as noise such objects are now defined as outliers and detecting them can be beneficial for applications such as detecting fraudulent financial transactions, or intruders. A major challenge to detect these outliers is the dimensionality and vastness of data. Rough sets can be used to clearly define the objects that need not be considered as outliers. As a result, all objects do not need to be processed while applying the outlier detection algorithm. This paper exploits a new methodology for detecting outliers using rough sets. This methodology has high potential since outliers have a low probability of being in the boundary region defined by the intersection of the lower and upper approximation, as compared to the lower approximation. This use of rough sets can be used to significantly reduce the computation time of existing algorithms.

Keywords

noise intruders rough sets outliers approximation

Get full access to this article

View all access options for this article.

References

Cadez

Smyth

. Probabilistic clustering using hierarchical models.

Ester

Kriegel

Sander

, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD. San Francisco, CA: Morgan Kaufmann Publishers Inc, 1996, pp.226–231.

Hawkins

. Identification of outliers. London: Chapman and Hall, 1980.

Deng

. Discovering cluster-based local outliers. Pattern Recognit Lett 2003; 24: 1641–1650.

Kantardzic

. Data mining: concepts, models, methods, and algorithms. Hoboken, New Jersey: John Wiley & Sons, 2011.

Duan

Liu

, et al. Cluster-based outlier detection. Ann Oper Res 2009; 168: 151–168.

Dong

Xie

. Color clustering and learning for image segmentation based on neural networks. IEEE Trans Neural Netw 2005; 16: 925–936.

Zhang

Cao

, et al. Parallel processing systems for big data: a survey. Proc IEEE 2016; 104: 2114–2136.

Almeida

Barbosa

Pais

, et al. Improving hierarchical cluster analysis: a new method with outlier detection and automatic clustering. Chemom Intell Lab Syst 2007; 87: 208–217.

10.

Ganji

Mannem

. Credit card fraud detection using anti-k nearest neighbor algorithm. Int J Comput Sci Eng 2012; 4: 1035–1039.

11.

Lühr

Lazarescu

. Incremental clustering of dynamic data streams using connectivity based representative points. Data Knowl Eng 2009; 68: 1–27.

12.

Park

Lee

. Statistical grid-based clustering over data streams. ACM SIGMOD Rec 2004; 33: 32–37.

13.

Shao

Tanner

Thompson

, et al. Clustering molecular dynamics trajectories: 1. Characterizing the performance of different clustering algorithms. J Chem Theory Comput 2007; 3: 2312–2334.

14.

Yuan

Chen

Liu

, et al. Anomaly detection based on weighted fuzzy-rough density. Appl Soft Comput 2023; 134: 109995.

15.

Mazarbhuiya

Shenify

. An intuitionistic fuzzy-rough set-based classification for anomaly detection. Appl Sci 2023; 13: 1–15.

16.

Knox

. Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the international conference on very large data bases, 1998, pp.392–403.

17.

Pawlak

. Rough sets. Int J Comput Inf Sci 1982; 11: 341–356.

18.

Breunig

Kriegel

, et al. LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD international conference on Management of data, 2000, pp.93–104.

19.

Mazarbhuiya

. Detecting anomaly using neighborhood rough set based classification approach, https://ssrn.com/abstract=4124453.

20.

Albanese

Pal

Petrosino

. Rough sets, kernel set, and spatiotemporal outlier detection. IEEE Trans Knowl Data Eng 2012; 26: 194–207.

21.

Keller

Muller

Bohm

. HiCS: high contrast subspaces for density-based outlier ranking. In: 2012 IEEE 28th international conference on data engineering, 2012, pp.1037–1048. IEEE.

22.

Jiang

Sui

Cao

. Outlier detection using rough set theory. In: International workshop on rough sets, fuzzy sets, data mining, and granular-soft computing, 2005, pp.79–87. Berlin, Heidelberg: Springer Berlin Heidelberg.

23.

Jiang

Sui

Cao

. Some issues about outlier detection in rough set theory. Expert Syst Appl 2009; 36: 4680–4687.

24.

Singh

Pamula

. An outlier detection approach in large-scale data stream using rough set. Neural Comput Appl 2020; 32: 9113–9127.

25.

Jiang

Sui

Cao

. An information entropy-based approach to outlier detection in rough sets. Expert Syst Appl 2010; 37: 6338–6344.

26.

Zerkouk

Mihoubi

Chikhaoui

. Deep generative model with isolation forest (DGM-IF) for unsupervised anomaly detection in wireless sensor network and internet of things. In: 2023 9th international conference on control, decision and information technologies (CoDIT), 2023, pp.2275–2280. IEEE.

27.

Maciá-Pérez

Berna-Martinez

Oliva

, et al. Algorithm for the detection of outliers based on the theory of rough sets. Decis Support Syst 2015; 75: 63–75.

28.

Mazarbhuiya

. Detecting anomaly using neighborhood rough set based classification approach, https://ssrn.com/abstract=4124453 (2022 May 31).

29.

Jiang

Sui

Cao

. A rough set approach to outlier detection. Int J Gen Syst 2008; 37: 519–536.

30.

Shaari

Bakar

Hamdan

. Outlier detection based on rough sets theory. Intell Data Anal 2009; 13: 191–206.

31.

Jiang

Chen

. Outlier detection based on granular computing and rough set theory. Appl Intell 2015; 42: 303–322.

32.

Han

Kamber

Pei

. Data mining concepts and techniques. 3rd ed. Amsterdam: Elsevier Science, 2012.

33.

Hagen

Kahng

. New spectral methods for ratio cut partitioning and clustering. IEEE Trans Comput-Aided Des Integr Circuits Syst 1992; 11: 1074–1085.

34.

Zerkouk

Mihoubi

Chikhaoui