Comparative study of neighbor-based methods for local outlier detection

Abstract

The neighbor-based method has become a powerful tool for addressing the outlier detection problem, which aims to assess the abnormality of a sample based on its compactness relative to neighboring samples. However, most existing methods primarily focus on designing various processes to identify outliers, while the contributions of different types of neighbors to the detection process have not been adequately explored. To address this gap, this article investigates the role of neighbors in existing outlier detection algorithms and introduces a taxonomy that utilizes three key components: information, neighbor, and methodology, to define hybrid methods. This taxonomy provides a framework that can inspire the development of novel neighbor-based outlier detection algorithms by combining different components from each level. Extensive comparative experiments on both synthetic and real-world datasets, including performance evaluations and case studies, demonstrate that reverse K-nearest neighbor-based methods perform well and that dynamic selection methods are particularly effective in high-dimensional spaces. Furthermore, the results confirm that strategically selecting components from this taxonomy can lead to the development of algorithms that outperform existing methods.

Keywords

neighbor-based methods local outlier detection taxonomy

Get full access to this article

View all access options for this article.

References

Bache

Lichman

(2013) UCI Machine Learning Repository. Irvine, CA, USA: UCI.

Bhattacharya

Ghosh

Chowdhury

(2015) Outlier detection using neighborhood rank difference. Pattern Recognition Letters 60: 24–31.

Boukela

Zhang

Yacoub

, et al. (2022) An approach for unsupervised contextual anomaly detection and characterization. Intelligent Data Analysis 26(5): 1185–1209.

Boukerche

Zheng

Alfandi

(2020) Outlier detection: Methods, models, and classification. ACM Computing Surveys (CSUR) 53(3): 1–37.

Breunig

Kriegel

, et al. (2000) LOF: Identifying density-based local outliers. ACM Sigmod Record 29(2): 93–104.

Bryant

Cios

(2017) RNN-DBSCAN: A density-based clustering algorithm using reverse nearest neighbor density estimates. IEEE Transactions on Knowledge and Data Engineering 30(6): 1109–1121.

Chen

Cao

, et al. (2023a) Class-level structural relation modeling and smoothing for visual representation learning. In: Proceedings of the 31st ACM International Conference on Multimedia (MM'23), October 29–November 3, 2023, Ottawa, ON, Canada, New York, NY, USA, pp.2964–2972. ACM.

Chen

, et al. (2023b) Class-aware convolution and attentive aggregation for image classification. In: Proceedings of the 5th ACM International Conference on Multimedia in Asia (MMAsia'23), December 6–8, 2023, Tainan, Taiwan, New York, NY, USA, Article 20, pp.1–7. ACM.

Dewi

Arbawa

(2019) Performance evaluation of distance function in KNN and WKNN for classification of soil organic matter. In: Proceedings of the 4th International Conference on Sustainable Information Engineering and Technology (SIET 2019), September 28–30, 2019, Senggigi, West Nusa Tenggara, Indonesia, Vol. 33, pp.196–199. Piscataway, NJ, USA.

10.

Domingues

Filippone

Michiardi

, et al. (2018) A comparative evaluation of outlier detection algorithms: Experiments and analyses. Pattern Recognition 74: 406–421.

11.

Duggimpudi

Abbady

Chen

, et al. (2019) Spatio-temporal outlier detection algorithms based on computing behavioral outlierness factor. Data & Knowledge Engineering 122: 1–24.

12.

Gallego

Rico-Juan

Valero-Mas

(2022) Efficient K-nearest neighbor search based on clustering and adaptive k values. Pattern Recognition 122: 108356.

13.

Gao

Zhang

, et al. (2020) Cube-based incremental outlier detection for streaming computing. Information Sciences 517: 361–376.

14.

Seok

Lee

(2014) Robust outlier detection using the instability factor. Knowledge-Based Systems 63: 15–23.

15.

Hampel

Ronchetti

Rousseeuw

, et al. (2011) Robust Statistics: The Approach Based on Influence Functions. New York, NY: John Wiley & Sons.

16.

Huang

Zhu

Yang

, et al. (2016) A non-parameter outlier detection algorithm based on natural neighbor. Knowledge-Based System 92: 71–77.

17.

Huang

Zhu

Yang

, et al. (2017) A novel outlier cluster detection algorithm without top-n parameter. Knowledge-Based System 121: 32–40.

18.

Kim

Shim

Heo

, et al. (2019) Moving view field nearest neighbor queries. Data & Knowledge Engineering 119: 58–70.

19.

Kriegel

Kröger

Schubert

, et al. (2009) Outlier detection in axis-parallel subspaces of high dimensional data. In: Proceedings of the 13th Pacific_Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2009), April 27–30, 2009, Bangkok, Thailand, Lecture Notes in Computer Science (LNCS, vol. 5476). Heidelberg, Germany, pp.831–838. Springer.

20.

Latifi-Pakdehi

Daneshpour

(2021) DBHC: A DBSCAN-based hierarchical clustering algorithm. Data & Knowledge Engineering 135: 101922.

21.

Leng

Huang

(2011) Outliers detection with correlated subspaces for high dimensional datasets. International Journal of Wavelets, Multiresolution and Information Processing 9(2): 227–236.

22.

Levada

ALM

Nielsen

Haddad

MFC

(2024) Adaptive

k

-nearest neighbor classifier based on the local estimation of the shape operator. arXiv preprint arXiv:2409.05084. https://arxiv.org/abs/2409.05084

23.

, et al. (2022) Unsupervised contrastive masking for visual haze classification. In: Proceedings of the 2022 International Conference on Multimedia Retrieval (ICMR '22), June 27–30, 2022, Newark, NJ, USA. New York, NY, USA, pp.426–434. ACM.

24.

Liu

Niu

Liao

(2018) Mechanisms to improve clustering uncertain data with ukmeans. Data & Knowledge Engineering 116: 61–79.

25.

Liu

Song

, et al. (2020) Scalable KDE-based top-

n

local outlier detection over large-scale data streams. Knowledge-Based System 204: 106186.

26.

Liu

, et al. (2017) Efficient outlier detection for high-dimensional data. IEEE Transactions on Systems, Man, and Cybernetics — Part A: Systems 48(12): 2451–2461.

27.

Liu

Xiao

, et al. (2022) Prompt learning with cross-modal feature alignment for visual domain adaptation. In: Proceedings of the 2nd CAAI International Conference on Artificial Intelligence (CICAI 2022), August 27–28, 2022, Beijing, China. Lecture Notes in Computer Science (LNCS, Vol. 13604–13606). Heidelberg, Germany, pp.416–428. Springer.

28.

Liu

Chen

, et al. (2023) Cross-training with prototypical distillation for improving the generalization of federated learning. In: Proceedings of the 2023 IEEE International Conference on Multimedia and Expo (ICME '23), July 10–14, 2023, Brisbane, Australia. Piscataway, NJ, USA, pp.648–653. IEEE.

29.

Masud

Huang

Zhong

, et al. (2019) Generate pairwise constraints from unlabeled data for semi-supervised clustering. Data & Knowledge Engineering 123: 101715.

30.

Meng

, et al. (2024) Improving global generalization and local personalization for federated learning. IEEE Transactions on Neural Networks and Learning Systems 36(1): 76–87.

31.

Moutafis

Mavrommatis

Vassilakopoulos

, et al. (2019) Efficient processing of all-

k

-nearest-neighbor queries in the mapreduce programming framework. Data & Knowledge Engineering 121: 42–70.

32.

Ning

Chen

Zhou

, et al. (2018) Parameter

k

search strategy in outlier detection. Pattern Recognition Letters 112: 56–62.

33.

Pascoal

De Oliveira

Valadas

, et al. (2012) Robust feature selection and robust pca for internet traffic anomaly detection. In: Proceedings of the 31st Annual IEEE International Conference on Computer Communications (IEEE INFOCOM 2012), March 25–30, 2012, Orlando, FL, USA, pp.1755–1763. IEEE.

34.

Chen

(2022) A novel density-based outlier detection method using key attributes. Intelligent Data Analysis 26(6): 1431–1449.

35.

Meng

, et al. (2023a) Attentive modeling and distillation for out-of-distribution generalization of federated learning. In: Proceedings of the 2024 IEEE International Conference on Multimedia and Expo (ICME '24), July 15–19, 2024, Niagara Falls, ON, Canada. Piscataway, NJ, USA, pp.1–7. IEEE.

36.

Jiang

Chen

(2021) Iterative gradient descent for outlier detection. International Journal of Wavelets, Multiresolution and Information Processing 19(04): 2150004.

37.

Meng

Chen

, et al. (2023b) Cross-silo prototypical calibration for federated learning with non-IID data. In: Proceedings of the 31st ACM International Conference on Multimedia (MM '23), October 29–November 3, 2023, Ottawa, ON, Canada. New York, NY, USA.

38.

Meng

(2025a) Cross-silo feature space alignment for federated learning on clients with imbalanced data. Proceedings of the AAAI Conference on Artificial Intelligence 39: 19986–19994.

39.

Wang

Chen

, et al. (2022) Clustering-based curriculum construction for sample-balanced federated learning. In: Proceedings of the 2nd CAAI International Conference on Artificial Intelligence (CICAI 2022), August 27–28, 2022, Beijing, China, Vol. 10, pp.155–166. Lecture Notes in Computer Science, Springer, Heidelberg, Germany.

40.

Zhou

Meng

, et al. (2025b) Federated deconfounding and debiasing learning for out-of-distribution generalization[J]. arXiv preprint arXiv:2505.04979.

41.

Radovanović

Nanopoulos

Ivanović

(2014) Reverse nearest neighbors in unsupervised distance-based outlier detection. IEEE Transactions on Knowledge and Data Engineering 27(5): 1369–1382.

42.

Rastin

Jahromi

Taheri

(2021) A generalized weighted distance K-nearest neighbor for multi-label problems. Pattern Recognition 114: 107526.

43.

Riahi-Madvar

Azirani

Nasersharif

, et al. (2021) A new density-based subspace selection method using mutual information for high dimensional outlier detection. Knowledge-Based Systems 216: 106733.

44.

Shi

Billor

Ceyhan

(2024) Outlier detection with cluster catch digraphs. arXiv preprint arXiv:2409.11596. https://arxiv.org/abs/2409.11596

45.

Sun

Han

(2025) Effective K-nearest neighbor models for data classification using proximal ratio. Journal of Big Data 12(1): 137.

46.

Tang

(2017) A local density-based approach for outlier detection. Neurocomputing 241: 171–180.

47.

Wahid

Annavarapu

CSR

(2021) Nanod: A natural neighbour-based outlier detection algorithm. Neural Computing & Applications 33(6): 2107–2123.

48.

Wang

Mao

(2020) A dynamic ensemble outlier detection model based on an adaptive K-nearest neighbor rule. Infusion Fusion 63: 30–40.

49.

Wang

Zhu

Luo

, et al. (2021) Local dynamic neighborhood based outlier detection approach and its framework for large-scale datasets. Egyptian Informatics Journal 22(2): 125–132.

50.

Wang

Zhang

(2024) Dimensionality-aware outlier detection: Theoretical and experimental analysis. arXiv preprint arXiv:2401.05453. https://arxiv.org/abs/2401.05453

51.

Wang

, et al. (2015) A fast MST-inspired KNN-based outlier detection method. Information Systems 48: 89–112.

52.

Wang

, et al. (2022) Meta-causal feature learning for out-of-distribution generalization. In: Proceedings of the European Conference on Computer Vision (ECCV 2022), October 23–27, 2022, Tel Aviv, Israel, NCS, Heidelberg, Germany, Vol. 13604–13606, pp.530–545 Springer.

53.

Wang

, et al. (2023) Multi-channel attentive weighting of visual frames for multimodal video classification. In: Proceedings of the 2023 International Joint Conference on Neural Networks (IJCNN '23), June 18–23, 2023, Gold Coast Convention & Exhibition Centre, Gold Coast, Queensland, Australia, Vol. 10, pp.1–8. IEEE.

54.

Liu

, et al. (2018) A comparison of outlier detection techniques for high-dimensional data. International Journal of Computational Intelligence Systems 11(1): 652–662.

55.

Yang

Zhu

(2011) Finding key attribute subset in dataset for outlier detection. Knowledge-Based Systems 24(2): 269–274.

56.

Zhang

Hutter

Jin

(2009) A new local distance-based outlier detection approach for scattered real-world data. In: Proceedings of the 13th Pacific_Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2009), April 27–30, 2009, Bangkok, Thailand, Lecture Notes in Computer Science. Heidelberg, Germany, Vol. 5476, pp.813–822. Springer.

57.

Zhang

(2020) Cost-sensitive KNN classification. Neurocomputing 391: 234–242.

58.

Zhang

Cao

Wang

, et al. (2019) A novel ensemble method for K-nearest neighbor. Pattern Recognition 85: 13–25.

59.

Zhao

Zhang

Qin

(2017) Loma: A local outlier mining algorithm based on attribute relevance analysis. Expert Systems with Applications 84: 272–280.

60.

Zhou

Lin

, et al. (2024) Local means-based fuzzy K-nearest neighbor classifier with Minkowski distance and relevance-complementarity feature weighting. Granular Computing 9(1): 73.

61.

Zhu

Feng

Huang

(2016) Natural neighbor: A self-adaptive neighborhood method without parameter k. Pattern Recognition Letters 80: 30–36.