Evaluate K NN stream queries by cache mechanisms

Abstract

K-nearest neighbor (KNN) queries algorithm has been widely used in many fields. However, as data volume increases, query efficiency descends sharply. Inspired by Persistent Memory, we propose a new method of knowledge base and cache mechanisms to evaluate exact KNN queries. Initially, this method caches all the dataset tuples. Next in importance, this method creates a knowledge base and uses a learning-based method to get the first tuple. With the evaluating of stream KNN queries, the knowledge base gets sufficient information. In the knowledge base, every tuple is thought as a region. The regions of the knowledge base will be clustered into more significant regions. When a query is submitted, our method tries to obtain all of the results from the clustered regions. From this strategy, we can minimize the response time by getting candidates set quickly from the clustered regions and avoiding partially or wholly access to the underlying systems say database or files. Numerous experiments have been conducted using datasets of varying dimensions, including low-dimensional datasets (2, 3, and 4) and high-dimensional datasets (25, 50, and 104). The outcomes of these experiments demonstrate a notable superiority of our proposed method over analogous approaches presented in antecedent literature, particularly concerning the evaluation of a sequence of KNN queries. Our method is not only database friendly but also can be applied to many online systems that need fast and exact KNN retrieval.

Keywords

persistent memory cache mechanisms region cluster

Get full access to this article

View all access options for this article.

References

Amagata

Hara

Onizuka

. Space filling approach for distributed processing of top-k dominating queries. IEEE Trans Knowl Data Eng 2018; 30(6): 1150–1163.

Yang

Zhou

, et al. Efficient processing of top k group skyline queries. Knowl Base Syst 2019; 182: 104795.

Song

Liu

, et al. Top-k frequent items and item frequency tracking over sliding windows of any size. Inf Sci 2019; 475: 100–120.

Qiao

Zhu

, et al. A top-k spatial join querying processing algorithm based on spark. Inf Syst 2020; 87: 101419.

Guan

Zheng

, et al. Toward oblivious location-based k-nearest neighbor query in smart cities. IEEE IoT J 2021; 8(18): 14219–14231.

Kui

Feng

Zhou

, et al. Securing top-k query processing in two-tiered sensor networks. Connect Sci 2021; 33(1): 62–80.

Song

Zhu

Liu

, et al. Modeling and prediction of NOx emission of a coal-fired boiler by a learning-based KNN mechanism. Int J Pattern Recogn Artif Intell 2022; 36(13): 2251014.

Zhao

Guo

, et al. Fuzzy hypergraph network for recommending top-K profitable stocks. Inf Sci 2022; 613: 239–255.

Zehlike

Sühr

Baeza-Yates

, et al. Fair Top-k Ranking with multiple protected groups. Inf Process Manag 2022; 59(1): 102707.

10.

Ercisli

. Data-efficient crop pest recognition based on KNN distance entropy. Sustain Comput Inform Syst 2023; 38(5): 100860.

11.

Prasad

Gupta

Borah

, et al. Predicting diabetes with multivariate analysis an innovative KNN based classifier approach. Prev Med 2023; 174: 107619.

12.

Kazemian

Shrestha

. Comparisons of machine learning techniques for detecting fraudulent criminal identities. Expert Syst Appl 2023; 229: 120591.