Indexing through separable partitioning for complex data sharing in P2P systems

Abstract

Similarity search for content-based retrieval - a sustained problem; many applications endures. Most of the similarity measures intend focusing the least possible set of elements to find an answer. In the literature, most work is based on splitting the target data set into subsets using balls. However, in the era of big data, where efficient indexing is of vital importance, the subspace volumes grow exponentially, which could degenerate the index. This problem arises due to inherent insufficiency of space partitioning interlaced with the overlap factor among the regions. This affects the search algorithms thereby rendering these methods ineffective as it gets hard to store, manage and analyze the aforementioned quantities. A good topology should avoid biased allocation of objects for separable sets and should not influence the structure of the index. We put-forward a novel technique for indexing; IMB-tree, which limits the volume space, excludes the empty sets; the separable partitions, does not contain objects and creates eXtended regions that will be inserted into a new index named eXtended index, implemented in a P2P environment. These can reunite all objects in one of the subsets-partitions; either in a separable set or in the exclusion set, keeping the others empty. We also discussed the efficiency of construction and search algorithms, as well as the quality of the index. The experimental results show interesting performances.

Keywords

Indexing eXtended region parallel metric space complex data

Get full access to this article

View all access options for this article.

References

Arora

, Sinha

, Kumar

, Bhattacharya

Hd-index: Pushing the scalability-accuracy boundary for approximate knn search in high-dimensional spaces, Proc VLDB Endow 11 (2018), 906–919.

Arroyuelo

A dynamic pivoting algorithm based on spatial approximation indexes, in Similarity Search and Applications - 7th International Conference, SISAP 2014, Los Cabos, Mexico, Proceedings, 2014.

Baral

, Gonzalez

, Son

Conceptual modeling and querying in multimedia databases, Multimedia Tools and Applications 7 (1998), 37–66.

Batko

, Novak

, Falchi

, Zezula

On scalability of the similarity search in the world of peers, in, Proceedings of the 1st International Conference on Scalable Information Systems (InfoScale), Hong Kong, China, ACM Press, 2006, pp. 20–31.

Bolettieri

, Falchi

, Lucchese

, Mass

, Perego

, Rabitti

, Shmueli-Scheuer

Searching 100m images by content similarity, in, Postproceedings of the 5th Italian Research Conference on Digital Library Systems (IRCD), Padova, Italy, 2009, pp. 88–99.

Bozkaya

, Özsoyoglu

Indexing large metric spaces for similarity search queries, ACM Transactions on Database Systems 24 (1999), 361–404.

Burkhard

W.A.

, Keller

R.M.

Some approaches to best-match file searching, Communications of the ACM 16 (1973), 230–236.

Carélo

C.C.M.

, Pola

I.R.V.

, Ciferri

R.R.

, Traina

A.J.M.

, JR

C.T.

and de Aguiar Ciferri

C.D.

, Slicing the metric space to provide quick indexing of complex data in the main memory, Inf Syst 36 (2011), 79–98.

Castelli

, Thomasian

, Li

C.-S.

Clustering and singular value decomposition for approximate similarity searches in high dimensional spaces, IEEE Transactions on Knowledge and Data Eng (TKDE) (2000).

10.

Chavez

, Navarro

, Marroquin

J.L.

, Baeza-Yates

Searching in metric spaces, ACM Computing Surveys 33 (2001).

11.

A.W.-C.

, Chan

P.M.-S.

, Cheung

Y.-L.

, Moon

Y.S.

Dynamic vp-tree indexing for nnearest neighbor search given pair-wise distances, The VLDB Journal Very Large Data Bases (2012).

12.

Curtin

R.R.

Faster dual-tree traversal for nearest neighbor search, inSimilarity Search and Applications - 8th International Conference, SISAP 2015, Glasgow, UK, 2015, Proceedings.

13.

Gaede

, Günther

Multidimensional access methods, ACM Computing Surveys 30 (1998), 170–231.

14.

Gonzaga

A.S.

, Cordeiro

R.L.F.

A new division operator to handle complex objects in very large relational datasets, in, EDBT, 2017.

15.

Hanyf

, Silkan

A queries-based structure for similarity searching in static and dynamic metric spaces, Journal of King Saud University - Computer and Information Sciences (2018).

16.

Almeida

, Valle

, Torres

R.da.S.

, Leite

N.J.

Dahctree: An effective index for approximate search in high-dimensional metric spaces, Journal of Information and Data Management 1 (2010), 375–390.

17.

Chen

, Gao

, Li

, Jensen

C.S.

, Chen

Efficient metric indexing for similarity search and similarity joins, in, IEEE Transactions on Knowledge and Data Engineering, vol. 29, IEEE, 2017, pp. 556–571.

18.

Nielsen

Bregman vantage point trees for efficient nearest neighbor queries, in Proceedings of Multimedia and Exp (ICME) IEEE, 2009.

19.

Ooi

B.C.

, Spatial kd-Tree: A Data Structure for Geographic Database, Springer Berlin Heidelberg, Berlin Heidelberg, 1987, pp. 247–258.

20.

Pagh

, Silvestri

, Sivertsen

, Skala

Approximate furthest neighbor in high dimensions, in, Similarity Search and Applications - 8th International Conference, SISAP 2015, Glasgow, UK, 2015, Proceedings.

21.

Pola

I.R.V.

, Traina

A.J.M.

, Traina

, Kaster

D.S.

Improving metric access methods with bucket files, in Similarity Search and Applications, Amato

, Connor

, Falchi

, and Gennaro

, eds., Cham, 2015, Springer International Publishing, pp. 65–76.

22.

Pola

I.R.V.

, Traina

A.J.M.

The nobh-tree: Improving in-memory metric access methods by using metric hyperplanes with non-overlapping nodes, Data & Knowledge Engineering 94 (2014), 65–88.

23.

Revathi

, Sudha

, Retrieval performance analysis of multibiometric database using optimised multidimensional spectral hashing based indexing, 2018.

24.

Smeulders

, Worring

, Santini

, Gupta

, Jain

Content based image retrieval at the end of the early years, IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (2000), 1349–1380.

25.

Wan Yuchai

W.Y.

and Xiabi

, Cd-tree: A clustering-based dynamic indexing and retrieval approach, Intelligent Data Analysis 21 (2017), 243–261.

26.

Yianilos

P.N.

Data structures and algorithms for nearest neighbor search in general metric spaces, Proceedings of the 4th Annual In ACM-SIAM Symposium on Discrete Algorithms 1993, pp. 311–321.

27.

Zezula

, Amato

, Dohnal

, Batko

, Similarity Search: The Metric Space Approach, Springer, 2010, p. 220.

28.

Zineddine

, Martinez

A new intersection tree for content-based image retrieval, in

10th International Workshop on Content-Based Multimedia Indexing, CBMI 2012 Annecy, France, 2012, pp. 1–6.