Abstract
Similarity search for content-based retrieval - a sustained problem; many applications endures. Most of the similarity measures intend focusing the least possible set of elements to find an answer. In the literature, most work is based on splitting the target data set into subsets using balls. However, in the era of big data, where efficient indexing is of vital importance, the subspace volumes grow exponentially, which could degenerate the index. This problem arises due to inherent insufficiency of space partitioning interlaced with the overlap factor among the regions. This affects the search algorithms thereby rendering these methods ineffective as it gets hard to store, manage and analyze the aforementioned quantities. A good topology should avoid biased allocation of objects for separable sets and should not influence the structure of the index. We put-forward a novel technique for indexing; IMB-tree, which limits the volume space, excludes the empty sets; the separable partitions, does not contain objects and creates eXtended regions that will be inserted into a new index named eXtended index, implemented in a P2P environment. These can reunite all objects in one of the subsets-partitions; either in a separable set or in the exclusion set, keeping the others empty. We also discussed the efficiency of construction and search algorithms, as well as the quality of the index. The experimental results show interesting performances.
Get full access to this article
View all access options for this article.
