Abstract
Finding k-nearest neighbours (k-NN) is one of the most important primitives of many applications such as search engines and recommendation systems. However, its computational cost is extremely high when searching for k-NN points in a huge collection of high-dimensional points. Locality-sensitive hashing (LSH) has been introduced for an efficient k-NN approximation, but none of the existing LSH approaches clearly outperforms others. We propose a novel LSH approach, Signature Selection LSH (S2LSH), which finds approximate k-NN points very efficiently in various datasets. It first constructs a large pool of highly diversified signature regions with various sizes. Given a query point, it dynamically generates a query-specific signature region by merging highly effective signature regions selected from the signature pool. We also suggest S2LSH-M, a variant of S2LSH, which processes multiple queries more efficiently by using query-specific features and optimization techniques. Extensive experiments show the performance superiority of our approaches in diverse settings.
Keywords
Get full access to this article
View all access options for this article.
