What is it about?
The Nearest Neighbor Search (NNS) problem in high-dimensional space is fundamentally important in many applications, such as image database and data mining. In this paper, we introduce a novel concept of query-aware bucket partition which uses a given query as the anchor for bucket partition. Accordingly, a query-aware LSH function under a specific l_p norm with p \in (0, 2] is a random projection coupled with query-aware bucket partition, which removes the random shift required by traditional query-oblivious LSH functions. For each l_p norm (p \in (0,2]), based on the corresponding p-stable distribution, we propose a novel LSH scheme named query-aware LSH (QALSH) for c-ANNS over external memory. Our theoretical studies show that QALSH enjoys a guarantee on query quality. The use of query-aware LSH function enables QALSH to work with any approximation ratio c > 1. In addition, we propose a heuristic variant named QALSH+ to improve the scalability of QALSH. Extensive experiments show that QALSH and QALSH+ outperform the state-of-the-art schemes, especially in high-dimensional spaces. Specifically, by using a ratio c < 2, QALSH can achieve much better query quality.
Photo by Elena Mozhvilo on Unsplash
Why is it important?
Vanilla LSH functions are constructed in a query-oblivious manner in the sense that buckets are partitioned before any query arrives. However, objects closer to a query may be partitioned into different buckets, which leads to a high search error. Moreover, due to the use of query-oblivious bucket partition, the state-of-the-art LSH schemes for external memory, namely C2LSH and LSB-Forest, only work with an integer c ≥ 2. In order to fix those problems, we design a novel query-aware LSH function based on the query position. We also simply the function as a random projection coupled with the query-aware bucket partition, which removes the random shift required by traditional query-oblivious LSH functions. Notably, query-aware bucket partition can be easily implemented so that query performance is guaranteed.
Read the Original
This page is a summary of: Query-aware locality-sensitive hashing scheme for $$l_p$$ norm, The VLDB Journal, June 2017, Springer Science + Business Media,
You can read the full text:
The following have contributed to this page