What is it about?

The Nearest Neighbor Search (NNS) problem in high-dimensional space is fundamentally important in many applications, such as image database and data mining. In this paper, we introduce a novel concept of query-aware bucket partition which uses a given query as the anchor for bucket partition. Accordingly, a query-aware LSH function under a specific l_p norm with p \in (0, 2] is a random projection coupled with query-aware bucket partition, which removes the random shift required by traditional query-oblivious LSH functions. For each l_p norm (p \in (0,2]), based on the corresponding p-stable distribution, we propose a novel LSH scheme named query-aware LSH (QALSH) for c-ANNS over external memory. Our theoretical studies show that QALSH enjoys a guarantee on query quality. The use of query-aware LSH function enables QALSH to work with any approximation ratio c > 1. In addition, we propose a heuristic variant named QALSH+ to improve the scalability of QALSH. Extensive experiments show that QALSH and QALSH+ outperform the state-of-the-art schemes, especially in high-dimensional spaces. Specifically, by using a ratio c < 2, QALSH can achieve much better query quality.

Featured Image

Why is it important?

Vanilla LSH functions are constructed in a query-oblivious manner in the sense that buckets are partitioned before any query arrives. However, objects closer to a query may be partitioned into different buckets, which leads to a high search error. Moreover, due to the use of query-oblivious bucket partition, the state-of-the-art LSH schemes for external memory, namely C2LSH and LSB-Forest, only work with an integer c ≥ 2. In order to fix those problems, we design a novel query-aware LSH function based on the query position. We also simply the function as a random projection coupled with the query-aware bucket partition, which removes the random shift required by traditional query-oblivious LSH functions. Notably, query-aware bucket partition can be easily implemented so that query performance is guaranteed.


I hope this article could bring new thought to people who design basic data structures for NNS problem or other fundamental similarity search problems. The query-aware way for NNS is quite exciting and insightful, which might be the potential of the separate interest in other computer science fields.

Dr Qiang Huang
National University of Singapore

Read the Original

This page is a summary of: Query-aware locality-sensitive hashing scheme for $$l_p$$ norm, The VLDB Journal, June 2017, Springer Science + Business Media,
DOI: 10.1007/s00778-017-0472-7.
You can read the full text:




The following have contributed to this page