Distance Comparison Operators for Approximate Nearest Neighbor Search: Exploration and Benchmark

Zeyu Wang,Haoran Xiong,Zhenying He,Peng Wang,Wei wang
2024-03-20
Abstract:Approximate nearest neighbor search (ANNS) on high-dimensional vectors has become a fundamental and essential component in various machine learning tasks. Prior research has shown that the distance comparison operation is the bottleneck of ANNS, which determines the query and indexing performance. To overcome this challenge, some novel methods have been proposed recently. The basic idea is to estimate the actual distance with fewer calculations, at the cost of accuracy loss. Inspired by this, we also propose that some classical techniques and deep learning models can also be adapted to this purpose. In this paper, we systematically categorize the techniques that have been or can be used to accelerate distance approximation. And to help the users understand the pros and cons of different techniques, we design a fair and comprehensive benchmark, Fudist implements these techniques with the same base index and evaluates them on 16 real datasets with several evaluation metrics. Designed as an independent and portable library, Fudist is orthogonal to the specific index structure and thus can be easily utilized in the current ANNS library to achieve significant improvements.
Databases
What problem does this paper attempt to address?