Similarity search in the blink of an eye with compressed indices

Cecilia Aguerrebere,Ishwar Bhati,Mark Hildebrand,Mariano Tepper,Ted Willke
2023-07-25
Abstract:Nowadays, data is represented by vectors. Retrieving those vectors, among millions and billions, that are similar to a given query is a ubiquitous problem, known as similarity search, of relevance for a wide range of applications. Graph-based indices are currently the best performing techniques for billion-scale similarity search. However, their random-access memory pattern presents challenges to realize their full potential. In this work, we present new techniques and systems for creating faster and smaller graph-based indices. To this end, we introduce a novel vector compression method, Locally-adaptive Vector Quantization (LVQ), that uses per-vector scaling and scalar quantization to improve search performance with fast similarity computations and a reduced effective bandwidth, while decreasing memory footprint and barely impacting accuracy. LVQ, when combined with a new high-performance computing system for graph-based similarity search, establishes the new state of the art in terms of performance and memory footprint. For billions of vectors, LVQ outcompetes the second-best alternatives: (1) in the low-memory regime, by up to 20.7x in throughput with up to a 3x memory footprint reduction, and (2) in the high-throughput regime by 5.8x with 1.4x less memory.
Machine Learning,Information Retrieval
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve the performance and memory - occupancy problems in large - scale similarity search. Specifically, the authors propose a new compression method - Locally - adaptive Vector Quantization (LVQ), as well as a high - performance graph - indexing system to achieve faster and smaller graph - indexing. These techniques are especially suitable for processing high - dimensional vector data on the scale of billions, such as applications in fields like images, audio, video, text, genomics and computer code. ### Background and challenges In the era of deep learning, high - dimensional vectors have become the standard way to represent unstructured data. For example, data such as images, audio, video, text, genomics and computer code can all be represented as high - dimensional vectors. The way these vectors are generated makes semantically related vectors close to each other under a certain similarity function. Therefore, searching for the nearest neighbors most similar to a given query vector from these vectors has become a widespread problem, known as similarity search. However, due to the large scale of data (billions of vectors, each with hundreds of dimensions) and the impact of the curse of dimensionality, exact nearest - neighbor search has become impractical. Therefore, research has mainly focused on approximate methods. Graph - based Approximate Nearest Neighbor Methods perform well in large - scale similarity search, but their random - access memory pattern limits their performance potential. ### Main contributions 1. **New compression algorithm**: The authors propose Locally - adaptive Vector Quantization (LVQ), which improves search performance through per - vector scaling and scalar quantization, reduces the effective bandwidth, and at the same time reduces memory - occupancy with almost no impact on accuracy. Under low - memory configurations, LVQ can increase the throughput by up to 20.7 times and reduce the memory - occupancy by 3 times; under high - throughput configurations, the throughput can be increased by 5.8 times and the memory - occupancy can be reduced by 1.4 times. 2. **Fast implementation**: Combined with an optimized graph - search algorithm, LVQ sets new performance and memory - occupancy standards in large - scale similarity search. The authors verify these contributions through experiments. 3. **Index construction**: LVQ can directly construct a graph - index from compressed vectors, thereby relieving memory pressure in this time - consuming step while having minimal impact on the index quality. 4. **Open - source framework**: The authors open - source a similarity - search library, allowing the research community to use their algorithms and large - scale search frameworks for experiments. 5. **New dataset and generator**: To promote similarity - search research in modern applications, the authors introduce a new dataset containing 768 - dimensional vectors generated by a large - language model and open - source the code for generating this dataset. ### Technical details 1. **Locally - adaptive Vector Quantization (LVQ)**: - **Definition**: LVQ reduces memory pressure through per - vector scaling and scalar quantization and has a built - in two - level quantization - margin system to avoid storing full - precision vectors. - **Formula**: \[ Q(x) = [Q(x_1 - \mu_1; B, \ell, u), \ldots, Q(x_d - \mu_d; B, \ell, u)] \] where \( Q \) is the scalar quantization function, defined as: \[ Q(x; B, \ell, u) = \Delta \left\lfloor \frac{x - \ell}{\Delta} + \frac{1}{2} \right\rfloor + \ell, \quad \text{where} \quad \Delta = \frac{u - \ell}{2^B - 1} \] \( \mu = [\mu_1, \ldots, \mu_d] \) is the mean of all vectors, and \( u \) and \( \ell \) are defined as: \[ u = \max_j (x_j - \mu_j), \quad \ell = \min_j (x_j - \mu_j)