Abstract:Nowadays, data is represented by vectors. Retrieving those vectors, among millions and billions, that are similar to a given query is a ubiquitous problem, known as similarity search, of relevance for a wide range of applications. Graph-based indices are currently the best performing techniques for billion-scale similarity search. However, their random-access memory pattern presents challenges to realize their full potential. In this work, we present new techniques and systems for creating faster and smaller graph-based indices. To this end, we introduce a novel vector compression method, Locally-adaptive Vector Quantization (LVQ), that uses per-vector scaling and scalar quantization to improve search performance with fast similarity computations and a reduced effective bandwidth, while decreasing memory footprint and barely impacting accuracy. LVQ, when combined with a new high-performance computing system for graph-based similarity search, establishes the new state of the art in terms of performance and memory footprint. For billions of vectors, LVQ outcompetes the second-best alternatives: (1) in the low-memory regime, by up to 20.7x in throughput with up to a 3x memory footprint reduction, and (2) in the high-throughput regime by 5.8x with 1.4x less memory.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to solve the performance and memory - occupancy problems in large - scale similarity search. Specifically, the authors propose a new compression method - Locally - adaptive Vector Quantization (LVQ), as well as a high - performance graph - indexing system to achieve faster and smaller graph - indexing. These techniques are especially suitable for processing high - dimensional vector data on the scale of billions, such as applications in fields like images, audio, video, text, genomics and computer code. ### Background and challenges In the era of deep learning, high - dimensional vectors have become the standard way to represent unstructured data. For example, data such as images, audio, video, text, genomics and computer code can all be represented as high - dimensional vectors. The way these vectors are generated makes semantically related vectors close to each other under a certain similarity function. Therefore, searching for the nearest neighbors most similar to a given query vector from these vectors has become a widespread problem, known as similarity search. However, due to the large scale of data (billions of vectors, each with hundreds of dimensions) and the impact of the curse of dimensionality, exact nearest - neighbor search has become impractical. Therefore, research has mainly focused on approximate methods. Graph - based Approximate Nearest Neighbor Methods perform well in large - scale similarity search, but their random - access memory pattern limits their performance potential. ### Main contributions 1. **New compression algorithm**: The authors propose Locally - adaptive Vector Quantization (LVQ), which improves search performance through per - vector scaling and scalar quantization, reduces the effective bandwidth, and at the same time reduces memory - occupancy with almost no impact on accuracy. Under low - memory configurations, LVQ can increase the throughput by up to 20.7 times and reduce the memory - occupancy by 3 times; under high - throughput configurations, the throughput can be increased by 5.8 times and the memory - occupancy can be reduced by 1.4 times. 2. **Fast implementation**: Combined with an optimized graph - search algorithm, LVQ sets new performance and memory - occupancy standards in large - scale similarity search. The authors verify these contributions through experiments. 3. **Index construction**: LVQ can directly construct a graph - index from compressed vectors, thereby relieving memory pressure in this time - consuming step while having minimal impact on the index quality. 4. **Open - source framework**: The authors open - source a similarity - search library, allowing the research community to use their algorithms and large - scale search frameworks for experiments. 5. **New dataset and generator**: To promote similarity - search research in modern applications, the authors introduce a new dataset containing 768 - dimensional vectors generated by a large - language model and open - source the code for generating this dataset. ### Technical details 1. **Locally - adaptive Vector Quantization (LVQ)**: - **Definition**: LVQ reduces memory pressure through per - vector scaling and scalar quantization and has a built - in two - level quantization - margin system to avoid storing full - precision vectors. - **Formula**: \[ Q(x) = [Q(x_1 - \mu_1; B, \ell, u), \ldots, Q(x_d - \mu_d; B, \ell, u)] \] where \( Q \) is the scalar quantization function, defined as: \[ Q(x; B, \ell, u) = \Delta \left\lfloor \frac{x - \ell}{\Delta} + \frac{1}{2} \right\rfloor + \ell, \quad \text{where} \quad \Delta = \frac{u - \ell}{2^B - 1} \] \( \mu = [\mu_1, \ldots, \mu_d] \) is the mean of all vectors, and \( u \) and \( \ell \) are defined as: \[ u = \max_j (x_j - \mu_j), \quad \ell = \min_j (x_j - \mu_j)

Similarity search in the blink of an eye with compressed indices

Locally-Adaptive Quantization for Streaming Vector Search

Fast Adaptive Similarity Search through Variance-Aware Quantization

Vector and Line Quantization for Billion-scale Similarity Search on GPUs

Fast Additive Quantization for Vector Compression in Nearest Neighbor Search.

An Improved Fast Encoding Method For Vector Quantization Based On Memory Efficient Data Structure

LeanVec: Searching vectors faster by making them fit

Fast Search In Large-Scale Image Database Using Vector Quantization

Billion-Scale Similarity Search with GPUs

Indexing very high-dimensional sparse and quasi-sparse vectors for similarity searches

A Fast Full Search Equivalent Encoding Method for Vector Quantization by Using Appropriate Features

Fast Image Search Using Vector Quantization

Compact Projection: Simple and Efficient Near Neighbor Search with Practical Memory Requirements

Efficient Similarity Search by Summarization in Large Video Database

A New Fast Encoding Algorithm For Data Compression

Robust Quantization for General Similarity Search.

Approximate search with quantized sparse representations

A FAST SEARCH METHOD FOR VECTOR QUANTIZATION USING 2-PIXEL-MERGING SUM PYRAMID IN RECURSIVE WAY

Fast High-dimensional Approximate Nearest Neighbor Search with Efficient Index Time and Space

Multi-stage vector quantization towards low bit rate visual search

Starling: An I/O-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data Segment