Abstract:There are now over 20 commercial vector database management systems (VDBMSs), all produced within the past five years. But embedding-based retrieval has been studied for over ten years, and similarity search a staggering half century and more. Driving this shift from algorithms to systems are new data intensive applications, notably large language models, that demand vast stores of unstructured data coupled with reliable, secure, fast, and scalable query processing capability. A variety of new data management techniques now exist for addressing these needs, however there is no comprehensive survey to thoroughly review these techniques and systems. We start by identifying five main obstacles to vector data management, namely vagueness of semantic similarity, large size of vectors, high cost of similarity comparison, lack of natural partitioning that can be used for indexing, and difficulty of efficiently answering hybrid queries that require both attributes and vectors. Overcoming these obstacles has led to new approaches to query processing, storage and indexing, and query optimization and execution. For query processing, a variety of similarity scores and query types are now well understood; for storage and indexing, techniques include vector compression, namely quantization, and partitioning based on randomization, learning partitioning, and navigable partitioning; for query optimization and execution, we describe new operators for hybrid queries, as well as techniques for plan enumeration, plan selection, and hardware accelerated execution. These techniques lead to a variety of VDBMSs across a spectrum of design and runtime characteristics, including native systems specialized for vectors and extended systems that incorporate vector capabilities into existing systems. We then discuss benchmarks, and finally we outline research challenges and point the direction for future work.

Vector Database Management Techniques and Systems

Survey of Vector Database Management Systems

A Comprehensive Survey on Vector Database: Storage and Retrieval Technique, Challenge

Vector database management systems: Fundamental concepts, use-cases, and current challenges

When Large Language Models Meet Vector Databases: A Survey

Manu: A Cloud Native Vector Database Management System

Quantixar: High-performance Vector Data Management System

Fast Search In Large-Scale Image Database Using Vector Quantization

Approximate Vector Set Search: A Bio-Inspired Approach for High-Dimensional Spaces

Vector Spatial Big Data Storage and Optimized Query Based on the Multi-Level Hilbert Grid Index in HBase

VectorSearch: Enhancing Document Retrieval with Semantic Embeddings and Optimized Search

Distributed Vector Quantization Based On Kullback-Leibler Divergence

VQ Image Coding Using Sub-Vector Techniques.

Color Image Retrieval Utilizing Extended Fast Vq Codeword Search Technique And Vector Composition-Based Feedback

The Faiss library

Foundations of Vector Retrieval

Domain-specific website recognition using hybrid vector space model

Curator: Efficient Indexing for Multi-Tenant Vector Databases

Research on Integrated Operation of Vector and Raster Data in Object-Relational Database

Modeling Image Data for Effective Indexing and Retrieval in Large General Image Databases.