Efficient Approximate Maximum Inner Product Search over Sparse Vectors

Xi Zhao,Zhonghan Chen,Kai Huang,Ruiyuan Zhang,Bolong Zheng,Xiaofang Zhou
DOI: https://doi.org/10.1109/icde60146.2024.00303
2024-01-01
Abstract:The maximum inner product search (MIPS) problem in high-dimensional vector spaces has various applications, primarily driven by the success of deep neural network-based embedding models. Existing MIPS methods designed for dense vectors using approximate techniques like locality-sensitive hashing (LSH) have been well studied, but they are not efficient and effective for searching sparse vectors due to the near-orthogonality among the sparse vectors. The solutions to MIPS over sparse vectors rely heavily on inverted lists, resulting in poor query efficiency, particularly when dealing with large-scale sparse datasets. In this paper, we introduce SOSIA, a novel framework specifically tailored to address these limitations. To handle sparsity, we propose the SOS transformation, which converts sparse vectors into a binary space while providing an unbiased estimator of the inner product between any two vectors. Additionally, we develop a minHash-based index to enhance query efficiency. We provide a theoretical analysis on the query quality of SOSIA and present extensive experiments on real-world sparse datasets to validate its effectiveness. The experimental results demonstrate its superior performance in terms of query efficiency and accuracy compared to existing methods.
What problem does this paper attempt to address?