ProMIPS: Efficient High-Dimensional C-Approximate Maximum Inner Product Search with a Lightweight Index

Yang Song,Yu Gu,Rui Zhang,Ge Yu
DOI: https://doi.org/10.1109/icde51399.2021.00143
2021-01-01
Abstract:Due to the wide applications in recommendation systems, multi-class label prediction and deep learning, the Maximum Inner Product (MIP) search problem has received extensive attention in recent years. Faced with large-scale datasets containing high-dimensional feature vectors, the state-of-the-art LSH-based methods usually require a large number of hash tables or long hash codes to ensure the searching quality, which takes up lots of index space and causes excessive disk page accesses. In this paper, we relax the guarantee of accuracy for efficiency and propose an efficient method for c-Approximate Maximum Inner Product (c-AMIP) search with a lightweight iDistance index. We project high-dimensional points to low-dimensional ones via 2-stable random projections and derive probability-guaranteed searching conditions, by which the c-AMIP results can be guaranteed in accuracy with arbitrary probabilities. To further improve the efficiency, we propose Quick-Probe for quickly determining the searching bound satisfying the derived condition in advance, avoiding the inefficient incremental searching process. Extensive experimental evaluations on four real datasets demonstrate that our method requires less pre-processing cost including index size and pre-processing time. In addition, compared to the state-of-the-art benchmark methods, it provides superior results on searching quality in terms of overall ratio and recall, and efficiency in terms of page access and running time.
What problem does this paper attempt to address?