DForest: A Minimal Dimensionality-Aware Indexing for High-Dimensional Exact Similarity Search

Lingli Li,Wenjing Sun,Baohua Wu
DOI: https://doi.org/10.1109/tkde.2024.3381111
IF: 9.235
2024-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:The problem of similarity search in high-dimensional space is a fundamental problem with numerous applications in computer science, yet it remains challenging due to the curse of dimensionality. This paper introduces DForest, a novel indexing approach designed to address this challenge for both range and kNN queries on high-dimensional data. Unlike previous similarity search approaches that apply a fixed dimensionality reduction to all objects uniformly, our approach determines the minimal dimensionality required for each object within a specified loss threshold and then reduces the dimensionality for each object individually. Furthermore, the query performance is also optimized by deriving the upper and lower bounds of retrieved blocks and computing distances in a low-embedding space preferentially. Theoretical analysis is provided to support our search strategy. Extensive experiments on a variety of datasets verify the superiority of DForest over the state-of-the-art methods.
computer science, information systems, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?