Indexing very high-dimensional sparse and quasi-sparse vectors for similarity searches

Changzhou Wang,X. Sean Wang
DOI: https://doi.org/10.1007/s007780100036
2001-01-01
Abstract:. Similarity queries on complex objects are usually translated into searches among their feature vectors. This paper studies indexing techniques for very high-dimensional (e.g., in hundreds) vectors that are sparse or quasi-sparse, i.e., vectors each having only a small number (e.g., ten) of non-zero or significant values. Based on the R-tree, the paper introduces the xS-tree that uses lossy compression of bounding regions to guarantee a reasonable minimum fan-out within the allocated storage space for each node. In addition, the paper studies the performance and scalability of the xS-tree via experiments.
What problem does this paper attempt to address?