Aggregating Tree for Searching in Billion Scale High Dimensional Data

Shicong Liu,Junru Shao,Hongtao Lu
DOI: https://doi.org/10.1109/icdmw.2016.0110
2016-01-01
Abstract:We present a novel nearest neighbor search scheme named aggregating tree (A-Tree) for high dimensional data that uses vector quantization encodings (VQ-encodings) to build a radix tree, and perform the nearest neighbor search by beam search. To search accurately and efficiently, we suggest VQencodings to satisfy locally aggregating encoding criterion: for any node of the corresponding A-Tree, neighboring vectors should aggregate in fewer subtrees to make beam search efficient. We suggest another two criteria for effective VQencodings which resembles balanced and uncorrelated bit criteria for hashing codes. We use generalized residual vector quantization (GRVQ) encodings to build A-Tree to meet the suggested criteria, and this combination shows significantly better performances. Our methods are validated on several standard benchmark datasets, including one containing a billion vectors. Experimental results show the superior efficiency and effectiveness of our proposed methods compared to the state-ofthe-art.
What problem does this paper attempt to address?