Fast KNN search for big data with set compression tree and best bin first

Zhenjie Chen,Jingqi Yan
DOI: https://doi.org/10.1109/CCIOT.2016.7868311
2016-01-01
Abstract:This paper proposes k nearest neighbors (kNN) search based on set compression tree (SCT) and best bin first (BBF) to deal with the problem for big data. The large compression rate by set compression tree is achieved by compressing the set of descriptors jointly instead of compressing on a per-descript or basis. So set compression tree has a good performance in kNN search at a low bit rate. At the same time, the best bin first (BBF) is a very efficient algorithm to find the approximately kNN from a large number of high dimensional feature descriptors. SCT-BBF is a novel exploration and it improves search performance in three aspects: First, SCT-BBF requires less memory footprint, which is important in big data age. Second, it increases accuracy compared traditional method like KD-Tree and original SCT. SCT-BBF can be used with other data processing methods like PCA and SIFT to perform better. Third, this paper adopts finer search to increase accuracy at a slight loss of speed. And it can extend to big data easily.
What problem does this paper attempt to address?