Large Visual Words For Large Scale Image Classification

Sheng Tang,Hui Chen,Ke Lv,Yong-Dong Zhang
DOI: https://doi.org/10.1109/ICIP.2015.7350984
2015-01-01
Abstract:Recently, using large visual vocabulary or codebooks to quantize and partition the set of local feature descriptors into large set of disjoint subsets termed visual words (or large visual words) has become an important research topic in solving many computer vision problems including near duplicate image retrieval, object retrieval, etc. Generally, large visual words means a heavy burden on the cost of time and memory space for both the construction of large vocabulary and the searching process, especially for large scale applications. In this paper, we present an efficient generation approach of large visual words with a very compact vocabulary, namely two dictionaries learned with sparse non-negative matrix factorization (NMF). After piecewise sparse decomposition of features with two learned dictionaries, we map a pair of indices of the dictionary's bases corresponding to the maximum elements of the two sparse codes to a large set of visual words upon the assumption that data with similar properties will share the same base with the largest sparse coefficient. With the help of an inverted file structure built through the large visual words, K-nearest neighbors (KNN) can be efficiently retrieved. Therefore, we can classify images very efficiently with the incorporation of our fast KNN search based on large visual words into SVM-KNN method. Experiments on the public Oxford dataset, and ACM Multimedia 2013 Yahoo! image classification challenge dataset show that our approach is both effective and efficient.
What problem does this paper attempt to address?