Deep Learning For Image Retrieval: What Works And What Doesn'T

Huafeng Wang,Yehe Cai,Yanxiang Zhang,Haixia Pan,Weifeng Lv,Hao Han
DOI: https://doi.org/10.1109/ICDMW.2015.121
2015-01-01
Abstract:To build an industrial content-based image retrieval system (CBIRs), it is highly recommended that feature extraction, feature processing and feature indexing need to be fully considered. Although research that bloomed in the past years suggest that the convolutional neural network (CNN) be in a leading position on feature extraction & representation for CBIRs, there are less instructions on the deep analysis of feature related topics, for example the kind of feature representation that has the best performance among the candidates provided by CNN, the extracted features generalization ability, the relationship between the dimensional reduction and the accuracy loss in CBIRs, the best distance measure technique in CBIRs and the benefit of the coding techniques in improving the efficiency of CBIRs, etc. Therefore, several practicing studies were conducted and a thorough analysis was made in this research attempting to answer the above questions. The results in the study on both ImageNet-2012 and an industrial dataset provided by Sogou demonstrate that fc4096a and fc4096b perform the best on the datasets from unseen categories. Several interesting and practicing conclusions are drawn, for instance, fc4096a and fc4096b are found to have a better generalization ability than other features of CNN and could be considered as the first choice for industrial CBIRs. Furthermore, a novel feature binarization approach is presented in this paper for better efficiency of CBIRs. More specifically, the binarization is capable of reducing 31/32 space usage of original data. To sum up, the conclusions seem to provide practical instructions on real industrial CBIRs.
What problem does this paper attempt to address?