Database Images Multi-modal Canonical View Mining Discrete Binary Embedding Visual Preserving Latent Structure Extraction Discrete Optimiation Canonical

Lei Zhu,Zi Huang,Xiaobai Liu,Xiangnan He,Jingkuan Song,Xiaofang Zhou
2018-01-01
Abstract:Mobile landmark search (MLS) recently receives increasing attention for its great practical values. However, it still remains unsolved due to two important challenges. One is high bandwidth consumption of query transmission, and the other is the huge visual variations of query images sent from mobile devices. In this paper, we propose a novel hashing scheme, named as canonical view based discrete multi-modal hashing (CV-DMH), to handle these problems via a novel three-stage learning procedure. First, a submodular function is designed to measure visual representativeness and redundancy of a view set. With it, canonical views, which capture key visual appearances of landmark with limited redundancy, are efficiently discovered with an iterative mining strategy. Second, multi-modal sparse coding is applied to transform visual features from multiple modalities into an intermediate representation. It can robustly and adaptively characterize visual contents of varied landmark images with certain canonical views. Finally, compact binary codes are learned on intermediate representation within a tailored discrete binary embedding model which preserves visual relations of images measured with canonical views and removes the involved noises. In this part, we develop a new augmented Lagrangian multiplier (ALM) based optimization method to directly solve the discrete binary codes. We can not only explicitly deal with the discrete constraint, but also consider the bit-uncorrelated constraint and balance constraint together. The proposed solution can desirably avoid accumulated quantization errors in conventional optimization method which simply adopts two-step “relaxing+rounding” framework. With CV-DMH, robust visual query processing, low-cost of query transmission, and fast search process are simultaneously supported. Experiments on real world landmark datasets demonstrate the superior performance of CVDMH over several state-of-the-art methods.
What problem does this paper attempt to address?