Online Cross-Modal Scene Retrieval by Binary Representation and Semantic Graph

Mengshi Qi,Yunhong Wang,Annan Li
DOI: https://doi.org/10.1145/3123266.3123311
2017-01-01
Abstract:In recent years, cross-modal scene retrieval has attracted more attention. However, most existing approaches neglect the semantic relationship between objects in a scene together with the embedded spatial layouts. Moreover, these methods mostly apply the batch learning strategy, which is not suitable for processing streaming data. To address the aforementioned problems, we propose a new framework for online cross-modal scene retrieval based on binary representations and semantic graph. Specially, we adopt the cross-modal hashing based on the quantization loss of different modalities. By introducing the semantic graph, we are able to extract wealthy semantics and measure their correlation across different modalities. Further more, we propose a two-step optimization procedure based on stochastic gradient descent for online update. Experimental results on four datasets show the superiority of our approach over the state-of-the-art.
What problem does this paper attempt to address?