Learning Compact Hash Codes for Multimodal Representations Using Orthogonal Deep Structure.

Daixin Wang,Peng Cui,Mingdong Ou,Wenwu Zhu
DOI: https://doi.org/10.1109/tmm.2015.2455415
IF: 7.3
2015-01-01
IEEE Transactions on Multimedia
Abstract:As large-scale multimodal data are ubiquitous in many real-world applications, learning multimodal representations for efficient retrieval is a fundamental problem. Most existing methods adopt shallow structures to perform multimodal representation learning. Due to a limitation of learning ability of shallow structures, they fail to capture the correlation of multiple modalities. Recently, multimodal deep learning was proposed and had proven its superiority in representing multimodal data due to its high nonlinearity. However, in order to learn compact and accurate representations, how to reduce the redundant information lying in the multimodal representations and incorporate different complexities of different modalities in the deep models is still an open problem. In order to address the aforementioned problem, in this paper we propose a hashing-based orthogonal deep model to learn accurate and compact multimodal representations. The method can better capture the intra-modality and inter-modality correlations to learn accurate representations. Meanwhile, in order to make the representations compact, the hashing-based model can generate compact hash codes and the proposed orthogonal structure can reduce the redundant information lying in the codes by imposing orthogonal regularizer on the weighting matrices. We also theoretically prove that, in this case, the learned codes are guaranteed to be approximately orthogonal. Moreover, considering the different characteristics of different modalities, effective representations can be attained with different number of layers for different modalities. Comprehensive experiments on three real-world datasets demonstrate a substantial gain of our method on retrieval tasks compared with existing algorithms.
What problem does this paper attempt to address?