Towards Improving Canonical Correlation Analysis for Cross-modal Retrieval.

Jie Shao,Zhicheng Zhao,Fei Su,Ting Yue
DOI: https://doi.org/10.1145/3126686.3126726
2017-01-01
Abstract:Building correlations for cross-modal retrieval, i.e., image-to-text retrieval and text-to-image retrieval, is a feasible solution to bridge the semantic gap between different modalities. Canonical correlation analysis (CCA) based methods have ever achieved great successes. However, conventional 2-view CCA suffers from three inherent problems: 1) it fails to capture the intra-modal semantic consistency, which is a necessary element to improve the retrieval performance, 2) it is hard to learn the non-linear correlation between different modalities, and 3) there exists problem in similarity measure due to the fact that the latent space learned by CCA is not directly optimized with certain distance measure. To address above problem, in this paper, we propose an improved CCA algorithm (ICCA) from three aspects. First, we propose two effective semantic features based on text features to improve intra-modal semantic consistency. Second, we expand traditional CCA from 2-view to 4-view, and embed 4-view CCA into a progressive framework to alleviate the over-fitting. Our progressive framework combines the training of linear projection and nonlinear hidden layers to ensure that good representations of the input raw data are learned at the output of the network. Third, inspired by large scale similarity learning (LSSL), a similarity metric is proposed to improve the distance measure. Experiments on three publicly data sets demonstrate the effectiveness of the proposed ICCA method.
What problem does this paper attempt to address?