Kernel-Based Mixture Mapping for Image and Text Association
Youtian Du,Xue Wang,Yunbo Cui,Hang Wang,Chang Su
DOI: https://doi.org/10.1109/tmm.2019.2930336
IF: 7.3
2019-01-01
IEEE Transactions on Multimedia
Abstract:Modeling the relationship between multimodal media, including images, videos, and text, can reduce the gap between the modalities and promote cross-media retrieval, image annotation, etc. In this paper, we propose a new approach called kernel-based mixture mapping (KMM) to model the semantic correlations between web images and text. With this approach, we first construct latent high-dimensional feature spaces based on kernel theory to address the nonlinearity of both the data distributions in the input spaces and the cross-model correlation. Second, we present a probabilistic neighborhood model to describe the spatial locality of semantics by assuming that proximate examples in feature spaces generally have the same semantics and a conditional model to describe cross-modal conditional dependency. Finally, we build a probabilistic mixture model to jointly model the spatial locality of semantics and the conditional dependency between different modalities. By combining nonlinear transformation and probabilistic models, KMM can address the nonlinearity of cross-modal correlation, the complexity of semantic distributions at the global scale, and the continuity of semantic distributions at the local scale. We present a hybrid optimization algorithm to find the solution of KMM based on expectation-maximization and subgradient ascent; this algorithm avoids estimating the parameters of KMM in high-dimensional feature space and is proved to converge to an (local) optimal solution. We demonstrate the performance of KMM using four public datasets. The experimental results show that our approach outperforms the compared methods when modeling the relationships between images and text.