Multi-modal Mutual Topic Reinforce Modeling for Cross-media Retrieval

Yanfei Wang,Fei Wu,Jun Song,Xi Li,Yueting Zhuang
DOI: https://doi.org/10.1145/2647868.2654901
2014-01-01
Abstract:As an important and challenging problem in the multimedia area, multi-modal data understanding aims to explore the intrinsic semantic information across different modalities in a collaborative manner. To address this problem, a possible solution is to effectively and adaptively capture the common cross-modal semantic information by modeling the inherent correlations between the latent topics from different modalities. Motivated by this task, we propose a supervised multi-modal mutual topic reinforce modeling (M$^3$R) approach, which seeks to build a joint cross-modal probabilistic graphical model for discovering the mutually consistent semantic topics via appropriate interactions between model factors (e.g., categories, latent topics and observed multi-modal data). In principle, M$^3$R is capable of simultaneously accomplishing the following two learning tasks: 1) modality-specific (e.g., image-specific or text-specific ) latent topic learning; and 2) cross-modal mutual topic consistency learning. By investigating the cross-modal topic-related distribution information, M$^3$R encourages to disentangle the semantically consistent cross-modal topics (containing some common semantic information across different modalities). In other words, the semantically co-occurring cross-modal topics are reinforced by M$^3$R through adaptively passing the mutually reinforced messages to each other in the model-learning process. To further enhance the discriminative power of the learned latent topic representations, M$^3$R incorporates the auxiliary information (i.e., categories or labels) into the process of Bayesian modeling, which boosts the modeling capability of capturing the inter-class discriminative information. Experimental results over two benchmark datasets demonstrate the effectiveness of the proposed M$^3$R in cross-modal retrieval.
What problem does this paper attempt to address?