Multi-Modal Image Annotation with Multi-Instance Multi-Label LDA.

Cam-Tu Nguyen,De-Chuan Zhan,Zhi-Hua Zhou
2013-01-01
Abstract:This paper studies the problem of image annotation in a multi-modal setting where both visual and textual information are available. We propose Multimodal Multi-instance Multi-label Latent Dirichlet Allocation (M3LDA), where the model consists of a visual-label part, a textual-label part and a label-topic part. The basic idea is that the topic decided by the visual information and the topic decided by the textual information should be consistent, leading to the correct label assignment. Particularly, M3LDA is able to annotate image regions, thus provides a promising way to understand the relation between input patterns and output semantics. Experiments on Corel5K and ImageCLEF validate the effectiveness of the proposed method.
What problem does this paper attempt to address?