Image Tagging Via Cross-Modal Semantic Mapping

Zhi-Hong Deng,Hongliang Yu,Yunlun Yang
DOI: https://doi.org/10.1145/2733373.2806302
2015-01-01
Abstract:Images without annotations are ubiquitous on the Internet, and recommending tags for them has become a challenging open task in image understanding. A common bottleneck of related work is the semantic gap between the image and text representations. In this paper, we bridge the gap by introducing a semantic layer, the space of word embeddings that represents the image tags as the word vectors. Our model first learns the optimal mapping from the visual space to the semantic space using training sources. Then we annotate test images by decoding the semantic representations of the visual features. Extensive experiments demonstrate that our model outperforms the state-of-the-art approaches in predicting the image tags.
What problem does this paper attempt to address?