Towards Semantic Embedding In Visual Vocabulary

Rongrong Ji,Hongxun Yao,Xiaoshuai Sun,Bineng Zhong,Wen Gao
DOI: https://doi.org/10.1109/CVPR.2010.5540118
2010-01-01
Abstract:Visual vocabulary serves as a fundamental component in many computer vision tasks, such as object recognition, visual search, and scene modeling. While state-of-the-art approaches build visual vocabulary based solely on visual statistics of local image patches, the correlative image labels are left unexploited in generating visual words. In this work, we present a semantic embedding framework to integrate semantic information from Flickr labels for supervised vocabulary construction. Our main contribution is a Hidden Markov Random Field modeling to supervise feature space quantization, with specialized considerations to label correlations: Local visual features are modeled as an Observed Field, which follows visual metrics to partition feature space. Semantic labels are modeled as a Hidden Field, which imposes generative supervision to the Observed Field with WordNet-based correlation constraints as Gibbs distribution. By simplifying the Markov property in the Hidden Field, both unsupervised and supervised (label independent) vocabularies can be derived from our framework. We validate our performances in two challenging computer vision tasks with comparisons to state-of-the-arts: (1) Large-scale image search on a Flickr 60,000 database; (2) Object recognition on the PASCAL VOC database.
What problem does this paper attempt to address?