Click-through-Based Word Embedding for Large Scale Image Retrieval

Yun Chen,Victor O. K. Li
DOI: https://doi.org/10.1109/bigmm.2016.40
2016-01-01
Abstract:Similarity learning between textual query and visual images is a fundamental problem in large scale image retrieval. Traditional methods primarily rely on the surrounding texts of images for image search. However, as the volume of images on the web grows to new levels, it is likely that the surrounding textual information is noisy or even unavailable. Thus, determining how to bridge the semantic gap between textual queries and visual images remains an open problem. Inspired by the success of neural network language models and the usage of click-through data for search engines, we attempt to solve the image retrieval problem by learning the word embedding for the query words in the image feature space through modeling the probability distribution over images, conditioned on a given query. This model is called click-through-based word embedding (CWE), and it renders a direct comparison between textual query and visual image feasible. We conducted several experiments on the Microsoft Clickture dataset and found that CWE outperformed others in terms of both scalability and accuracy.
What problem does this paper attempt to address?