Abstract:Our objective is to estimate the relevance of an image to a query for image search purposes. We address two limitations of the existing image search engines in this paper. First, there is no straightforward way of bridging the gap between semantic textual queries as well as users' search intents and image visual content. Image search engines therefore primarily rely on static and textual features. Visual features are mainly used to identify potentially useful recurrent patterns or relevant training examples for complementing search by image reranking. Second, image rankers are trained on query-image pairs labeled by human experts, making the annotation intellectually expensive and time-consuming. Furthermore, the labels may be subjective when the queries are ambiguous, resulting in difficulty in predicting the search intention. We demonstrate that the aforementioned two problems can be mitigated by exploring the use of click-through data, which can be viewed as the footprints of user searching behavior, as an effective means of understanding query. The correspondences between an image and a query are determined by whether the image was searched and clicked by users under the query in a commercial image search engine. We therefore hypothesize that the image click counts in response to a query are as their relevance indications. For each new image, our proposed graph-based label propagation algorithm employs neighborhood graph search to find the nearest neighbors on an image similarity graph built up with visual representations from deep neural networks and further aggregates their clicked queries/click counts to get the labels of the new image. We conduct experiments on MSR-Bing Grand Challenge and the results show consistent performance gain over various baselines. In addition, the proposed approach is very efficient, completing annotation of each query-image pair within just 15 milliseconds on a regular PC.

Click-through-Based Word Embedding for Large Scale Image Retrieval

Click-Through-Based Cross-View Learning For Image Search

Clickthrough Refinement for Improved Graph Ranking

Click-through-based Subspace Learning for Image Search

Learning Cross Space Mapping Via DNN Using Large Scale Click-Through Logs

Learning Click-Based Deep Structure-Preserving Embeddings with Visual Attention.

Bag-Of-Words Based Deep Neural Network For Image Retrieval

Click prediction for web image reranking using multimodal sparse coding.

Cross-Media Similarity Evaluation for Web Image Retrieval in the Wild

Coupled Binary Embedding for Large-Scale Image Retrieval

Clickage: towards bridging semantic and intent gaps via mining click logs of search engines.

Learning of Multimodal Representations With Random Walks on the Click Graph

Image search by graph-based label propagation with image representation from DNN.

Seeing the Big Picture: Deep Embedding with Contextual Evidences

Learning to Rank Using User Clicks and Visual Features for Image Retrieval

Hierarchical Deep Click Feature Prediction for Fine-Grained Image Recognition.

Multi-Task Deep Visual-Semantic Embedding For Video Thumbnail Selection

Dynamic Visual Semantic Sub-Embeddings and Fast Re-Ranking

Res-embedding for Deep Learning Based Click-Through Rate Prediction Modeling

Pre-Trained Multi-View Word Embedding Using Two-Side Neural Network

Image Search Reranking with Query-Dependent Click-Based Relevance Feedback.