Abstract:Our objective is to estimate the relevance of an image to a query for image search purposes. We address two limitations of the existing image search engines in this paper. First, there is no straightforward way of bridging the gap between semantic textual queries as well as users' search intents and image visual content. Image search engines therefore primarily rely on static and textual features. Visual features are mainly used to identify potentially useful recurrent patterns or relevant training examples for complementing search by image reranking. Second, image rankers are trained on query-image pairs labeled by human experts, making the annotation intellectually expensive and time-consuming. Furthermore, the labels may be subjective when the queries are ambiguous, resulting in difficulty in predicting the search intention. We demonstrate that the aforementioned two problems can be mitigated by exploring the use of click-through data, which can be viewed as the footprints of user searching behavior, as an effective means of understanding query. The correspondences between an image and a query are determined by whether the image was searched and clicked by users under the query in a commercial image search engine. We therefore hypothesize that the image click counts in response to a query are as their relevance indications. For each new image, our proposed graph-based label propagation algorithm employs neighborhood graph search to find the nearest neighbors on an image similarity graph built up with visual representations from deep neural networks and further aggregates their clicked queries/click counts to get the labels of the new image. We conduct experiments on MSR-Bing Grand Challenge and the results show consistent performance gain over various baselines. In addition, the proposed approach is very efficient, completing annotation of each query-image pair within just 15 milliseconds on a regular PC.

Learning Click-Based Deep Structure-Preserving Embeddings with Visual Attention.

Select & Re-Rank: Effectively and Efficiently Matching Multimodal Data with Dynamically Evolving Attention

Learning to Rank Using User Clicks and Visual Features for Image Retrieval

Seeing the Big Picture: Deep Embedding with Contextual Evidences

Clickage: towards bridging semantic and intent gaps via mining click logs of search engines.

Click-Through-Based Cross-View Learning For Image Search

Learning Deep Local Features with Multiple Dynamic Attentions for Large-Scale Image Retrieval.

Deep Learning for Content-Based Image Retrieval: A Comprehensive Study

Click-through-based Subspace Learning for Image Search

A Search Engine Click Model Based on Deep Neural Network

Learning Deep Structure-Preserving Image-Text Embeddings

Learning Cross Space Mapping Via DNN Using Large Scale Click-Through Logs

From Document to Image: Learning a Scalable Ranking Model for Content Based Image Retrieval

Community-Aware Photo Quality Evaluation by Deeply Encoding Human Perception

Deep Multimodal Distance Metric Learning Using Click Constraints for Image Ranking.

Image search by graph-based label propagation with image representation from DNN.

Graph Relation Embedding Network for Click-Through Rate Prediction

Hierarchical Deep Click Feature Prediction for Fine-Grained Image Recognition.

User-Click-Data-Based Fine-Grained Image Recognition Via Weakly Supervised Metric Learning

Dynamic Visual Semantic Sub-Embeddings and Fast Re-Ranking

Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval