Abstract:The vast amount of images available on the Web request for an effective and efficient search service to help users find relevant images. The prevalent way is to provide a keyword interface for users to submit queries. However, the amount of images without any tags or annotations are beyond the reach of manual efforts. To overcome this, automatic image annotation techniques emerge, which are generally a process of selecting a suitable set of tags for a given image without user intervention. However, there are three main challenges with respect to Web-scale image annotation: scalability, noise-resistance and diversity. Scalability has a twofold meaning: first an automatic image annotation system should be scalable with respect to billions of images on the Web; second it should be able to automatically identify several relevant tags among a huge tag set for a given image within seconds or even faster. Noise-resistance means that the system should be robust enough against typos and ambiguous terms used in tags. Diversity represents that image content may include both scenes and objects, which are further described by multiple different image features constituting different facets in annotation. In this paper, we propose a unified framework to tackle the above three challenges for automatic Web image annotation. It mainly involves two components: tag candidate retrieval and multi-facet annotation. In the former content-based indexing and concept-based codebook are leveraged to solve scalability and noise-resistance issues. In the latter the joint feature map has been designed to describe different facets of tags in annotations and the relations between these facets. Tag graph is adopted to represent tags in the entire annotation and the structured learning technique is employed to construct a learning model on top of the tag graph based on the generated joint feature map. Millions of images from Flickr are used in our evaluation. Experimental results show that we have achieved 33% performance improvements compared with those single facet approaches in terms of three metrics: precision, recall and F1 score.

Web Image Annotation Based On Automatically Obtained Noisy Training Set

FGCM: Noisy Label Learning via Fine-Grained Confidence Modeling

Real-Time Image Annotation By Manifold-Based Biased Fisher Discriminant Analysis

Face Annotation Using Transductive Kernel Fisher Discriminant

Retrieval-Based Face Annotation by Weak Label Regularized Local Coordinate Coding

Automatic Image Annotation Based on Wordnet and Hierarchical Ensembles

Mining Weakly Labeled Web Facial Images for Search-Based Face Annotation

Data-driven Meta-set Based Fine-Grained Visual Classification

Effective and Efficient Multi-Facet Web Image Annotation

Noise-Aware Fully Webly Supervised Object Detection.

Guided by Meta-Set: A Data-Driven Method for Fine-Grained Visual Recognition

Automatic image annotation via local multi-label classification

Exploiting Web Images for Fine-Grained Visual Recognition by Eliminating Open-Set Noise and Utilizing Hard Examples

Robust Web Image Annotation Via Exploring Multi-Facet and Structural Knowledge

Web Image Semi-supervised Learning Method Based on Heterogeneous Information Fusion

Learning Image Labels On-the-fly for Training Robust Classification Models

An accurate detection is not all you need to combat label noise in web-noisy datasets

Efficient Tag Mining Via Mixture Modeling for Real-Time Search-Based Image Annotation.

Data Fusing and Joint Training for Learning with Noisy Labels

A Search-Based Web Image Annotation Method

Multi-label image annotation based on multi-model