Abstract:Automatic image annotation concerns a process of automatically labeling image contents with a pre-defined set of keywords, which are regarded as descriptors of image high-level semantics, so as to enable semantic image retrieval via keywords. A serious problem in this task is the unsatisfactory annotation performance due to the semantic gap between the visual content and keywords. Targeting at this problem, we present a new approach that tries to incorporate lexical semantics into the image annotation process. In the phase of training, given a training set of images labeled with keywords, a basic visual vocabulary consisting of visual terms, extracted from the image to represent its content, and the associated keywords is generated at first, using K-means clustering combined with semantic constraints obtained from WordNet, then the statistical correlation between visual terms and keywords is modeled by a two-level hierarchical ensemble model composed of probabilistic SVM classifiers and a co-occurrence language model. In the phase of annotation, given an unlabeled image, the most likely associated keywords are predicted by the posterior probability of each keyword given each visual term at the first-level classifier ensemble, then the second-level language model is used to refine the annotation quality by word co-occurrence statistics derived from the annotated keywords in the training set of images. We carried out experiments on a medium-sized image collection from Corel Stock Photo CDs. The experimental results demonstrated that the annotation performance of this method outperforms some traditional annotation methods by about 7% in average precision, showing the feasibility and effectiveness of the proposed approach.

Word Image Representation Based on Sequence to Sequence Model with Attention Mechanism for Out-of-Vocabulary Keyword Spotting.

Learning Task-Specific Representation for Novel Words in Sequence Labeling.

Multi-feature representation for Web-based English-Chinese OOV term translation

A Unified Model for Solving the OOV Problem of Chinese Word Segmentation

Deep learning models for representing out-of-vocabulary words

Exploring Representation Learning for Small-Footprint Keyword Spotting

Key-Word-Aware Network for Referring Expression Image Segmentation

OVMR: Open-Vocabulary Recognition with Multi-Modal References

Query-by-Example Keyword Spotting Using Spectral-Temporal Graph Attentive Pooling and Multi-Task Learning

Exploiting Noisy Web Data by OOV Ranking for Low-Resource Keyword Search.

A unified framework for image retrieval using keyword and visual features

Incorporate Web Search Technology to Solve Out-of-Vocabulary Words in Chinese Word Segmentation.

An lstm-ctc based verification system for proxy-word based oov keyword search

Exploiting visual word co-occurrence for image retrieval.

Addressing the Out-of-vocabulary Problem for Large-Scale Chinese Spoken Term Detection

Automatic Image Annotation Based on Wordnet and Hierarchical Ensembles

A Multi-oriented Chinese Keyword Spotter Guided by Text Line Detection

An Approach of Keyword Spotting Based on HMM

Estimator Vectors: OOV Word Embeddings based on Subword and Context Clue Estimates

Vision-Language Adaptive Mutual Decoder for OOV-STR