Abstract:Automatic video annotation is an important ingredient for semantic-level video browsing, search and navigation. Much attention has been paid to this topic in recent years. These researches have evolved through two paradigms. In the first paradigm, each concept is individually annotated by a pre-trained binary classifier. However, this method ignores the rich information between the video concepts and only achieves limited success. Evolved from the first paradigm, the methods in the second paradigm add an extra step on the top of the first individual classifiers to fuse the multiple detections of the concepts. However due to the unreliable classifiers in the first step, the performance of these methods can be degraded by the errors incurred in the first step. In this paper, another paradigm of the video annotation method is proposed to address the above problems. It simultaneously annotates the concepts as well as model correlations between them in one step by the proposed Correlative Multi-Label (CML) method. Furthermore since the video clips are composed by temporal-ordered frame sequences, we extend the proposed method to exploit the rich temporal information in the videos. Specifically, a temporal-kernel is incorporated into the CML method based on the discriminative information between Hidden Markov Models (HMM) that are learned from the video clips. We compare the performance between the proposed approach and the state-of-the-art approaches in the first and second paradigms on the widely used TRECVID data set. As to be shown, superior performance from the proposed method can be gained.

A Two-View Concept Correlation Based Video Annotation Refinement

Data-specific Concept Correlation Estimation for Video Annotation Refinement

Refining Video Annotation by Exploiting Pairwise Concurrent Relation.

Refining video annotation by exploiting inter-shot context.

Improving Video Concept Detection Using Spatio-Temporal Correlation

Correlative Multi-Label Video Annotation.

Mining Concept Relationship in Temporal Context for Effective Video Annotation.

Semantic Context Based Refinement for News Video Annotation.

Exploiting Semantic And Visual Context For Effective Video Annotation

Correlative multilabel video annotation with temporal kernels

Exploring Inter-Concept Relationship with Context Space for Semantic Video Indexing

Video Semantic Concept Detection Based on Conceptual Correlation and Boosting

Multi-view video coding based on sequence correlation

Temporal-Spatial refinements for video concept fusion

A Novel Semantic Model for Video Concept Detection

Correlative Linear Neighborhood Propagation for Video Annotation

Building a comprehensive ontology to refine video concept detection.

Concept-Enhanced Relation Network for Video Visual Relation Inference

A Unifying Multi-Label Temporal Kernel Machine with Its Application to Video Annotation