Robust Semantic Video Indexing by Harvesting Web Images.

Yang Yang,Zheng-Jun Zha,Heng Tao Shen,Tat-Seng Chua
DOI: https://doi.org/10.1007/978-3-642-35725-1_7
2013-01-01
Abstract:Semantic video indexing, also known as video annotation, video concept detection in literatures, has attracted significant attentions recently. Due to the scarcity of training videos, most existing approaches can scarcely achieve satisfactory performances. This paper proposes a robust semantic video indexing framework, which exploits user-tagged web images to assist learning robust semantic video indexing classifiers. The following two challenges are well studied: (a) domain difference between images and videos; and (b) noisy web images with incorrect tags. Specifically, we first estimate the probabilities of images being correctly tagged as confidence scores and filter out the images with low confidence scores. We then develop a robust image-to-video indexing approach to learn reliable classifiers from a limited number of training videos together with abundant user-tagged images. A robust loss function weighted by the confidence scores of images is used to further alleviate the influence of noisy samples. An optimal kernel space, in which the domain difference between images and videos is minimal, is automatically discovered by the approach to tackle the domain difference problem. Experiments on NUS-WIDE web image dataset and Kodak consumer video corpus demonstrate the effectiveness of the proposed robust semantic video indexing framework.
What problem does this paper attempt to address?