Enhancing Video Event Recognition Using Automatically Constructed Semantic-Visual Knowledge Base.
Xishan Zhang,Yang,Yongdong Zhang,Huanbo Luan,Jintao Li,Hanwang Zhang,Tat-Seng Chua
DOI: https://doi.org/10.1109/tmm.2015.2449660
IF: 7.3
2015-01-01
IEEE Transactions on Multimedia
Abstract:The task of recognizing events from video has attracted a lot of attention in recent years. However, due to the complex nature of user-defined events, the use of purely audio-visual content analysis without domain knowledge has been found to be grossly inadequate. In this paper, we propose to construct a semantic-visual knowledge base to encode the rich event-centric concepts and their relationships from the well-established lexical databases, including FrameNet, as well as the concept-specific visual knowledge from ImageNet. Based on this semantic-visual knowledge bases, we design an effective system for video event recognition. Specifically, in order to narrow the semantic gap between the high-level complex events and low-level visual representations, we utilize the event-centric semantic concepts encoded in the knowledge base as the intermediate-level event representation, which offers both human-perceivable and machine-interpretable semantic clues for event recognition. In addition, in order to leverage the abundant ImageNet images, we propose a robust transfer learning model to learn the noise-resistant concept classifiers for videos. Extensive experiments on various real-world video datasets demonstrate the superiority of our proposed system as compared to the state-of-the-art approaches.