Abstract:Summary TRECVID Multimedia Event Detection offers an interesting but very challenging task in detecting high- level complex events (Figure 1) in user-generated videos. In this paper, we will present an overview and comparative analysis of our results, which achieved top performance among all 45 submissions in TRECVID 2010. Our aim is to answer the following questions. What kind of feature is more effective for multimedia event detection? Are features from different feature modalities (e.g., audio and visual) complementary for event detection? Can we benefit from generic concept detection of background scenes, human actions, and audio concepts? Are sequence matching and event-specific object detectors critical? Our findings indicate that spatial-temporal feature is very effective for event detection, and it's also very complementary to other features such as static SIFT and audio features. As a result, our baseline run combining these three features already achieves very impressive results, with a mean minimal normalized cost (MNC) of 0.586. Incorporating the generic concept detectors using a graph diffusion algorithm provides marginal gains (mean MNC 0.579). Sequence matching with Earth Mover's Distance (EMD) further improves the results (mean MNC 0.565). The event-specific detector (batter), however, didn't prove useful from our current re-ranking tests. We conclude that it is important to combine strong complementary features from multiple modalities for multimedia event detection, and cross-frame matching is helpful in coping with temporal order variation. Leveraging contextual concept detectors and foreground activities remains a very attractive direction requiring further research.

Resource Constrained Multimedia Event Detection

Columbia-UCF TRECVID2010 Multimedia Event Detection: Combining Multiple Modalities, Contextual Concepts, and Temporal Matching.

Multimedia Event Detection and Recounting

Bi-Level Semantic Representation Analysis for Multimedia Event Detection

Multimodal feature fusion for robust event detection in web videos

Multimedia Event Detection Using A Classifier-Specific Intermediate Representation

Searching Persuasively: Joint Event Detection And Evidence Recounting With Limited Supervision

Complex Event Detection by Identifying Reliable Shots from Untrimmed Videos

Knowledge Adaptation for Ad Hoc Multimedia Event Detection with Few Exemplars.

Multimodal Sparse Coding for Event Detection

Informedia@TRECVID 2013.

Informedia E-Lamp@TRECVID 2012: Multimedia Event Detection and Recounting (MED and MER)

E-LAMP: Integration of Innovative Ideas for Multimedia Event Detection.

IBM Research and Columbia University TRECVID-2012 Multimedia Event Detection (MED), Multimedia Event Recounting (MER), and Semantic Indexing (SIN) Systems.

IBM Research and Columbia University TRECVID-2011 Multimedia Event Detection (MED) System

Informedia E-Lamp @ TRECVID 2013: Multimedia Event Detection and Recounting (MED and MER)

Informedia at TRECVID2014: MED and MER, Semantic Indexing, Surveillance Event Detection

Informedia@ trecvid 2014 med and mer

Knowledge Adaptation with PartiallyShared Features for Event DetectionUsing Few Exemplars

Video Event Detection Using Motion Relativity and Feature Selection

Detection Bank: An Object Detection Based Video Representation for Multimedia Event Recognition