Informedia E-Lamp@TRECVID 2012: Multimedia Event Detection and Recounting (MED and MER)

Si Yu,Zhongwen Xu,Duo Ding,Waito Sze,Francisco Vicente,Zhenzhong Lan,Yang Cai,Shourabh Rawat,Peter Schulam,Nisarga Markandaiah,Sohail Bahmani,Antonio García-Uceda Juárez,Wei Tong,Yi Yang,Susanne Burger,Florian Metze,Rita Singh,Bhiksha Raj,Richard M. Stern,Teruko Mitamura,Eric Nyberg,Alexander G. Hauptmann
DOI: https://doi.org/10.1184/r1/6473450.v1
2012-01-01
Abstract:We report on our system used in the TRECVID 2012 Multimedia Event Detection (MED) and Multimedia Event Recounting (MER) tasks. For MED, generally, it consists of three main steps: extracting features, training detectors and fusion. In the feature extraction part, we extract many low-level, high-level features and text features. Those features are then represented in three different ways which are spatial bag-of words with standard tiling, spatial bag-of-words with feature and event specific tiling and the Gaussian Mixture Model Super Vector. In the detector training and fusion, two classifiers and three fusion methods are employed. The results from both of the official sources and our internal evaluations show good performance of our system. For our MER system, it takes some of the features and detection results from the MED system from which the recount is then generated.
What problem does this paper attempt to address?