Multimedia Event Detection and Recounting
Zhen-Zhong Lan,Lu Jiang,Shoou-I Yu,Chenqiang Gao,Shourabh Rawat,Yang Cai,Shicheng Xu,Haoquan Shen,Xuanchong Li,Yipei Wang,Waito Sze,Yan Yan,Zhigang Ma,Nicolas Ballas,Deyu Meng,Wei Tong,Yi Yang,Susanne Burger,Florian Metze,Rita Singh,Bhiksha Raj,Richard Stern,Teruko Mitamura,Eric Nyberg,Alexander Hauptmann,Alex Hauptmann
2013-01-01
Abstract:We report on our system used in the TRECVID 2013 Multimedia Event Detection (MED) and Multimedia Event Recounting (MER) tasks. For MED, it consists of four main steps: extracting features, representing features, training detectors and fusion. In the feature extraction part, we extract more than 10 low-level, high-level, and text features. Those features are then represented in three different ways, which are spatial bag-of-words, Gaussian Mixture Model Super Vectors (GMM) and Fisher Vectors. In the detector training and fusion, two classifiers and weighted double fusion method are employed. The official evaluation results show that our MED full systems achieve the best scores on Ah-Hoc EK10 and EK0, our audio systems achieve the best scores in EK100 and EK10 for both Pre-specified and Ad-Hoc tasks. In this report, we will analyze the contribution of each component for MED and draw some insights for video analysis. Our MER system utilizes a subset of features and detection results from the MED system from which the recounting is generated.