Informedia at TRECVID2014: MED and MER, Semantic Indexing, Surveillance Event Detection

Shoou-I Yu,Lu Jiang,Zhongwen Xu,Zhenzhong Lan,Shicheng Xu,Xiaojun Chang,Xuanching Li,Zexi Mao,Chuang Gan,Yajie Miao,Xingzhong Du,Yang Cai,Lara Martin,Nikolas Wolfe,Anurag Kumar,Huan Li,Ming Lin,Zhigang Ma,Yi Yang,Deyu Meng,Shiguang Shan,Pinar D Sahin,Susanne Burger,Florian Metze,Rita Singh,Bhiksha Raj,Teruko Mitamura,Richard Stern,Alexander Hauptmann,Anil Armagan,Yicheng Zhao
2014-01-01
Abstract:We report on our results in the TRECVID 2011 Multimedia Event Detection (MED) and Semantic Indexing (SIN) tasks. Generally, both of these tasks consist of three main steps: extracting features, training detectors and fusing. In the feature extraction part, we extracted many low-level features, high-level features and text features. We used the Spatial-Pyramid Matching technique to represent the low-level visual local features, such as SIFT and MoSIFT, which describe the location information of feature points. In the detector training part, besides the traditional SVM, we proposed a Sequential Boosting SVM classifier to deal with the large-scale unbalanced classification problem. In the fusion part, to take the advantages from different features, we tried three different fusion methods: early fusion, late fusion and double fusion. Double fusion is a combination of early fusion and late fusion. The experimental results demonstrated that double fusion is consistently better than or at least comparable to early fusion and late fusion.Descriptors: adaptive training, signal processing, machine learning, automated speech recognition, supervised machine learning, artificial neural networks, detectors, detection, algorithms, vocabulary, compression, language, feature extraction, dimensionality reduction, classification, SEMANTIC MODELS, image processing, video images, video signals
What problem does this paper attempt to address?