Shanghai Jiao Tong University Participation in High-Level Feature Extraction and Surveillance Event Detection at TRECVID 2009

Xiaokang Yang,Yi Xu,Rui Zhang,Erkang Chen,Qing Yan,Bo Xiao,Zhou Yu,Ning Li,Zuo Huang,Cong Zhang,Xiaolin Chen,Anwen Liu,Zhenfei Chu,Kai Guo,Jun Huang
2009-01-01
Abstract:In this paper, we describe our participation for high-level feature extraction, automatic search and surveillance event detection at TRECVID 2009 evaluation. In high-level feature extraction, we establish a common feature set for all the predefined concepts, including global features and local features extracted from the keyframes. For the concepts related to person activity, space--time interest points are also used. Detection of ROI and Faces is needed for some special concepts, such as playing instrument, female face close-up. Classifiers are trained using these features and linear weighted fusion of the classification results are utilized as the baseline. Specifically, simple average fusion can work pretty well. Further, ASR and IB re-ranking are used to improve the overall performance. We submitted the following six runs: z A_SJTU_ICIP_Lab317_1: Average fusion of classification results with global features and local features used, SVM classifiers are trained on TRECVID2009 development data z A_SJTU_ICIP_Lab317_2: Linear weighted fusion of classification results with global and local features used, SVM classifiers are trained on TRECVID2009 development data z A_SJTU_ICIP_Lab317_3: Max of RUN1 and RUN2, and re-rank on ASR z A_SJTU_ICIP_Lab317_4: Max of RUN1 and RUN2, and re-rank on IB re-ranking z A_SJTU_ICIP_Lab317_5: Based on the result of RUN3, combine ASR and IB re-ranking z A_SJTU_ICIP_Lab317_6: Max of all runs In Event detection, trajectory features obtained from human tracking and optical flow computation, local appearance and shape features are employed in event model training. With regard to particular event detection tasks, several detection rules are tested using HMM models, boosted classifiers, matching and heuristic settings. We provide the detection results of eight event tasks out of 10 required events for performance evaluation. z SJTU_2009_retroED_EVAL09_ENG_s-camera_p-baseline_1: Event detection based on human tracking, motion detection and gesture recognition
What problem does this paper attempt to address?