Intelligent Multimedia Group of Tsinghua University at TRECVID 2006.

Jie Cao,Yanxiang Lan,Jianmin Li,Qiang Li,Xirong Li,Fuzong Lin,Xiaobing Liu,Linjie Luo,Wanli Peng,Dong Wang,Huiyi Wang,Zhikun Wang,Zhen Xiang,Jinhui Yuan,Bo Zhang,Jun Zhang,Leigang Zhang,Xiao Zhang,Wujie Zheng
2006-01-01
Abstract:Our shot boundary detection system of this year is basically the same as that of last year. However, we have made three minor improvements on the system, including the detection of FOIs, flashlight and short gradual transitions. On the data set of last year, the new system achieves better performance than the old one. However, on the data set of 2006, the new system has not performed better as expected. We find that this is mainly due to the inconsistent annotation criteria, a) the inaccurate definition of FOIs, OTHs etc, b) the blurry distinction between CUTs and short gradual transitions, c) the inconsistent annotation of video in video. In high level feature extraction task, rich and hierarchical / multiple granular visual representations are adopted. A bundle of diversified SVM classifiers are trained sequentially for each feature. These classifiers are then combined with a weight and select fusion algorithm. Also, the RankBoost and the StackSVM fusion algorithms are implemented, and different approaches for representing concept context are evaluated in quest of the performance gain. Our submitted runs (runid: A/B_hua) are ranked highest in MAP of all HFE participants except run B_hua_2 which is ranked 8th. At the same time, a top result for 7 out of 20 concepts is obtained. The results indicate that our weight and select fusion algorithm works surprisingly well, better than all variations of the RankBoost and the StackSVM fusion algorithm.
What problem does this paper attempt to address?