Challenge Huawei challenge: Fusing multimodal features with deep neural networks for Mobile Video Annotation

Jian Tu,Zuxuan Wu,Qi Dai,Yu-Gang Jiang,Xiangyang Xue
DOI: https://doi.org/10.1109/ICMEW.2014.6890609
2014-01-01
Abstract:We participated in the Huawei Accurate and Fast Mobile Video Annotation Challenge (MoVAC) at IEEE ICME 2014. Three result runs were submitted by combining different features and classification techniques, with emphasis on both accuracy and efficiency. In this paper, we briefly summarize the techniques used in our system, and the components used for generating each of the three submitted results. One novel component in our system is a specially tailored deep neural network (DNN) that can explore the relationships of multiple features for improved annotation performance, which is very efficient based on an implementation with the GPU. Only 18.8 seconds were needed by one of our DNN-based submissions to process a test video. By combining the DNN with the traditional SVM learning, we achieved the best accuracy across all the worldwide submissions to this challenge.
What problem does this paper attempt to address?