The Very Deep Multi-stage Two-stream Convolutional Neural Network for Action Recognition

Xiuju Gao,Hanling Zhang
DOI: https://doi.org/10.2991/icmit-16.2016.46
2016-01-01
Abstract:In this paper, we consider the very deep multi-stage two-stream convolutional neural network for action recognition in videos. The challenge of action recognition is to capture the appearance and motion information to describe various actions efficiently and to classify different levels of difficult videos correctly. The proposed new deep architecture we name the very deep two-stream convolutional neural network has preferable model capacity and it enables us to obtain appearance and motion information validly from image frames in videos. Besides, with the proposed multi-stage training strategy, multiple classifiers are jointly optimized to process samples at different difficulty levels. Finally, the Dynamic Random Forests classifier is employed to replace Softmax classifier or SVM, achieving a decent classification result. Our architecture is trained and evaluated on the standard video actions benchmarks of UCF-101, and it is competitive with the state of the arts.
What problem does this paper attempt to address?