Multi-Level ResNets with Stacked SRUs for Action Recognition.

ZhenXing Zheng,Gaoyun An,Qiuqi Ruan
2017-01-01
Abstract:Inspiring by the fact that the enormous breakthrough convolutional networks consistently make in image classification while most existing works are either low efficiency or hard to optimized, we propose multiple level residual networks with stacked simple recurrent units(R-SRU) model trained end-to-end that ResNets learn spatial information from frame appearances and stacked SRUs learn temporal dynamics from video sequences, both deep in spatial and temporal. We investigate the effect of diverse hyper-parameter settings aiming at recommending researchers the better choice of hyper-parameters for using SRUs. Additionally, we compare low-, mid-, high-level features produced by ResNets and combine multi-level features to pass it through SRUs with various time pooling manners after that, experimentally demonstrating the extent of contribution of each level features to action recognition. Specifically, we are the first to apply SRU to distinguish actions. A series of experiments is carried out on two standard benchmarks: HMDB-51 and UCF-101 dataset. Experimental results illustrate that R-SRU outperforms the majority of methods which only take RGB data as input and obtain competitive performances with the state-of-the-art, achieving 51.31% on HMDB-51 and 81.38% on UCF-101.
What problem does this paper attempt to address?