A very deep sequences learning approach for human action recognition

Lin Zhihui,Yuan Chun
DOI: https://doi.org/10.1007/978-3-319-27674-8_23
2016-01-01
Abstract:Human action recognition is a popular study in computer vision. The most difficult challenge is capturing the movement features of image sequences or videos. In recent years, deep convolutional networks have achieved great success in many image classification and recognition tasks. But in videos interpretation tasks, the deep-learning has not done well. There were [18, 19] earlier models which were built on convolutional networks for human action recognition tasks. We propose an approach based on CNNs and RNN-like models which have abilities to extract spatial and temporal features both, a CNN model can get static scores, a LSTM or GRU layer which gets dynamic class scores of human action. In another side, compared to a two-stream ConvNet [18, 24], we do not need an optical-flow CNN stream that saves us considerable time, RNN-like models just need a few hours to get convergence. And, we have achieved a quite remarkable performance. © Springer International Publishing Switzerland 2016.
What problem does this paper attempt to address?