Action recognition using three dimension convolution and long short term memory

Yu-Cheng Liu,Jian-Jiun Ding,Yao-Jen Chang,Chien-Yao Wang,Jia-Ching Wang
DOI: https://doi.org/10.1109/ICCE-China.2017.7991006
2017-01-01
Abstract:The convolutional neural network (CNN) is more and more popular in computer vision and widely used in acoustic signal processing, image classification, and image segmentation. In this work, an architecture which is a combination of the 3-D convolutional neural network and the long short term memory (LSTM) was proposed for action recognition. It stacks the consecutive video frames, extracts spatial and time features, and trains the input dataset to achieve good recognition performance. Moreover, the LSTM model based on the relations among the frames in different time is adopted to consider the information of past frames. Simulations show that the proposed algorithm outperforms other neural network based methods and has even better performance for action recognition.
What problem does this paper attempt to address?