A data augmentation method for human action recognition using dense joint motion images

Leiyue Yao,Wei Yang,Wei Huang
DOI: https://doi.org/10.1016/j.asoc.2020.106713
IF: 8.7
2020-12-01
Applied Soft Computing
Abstract:<p>With the development of deep learning and neural network techniques, human action recognition has made great progress in recent years. However, it remains challenging to analyse temporal information and identify human actions with few training samples. In this paper, an effective motion image called a dense joint motion image (DJMI) was proposed to transform an action to an image. Our method was compared with state-of-the-art methods, and its contributions are mainly reflected in three characteristics. First, in contrast to the current classic joint trajectory map (JTM), every pixel of the DJMI is useful and contains essential spatio-temporal information. Thus, the input parameters of the deep neural network (DNN) are reduced by an order of magnitude, and the efficiency of action recognition is improved. Second, each frame of an action video is encoded as an independent slice of the DJMI, which avoids the information loss caused by action trajectory overlap. Third, by using DJMIs, proven algorithms for graphics and images can be used to generate training samples. Compared with the original image, the generated DJMIs contain new and different spatio-temporal information, which enables DNNs to be trained well on very few samples. Our method was evaluated on three benchmark datasets, namely, Florence-3D, UTKinect-Action3D and MSR Action3D. The results showed that our method achieved a recognition speed of 37 fps with competitive accuracy on these datasets. The time efficiency and few-shot learning capability of our method enable it to be used in real-time surveillance.</p>
computer science, artificial intelligence, interdisciplinary applications
What problem does this paper attempt to address?