An Approach to Pose-Based Action Recognition

Chunyu Wang,Yizhou Wang,Alan L. Yuille
DOI: https://doi.org/10.1109/CVPR.2013.123
2013-01-01
Computer Vision and Pattern Recognition
Abstract:We address action recognition in videos by modeling the spatial-temporal structures of human poses. We start by improving a state of the art method for estimating human joint locations from videos. More precisely, we obtain the $K$-best estimations output by the existing method and incorporate additional segmentation cues and temporal constraints to select the ``best'' one. Then we group the estimated joints into five body parts (e.g. the left arm) and apply data mining techniques to obtain a representation for the spatial-temporal structures of human actions. This representation captures the spatial configurations of body parts in one frame (by spatial-part-sets) as well as the body part movements(by temporal-part-sets) which are characteristic of human actions. It is interpretable, compact, and also robust to errors on joint estimations. Experimental results first show that our approach is able to localize body joints more accurately than existing methods. Next we show that it outperforms state of the art action recognizers on the UCF sport, the Keck Gesture and the MSR-Action3D datasets.
What problem does this paper attempt to address?