Fusion of Skeletal and STIP-Based Features for Action Recognition with RGB-D Devices.

Ting Liu,Mingtao Pei
DOI: https://doi.org/10.1007/978-3-319-21963-9_29
2015-01-01
Abstract:Along with the popularization of the Kinect sensor, the usage of marker-less body pose estimation has been enormously eased and complex human actions can be recognized based on the 3D skeletal information. However, due to errors in tracking and occlusion, the obtained skeletal information can be noisy. In this paper, we compute posture, motion and offset information from skeleton positions to represent the global information of action, and build a novel depth cuboid feature (called HOGHOG) to describe the 3D cuboid around the STIPs (spatiotemporal interest points) to handle cluttered backgrounds and partial occlusions. Then, a fusion scheme is proposed to combine the two complementary features. We test our approach on the public MSRAction3D and MSRDailyActivity3D datasets. Experimental evaluations demonstrate the effectiveness of the proposed method.
What problem does this paper attempt to address?