Learning Composite Latent Structures for 3D Human Action Representation and Recognition

Ping Wei,Hongbin Sun,Nanning Zheng
DOI: https://doi.org/10.1109/tmm.2019.2897902
IF: 7.3
2019-01-01
IEEE Transactions on Multimedia
Abstract:3D human action representation and recognition are important issues in many multimedia applications. While latent state approaches have been widely used for action modeling, previous works assume the latent states of actions are single attribute. This assumption is inaccurate for representing structures of complex actions. In this paper, we propose that latent states have composite attributes and introduce a novel composite latent structure (CLS) model to represent and recognize 3D human actions with skeleton sequences. A human action is modeled with a hierarchical graph, which represents the action sequence as sequential atomic actions. An atomic action is represented as a composite latent state, which is composed of a latent semantic attribute and a latent geometric attribute. A discriminative EM-like algorithm is proposed to learn the model parameters and the composite latent structures of human actions. Given a 3D skeleton sequence, a composite attribute iterative programming algorithm is proposed to recognize the action and infer the action's latent temporal structure. We evaluate the proposed method on three challenging 3D action datasets—MSR 3D Action Dataset, Multiview 3D Event Dataset, and UTKinect-Action 3D Dataset. Extensive experimental results on these datasets demonstrate the effectiveness and advantage of the proposed method.
What problem does this paper attempt to address?