Fusing Shape and Motion Matrices for View Invariant Action Recognition Using 3D Skeletons

Mengyuan Liu,Qinqin He,Hong Liu
DOI: https://doi.org/10.1109/icip.2017.8296967
2017-01-01
Abstract:Action recognition under arbitrary views remains a challenge, since view variations bring severe motion and appearance changes which increase the ambiguities among same types of actions. To solve this problem, we propose a new method to effectively capture view invariant, shape and motion cues. This method contains three main stages. first, we compute distances among pairwise skeleton joints to form a distance matrix for each skeleton. Second, shape matrices (SMs) and motion matrices (MMs) are formulated to describe shape and motion cues between pairwise distance matrices, respectively. Third, Fisher Vector and Linear Discriminant Analysis (L-DA) are adopted to encode SMs and MMs as low dimension and high discriminative representations, which are further fused to generate final action representation. Experimental results on benchmark UTKinect-Action dataset show that our method achieves better results than methods designed for view invariant action recognition task. Additionally, we collect a SmartHome dataset, on which the robustness of our method to noisy skeleton data is verified,
What problem does this paper attempt to address?