Hierarchical and Spatio-Temporal Sparse Representation for Human Action Recognition.

Yi Tian,Yu Kong,Qiuqi Ruan,Gaoyun An,Yun Fu
DOI: https://doi.org/10.1109/tip.2017.2788196
IF: 10.6
2018-01-01
IEEE Transactions on Image Processing
Abstract:In this paper, we present a novel two-layer video representation for human action recognition employing hierarchical group sparse encoding technique and spatio-temporal structure. In the first layer, a new sparse encoding method named locally consistent group sparse coding (LCGSC) is proposed to make full use of motion and appearance information of local features. LCGSC method not only encodes global layouts of features within the same video-level groups, but also captures local correlations between them, which obtains expressive sparse representations of video sequences. Meanwhile, two kinds of efficient location estimation models, namely an absolute location model and a relative location model, are developed to incorporate spatio-temporal structure into LCGSC representations. In the second layer, action-level group is established, where a hierarchical LCGSC encoding scheme is applied to describe videos at different levels of abstractions. On the one hand, the new layer captures higher order dependency between video sequences; on the other hand, it takes label information into consideration to improve discrimination of videos' representations. The superiorities of our hierarchical framework are demonstrated on several challenging datasets.
What problem does this paper attempt to address?