Action Recognition Based on Spatial-Temporal Pyramid Sparse Coding.

Xiaojing Zhang,Hua Zhang,Xiaochun Cao
2012-01-01
Abstract:This paper introduces a novel video presentation term spatial-temporal pyramid sparse coding (STPSC) which characterizes both the spatial and temporal aspects of the video. Specifically, the co-occurrences of visual words are computed with respect to the spatial layout and the sequencing of the features in the video. The representation captures both the spatial arrangement and the temporal relationship of the words.Our representation is motivated by the technology spatial pyramid matching (SPM) which is used to recognize scenes in the image. We extend SPM to video analysis combining with sparse coding. Firstly, dense feature points are extracted and represented by displacement information from a dense optical flow field. Then sparse coding is used to quantize the feature descriptors, and the spatial-temporal pyramid is introduced to represent an action. Finally, we use SVM to classify the videos. Experimental results showed improvements over the state-of-the-art techniques on the public action dataset.
What problem does this paper attempt to address?