Constructing Visual Vocabularies Using Sparse Coding for Action Recognition

Changhong Liu,Yang,Yong Chen
DOI: https://doi.org/10.1109/iciecs.2009.5366461
2009-01-01
Abstract:Much of action recognition research is recently based on a bag of words (BOW) representation by quantizing the extracted 3D interest points from videos. The k-means algorithm is commonly used to construct a visual vocabulary. However, it has two major drawbacks. Firstly, the visual vocabulary is sensitive to the vocabulary size and the initialization. Secondly, k-means is unable to capture the salient properties of the videos and this vocabulary may contain a large amount of information redundancy. In this paper, we propose a novel action recognition approach which constructs a visual vocabulary and represents a video by sparse coding followed by the max pooling. Unlike the k-means algorithm, the sparse coding approach can capture the salient properties of videos owing to its powerful discriminative ability. Experiments are conducted on the KTH action dataset. The results demonstrate that our approach achieves better performance than k-means and outperforms most recently proposed methods.
What problem does this paper attempt to address?