A Joint Evaluation of Dictionary Learning and Feature Encoding for Action Recognition.

Xiaojiang Peng,Limin Wang,Yu Qiao,Qiang Peng
DOI: https://doi.org/10.1109/icpr.2014.450
IF: 8
2014-01-01
Pattern Recognition
Abstract:Many mid-level representations have been developed to replace traditional bag-of-words model (VQ+k-means) such as sparse coding, OMP-k with k-SVD, and fisher vector with GMM in image domain. These approaches can be split into a dictionary learning phase and a feature encoding phase which are often closely related. In this paper, we jointly evaluate the effect of these two phases for video-based action recognition. Specially, we compare several dictionary learning methods and feature encoding schemes through extensive experiments on the KTH and HMDB51 datasets. Experimental results indicate that fisher vector performs consistently better than the other encoding methods, and sparse coding is robust to different dictionaries even random weights. In addition, we observe that the advantages of sophisticated mid-level representations do not come from their specific dictionaries but the encoding mechanisms, and we can just use randomly selected exemplars as dictionaries for most of encoding methods. Finally, we achieve the state-of-the-art results on the HMDB51 and UCF101 by combining our configurations with improved dense trajectory features.
What problem does this paper attempt to address?