Motionlets: Mid-level 3D Parts for Human Motion Recognition
LiMin Wang,Yu Qiao,Xiaoou Tang
DOI: https://doi.org/10.1109/CVPR.2013.345
2013-01-01
Computer Vision and Pattern Recognition
Abstract:This paper proposes \emph{motion let}, a mid-level and spatiotemporal part, for human motion recognition. Motion let can be seen as a tight cluster in motion and appearance space, corresponding to the moving process of different body parts. We postulate three key properties of motion let for action recognition: high motion saliency, multiple scale representation, and representative-discriminative ability. Towards this goal, we develop a data-driven approach to learn motion lets from training videos. First, we extract 3D regions with high motion saliency. Then we cluster these regions and preserve the centers as candidate templates for motion let. Finally, we examine the representative and discriminative power of the candidates, and introduce a greedy method to select effective candidates. With motion lets, we present a mid-level representation for video, called \emph{motion let activation vector}. We conduct experiments on three datasets, KTH, HMDB51, and UCF50. The results show that the proposed methods significantly outperform state-of-the-art methods.