Pose for Action - Action for Pose

Umar Iqbal,Martin Garbade,Juergen Gall
DOI: https://doi.org/10.48550/arXiv.1603.04037
2017-02-10
Abstract:In this work we propose to utilize information about human actions to improve pose estimation in monocular videos. To this end, we present a pictorial structure model that exploits high-level information about activities to incorporate higher-order part dependencies by modeling action specific appearance models and pose priors. However, instead of using an additional expensive action recognition framework, the action priors are efficiently estimated by our pose estimation framework. This is achieved by starting with a uniform action prior and updating the action prior during pose estimation. We also show that learning the right amount of appearance sharing among action classes improves the pose estimation. We demonstrate the effectiveness of the proposed method on two challenging datasets for pose estimation and action recognition with over 80,000 test images.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to use human motion information to improve the accuracy of pose estimation in monocular videos. Specifically, the author proposes a method based on a graph - structured model. This model incorporates high - level motion information by modeling the appearance model and pose prior of specific motions to improve the estimation of human poses. Different from methods that require an additional and expensive motion - recognition framework, the framework proposed in this paper can efficiently estimate motion priors and continuously update these priors during the pose - estimation process, thereby achieving more accurate pose estimation. In addition, the paper also explores the impact of learning appropriate appearance sharing among motion categories on improving pose - estimation performance. ### Main contributions of the paper 1. **Proposing a graph - structured model under motion conditions**: This model can use motion information to improve pose estimation without increasing additional computational overhead. 2. **Efficient motion - prior estimation**: By starting from a uniform distribution and updating motion priors during the pose - estimation process, it avoids the high - computational - cost drawback in traditional methods. 3. **Appearance - sharing learning of motion categories**: By learning the appearance sharing among different motion categories, the accuracy of pose estimation is further improved. 4. **Experimental verification**: Experiments were carried out on two challenging datasets (J - HMDB and Penn - Action), demonstrating the effectiveness of the method. ### Method overview - **Graph - structured model**: By introducing motion priors, this model enables better capture of the relationship between human poses and motions. - **Convolutional channel features**: Features extracted by convolutional networks are used to train regression forests, replacing traditional color, HOG, and skin - color features, which significantly improves the accuracy of pose estimation. - **Binary potential function under motion conditions**: The deformation cost between joints is modeled by a conditional Gaussian mixture model, and these models depend on motion priors. - **Motion classification**: A bag - of - words - based method is used for motion recognition to provide feedback for pose estimation. ### Experimental results - **Pose - estimation performance**: On the J - HMDB and Penn - Action datasets, using convolutional channel features (CCF) significantly improves the accuracy of pose estimation, and the APK (Average Precision of Keypoints) metric shows a significant improvement compared to the baseline method. - **Motion - recognition performance**: Based on pose estimation, the accuracy of motion recognition is also improved. In conclusion, this paper effectively improves the performance of human - pose estimation in monocular videos by introducing motion information and an efficient model - update mechanism.