Learning to Refine Human Pose Estimation

Mihai Fieraru,Anna Khoreva,Leonid Pishchulin,Bernt Schiele

DOI: https://doi.org/10.48550/arXiv.1804.07909

2018-04-21

Abstract:Multi-person pose estimation in images and videos is an important yet challenging task with many applications. Despite the large improvements in human pose estimation enabled by the development of convolutional neural networks, there still exist a lot of difficult cases where even the state-of-the-art models fail to correctly localize all body joints. This motivates the need for an additional refinement step that addresses these challenging cases and can be easily applied on top of any existing method. In this work, we introduce a pose refinement network (PoseRefiner) which takes as input both the image and a given pose estimate and learns to directly predict a refined pose by jointly reasoning about the input-output space. In order for the network to learn to refine incorrect body joint predictions, we employ a novel data augmentation scheme for training, where we model "hard" human pose cases. We evaluate our approach on four popular large-scale pose estimation benchmarks such as MPII Single- and Multi-Person Pose Estimation, PoseTrack Pose Estimation, and PoseTrack Pose Tracking, and report systematic improvement over the state of the art.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the difficult situations in multi - person pose estimation in images and videos. Although the development of convolutional neural networks has greatly improved the performance of human pose estimation, in some challenging cases, even the most advanced models have difficulty accurately locating all body joints. These challenging situations include occlusion between people, close proximity between people with similar appearances, rare body postures, partially visible people, and cluttered backgrounds, etc. Therefore, the paper proposes an additional pose refinement step to solve these problems and can be easily applied on top of any existing methods. Specifically, the authors introduce a pose refinement network (PoseRefiner), which takes an image and a given pose estimate as inputs and directly predicts a more refined pose by jointly reasoning about the input - output space. To enable the network to learn how to correct incorrect body joint predictions, the authors also design a new data augmentation scheme for simulating "difficult" human pose cases.

Learning to Refine Human Pose Estimation

JCN: Joint Constraint-Based Human Pose Refinement Networks

Hybrid Refinement-Correction Heatmaps for Human Pose Estimation.

Learning to Refine 3D Human Pose Sequences.

FaSRnet: a feature and semantics refinement network for human pose estimation

PoseRN: A 2D pose refinement network for bias-free multi-view 3D human pose estimation

RefinePose: Towards More Refined Human Pose Estimation

Lightweight Whole-Body Human Pose Estimation With Two-Stage Refinement Training Strategy

Two-Stage Representation Refinement Based on Convex Combination for 3D Human Poses Estimation

Personalized 3D Human Pose and Shape Refinement

Human Pose Estimation Using Exemplars and Part Based Refinement

Automatic Pose Quality Assessment for Adaptive Human Pose Refinement.

Temporal Keypoint Matching and Refinement Network for Pose Estimation and Tracking

D3PRefiner: A Diffusion-based Denoise Method for 3D Human Pose Refinement

Cross Refinement Techniques for Markerless Human Motion Capture

A 3D Human Motion Refinement Method Based on Sparse Motion Bases Selection.

Diffusion-Based Pose Refinement and Multi-Hypothesis Generation for 3D Human Pose Estimation

Selective Spatio-Temporal Aggregation Based Pose Refinement System: Towards Understanding Human Activities in Real-World Videos

Iterative Pose Refinement for Object Pose Estimation Based on RGBD Data

Iterative Optimisation with an Innovation CNN for Pose Refinement

RePose: Learning Deep Kinematic Priors for Fast Human Pose Estimation