Abstract:Multi-person pose estimation generally follows top-down and bottom-up paradigms. The top-down paradigm detects all human boxes and then performs single-person pose estimation on each ROI. The bottom-up paradigm locates identity-free keypoints and then groups them into individuals. Both of them use an extra stage to build the relationship between human instance and corresponding keypoints (e.g., human detection in a top-down manner or a grouping process in a bottom-up manner). The extra stage leads to a high computation cost and a redundant two-stage pipeline. To address the above issue, we introduce a fine-grained body representation method. Concretely, the human body is divided into several local parts and each part is represented by an adaptive point. The novel body representation is able to sufficiently encode the diverse pose information and effectively model the relationship between human instance and corresponding keypoints in a single-forward pass. With the proposed body representation, we further introduce a compact single-stage multi-person pose regression network, called AdaptivePose++, which is the extended version of AAAI-22 paper AdaptivePose. During inference, our proposed network only needs a single-step decode operation to estimate the multi-person pose without complex post-processes and refinements. Without any bells and whistles, we achieve the most competitive performance on representative 2D pose estimation benchmarks MS COCO and CrowdPose in terms of accuracy and speed. In particular, AdaptivePose++ outperforms the state-of-the-art SWAHR-W48 and CenterGroup-W48 by 3.2 AP and 1.4 AP on COCO mini-val with faster inference speed. Furthermore, the outstanding performance on 3D pose estimation datasets MuCo-3DHP and MuPoTS-3D further demonstrates its effectiveness and generalizability on 3D scenes.

Instance-Level Data Augmentation for Multi-Person Pose Estimation: Improving Recognition of Individuals at Different Scales

Multi-Scale Structure-Aware Network for Human Pose Estimation

Overcoming Data Deficiency for Multi-Person Pose Estimation

PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation

PoseTrans: A Simple Yet Effective Pose Transformation Augmentation for Human Pose Estimation

AID: Pushing the Performance Boundary of Human Pose Estimation with Information Dropping Augmentation

A Compact and Powerful Single-Stage Network for Multi-Person Pose Estimation

Multi-person pose estimation using atrous convolution

How to Train Your Robust Human Pose Estimator: Pay Attention to the Constraint Cue.

Improving Multiperson Pose Estimation by Mask-aware Deep Reinforcement Learning

Human Pose Estimation Based on Lightweight Multi-Scale Coordinate Attention

A Deconvolutional Bottom-up Deep Network for Multi-Person Pose Estimation.

Adversarial Semantic Data Augmentation for Human Pose Estimation

Rethinking on Multi-Stage Networks for Human Pose Estimation

A Lightweight Top-Down Multi-Person Pose Estimation Method Based on Symmetric Transformation and Global Matching

Multi-Person Pose Estimation with Accurate Heatmap Regression and Greedy Association

Magnify-Net for Multi-Person 2D Pose Estimation.

PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model

Scale-aware heatmap representation for human pose estimation

Rethinking pose estimation in crowds: overcoming the detection information-bottleneck and ambiguity

AMANet: Adaptive Multi-Path Aggregation for Learning Human 2D-3D Correspondences