Abstract:Learning human 2D-3D correspondences aims to map all human 2D pixels to a 3D human template, namely human densepose estimation, involving surface patch recognition (i.e., Index-to-Patch (I)) and regression of patch-specific UV coordinates. Despite recent progress, it remains challenging especially under the condition of in the wild, where RGB images capture real-world scenes with backgrounds, occlusions, scale variations, and postural diversity. In this paper, we address three vital problems in this task: 1) how to perceive multi-scale visual information for instances in the wild; 2) how to design learning objectives to address the precise instance representation harassed by multiple instances in one bounding box phenomenon; and 3) how to boost the performance of index-to-patch prediction faced by limited supervision. To tackle problems above, we propose an end-to-end deep Adaptive Multi-path Aggregation network (AMA-net) for Human DensePose Estimation. First, we introduce an adaptive multi-path aggregation algorithm to extract varying-sized instance-level features, which capture multi-scale information of a bounding-box and are then utilized for parsing different instances. Second, we adopt an instance augmentation learning objective to further distinguish the target instance from other interference instances. Third, taking advantage of 2D human parsers that are trained from sufficient annotations, we introduce a task transformer that bridges the gap between 2D human parsing and densepose estimation, thus benefiting the performance of densepose estimator. Experimental results on the challenging DensePose-COCO dataset demonstrate that our approach sets a new record, and it significantly outperforms the state-of-the-art methods. Codes and models are publicly available.

EANet: Towards Lightweight Human Pose Estimation With Effective Aggregation Network

Context-Guided Adaptive Network for Efficient Human Pose Estimation.

X-HRNet: Towards Lightweight Human Pose Estimation with Spatially Unidimensional Self-Attention

AMANet: Adaptive Multi-Path Aggregation for Learning Human 2D-3D Correspondences

Lightweight high-resolution network based on adaptive cross-dimensional weighting for human pose estimation

EANet: Edge-Attention 6D Pose Estimation Network for Texture-Less Objects

Human Pose Estimation Based on Lightweight Multi-Scale Coordinate Attention

Simple and Lightweight Human Pose Estimation

Lightweight and Effective Human Pose Estimation Model Based on Multi-Angle Knowledge Distillation

Optimized S2E Attention Block based Convolutional Network for Human Pose Estimation

LSDNet: lightweight stochastic depth network for human pose estimation

Lenet: A Lightweight and Efficient High-Resolution Network for Human Pose Estimation

Implicit Decouple Network for Efficient Pose Estimation

Human Pose Estimation Based on Efficient and Lightweight High-Resolution Network (EL-HRNet)

EfficientPose: Scalable single-person pose estimation

Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation

An improved lightweight high-resolution network based on multi-dimensional weighting for human pose estimation

LIGHTWEIGHT HUMAN POSE ESTIMATION UNDER RESOURCE-LIMITED SCENES

Attention-Enhanced Lightweight Hourglass Network for Human Pose Estimation

Complementary Feature Pyramid Network for Human Pose Estimation

A Lightweight Context-Aware Feature Transformer Network for Human Pose Estimation