Abstract:Multi-person pose estimation (MPPE), which aims to locate the key points for all persons in the frames, is an active research branch of computer vision. Variable human poses and complex scenes make MPPE dependent on local details and global structures; their absence may cause key point feature misalignment. In this case, high-order spatial interactions that can effectively link the local and global information of features are particularly important. However, most methods do not include spatial interactions. A few methods have low-order spatial interactions, but achieving a good balance between accuracy and complexity is challenging. To address the above problems, a dual-residual spatial interaction network (DRSI-Net) for MPPE with high accuracy and low complexity is proposed herein. Compared to other methods, DRSI-Net recursively performs residual spatial information interactions on the neighbouring features so that more useful spatial information can be retained and more similarities can be obtained between shallow and deep extracted features. The channel and spatial dual attention mechanism introduced in the multi-scale feature fusion also helps the network to adaptively focus on features relevant to the target key points and further refine the generated poses. Simultaneously, by optimising the interactive channel dimensions and dividing the gradient flow, the spatial interaction module is designed to be lightweight, thus reducing the complexity of the network. According to the experimental results on the COCO dataset, the proposed DRSI-Net outperforms other state-of-the-art methods in accuracy and complexity.

Efficient High-Resolution High-Level-Semantic Representation Learning for Human Pose Estimation

Context-Guided Adaptive Network for Efficient Human Pose Estimation.

Adaptively Fusing Complete Multi-resolution Features for Human Pose Estimation.

Deep High-Resolution Representation Learning For Human Pose Estimation

Densely Connected Attentional Pyramid Residual Network for Human Pose Estimation.

HRPVT: High-Resolution Pyramid Vision Transformer for medium and small-scale human pose estimation

Implicit Decouple Network for Efficient Pose Estimation

SPCNet:Spatial Preserve and Content-aware Network for Human Pose Estimation

Adept: Annotation-denoising Auxiliary Tasks with Discrete Cosine Transform Map and Keypoint for Human-Centric Pretraining

SD-Pose: facilitating space-decoupled human pose estimation via adaptive pose perception guidance

Pyramid Knowledge Distillation for Efficient Human Pose Estimation.

High-Resolution Aerial Imagery Semantic Labeling With Dense Pyramid Network

Learning high resolution reservation for human pose estimation

EfficientPose: Scalable single-person pose estimation

SHaRPose: Sparse High-Resolution Representation for Human Pose Estimation

Learning Delicate Local Representations for Multi-person Pose Estimation

An Effective 3D Human Pose Estimation Method Based on Dilated Convolutions for Videos.

An improved lightweight high-resolution network based on multi-dimensional weighting for human pose estimation

High-Resolution Representations for Labeling Pixels and Regions.

DRSI-Net: Dual-Residual Spatial Interaction Network for Multi-Person Pose Estimation