Abstract:Human pose estimation plays a critical role in human-centred vision applications. Its influence extends to various aspects of daily life, from healthcare diagnostics and sports training to augmented reality experiences and gesture-controlled interfaces. While current approaches have achieved impressive accuracy, their high model complexity and slow detection speeds significantly limit their deployment on edge devices with limited computing power, such as mobile phones and IoT devices. In this paper, we introduce a novel lightweight network for 2D human pose estimation, called lightweight stochastic depth network (LSDNet). Our approach is based on the observation that the majority of HRNet's parameters are located in the middle and later stages in the network. We reduce some unnecessary branches to significantly reduce these parameters. This is achieved by leveraging the Bernoulli distribution to randomly remove these redundant branches, which improves the network's efficiency while also increasing its robustness. To further reduce the network's parameter count, we introduce two lightweight blocks with simple yet effective architectures. These blocks achieve significant parameter reduction while maintaining good accuracy. Furthermore, we leverage coordinate attention to effectively fuse features from different branches and scales. This mechanism captures both inter-channel dependencies and spatial context, enabling the network to accurately localize keypoints across the human body. We evaluated the effectiveness of our method on the MPII and COCO datasets, demonstrating superior results on human pose estimation compared to popular lightweight networks. Our code is available at: https://github.com/illusory2333/LSDNet.

X-HRNet: Towards Lightweight Human Pose Estimation with Spatially Unidimensional Self-Attention

Context-Guided Adaptive Network for Efficient Human Pose Estimation.

A-HRNet: Attention Based High Resolution Network for Human Pose Estimation

Human Pose Estimation Based on Efficient and Lightweight High-Resolution Network (EL-HRNet)

Human Pose Estimation Based on Lightweight Multi-Scale Coordinate Attention

Optimized S2E Attention Block based Convolutional Network for Human Pose Estimation

Lightweight high-resolution network based on adaptive cross-dimensional weighting for human pose estimation

Multi-Stage HRNet: Multiple Stage High-Resolution Network for Human Pose Estimation

Lightweight Super-Resolution Head for Human Pose Estimation

Research on Lightweight High-resolution Network Human Pose Estimation Based on Self-attention

Parallel Self-Attention and Spatial-Attention Fusion for Human Pose Estimation and Running Movement Recognition

LSDNet: lightweight stochastic depth network for human pose estimation

Towards Simple and Accurate Human Pose Estimation with Stair Network

Lite-HRNet: A Lightweight High-Resolution Network

Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation

Bottom-Up Human Pose Estimation by Ranking Heatmap-Guided Adaptive Keypoint Estimates

Ghost attentional down net: An effective lightweight top-down network for human pose estimation

Unified End-to-End YOLOv5-HR-TCM Framework for Automatic 2D/3D Human Pose Estimation for Real-Time Applications

An improved lightweight high-resolution network based on multi-dimensional weighting for human pose estimation

Simplified-attention Enhanced Graph Convolutional Network for 3D human pose estimation

Pose ResNet: 3D Human Pose Estimation Based on Self-Supervision