Abstract:Human pose estimation is a popular research area due to its wide range of application scenarios. The general works, on the other hand, concentrate on how to enhance the network's width, depth, and resolution, which results in a sizable number of parameters that hinder practical implementation on real-time and resource-constrained devices. Furthermore, since networks are severely constrained by the unequal feature distribution, it is challenging to extract deep features. We propose an S2E-based attention module that is lightweight, easily scalable, and aims to achieve a balance between reignition accuracy and speed while using fewer computational resources. The optimized S2E attention model consists of two layers of compression modules and one layer of motivation modules. We apply this attention block to the classical ResNet-101 and HRNet network backbones to build our S2E Based S2E-ResNet-101 and S2E-HRNet structure. Comparative studies on the COCO dataset and the MPII dataset show that the S2E module consumes very few computational resources but shows significant improvement in prediction accuracy, achieving a better speed/accuracy tradeoff and being more practical than other state-of-the-art methods. Moreover, the visual output of the qualitative comparison experiments in chaotic pose recognition further demonstrates our model's capacity to concentrate on a significantly more detailed area and prevent erroneous recognition brought on by posture crossover and occlusion. Overall, it can be seen that the S2E module is a simple but effective and easily scalable attention module, which is of tremendous practical value to the field of pose recognition.

Improved Modular Convolution Neural Network for Human Pose Estimation

Modelling Human Body Pose for Action Recognition Using Deep Neural Networks

Multi-Scale Supervised Network for Human Pose Estimation

Improving Human Pose Estimation Based on Stacked Hourglass Network

Full-Resolution Encoder-Decoder Networks with Multi-Scale Feature Fusion for Human Pose Estimation

Deep Dual Consecutive Network for Human Pose Estimation

Human Pose Estimation from Depth Images via Inference Embedded Multi-task Learning

Locally Connected Network for Monocular 3D Human Pose Estimation

Human Pose Estimation Based on Parallel Atrous Convolution and Body Structure Constraints

Optimized S2E Attention Block based Convolutional Network for Human Pose Estimation

Human Pose Estimation using Global and Local Normalization

Human Pose Estimation Based on Lightweight Multi-Scale Coordinate Attention

Single upper limb pose estimation method based on improved stacked hourglass network

Densely Connected Attentional Pyramid Residual Network for Human Pose Estimation.

Complementary Feature Pyramid Network for Human Pose Estimation

Higher-Order Implicit Fairing Networks for 3D Human Pose Estimation

3D Human Pose Estimation Using Improved Semantic Graph Convolutional Based on Fusing Non-local Neural Network and Multi-Head Attention

A Cascaded Inception of Inception Network with Attention Modulated Feature Fusion for Human Pose Estimation

A Deconvolutional Bottom-up Deep Network for Multi-Person Pose Estimation.

Combining Local Appearance and Holistic View: Dual-Source Deep Neural Networks for Human Pose Estimation

Motion Capture for Sporting Events Based on Graph Convolutional Neural Networks and Single Target Pose Estimation Algorithms