Abstract:Optical motion capture systems have been used intensively to obtain human body poses. However, there still exist several problems. First is the dislocation problem caused by joints being too close together. The second is the joint lost problem. Restricted by severe self-occlusions, cameras may not capture the target joints. Given this observation, we investigate the high-level constraints over human poses to solve these two problems. In this work, we present a Simplified-attention Enhanced Graph Convolutional Network (SaEGC-Net) to extract both spatial and temporal features from monocular videos flexibly. The SaEGC-Net for 3D human pose estimation is U-shaped and involves the Cascaded Spatial-Temporal Graph Convolutional (CST-GC) blocks and the Simplified Spatial-Temporal Attention (SST-Att) blocks, allowing for drawing long-range dependencies between unconnected joints by graph topologies and attention mechanism, respectively. Specifically, the CST-GC block embeds two predefined graph structures into a convolutional network, incorporating discriminative features from distant joints. The proposed SST-Att block disregards redundant information by sharing part of the attention map, which is highly lightweight. It also considers dimension-expanded joint relationships to maintain the diversity of dependence. To evaluate the effectiveness of our method, we conduct extensive experiments on two datasets: Human3.6M and our own dataset FDU-Motion. Results demonstrate that our model achieves excellent performance and can competently handle the above two problems. Also, ablation studies show that our network’s submodules can better exploit the motion information of the human body.

Attention Residual Network with 3D convolutional neural network for 3D Human Pose Estimation.

X-HRNet: Towards Lightweight Human Pose Estimation with Spatially Unidimensional Self-Attention

Exploring Severe Occlusion: Multi-Person 3D Pose Estimation with Gated Convolution.

Densely Connected Attentional Pyramid Residual Network for Human Pose Estimation.

Enhanced 3D Human Pose Estimation from Videos by Using Attention-Based Neural Network with Dilated Convolutions

Simplified-attention Enhanced Graph Convolutional Network for 3D human pose estimation

PVA-GCN: point-voxel absorbing graph convolutional network for 3D human pose estimation from monocular video

3D Human Pose Estimation using Spatio-Temporal Networks with Explicit Occlusion Training

3D-UGCN: A Unified Graph Convolutional Network for Robust 3D Human Pose Estimation from Monocular RGB Images

A residual semantic graph convolutional network with high-resolution representation for 3D human pose estimation in a virtual fashion show

Pose ResNet: 3D Human Pose Estimation Based on Self-Supervision

Locally Connected Network for Monocular 3D Human Pose Estimation

3D Human Pose Estimation Using Improved Semantic Graph Convolutional Based on Fusing Non-local Neural Network and Multi-Head Attention

Semi-Dynamic Hypergraph Neural Network for 3D Pose Estimation

Back to the Future: Joint Aware Temporal Deep Learning 3D Human Pose Estimation

Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose

Residual Pose: A Decoupled Approach for Depth-based 3D Human Pose Estimation

Graph U-Shaped Network with Mapping-Aware Local Enhancement for Single-Frame 3D Human Pose Estimation

Optimizing Network Structure for 3D Human Pose Estimation.

V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map

Center point to pose: Multiple views 3D human pose estimation for multi-person