Abstract:3D human reconstruction from a single image is a challenging problem. Existing methods have difficulties to infer 3D clothed human models with consistent topologies for various poses. In this paper, we propose an efficient and effective method using a hierarchical graph transformation network. To deal with large deformations and avoid distorted geometries, rather than using Euclidean coordinates directly, 3D human shapes are represented by a vertex-based deformation representation that effectively encodes the deformation and copes well with large deformations. To infer a 3D human mesh consistent with the input real image, we also use a perspective projection layer to incorporate perceptual image features into the deformation representation. Our model is easy to train and fast to converge with short test time. Besides, we present the <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$D^{2}Human$ </tex-math></inline-formula> (Dynamic Detailed Human) dataset, including variously posed 3D human meshes with consistent topologies and rich geometry details, together with the captured color images and SMPL models, which is useful for training and evaluation of deep frameworks, particularly for graph neural networks. Experimental results demonstrate that our method achieves more plausible and complete 3D human reconstruction from a single image, compared with several state-of-the-art methods. The code and dataset are available for research purposes at <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><uri>http://cic.tju.edu.cn/faculty/likun/projects/MGTnet</uri></i> .

GraphMLP: A Graph MLP-Like Architecture for 3D Human Pose Estimation

Hierarchical Graph Networks for 3D Human Pose Estimation

MLP-JCG: Multi-Layer Perceptron with Joint-Coordinate Gating for Efficient 3D Human Pose Estimation

GLA-GCN: Global-local Adaptive Graph Convolutional Network for 3D Human Pose Estimation from Monocular Video

Interweaved Graph and Attention Network for 3D Human Pose Estimation

Conditional Directed Graph Convolution for 3D Human Pose Estimation

PoseGTAC: Graph Transformer Encoder-Decoder with Atrous Convolution for 3D Human Pose Estimation

Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images

Graph U-Shaped Network with Mapping-Aware Local Enhancement for Single-Frame 3D Human Pose Estimation

Multiscale Spatio-Temporal Graph Neural Networks for 3D Skeleton-Based Motion Prediction

MUG: Multi-human Graph Network for 3D Mesh Reconstruction from 2D Pose

Multi-hop graph transformer network for 3D human pose estimation

Graph and Skipped Transformer: Exploiting Spatial and Temporal Modeling Capacities for Efficient 3D Human Pose Estimation

Multi-Graph Convolution Network for Pose Forecasting

Simplified-attention Enhanced Graph Convolutional Network for 3D human pose estimation

Graph Stacked Hourglass Networks for 3D Human Pose Estimation

MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation

Image-Guided Human Reconstruction via Multi-Scale Graph Transformation Networks

Graph-Guided MLP-Mixer for Skeleton-Based Human Motion Prediction

Semi-Dynamic Hypergraph Neural Network for 3D Pose Estimation

3D Human Pose Estimation with Multi-Scale Graph Convolution and Hierarchical Body Pooling