A residual semantic graph convolutional network with high-resolution representation for 3D human pose estimation in a virtual fashion show
Peng Zhang,Pengfei Ding,Geng Li,Jie Zhang
DOI: https://doi.org/10.1007/s11042-024-19383-6
IF: 2.577
2024-05-23
Multimedia Tools and Applications
Abstract:3D human pose estimation has achieved rapid progress in virtual fashion shows. Loose clothing and personalized poses in virtual fashion shows often led to unclear body joint positions and frequent joint self-occlusion. Moreover, the current methods usually focus on salient features, ignoring multi-scale high-resolution features in virtual fashion shows. This paper proposes a residual semantic graph convolutional network (GCN) with high-resolution representation, which contains high-resolution driven 2D feature extraction and semantic guidance 3D pose regression, for 3D human pose estimation in virtual fashion shows. To obtain enough valuable 2D features, we design a multi-scale module to extract global semantic information and local semantic information from high-resolution and low-resolution data respectively. Meanwhile, the feature heat map is regressed into 2D joint coordinates by integral regression. For 3D pose regression, a residual semantic GCN with non-local module, which views the 2D joint coordinates as input, is devised to explore the neighbor relationship of body joints to further improve the vague joint positions and joint self-occlusion. In the above network, each local joint is given a learnable weight to learn local semantic information gradually. Subsequently, the local semantic information is extended to the global level and is blended with obtained global semantic information via non-local operation. The experiments on COCO2017, Human3.6 M, and joint occlusion datasets demonstrate the effectiveness of our method.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering