Two-Stage Representation Refinement Based on Convex Combination for 3D Human Poses Estimation

Luefeng Chen,Wei Cao,Biao Zheng,Min Wu,Witold Pedrycz,Kaoru Hirota
DOI: https://doi.org/10.1109/tai.2024.3432028
2024-01-01
IEEE Transactions on Artificial Intelligence
Abstract:In the human pose estimation task, on the one hand, 3D pose always has difficulty in dividing different 2D poses if the view is limited; on the other hand, it is hard to reduce the lifting ambiguity because of the lack of depth information, it is an important and challenging problem. Therefore, Two-stage representation refinement based on the convex combination for 3D human pose estimation is proposed, in which the two-stage method includes a dense-spatial-temporal convolutional network and a local-to-refine network. The former is applied to determine the features between each video frame; the latter is used to get the different scales of pose details. It aims to address the difficulty of estimating 3D human pose from 2D image sequences. In such a way, it can better use the relations between every frame in the sequence of the pose video to produce more accurate results. Finally, we combine the above network with a block called convex combination to helps refine the 3D pose location. We test the proposed approach on both Human3.6m and MPII datasets. The result confirms that our method can achieve better performance than improved CNN supervision, a simple yet effective baseline, and coarse-to-fine volumetric prediction. Besides, a robustness test experiment is carried out for the proposed method while the input is interrupted. The result verifies that our method shows better robustness.
What problem does this paper attempt to address?