Autoencoder and Masked Image Encoding-Based Attentional Pose Network.

Longhua Hu,Xiaoliang Ma,Cheng He,Lei Wang,Jun Cheng
DOI: https://doi.org/10.1007/978-981-99-8432-9_18
2024-01-01
Abstract:Despite recent advances in single-image-based 3D human pose and shape estimation, partial occlusion remains a major challenge for many methods, leading to significant prediction errors. Some existing methods fail to provide satisfactory performance for 3D human body reconstruction in occluded outdoor environments. To address these issues, we propose an autoencoder for feature extraction that integrates image masking methods to improve training stability. Our approach utilizes an attention mechanism to effectively capture the features of partially visible body parts, addressing partial occlusion. We further employ a partial attention mechanism to obtain the final features and use a regressor to estimate human model parameters. Experimental results on outdoor 3D poses in benchmark datasets demonstrate that our method outperforms state-of-the-art image-based methods in terms of robustness and efficiency. Qualitative evaluation shows that our method achieves more accurate and robust reconstruction results than existing methods, not only in occluded scenarios but also on standard benchmarks. Our approach exhibits excellent model robustness and training stability.
What problem does this paper attempt to address?