Leveraging Anthropometric Measurements to Improve Human Mesh Estimation and Ensure Consistent Body Shapes

Katja Ludwig,Julian Lorenz,Daniel Kienzle,Tuan Bui,Rainer Lienhart
2024-09-27
Abstract:The basic body shape of a person does not change within a single video. However, most SOTA human mesh estimation (HME) models output a slightly different body shape for each video frame, which results in inconsistent body shapes for the same person. In contrast, we leverage anthropometric measurements like tailors are already obtaining from humans for centuries. We create a model called A2B that converts such anthropometric measurements to body shape parameters of human mesh models. Moreover, we find that finetuned SOTA 3D human pose estimation (HPE) models outperform HME models regarding the precision of the estimated keypoints. We show that applying inverse kinematics (IK) to the results of such a 3D HPE model and combining the resulting body pose with the A2B body shape leads to superior and consistent human meshes for challenging datasets like ASPset or fit3D, where we can lower the MPJPE by over 30 mm compared to SOTA HME models. Further, replacing HME models estimates of the body shape parameters with A2B model results not only increases the performance of these HME models, but also leads to consistent body shapes.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve **the problem of inconsistent human body shapes generated by the Human Mesh Estimation (HME) model in videos**. Specifically: 1. **Problem background**: - When processing videos, the HME model usually generates a slightly different human body shape for each frame, even if these frames are consecutive actions of the same person. This results in inconsistent body shapes of the same person in the same video. - This inconsistency is especially more obvious in scenarios with rapidly changing postures such as sports, which seriously affects the accuracy of 3D pose and shape estimation. 2. **Limitations of existing methods**: - Currently, most HME models are trained based on a single image and cannot handle the entire video sequence, so it is difficult to maintain the consistency of the body shape of the same person in different frames. - Similar inconsistency problems also exist in existing 3D pose and mesh datasets, which further affect the performance of the model. 3. **Solutions**: - The paper proposes a model named A2B (Anthropometric to Body shape). This model uses anthropometric data (such as the measurement data used by tailors) to generate consistent and accurate human body shape parameters. - The A2B model converts anthropometric data into shape parameters of common human body mesh models such as SMPL - X, ensuring that the body shape of the same person remains consistent in all frames. - In addition, the paper also combines an improved 3D Human Pose Estimation (HPE) model and Inverse Kinematics (IK) to improve the accuracy of pose estimation and combines it with the body shape generated by A2B to generate a more accurate and consistent human body mesh. 4. **Experimental verification**: - The authors conducted experiments on challenging datasets such as ASPset and fit3D. The results show that the body shape parameters generated by using the A2B model can significantly reduce the MPJPE (Mean Per Joint Position Error) and improve the overall performance. ### Summary The main goal of this paper is to solve the problem of inconsistent human body shapes generated by the HME model in videos by introducing the A2B model and an improved pose estimation method, thereby improving the accuracy and consistency of 3D human body mesh estimation.