Self-supervised 3D Human Mesh Recovery from Noisy Point Clouds

Xinxin Zuo,Sen Wang,Qiang Sun,Minglun Gong,Li Cheng
DOI: https://doi.org/10.48550/arXiv.2107.07539
2021-11-27
Abstract:This paper presents a novel self-supervised approach to reconstruct human shape and pose from noisy point cloud data. Relying on large amount of dataset with ground-truth annotations, recent learning-based approaches predict correspondences for every vertice on the point cloud; Chamfer distance is usually used to minimize the distance between a deformed template model and the input point cloud. However, Chamfer distance is quite sensitive to noise and outliers, thus could be unreliable to assign correspondences. To address these issues, we model the probability distribution of the input point cloud as generated from a parametric human model under a Gaussian Mixture Model. Instead of explicitly aligning correspondences, we treat the process of correspondence search as an implicit probabilistic association by updating the posterior probability of the template model given the input. A novel self-supervised loss is further derived which penalizes the discrepancy between the deformed template and the input point cloud conditioned on the posterior probability. Our approach is very flexible, which works with both complete point cloud and incomplete ones including even a single depth image as input. Compared to previous self-supervised methods, our method shows the capability to deal with substantial noise and outliers. Extensive experiments conducted on various public synthetic datasets as well as a very noisy real dataset (i.e. CMU Panoptic) demonstrate the superior performance of our approach over the state-of-the-art methods.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to solve the problem of reconstructing human shapes and poses from noisy point - cloud data. Specifically, existing learning - based methods perform poorly when dealing with noisy and outlier - containing point - cloud data, and usually rely on a large number of labeled datasets for training. In addition, existing methods usually require complete point - clouds as input, which is uncommon in practical applications because the actually acquired point - clouds are often incomplete (for example, only containing a single depth image). To solve these problems, this paper proposes a new self - supervised method that uses a probability model to handle noise and outliers and can handle incomplete point - cloud data. ### Main problem summary: 1. **Sensitivity to noise and outliers**: Existing methods are very sensitive to noise and outliers, resulting in poor reconstruction effects. 2. **Dependence on complete point - clouds**: Most existing methods require complete point - clouds as input, which is difficult to meet in practical applications. 3. **Dependence on a large amount of labeled data**: Existing methods usually rely on a large amount of labeled data for training, and it is very difficult to obtain such data in practical applications. ### Solutions: - **Probability modeling**: Use a Gaussian Mixture Model (GMM) to model the probability distribution of the input point - cloud instead of directly regressing one - to - one correspondences. This can handle noise and outliers more flexibly. - **Self - supervised loss function**: Introduce a new self - supervised loss function that minimizes the difference between the deformed template and the input point - cloud based on the posterior probability. - **Handling incomplete point - clouds**: This method can handle incomplete point - cloud data, including cases where only a single depth image is included. Through these improvements, this method can more robustly reconstruct human shapes and poses in the presence of noise, outliers, and incomplete point - clouds.