Abstract:High-quality and complete human motion 4D reconstruction is of great significance for immersive VR and even human operation. However, it has inevitable self-scanning constraints, and tracking under monocular settings also has strict restrictions. In this paper, we propose a human motion capture system combined with human priors and performance capture that only uses a single RGB-D sensor. To break the self-scanning constraint, we generated a complete mesh only using the front view input to initialize the geometric capture. In order to construct a correct warping field, most previous methods initialize their systems in a strict way. To maintain high fidelity while increasing the easiness of the system, we updated the model while capturing motion. Additionally, we blended in human priors in order to improve the reliability of model warping. Extensive experiments demonstrated that our method can be used more comfortably while maintaining credible geometric warping and remaining free of self-scanning constraints.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to reduce the strict requirements for the initial pose and overcome the self - scanning constraint when using a single RGB - D sensor for human motion capture. Specifically, the authors propose a method that combines human prior knowledge and performance capture, aiming to achieve high - fidelity human surface reconstruction and motion tracking while improving the ease - of - use and robustness of the system. ### Key problems solved in the paper: 1. **Reducing the requirements for the initial pose**: Traditional template - based or data - accumulation - based methods often require strict initialization poses, which limit the flexibility of these methods in practical applications. By introducing human prior knowledge, this paper reduces the strict requirements for the initial pose, allowing users to start motion capture in a more natural pose. 2. **Overcoming the self - scanning constraint**: The self - scanning constraint means that when using a single sensor for 3D reconstruction, due to the limitation of the viewing angle, some parts may not be scanned. This paper overcomes this constraint by generating a complete front - view mesh to initialize geometric capture. 3. **Improving the accuracy of motion tracking**: In order to improve the accuracy of motion tracking while maintaining high - fidelity, this paper proposes a new optimization pipeline that combines human prior knowledge and volume fusion technology. This method can not only accurately track human motion, but also maintain the accuracy of geometric details in new fusion areas. ### Method overview: - **Initialization stage**: - Use NormalGAN to generate a complete human mesh with specific details. - Align the generated mesh with the current depth through a non - linear optimization method to initialize the TSDF volume. - Use the SMPL model and FrankMocap to initialize the human pose to ensure the credibility of the complete mesh deformation. - **Motion capture stage**: - Constrain the human pose parameters by point clouds and 3D pose, using the predicted human pose as a prior. - Capture non - rigid deformations and achieve high - fidelity surface reconstruction by solving the surface tracking energy function. - Combine human prior knowledge and depth information to refine the geometric details of the model. ### Technical contributions: - Proposed a human volume capture method based on human prior knowledge, which effectively reduces the strict requirements for the initial pose while maintaining accurate motion tracking. - Designed a new optimization pipeline that combines human prior knowledge and volume fusion technology to overcome the self - scanning constraint. - Generated a complete human mesh with geometric details by a data - driven implicit occupancy representation method, improving the accuracy of surface reconstruction and motion tracking. In conclusion, the method proposed in this paper has made significant progress in improving the ease - of - use and robustness of human motion capture, especially when using a single RGB - D sensor.

Human Motion Tracking with Less Constraint of Initial Posture from a Single RGB-D Sensor

Motion Imitation of a Humanoid Robot Via Pose Estimation

RobustFusion: Human Volumetric Capture with Data-Driven Visual Cues Using a RGBD Camera

Marker-Less 3d Human Motion Capture With Monocular Image Sequence And Height-Maps

Human Motion Tracking by Multiple RGBD Cameras.

Towards Robust and Accurate Single-View Fast Human Motion Capture

Dynamic Human Body Reconstruction and Motion Tracking with Low-Cost Depth Cameras

High-precision Human Body Acquisition Via Multi-View Binocular Stereopsis

Accurate realtime full-body motion capture using a single depth camera

3D Real Human Reconstruction Via Multiple Low-Cost Depth Cameras.

Motion Capture Research: 3D Human Pose Recovery Based on RGB Video Sequences

UnstructuredFusion: Realtime 4D Geometry and Texture Reconstruction Using Commercial RGBD Cameras.

Experimental Study of a Deep-Learning RGB-D Tracker for Virtual Remote Human Model Reconstruction

Simultaneous 3-D Human-Motion Tracking and Voxel Reconstruction

Two-camera-based Human Motion Capture

Skeleton Driven Non-rigid Motion Tracking and 3D Reconstruction

Full-body Human Motion Reconstruction with Sparse Joint Tracking Using Flexible Sensors

4D Human Body Capture from Egocentric Video via 3D Scene Grounding

Human Motion Tracking Using 3D Image Features with a Long Short-Term Memory Mechanism Model—An Example of Forward Reaching

Dynamic Multi-Person Mesh Recovery From Uncalibrated Multi-View Cameras

Ultra Inertial Poser: Scalable Motion Capture and Tracking from Sparse Inertial Sensors and Ultra-Wideband Ranging