RopeTP: Global Human Motion Recovery via Integrating Robust Pose Estimation with Diffusion Trajectory Prior

Mingjiang Liang,Yongkang Cheng,Hualin Liang,Shaoli Huang,Wei Liu
2024-11-01
Abstract:We present RopeTP, a novel framework that combines Robust pose estimation with a diffusion Trajectory Prior to reconstruct global human motion from videos. At the heart of RopeTP is a hierarchical attention mechanism that significantly improves context awareness, which is essential for accurately inferring the posture of occluded body parts. This is achieved by exploiting the relationships with visible anatomical structures, enhancing the accuracy of local pose estimations. The improved robustness of these local estimations allows for the reconstruction of precise and stable global trajectories. Additionally, RopeTP incorporates a diffusion trajectory model that predicts realistic human motion from local pose sequences. This model ensures that the generated trajectories are not only consistent with observed local actions but also unfold naturally over time, thereby improving the realism and stability of 3D human motion reconstruction. Extensive experimental validation shows that RopeTP surpasses current methods on two benchmark datasets, particularly excelling in scenarios with occlusions. It also outperforms methods that rely on SLAM for initial camera estimates and extensive optimization, delivering more accurate and realistic trajectories.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of reconstructing global human motion from videos captured by a monocular camera, especially in the presence of occlusions. Specifically, the paper proposes a new framework named **RopeTP**, which combines robust pose estimation and Diffusion Trajectory Prior to improve the accuracy and stability of occluded parts. #### Main challenges: 1. **Occlusion problem**: In the real world, some parts of the human body may be occluded by itself or other objects, which makes it very difficult to estimate the human pose and shape from monocular images. 2. **Global trajectory ambiguity**: Existing methods based on monocular cameras experience a significant performance degradation when dealing with trajectories in 3D space due to the lack of global information. 3. **Limitations of existing methods**: Many traditional methods rely on parametric models (such as SMPL) for 3D mesh reconstruction, but they perform poorly when encountering occlusions, and the optimization process is computationally intensive and time - consuming. #### Main contributions of RopeTP: 1. **Combining robust pose estimation with Diffusion Trajectory Prior**: By introducing a diffusion - generation model, the global motion trajectory is re - inferred, solving the problem of human trajectory ambiguity under a monocular camera. 2. **Hierarchical attention mechanism**: Multi - scale visual cues are utilized to extract and synthesize features at different levels, thus inferring the pose of occluded parts more accurately. 3. **Efficiency and accuracy**: Compared with traditional optimization methods, RopeTP provides a more efficient solution and performs well on multiple benchmark datasets, especially when dealing with occlusion scenes. #### Experimental verification: - The experimental results on two standard datasets (3DPW and Human3.6M) show that RopeTP outperforms existing methods in terms of metrics such as MPJPE (Mean Joint Position Error) and MPVPE (Mean Vertex Position Error). - Experiments on occlusion datasets (such as 3DPW - OCC and 3DOH) further prove the robustness and superiority of this method in dealing with occlusion problems. In conclusion, through innovative architecture design and algorithm improvement, RopeTP successfully solves the occlusion and trajectory ambiguity problems in human motion reconstruction under a monocular camera, bringing a new breakthrough to the field of computer vision.