Diverse 3D Human Pose Generation in Scenes based on Decoupled Structure

Bowen Dang,Xi Zhao
2024-06-09
Abstract:This paper presents a novel method for generating diverse 3D human poses in scenes with semantic control. Existing methods heavily rely on the human-scene interaction dataset, resulting in a limited diversity of the generated human poses. To overcome this challenge, we propose to decouple the pose and interaction generation process. Our approach consists of three stages: pose generation, contact generation, and putting human into the scene. We train a pose generator on the human dataset to learn rich pose prior, and a contact generator on the human-scene interaction dataset to learn human-scene contact prior. Finally, the placing module puts the human body into the scene in a suitable and natural manner. The experimental results on the PROX dataset demonstrate that our method produces more physically plausible interactions and exhibits more diverse human poses. Furthermore, experiments on the MP3D-R dataset further validates the generalization ability of our method.
Computer Vision and Pattern Recognition,Graphics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to generate diverse and natural 3D human poses in a scene while maintaining reasonable interaction with the scene**. Existing methods usually rely on specific human - scene interaction datasets, resulting in limited diversity of generated poses and difficulty in controlling the generation process. Moreover, the poses generated by these methods often lack sufficient naturalness and physical rationality. ### Specific problems: 1. **Insufficient pose diversity**: The types of poses generated by existing methods are limited, and new poses that have never appeared in the interaction datasets cannot be generated. 2. **Poor physical rationality**: The interaction between the generated human poses and the scene is not natural enough, and problems such as penetration or unreasonable contact are likely to occur. 3. **Poor controllability**: The generation process lacks fine - grained control over poses and interactions, and it is difficult to generate actions and interactions of a specified type according to specific requirements. ### Solutions: To solve the above problems, the paper proposes a method based on a decoupled structure, which separates the processes of pose generation and interaction generation for processing. Specifically: 1. **Pose generation module**: By training a pose generator on a large - scale human pose dataset, rich prior knowledge of human poses is learned, thereby reducing the dependence on specific interaction datasets and increasing pose diversity. 2. **Contact generation module**: By training a contact generator on a human - scene interaction dataset, the contact patterns between the human body and the scene are learned to ensure that the generated poses can interact reasonably with the scene. 3. **Placement module**: Place the generated human poses into the scene, and through three sub - stages of initial position selection, physical feasibility testing, and optimization, ensure that the generated poses are both natural and in line with physical laws. ### Method advantages: - **Higher pose diversity**: By decoupling the pose and interaction generation processes, more diverse human poses can be generated, including some uncommon or poses that have never appeared in the dataset. - **Better physical rationality**: The introduction of physical feasibility testing and optimization modules ensures that the generated poses do not have serious penetration or unreasonable contact, improving the physical rationality of the interaction. - **Stronger controllability**: Through conditional inputs of action and object types, the types of generated poses and interactions can be precisely controlled, enhancing the controllability of the method. ### Experimental verification: The experimental results show that this method can generate more diverse and physically reasonable 3D human poses on multiple datasets, and in particular, it also shows good generalization ability in some uncommon interaction scenes.