Abstract:This paper presents a novel method for generating diverse 3D human poses in scenes with semantic control. Existing methods heavily rely on the human-scene interaction dataset, resulting in a limited diversity of the generated human poses. To overcome this challenge, we propose to decouple the pose and interaction generation process. Our approach consists of three stages: pose generation, contact generation, and putting human into the scene. We train a pose generator on the human dataset to learn rich pose prior, and a contact generator on the human-scene interaction dataset to learn human-scene contact prior. Finally, the placing module puts the human body into the scene in a suitable and natural manner. The experimental results on the PROX dataset demonstrate that our method produces more physically plausible interactions and exhibits more diverse human poses. Furthermore, experiments on the MP3D-R dataset further validates the generalization ability of our method.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **How to generate diverse and natural 3D human poses in a scene while maintaining reasonable interaction with the scene**. Existing methods usually rely on specific human - scene interaction datasets, resulting in limited diversity of generated poses and difficulty in controlling the generation process. Moreover, the poses generated by these methods often lack sufficient naturalness and physical rationality. ### Specific problems: 1. **Insufficient pose diversity**: The types of poses generated by existing methods are limited, and new poses that have never appeared in the interaction datasets cannot be generated. 2. **Poor physical rationality**: The interaction between the generated human poses and the scene is not natural enough, and problems such as penetration or unreasonable contact are likely to occur. 3. **Poor controllability**: The generation process lacks fine - grained control over poses and interactions, and it is difficult to generate actions and interactions of a specified type according to specific requirements. ### Solutions: To solve the above problems, the paper proposes a method based on a decoupled structure, which separates the processes of pose generation and interaction generation for processing. Specifically: 1. **Pose generation module**: By training a pose generator on a large - scale human pose dataset, rich prior knowledge of human poses is learned, thereby reducing the dependence on specific interaction datasets and increasing pose diversity. 2. **Contact generation module**: By training a contact generator on a human - scene interaction dataset, the contact patterns between the human body and the scene are learned to ensure that the generated poses can interact reasonably with the scene. 3. **Placement module**: Place the generated human poses into the scene, and through three sub - stages of initial position selection, physical feasibility testing, and optimization, ensure that the generated poses are both natural and in line with physical laws. ### Method advantages: - **Higher pose diversity**: By decoupling the pose and interaction generation processes, more diverse human poses can be generated, including some uncommon or poses that have never appeared in the dataset. - **Better physical rationality**: The introduction of physical feasibility testing and optimization modules ensures that the generated poses do not have serious penetration or unreasonable contact, improving the physical rationality of the interaction. - **Stronger controllability**: Through conditional inputs of action and object types, the types of generated poses and interactions can be precisely controlled, enhancing the controllability of the method. ### Experimental verification: The experimental results show that this method can generate more diverse and physically reasonable 3D human poses on multiple datasets, and in particular, it also shows good generalization ability in some uncommon interaction scenes.

Diverse 3D Human Pose Generation in Scenes based on Decoupled Structure

Generating 3D People in Scenes Without People

Resolving 3D Human Pose Ambiguities with 3D Scene Constraints

Unsupervised Universal Hierarchical Multi-Person 3D Pose Estimation for Natural Scenes

Synthesizing Diverse Human Motions in 3D Indoor Scenes

Di^2Pose: Discrete Diffusion Model for Occluded 3D Human Pose Estimation

Towards Diverse and Natural Scene-aware 3D Human Motion Synthesis

Reconstructing 3D Human Pose from RGB-D Data with Occlusions

Diffusion-Based Hypotheses Generation and Joint-Level Hypotheses Aggregation for 3D Human Pose Estimation

Putting Humans in a Scene: Learning Affordance in 3D Indoor Environments

Human-Aware 3D Scene Generation with Spatially-constrained Diffusion Models

Populating 3D Scenes by Learning Human-Scene Interaction

Hierarchical Generation Of Human Pose With Part-Based Layer Representation

Embodied Scene-aware Human Pose Estimation

HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes

PoseGU: 3D Human Pose Estimation with Novel Human Pose Generator and Unbiased Learning

Synthesizing Long-Term 3D Human Motion and Interaction in 3D Scenes

DPoser: Diffusion Model as Robust 3D Human Pose Prior

Generating Continual Human Motion in Diverse 3D Scenes

3D Human Reconstruction in the Wild with Synthetic Data Using Generative Models

HumanCoser: Layered 3D Human Generation via Semantic-Aware Diffusion Model