Abstract:We present a novel method for populating 3D indoor scenes with virtual humans that can navigate in the environment and interact with objects in a realistic manner. Existing approaches rely on training sequences that contain captured human motions and the 3D scenes they interact with. However, such interaction data are costly, difficult to capture, and can hardly cover all plausible human-scene interactions in complex environments. To address these challenges, we propose a reinforcement learning-based approach that enables virtual humans to navigate in 3D scenes and interact with objects realistically and autonomously, driven by learned motion control policies. The motion control policies employ latent motion action spaces, which correspond to realistic motion primitives and are learned from large-scale motion capture data using a powerful generative motion model. For navigation in a 3D environment, we propose a scene-aware policy with novel state and reward designs for collision avoidance. Combined with navigation mesh-based path-finding algorithms to generate intermediate waypoints, our approach enables the synthesis of diverse human motions navigating in 3D indoor scenes and avoiding obstacles. To generate fine-grained human-object interactions, we carefully curate interaction goal guidance using a marker-based body representation and leverage features based on the signed distance field (SDF) to encode human-scene proximity relations. Our method can synthesize realistic and diverse human-object interactions (e.g.,~sitting on a chair and then getting up) even for out-of-distribution test scenarios with different object shapes, orientations, starting body positions, and poses. Experimental results demonstrate that our approach outperforms state-of-the-art methods in terms of both motion naturalness and diversity. Code and video results are available at: <a class="link-external link-https" href="https://zkf1997.github.io/DIMOS" rel="external noopener nofollow">this https URL</a>.

Hierarchical Generation of Human-Object Interactions with Diffusion Probabilistic Models

Act As You Wish: Fine-Grained Control of Motion Diffusion Model with Hierarchical Semantic Graphs

Human-Aware 3D Scene Generation with Spatially-constrained Diffusion Models

Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance

InterGen: Diffusion-Based Multi-human Motion Generation Under Complex Interactions

Generating Continual Human Motion in Diverse 3D Scenes

Synthesizing Diverse Human Motions in 3D Indoor Scenes

Synthesizing Long-Term 3D Human Motion and Interaction in 3D Scenes

Task-Oriented Human-Object Interactions Generation with Implicit Neural Representations

Motion Mamba: Efficient and Long Sequence Motion Generation

Learning a Generative Model for Multi‐Step Human‐Object Interactions from Videos

TIMotion: Temporal and Interactive Framework for Efficient Human-Human Motion Generation

StableMoFusion: Towards Robust and Efficient Diffusion-based Motion Generation Framework

DiverseMotion: Towards Diverse Human Motion Generation Via Discrete Diffusion

Synthesizing Long-Term Human Motions with Diffusion Models via Coherent Sampling

DreaMoving: A Human Video Generation Framework based on Diffusion Models

Guided Motion Diffusion for Controllable Human Motion Synthesis

Object Motion Guided Human Motion Synthesis

Controllable Human-Object Interaction Synthesis

MotionDiffuse: Text-Driven Human Motion Generation With Diffusion Model

Towards Efficient and Diverse Generative Model for Unconditional Human Motion Synthesis