Abstract:We present a novel method for populating 3D indoor scenes with virtual humans that can navigate in the environment and interact with objects in a realistic manner. Existing approaches rely on training sequences that contain captured human motions and the 3D scenes they interact with. However, such interaction data are costly, difficult to capture, and can hardly cover all plausible human-scene interactions in complex environments. To address these challenges, we propose a reinforcement learning-based approach that enables virtual humans to navigate in 3D scenes and interact with objects realistically and autonomously, driven by learned motion control policies. The motion control policies employ latent motion action spaces, which correspond to realistic motion primitives and are learned from large-scale motion capture data using a powerful generative motion model. For navigation in a 3D environment, we propose a scene-aware policy with novel state and reward designs for collision avoidance. Combined with navigation mesh-based path-finding algorithms to generate intermediate waypoints, our approach enables the synthesis of diverse human motions navigating in 3D indoor scenes and avoiding obstacles. To generate fine-grained human-object interactions, we carefully curate interaction goal guidance using a marker-based body representation and leverage features based on the signed distance field (SDF) to encode human-scene proximity relations. Our method can synthesize realistic and diverse human-object interactions (e.g.,~sitting on a chair and then getting up) even for out-of-distribution test scenarios with different object shapes, orientations, starting body positions, and poses. Experimental results demonstrate that our approach outperforms state-of-the-art methods in terms of both motion naturalness and diversity. Code and video results are available at: <a class="link-external link-https" href="https://zkf1997.github.io/DIMOS" rel="external noopener nofollow">this https URL</a>.

Human-centric Indoor Scene Synthesis Using Stochastic Grammar

Learning 3 D Scene Synthesis from Annotated RGB-D Images

DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-Aware Scene Synthesis

Holistic 3 D Indoor Scene Parsing and Reconstruction from a Single RGB Image

Action-driven 3D Indoor Scene Evolution

Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image

Configurable 3D Scene Synthesis and 2D Image Rendering with Per-pixel Ground Truth Using Stochastic Grammars

RoomDreamer: Text-Driven 3D Indoor Scene Synthesis with Coherent Geometry and Texture

Fast 3D Indoor Scene Synthesis with Discrete and Exact Layout Pattern Extraction

A Survey of 3D Indoor Scene Synthesis

Fast 3D Indoor Scene Synthesis by Learning Spatial Relation Priors of Objects

Fast 3D Indoor Scene Synthesis by LearningSpatial Relation Priors of Objects

SceneCraft: Layout-Guided 3D Scene Generation

Putting Humans in a Scene: Learning Affordance in 3D Indoor Environments

Synthesizing Diverse Human Motions in 3D Indoor Scenes

RoomDesigner: Encoding Anchor-latents for Style-consistent and Shape-compatible Indoor Scene Generation

HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes

Planner3D: LLM-enhanced graph prior meets 3D indoor scene explicit regularization

Integrating Function , Geometry , Appearance for Scene Parsing

Neural Rendering in a Room: Amodal 3D Understanding and Free-Viewpoint Rendering for the Closed Scene Composed of Pre-Captured Objects

Indoor Scene Generation from a Collection of Semantic-Segmented Depth Images