Locomotion-Action-Manipulation: Synthesizing Human-Scene Interactions in Complex 3D Environments

Jiye Lee,Hanbyul Joo
2023-09-08
Abstract:Synthesizing interaction-involved human motions has been challenging due to the high complexity of 3D environments and the diversity of possible human behaviors within. We present LAMA, Locomotion-Action-MAnipulation, to synthesize natural and plausible long-term human movements in complex indoor environments. The key motivation of LAMA is to build a unified framework to encompass a series of everyday motions including locomotion, scene interaction, and object manipulation. Unlike existing methods that require motion data "paired" with scanned 3D scenes for supervision, we formulate the problem as a test-time optimization by using human motion capture data only for synthesis. LAMA leverages a reinforcement learning framework coupled with a motion matching algorithm for optimization, and further exploits a motion editing framework via manifold learning to cover possible variations in interaction and manipulation. Throughout extensive experiments, we demonstrate that LAMA outperforms previous approaches in synthesizing realistic motions in various challenging scenarios. Project page: <a class="link-external link-https" href="https://jiyewise.github.io/projects/LAMA/" rel="external noopener nofollow">this https URL</a> .
Computer Vision and Pattern Recognition,Graphics,Robotics
What problem does this paper attempt to address?
### The Problem Addressed by the Paper This paper aims to tackle the challenging problem of synthesizing natural and reasonable long-duration human motions in complex 3D environments. Specifically: 1. **Limitations of Existing Methods**: - Current methods mostly focus on sub-problems, such as static pose modeling or interaction with a single target object. - Some recent methods attempt to synthesize dynamic interactive motions in real 3D scenes but require "paired" motion datasets (i.e., data capturing both motion and the surrounding 3D environment simultaneously), which limits these methods in terms of complexity and diversity coverage. 2. **Proposed New Method LAMA**: - LAMA (Locomotion-Action-Manipulation) is a unified framework capable of generating high-quality and realistic long-duration human motions, including walking, scene interaction, and object manipulation, within a given 3D scene. - Unlike existing methods, LAMA does not rely on motion datasets paired with 3D scenes but treats it as a test-time optimization problem, using only human motion capture data for synthesis. - LAMA combines a reinforcement learning framework and motion matching algorithm to generate motions through optimization and uses manifold learning to handle possible variations. 3. **Main Contributions**: - Proposed the first method capable of generating realistic long-duration motions, including walking, scene interaction, and object manipulation, in complex 3D scenes without requiring paired datasets. - An innovative test-time optimization framework that only requires human motion capture data, combining reinforcement learning and motion matching, with a reward mechanism designed to avoid collisions and interact with the scene. - Achieved state-of-the-art motion synthesis quality with durations close to 10 seconds. - Captured and organized a new high-quality motion capture dataset, including walking and actions (such as sitting down), suitable for motion matching. In summary, the main goal of this paper is to generate natural and reasonable long-duration human motions in complex 3D environments without the need for paired datasets.