Phone2Proc: Bringing Robust Robots Into Our Chaotic World

Matt Deitke,Rose Hendrix,Luca Weihs,Ali Farhadi,Kiana Ehsani,Aniruddha Kembhavi
DOI: https://doi.org/10.48550/arXiv.2212.04819
2022-12-09
Abstract:Training embodied agents in simulation has become mainstream for the embodied AI community. However, these agents often struggle when deployed in the physical world due to their inability to generalize to real-world environments. In this paper, we present Phone2Proc, a method that uses a 10-minute phone scan and conditional procedural generation to create a distribution of training scenes that are semantically similar to the target environment. The generated scenes are conditioned on the wall layout and arrangement of large objects from the scan, while also sampling lighting, clutter, surface textures, and instances of smaller objects with randomized placement and materials. Leveraging just a simple RGB camera, training with Phone2Proc shows massive improvements from 34.7% to 70.7% success rate in sim-to-real ObjectNav performance across a test suite of over 200 trials in diverse real-world environments, including homes, offices, and RoboTHOR. Furthermore, Phone2Proc's diverse distribution of generated scenes makes agents remarkably robust to changes in the real world, such as human movement, object rearrangement, lighting changes, or clutter.
Robotics,Artificial Intelligence,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the generalization challenge encountered when deploying robot agents trained through simulation in the real world. Specifically, the paper points out that although robot agents trained in simulated environments have demonstrated enhanced capabilities, when these agents are deployed in the physical world, they often struggle to adapt to changes in the real environment, such as different layouts, novel object instances, clutter, lighting changes, and human activities. These problems lead to a performance degradation from simulation to reality (sim - to - real). To address these challenges, the paper proposes the Phone2Proc method. Phone2Proc uses mobile phones to scan the target environment and generates a series of training - scene variants that are semantically similar to the target environment based on this. In this way, Phone2Proc aims to reduce the generalization gap between the simulated - environment dataset and the target real environment, thereby improving the performance of robots in the real world. The main contributions of the paper include: 1. **Proposing Phone2Proc**: A method that effectively reduces the generalization gap between the simulated environment and the real target environment. 2. **Large - scale real - world experiments**: Demonstrates significant improvements of Phone2Proc over existing techniques through 234 trials. 3. **Robustness - to - change testing**: Proves the robustness of Phone2Proc when facing real - world changes such as lighting changes, increased clutter, and human activities. Through these contributions, Phone2Proc not only improves the navigation ability of robots in complex real - environment, but also demonstrates its strong adaptability in the face of various uncertainties and changes.