BiGym: A Demo-Driven Mobile Bi-Manual Manipulation Benchmark

Nikita Chernyadev,Nicholas Backshall,Xiao Ma,Yunfan Lu,Younggyo Seo,Stephen James
2024-07-12
Abstract:We introduce BiGym, a new benchmark and learning environment for mobile bi-manual demo-driven robotic manipulation. BiGym features 40 diverse tasks set in home environments, ranging from simple target reaching to complex kitchen cleaning. To capture the real-world performance accurately, we provide human-collected demonstrations for each task, reflecting the diverse modalities found in real-world robot trajectories. BiGym supports a variety of observations, including proprioceptive data and visual inputs such as RGB, and depth from 3 camera views. To validate the usability of BiGym, we thoroughly benchmark the state-of-the-art imitation learning algorithms and demo-driven reinforcement learning algorithms within the environment and discuss the future opportunities.
Robotics,Artificial Intelligence,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
This paper presents a new benchmark and learning environment called BiGym, focusing on mobile bimanual manipulation tasks. BiGym consists of 40 diverse tasks set in a home environment, ranging from simple goal-reaching to complex dishwasher cleaning. To accurately reflect real-world performance, each task comes with human-collected demonstrations that capture various patterns in robot trajectories. BiGym supports various observation data, including proprioceptive sensory data and visual inputs such as RGB and depth images. The paper mentions that existing benchmarks primarily focus on pure reinforcement learning, but defining reward functions for long-horizon tasks is challenging. In contrast, BiGym provides sparse rewards and combines human demonstrations to evaluate imitation learning and demonstration-based reinforcement learning algorithms. Compared to expert demonstrations generated by planners, BiGym's human-collected demonstrations are more natural and multimodal, closer to actual robot motion trajectories. Furthermore, BiGym allows users to switch between full-body mode (considering both mobility and manipulation) and bimanual mode (focusing on upper-body mobility and manipulation with the lower body controlled by a fixed controller) to better study and compare the capabilities of different algorithms. The paper also compares BiGym to other existing benchmarks, emphasizing its advantages in bimanual manipulation and mobile manipulation tasks. Through experiments, the authors validate the usability of BiGym, test state-of-the-art imitation learning and demonstration-based reinforcement learning algorithms, and discuss future research directions such as improving network architectures to accommodate multimodal noisy demonstrations and better handling of partial observability issues in the context of mobile manipulation.