Abstract:We introduce BiGym, a new benchmark and learning environment for mobile bi-manual demo-driven robotic manipulation. BiGym features 40 diverse tasks set in home environments, ranging from simple target reaching to complex kitchen cleaning. To capture the real-world performance accurately, we provide human-collected demonstrations for each task, reflecting the diverse modalities found in real-world robot trajectories. BiGym supports a variety of observations, including proprioceptive data and visual inputs such as RGB, and depth from 3 camera views. To validate the usability of BiGym, we thoroughly benchmark the state-of-the-art imitation learning algorithms and demo-driven reinforcement learning algorithms within the environment and discuss the future opportunities.

What problem does this paper attempt to address?

This paper presents a new benchmark and learning environment called BiGym, focusing on mobile bimanual manipulation tasks. BiGym consists of 40 diverse tasks set in a home environment, ranging from simple goal-reaching to complex dishwasher cleaning. To accurately reflect real-world performance, each task comes with human-collected demonstrations that capture various patterns in robot trajectories. BiGym supports various observation data, including proprioceptive sensory data and visual inputs such as RGB and depth images. The paper mentions that existing benchmarks primarily focus on pure reinforcement learning, but defining reward functions for long-horizon tasks is challenging. In contrast, BiGym provides sparse rewards and combines human demonstrations to evaluate imitation learning and demonstration-based reinforcement learning algorithms. Compared to expert demonstrations generated by planners, BiGym's human-collected demonstrations are more natural and multimodal, closer to actual robot motion trajectories. Furthermore, BiGym allows users to switch between full-body mode (considering both mobility and manipulation) and bimanual mode (focusing on upper-body mobility and manipulation with the lower body controlled by a fixed controller) to better study and compare the capabilities of different algorithms. The paper also compares BiGym to other existing benchmarks, emphasizing its advantages in bimanual manipulation and mobile manipulation tasks. Through experiments, the authors validate the usability of BiGym, test state-of-the-art imitation learning and demonstration-based reinforcement learning algorithms, and discuss future research directions such as improving network architectures to accommodate multimodal noisy demonstrations and better handling of partial observability issues in the context of mobile manipulation.

BiGym: A Demo-Driven Mobile Bi-Manual Manipulation Benchmark

Benchmark for Skill Learning from Demonstration: Impact of User Experience, Task Complexity, and Start Configuration on Performance

Learning Diverse Bimanual Dexterous Manipulation Skills from Human Demonstrations

PerAct2: Benchmarking and Learning for Robotic Bimanual Manipulation Tasks

Bi-Touch: Bimanual Tactile Manipulation With Sim-to-Real Deep Reinforcement Learning

BEHAVIOR-1K: A Human-Centered, Embodied AI Benchmark with 1,000 Everyday Activities and Realistic Simulation

Bimanual Dexterity for Complex Tasks

ManiSkill2: A Unified Benchmark for Generalizable Manipulation Skills

M3Bench: Benchmarking Whole-body Motion Generation for Mobile Manipulation in 3D Scenes

Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning

DexMimicGen: Automated Data Generation for Bimanual Dexterous Manipulation via Imitation Learning

Demonstrating Mobile Manipulation in the Wild: A Metrics-Driven Approach

Bi-DexHands: Towards Human-Level Bimanual Dexterous Manipulation

SoftGym: Benchmarking Deep Reinforcement Learning for Deformable Object Manipulation

Benchmarking Simulated Robotic Manipulation through a Real World Dataset

HumanoidBench: Simulated Humanoid Benchmark for Whole-Body Locomotion and Manipulation

BiCap: A novel bi-modal dataset of daily living dual-arm manipulation actions

Empowering Embodied Manipulation: A Bimanual-Mobile Robot Manipulation Dataset for Household Tasks

Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation

Learning Multi-Step Manipulation Tasks from A Single Human Demonstration