BBSEA: An Exploration of Brain-Body Synchronization for Embodied Agents

Sizhe Yang,Qian Luo,Anumpam Pani,Yanchao Yang
DOI: https://doi.org/10.48550/arXiv.2402.08212
IF: 3.7
2024-02-13
Robotics
Abstract:Embodied agents capable of complex physical skills can improve productivity, elevate life quality, and reshape human-machine collaboration. We aim at autonomous training of embodied agents for various tasks involving mainly large foundation models. It is believed that these models could act as a brain for embodied agents; however, existing methods heavily rely on humans for task proposal and scene customization, limiting the learning autonomy, training efficiency, and generalization of the learned policies. In contrast, we introduce a brain-body synchronization ({\it BBSEA}) scheme to promote embodied learning in unknown environments without human involvement. The proposed combines the wisdom of foundation models (``brain'') with the physical capabilities of embodied agents (``body''). Specifically, it leverages the ``brain'' to propose learnable physical tasks and success metrics, enabling the ``body'' to automatically acquire various skills by continuously interacting with the scene. We carry out an exploration of the proposed autonomous learning scheme in a table-top setting, and we demonstrate that the proposed synchronization can generate diverse tasks and develop multi-task policies with promising adaptability to new tasks and configurations. We will release our data, code, and trained models to facilitate future studies in building autonomously learning agents with large foundation models in more complex scenarios. More visualizations are available at \href{https://bbsea-embodied-ai.github.io}{https://bbsea-embodied-ai.github.io}
What problem does this paper attempt to address?
The paper attempts to address the problem of how to autonomously train embodied agents with complex physical skills in unknown environments, particularly with the assistance of large foundation models (LFMs), to reduce reliance on human intervention, improve learning efficiency, and enhance the generalization ability of strategies. Specifically, the paper proposes a Brain-Body Synchronization (BBSEA) scheme, aiming to achieve autonomous learning without human involvement by combining the intelligence of foundation models ("brain") and the physical capabilities of embodied agents ("body"). ### Main Issues 1. **Reducing Human Intervention**: Existing methods heavily rely on human intervention for task proposal and scene customization, which limits the autonomy of learning, training efficiency, and the generalization ability of strategies. 2. **Improving Learning Efficiency and Generalization Ability**: In unknown environments, how to efficiently train embodied agents to adapt to new tasks and configurations. 3. **Achieving Multi-Task Strategies**: How to generate diverse tasks through autonomous learning and develop well-adapted multi-task strategies. ### Solution The paper proposes a framework to achieve brain-body synchronization through the following three key steps: 1. **Task Proposal**: The foundation model ("brain") proposes interactive tasks based on the scene and the physical constraints of the embodied agent. 2. **Task Completion Inference**: The foundation model defines success metrics for tasks, helping the embodied agent determine whether the task has been successfully executed. 3. **Strategy Learning Under Task Conditions**: The embodied agent acquires skills through continuous interaction with the environment (trial and error) and learns strategies under task conditions based on feedback. ### Experimental Validation The paper validates the proposed framework through experiments in a tabletop manipulation environment, demonstrating its effectiveness in terms of task diversity, feasibility of task proposals, accuracy of task completion inference, and the effectiveness and generalization ability of multi-task strategies. Experimental results show that the BBSEA framework can generate diverse and human-understandable tasks and exhibits high reliability and accuracy in task proposal and success inference. ### Contributions 1. **Autonomous Learning Framework**: Proposes a framework that combines foundation models with embodied agents to achieve autonomous learning in unknown environments. 2. **Task Proposal Module**: Develops an efficient scene understanding module that can automatically propose tasks compatible with the scene and establish evaluation criteria for task completion. 3. **Strategy Learning Validation**: Validates the adaptability of learned strategies to new tasks and configurations through zero-shot and few-shot settings.