E2CL: Exploration-based Error Correction Learning for Embodied Agents

Hanlin Wang,Chak Tou Leong,Jian Wang,Wenjie Li
2024-09-29
Abstract:Language models are exhibiting increasing capability in knowledge utilization and reasoning. However, when applied as agents in embodied environments, they often suffer from misalignment between their intrinsic knowledge and environmental knowledge, leading to infeasible actions. Traditional environment alignment methods, such as supervised learning on expert trajectories and reinforcement learning, encounter limitations in covering environmental knowledge and achieving efficient convergence, respectively. Inspired by human learning, we propose Exploration-based Error Correction Learning (E2CL), a novel framework that leverages exploration-induced errors and environmental feedback to enhance environment alignment for embodied agents. E2CL incorporates teacher-guided and teacher-free explorations to gather environmental feedback and correct erroneous actions. The agent learns to provide feedback and self-correct, thereby enhancing its adaptability to target environments. Extensive experiments in the VirtualHome environment demonstrate that E2CL-trained agents outperform those trained by baseline methods and exhibit superior self-correction capabilities.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in the embodied environment, when agents based on language models (LMs) perform tasks, the inconsistency between their internal knowledge and environmental knowledge leads to infeasible operations. Specifically, when these language - model - based agents are applied to the embodied environment, since their training data mainly comes from general corpora, they lack an understanding of the specific physical constraints of the environment. For example, if an agent is already holding two objects, it should not try to grab a third object anymore, but a language - model - based agent may generate such an incorrect operation. This inconsistency limits the application of these agents in real - world environments. To address this challenge, the paper proposes Exploratory Error - Correction Learning (E2CL), a new framework that enhances the consistency between agents and the environment by leveraging the errors generated during the exploration process and environmental feedback. E2CL combines both guided exploration and free exploration to collect environmental feedback and correct incorrect operations. In this way, the agent can not only learn from mistakes but also self - correct, thereby improving its ability to adapt to the target environment. Experimental results show that in the VirtualHome environment, agents trained with E2CL outperform the baseline methods on multiple evaluation metrics and exhibit stronger self - correction capabilities.