LangSuitE: Planning, Controlling and Interacting with Large Language Models in Embodied Text Environments

Zixia Jia,Mengmeng Wang,Baichen Tong,Song-Chun Zhu,Zilong Zheng
2024-06-24
Abstract:Recent advances in Large Language Models (LLMs) have shown inspiring achievements in constructing autonomous agents that rely on language descriptions as inputs. However, it remains unclear how well LLMs can function as few-shot or zero-shot embodied agents in dynamic interactive environments. To address this gap, we introduce LangSuitE, a versatile and simulation-free testbed featuring 6 representative embodied tasks in textual embodied worlds. Compared with previous LLM-based testbeds, LangSuitE (i) offers adaptability to diverse environments without multiple simulation engines, (ii) evaluates agents' capacity to develop ``internalized world knowledge'' with embodied observations, and (iii) allows easy customization of communication and action strategies. To address the embodiment challenge, we devise a novel chain-of-thought (CoT) schema, EmMem, which summarizes embodied states w.r.t. history information. Comprehensive benchmark results illustrate challenges and insights of embodied planning. LangSuitE represents a significant step toward building embodied generalists in the context of language models.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
This paper aims to address the performance issues of large language models (LLMs) in dynamic interactive environments, particularly whether LLMs can function as embodied agents to complete complex tasks without the need for multimodal perception errors. The paper introduces LangSuit ·E, a novel, simulator-free testing platform designed to evaluate the ability of LLMs to perform various embodied tasks in a textualized embodied world. Specifically, LangSuit ·E has the following features: 1. **Adaptability**: Capable of operating in different environments without relying on multiple simulation engines. 2. **Internal Knowledge Formation**: Evaluates the ability of agents to form "intrinsic world knowledge" through embodied observation. 3. **Policy Customization**: Allows easy customization of communication and action policies. To tackle embodied challenges, the researchers propose a new Chain-of-Thought (CoT) mode—EmMem, which is used to summarize the embodied state based on historical information. Extensive benchmark results demonstrate the challenges and insights of embodied planning, indicating that LangSuit ·E represents a significant step forward in building general embodied agents.