Asking Before Action: Gather Information in Embodied Decision Making with Language Models

Xiaoyu Chen,Shenao Zhang,Pushi Zhang,Li Zhao,Jianyu Chen
DOI: https://doi.org/10.48550/arxiv.2305.15695
2023-01-01
Abstract:With strong capabilities of reasoning and a broad understanding of the world,Large Language Models (LLMs) have demonstrated immense potential in buildingversatile embodied decision-making agents capable of executing a wide array oftasks. Nevertheless, when deployed in unfamiliar environments, we show that LLMagents encounter challenges in efficiently gathering essential information,leading to suboptimal performance. Conversely, human individuals often seekadditional information from their peers prior to taking action, harnessingexternal knowledge to avoid unnecessary trial and error. Drawing inspirationfrom this behavior, we propose Asking Before Acting (ABA), a methodthat empowers the agent to proactively inquire with external sources forpertinent information using natural language during their interactions withinthe environment. In this way, the agent is able to enhance its efficiency andperformance by circumventing potentially laborious steps and combating thedifficulties associated with exploration in unfamiliar environments andvagueness of the instructions. We conduct extensive experiments involving aspectrum of environments including text-based household everyday tasks, robotarm manipulation tasks, and real world open domain image based embodied tasks.The experiments involve various models from Vicuna to GPT-4. The resultsdemonstrate that, even with modest prompts modifications, ABA exhibitssubstantial advantages on both performance and efficiency over baseline LLMagents. Further finetuning ABA with reformulated metadata (ABA-FT) faciliateslearning the rationale for asking and allows for additional enhancementsespecially in tasks that baselines struggle to solve.
What problem does this paper attempt to address?