AdaPlanner: Adaptive Planning from Feedback with Language Models

Haotian Sun,Yuchen Zhuang,Lingkai Kong,Bo Dai,Chao Zhang
DOI: https://doi.org/10.48550/arXiv.2305.16653
2023-05-26
Abstract:Large language models (LLMs) have recently demonstrated the potential in acting as autonomous agents for sequential decision-making tasks. However, most existing methods either take actions greedily without planning or rely on static plans that are not adaptable to environmental feedback. Consequently, the sequential decision-making performance of LLM agents degenerates with problem complexity and plan horizons increase. We propose a closed-loop approach, AdaPlanner, which allows the LLM agent to refine its self-generated plan adaptively in response to environmental feedback. In AdaPlanner, the LLM agent adaptively refines its plan from feedback with both in-plan and out-of-plan refinement strategies. To mitigate hallucination, we develop a code-style LLM prompt structure that facilitates plan generation across a variety of tasks, environments, and agent capabilities. Furthermore, we propose a skill discovery mechanism that leverages successful plans as few-shot exemplars, enabling the agent to plan and refine with fewer task demonstrations. Our experiments in the ALFWorld and MiniWoB++ environments demonstrate that AdaPlanner outperforms state-of-the-art baselines by 3.73% and 4.11% while utilizing 2x and 600x fewer samples, respectively.
Computation and Language,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the limitations encountered by existing large - language - model - based (LLMs) autonomous agents in sequential decision - making tasks. Specifically: 1. **Lack of Adaptability**: Existing methods either adopt a greedy strategy (i.e., acting directly without planning) or rely on static plans that cannot be adjusted according to environmental feedback. This leads to a decline in the sequential decision - making performance of LLM agents when the problem complexity and planning time horizon increase. 2. **Insufficient Utilization of Feedback**: Although some methods attempt to adjust decisions through environmental feedback, they usually only update the current execution action, rather than the entire plan. This means that these methods may make short - term adaptations to environmental changes, but may have adverse effects in the long run. 3. **Inaccurate Initial Planning**: Even if the locally optimal action is taken at each step, if there are errors in the initial plan, it may eventually lead to task failure or non - completion. To solve the above problems, the authors propose a closed - loop method - AdaPlanner. AdaPlanner allows LLM agents to adaptively refine their automatically generated plans according to environmental feedback. It achieves this goal through the following mechanisms: - **Planning and Refinement**: The LLM agents in AdaPlanner are not only responsible for generating the initial plan, but also able to dynamically adjust the plan during execution according to environmental feedback. This includes two types of refinement strategies: **in - plan refinement** (dealing with expected feedback) and **out - of - plan refinement** (dealing with unexpected feedback). - **Code - Style Prompt Structure**: To reduce hallucinations (i.e., the model generating untrue or irrelevant information), AdaPlanner adopts a code - style LLM prompt structure, which is helpful for generating plans under multiple tasks, environments, and agent capabilities. - **Skill Discovery Mechanism**: AdaPlanner also introduces a skill discovery mechanism, which uses successful plans as few - shot examples, enabling agents to plan and refine with fewer task demonstrations. Experimental results show that AdaPlanner outperforms existing state - of - the - art baseline methods in both ALFWorld and MiniWoB++ environments, increasing the success rate by 3.73% and 4.11% respectively, while using 1/2 and 1/600 of the number of samples used by other methods respectively. These results demonstrate the effectiveness and efficiency of AdaPlanner in using environmental feedback for plan refinement.