Abstract:Planning and acting to solve `real' tasks using large language models (LLMs) in interactive environments has become a new frontier for AI methods. While recent advances allowed LLMs to interact with online tools, solve robotics tasks and many more, long range reasoning tasks remain a problem for LLMs. Existing methods to address this issue are very resource intensive and require additional data or human crafted rules, instead, we propose a simple method based on few-shot in-context learning alone to enhance `chain-of-thought' with state-tracking for planning and acting with LLMs. We show that our method establishes the new state-of-the-art on Alfworld for in-context learning methods (\textbf{+14\%} over the previous best few-shot in-context learning method) and performs on par with methods that use additional training data and additional tools such as code-execution. We also demonstrate that our enhanced `chain-of-states' allows the agent to both solve longer horizon problems and to be more efficient in number of steps required to solve a task. We show that our method works across a variety of LLMs for both API-based and open source ones. Finally, we also conduct ablation studies and show that `chain-of-thoughts' helps state-tracking accuracy, while a json-structure harms overall performance. We open-source our code and annotations at \url{<a class="link-external link-https" href="https://github.com/ai-nikolai/StateAct" rel="external noopener nofollow">this https URL</a>}.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the challenges that large - language models (LLMs) encounter when dealing with long - time - span tasks. Specifically, the author points out that although LLMs have made remarkable progress in interacting with online tools, solving robotic tasks, etc., they still have difficulties in tasks involving long - time reasoning. Existing methods usually require additional data or manually - written rules, which makes them resource - intensive and difficult to scale. To solve these problems, the author proposes a new method named **StateAct**, which is based on few - shot in - context learning. It enhances the chain - of - thought through "goal reminder" and "state tracking". This method not only does not require additional training data or external tools, but also can significantly improve the performance of LLMs in long - time reasoning tasks. #### Main contributions: 1. **Introduced "goal reminder" and "state tracking"**: By explicitly reminding the model of the current goal during each reasoning process and tracking the model's state (such as position and inventory), it helps the model to better perform long - time planning and reasoning. 2. **Improved the performance in long - time reasoning tasks**: In the Alfworld environment, StateAct has a 14% higher success rate than the previous best few - shot in - context learning method, and in some cases even exceeds the method using additional tools. 3. **Reduced the number of steps required to complete the task**: Experiments show that StateAct can not only solve longer - time - span tasks, but also reduce the number of steps required to complete the task, thus improving efficiency. #### Specific problem descriptions: - **Challenges in long - time reasoning tasks**: Existing LLMs perform poorly in handling long - time reasoning tasks, especially without additional resources. - **Resource - intensive solutions**: Existing solutions usually require additional data or manually - written rules, which makes them difficult to be widely applied. - **Improving efficiency and accuracy**: How to improve the performance and efficiency of LLMs in long - time reasoning tasks without adding additional resources. Through these improvements, StateAct provides a simple and effective method, enabling LLMs to better perform long - time reasoning tasks in complex interactive environments.

StateAct: State Tracking and Reasoning for Acting and Planning with Large Language Models

ReAct: Synergizing Reasoning and Acting in Language Models

LLM-State: Open World State Representation for Long-horizon Task Planning with Large Language Model

From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems

Integrating Large Language Models and Reinforcement Learning for Non-Linear Reasoning

Statler: State-Maintaining Language Models for Embodied Reasoning

Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency

Sequential Planning in Large Partially Observable Environments guided by LLMs

Improving Planning with Large Language Models: A Modular Agentic Architecture

Reasoning with Language Model is Planning with World Model

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models

ReasonPlanner: Enhancing Autonomous Planning in Dynamic Environments with Temporal Knowledge Graphs and LLMs

ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent

Learning to Act with Affordance-Aware Multimodal Neural SLAM

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Reason for Future, Act for Now: A Principled Architecture for Autonomous LLM Agents

Thought-Like-Pro: Enhancing Reasoning of Large Language Models through Self-Driven Prolog-based Chain-of-Thought

StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows

Theory of Mind for Multi-Agent Collaboration via Large Language Models

Selective Perception: Optimizing State Descriptions with Reinforcement Learning for Language Model Actors

Nl2Hltl2Plan: Scaling Up Natural Language Understanding for Multi-Robots Through Hierarchical Temporal Logic Task Representation