ReasonPlanner: Enhancing Autonomous Planning in Dynamic Environments with Temporal Knowledge Graphs and LLMs

Minh Pham Dinh,Munira Syed,Michael G Yankoski,Trenton W. Ford
2024-10-12
Abstract:Planning and performing interactive tasks, such as conducting experiments to determine the melting point of an unknown substance, is straightforward for humans but poses significant challenges for autonomous agents. We introduce ReasonPlanner, a novel generalist agent designed for reflective thinking, planning, and interactive reasoning. This agent leverages LLMs to plan hypothetical trajectories by building a World Model based on a Temporal Knowledge Graph. The agent interacts with the environment using a natural language actor-critic module, where the actor translates the imagined trajectory into a sequence of actionable steps, and the critic determines if replanning is necessary. ReasonPlanner significantly outperforms previous state-of-the-art prompting-based methods on the ScienceWorld benchmark by more than 1.8 times, while being more sample-efficient and interpretable. It relies solely on frozen weights thus requiring no gradient updates. ReasonPlanner can be deployed and utilized without specialized knowledge of Machine Learning, making it accessible to a wide range of users.
Computation and Language,Artificial Intelligence,Human-Computer Interaction
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper attempts to address the challenges of planning and executing complex interactive tasks in dynamic environments. Specifically, the paper proposes a novel general - purpose agent named ReasonPlanner, aiming to enhance autonomous planning capabilities, especially when conducting experimental tasks in simulated environments, such as determining the melting point of an unknown substance. #### Main problems 1. **Complex task planning**: For humans, planning and executing complex interactive tasks (such as experimental operations) are relatively simple, but it is very challenging for autonomous agents. Traditional reinforcement learning (RL) methods perform poorly when dealing with large - scale non - discrete action spaces, especially in text - based environments where the action space may grow polynomially. 2. **Dynamic environmental changes**: In dynamic environments, agents need to be able to "foresee" future scenarios and replan according to environmental changes. Existing RL and large - language - model (LLM) methods have limitations in this regard. 3. **Sample efficiency and interpretability**: Existing methods usually require a large amount of sample data for training and lack interpretability, making them difficult to understand and debug. #### Solutions 1. **Temporal Knowledge Graph (TKG)**: ReasonPlanner uses the Temporal Knowledge Graph to store and update environmental information as its World Model (WM). This enables the agent to construct an internal environmental representation and use it to predict future scenarios. 2. **Natural - language actor - critic module**: The agent interacts with the environment through a natural - language actor - critic module. The actor converts the imagined trajectory into a series of executable steps, while the critic evaluates the difference between the actual and predicted results and decides whether replanning is required. 3. **No weight update required**: ReasonPlanner relies on pre - trained LLM and does not require gradient updates, thereby improving sample efficiency and interpretability. #### Experimental results ReasonPlanner significantly outperforms existing prompt - based methods in the ScienceWorld benchmark, with an average score of over 65 (out of 100) and achieving full marks in multiple tasks. In addition, ReasonPlanner also performs well in terms of sample efficiency and interpretability, making it easier to deploy and use. ### Summary By combining the Temporal Knowledge Graph and large - language models, ReasonPlanner addresses the challenges of planning and executing complex tasks in dynamic environments, improves sample efficiency and interpretability, making it a promising autonomous planning solution.