PRACT: Optimizing Principled Reasoning and Acting of LLM Agent

Zhiwei Liu,Weiran Yao,Jianguo Zhang,Rithesh Murthy,Liangwei Yang,Zuxin Liu,Tian Lan,Ming Zhu,Juntao Tan,Shirley Kokane,Thai Hoang,Juan Carlos Niebles,Shelby Heinecke,Huan Wang,Silvio Savarese,Caiming Xiong
2024-10-24
Abstract:We introduce the Principled Reasoning and Acting (PRAct) framework, a novel method for learning and enforcing action principles from trajectory data. Central to our approach is the use of text gradients from a reflection and optimization engine to derive these action principles. To adapt action principles to specific task requirements, we propose a new optimization framework, Reflective Principle Optimization (RPO). After execution, RPO employs a reflector to critique current action principles and an optimizer to update them accordingly. We develop the RPO framework under two scenarios: Reward-RPO, which uses environmental rewards for reflection, and Self-RPO, which conducts self-reflection without external rewards. Additionally, two RPO methods, RPO-Traj and RPO-Batch, is introduced to adapt to different settings. Experimental results across four environments demonstrate that the PRAct agent, leveraging the RPO framework, effectively learns and applies action principles to enhance performance.
Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to optimize the reasoning and action capabilities of large - language - model (LLM) agents, especially the decision - making problem in the face of contradictory observations when performing multi - step tasks. Specifically, the author proposes a new framework - Principled Reasoning and Acting (PRAct), aiming to guide the behavior of LLM agents by introducing action principles and continuously optimize these principles through the Reflective Principle Optimization (RPO) framework to improve the performance of agents in different tasks. ### Problem Background Although existing LLM agents can perform actions and conduct continuous reasoning to a certain extent, their decision - making capabilities may be affected when dealing with complex tasks, especially when encountering contradictory or inconsistent observations. To solve this problem, this paper proposes the PRAct framework, which guides the behavior of agents by introducing action principles and continuously optimizes these principles through the RPO framework. ### Main Contributions 1. **PRAct Framework**: This is the first LLM - agent framework that takes action principles into account. By associating each action with specific conditions and guidelines, agents can better understand and execute tasks. 2. **RPO Optimization Methods**: Two optimization methods - RPO - Traj and RPO - Batch are proposed to meet the task requirements in different scenarios. RPO - Traj optimizes for each trajectory, while RPO - Batch optimizes after aggregating all reflection results. ### Experimental Verification The author conducted experiments in four different environments, including WebShop, Academia, Movie, and Weather, to verify the effectiveness of the PRAct framework. The experimental results show that the PRAct agent performs well in multiple tasks, especially in the WebShop environment, where the PRAct - B method outperforms other methods. ### Summary By introducing action principles and optimization mechanisms, the PRAct framework significantly improves the decision - making ability and execution effect of LLM agents in complex tasks. The RPO framework provides an effective way to continuously optimize these principles, thereby further enhancing the performance of agents.