Abstract:We introduce the Principled Reasoning and Acting (PRAct) framework, a novel method for learning and enforcing action principles from trajectory data. Central to our approach is the use of text gradients from a reflection and optimization engine to derive these action principles. To adapt action principles to specific task requirements, we propose a new optimization framework, Reflective Principle Optimization (RPO). After execution, RPO employs a reflector to critique current action principles and an optimizer to update them accordingly. We develop the RPO framework under two scenarios: Reward-RPO, which uses environmental rewards for reflection, and Self-RPO, which conducts self-reflection without external rewards. Additionally, two RPO methods, RPO-Traj and RPO-Batch, is introduced to adapt to different settings. Experimental results across four environments demonstrate that the PRAct agent, leveraging the RPO framework, effectively learns and applies action principles to enhance performance.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to optimize the reasoning and action capabilities of large - language - model (LLM) agents, especially the decision - making problem in the face of contradictory observations when performing multi - step tasks. Specifically, the author proposes a new framework - Principled Reasoning and Acting (PRAct), aiming to guide the behavior of LLM agents by introducing action principles and continuously optimize these principles through the Reflective Principle Optimization (RPO) framework to improve the performance of agents in different tasks. ### Problem Background Although existing LLM agents can perform actions and conduct continuous reasoning to a certain extent, their decision - making capabilities may be affected when dealing with complex tasks, especially when encountering contradictory or inconsistent observations. To solve this problem, this paper proposes the PRAct framework, which guides the behavior of agents by introducing action principles and continuously optimizes these principles through the RPO framework. ### Main Contributions 1. **PRAct Framework**: This is the first LLM - agent framework that takes action principles into account. By associating each action with specific conditions and guidelines, agents can better understand and execute tasks. 2. **RPO Optimization Methods**: Two optimization methods - RPO - Traj and RPO - Batch are proposed to meet the task requirements in different scenarios. RPO - Traj optimizes for each trajectory, while RPO - Batch optimizes after aggregating all reflection results. ### Experimental Verification The author conducted experiments in four different environments, including WebShop, Academia, Movie, and Weather, to verify the effectiveness of the PRAct framework. The experimental results show that the PRAct agent performs well in multiple tasks, especially in the WebShop environment, where the PRAct - B method outperforms other methods. ### Summary By introducing action principles and optimization mechanisms, the PRAct framework significantly improves the decision - making ability and execution effect of LLM agents in complex tasks. The RPO framework provides an effective way to continuously optimize these principles, thereby further enhancing the performance of agents.

PRACT: Optimizing Principled Reasoning and Acting of LLM Agent

Reason for Future, Act for Now: A Principled Architecture for Autonomous LLM Agents

Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency

PRIMER: Perception-Aware Robust Learning-based Multiagent Trajectory Planner

Devil's Advocate: Anticipatory Reflection for LLM Agents

ReAct Meets ActRe: When Language Agents Enjoy Training Data Autonomy

Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement

Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents

Reflective Policy Optimization

From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning

From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems

Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization

ReAct Meets ActRe: Autonomous Annotations of Agent Trajectories for Contrastive Self-Training

Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning

Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing

Efficient Reinforcement Learning via Decoupling Exploration and Utilization

Dynamic Planning for LLM-based Graphical User Interface Automation

Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration

Multi-Agent Cooperation Via Reasoning About The Behavior Of Others

PreAct: Prediction Enhances Agent's Planning Ability

STRIDE: A Tool-Assisted LLM Agent Framework for Strategic and Interactive Decision-Making