Abstract:Answering complex natural language questions often necessitates multi-step reasoning and integrating external information. Several systems have combined knowledge retrieval with a large language model (LLM) to answer such questions. These systems, however, suffer from various failure cases, and we cannot directly train them end-to-end to fix such failures, as interaction with external knowledge is non-differentiable. To address these deficiencies, we define a ReAct-style LLM agent with the ability to reason and act upon external knowledge. We further refine the agent through a ReST-like method that iteratively trains on previous trajectories, employing growing-batch reinforcement learning with AI feedback for continuous self-improvement and self-distillation. Starting from a prompted large model and after just two iterations of the algorithm, we can produce a fine-tuned small model that achieves comparable performance on challenging compositional question-answering benchmarks with two orders of magnitude fewer parameters.

What problem does this paper attempt to address?

The problems that this paper attempts to solve are: How to improve the performance and robustness of multi - step reasoning large language model (LLM) agents when answering complex natural language questions. Specifically, the author focuses on: 1. **Multi - step Reasoning and Integration of External Knowledge**: Many complex questions require multi - step reasoning and combination with external information to be answered. Although existing systems combine knowledge retrieval with large language models to answer such questions, there are various failure cases, and because the process of interacting with external knowledge is non - differentiable, end - to - end training cannot be carried out directly. 2. **Lack of High - Quality Multi - step Labeled Data**: For process - supervision - based systems, obtaining high - quality multi - step labeled data is very difficult and expensive, which limits the improvement of the model. To solve these problems, the author proposes a method that combines the ReAct - style reasoning mechanism and the ReST - style iterative self - training method, which is achieved in the following ways: - Define a ReAct - style agent with self - critical ability, which can perform multi - step reasoning and take actions based on external knowledge. - Adopt the ReST - style method, through iterative training of previous trajectories, using gradually increasing batches of reinforcement learning and AI feedback, to achieve continuous self - improvement and self - distillation. - Starting from a pre - trained large model, after only two algorithm iterations, generate a small model with two orders of magnitude fewer parameters, whose performance in complex combinatorial question - answering benchmarks is comparable to that of the large model. Through these methods, the author aims to improve the ability of multi - step reasoning LLM agents in handling complex problems and reduce the dependence on manually labeled data.

ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent

ReAct: Synergizing Reasoning and Acting in Language Models

ReSpAct: Harmonizing Reasoning, Speaking, and Acting Towards Building Large Language Model-Based Conversational AI Agents

Enhancing LLM Reasoning with Multi-Path Collaborative Reactive and Reflection agents

Focused ReAct: Improving ReAct through Reiterate and Early Stop

Do Large Language Models with Reasoning and Acting Meet the Needs of Task-Oriented Dialogue?

ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs

ReAct Meets ActRe: When Language Agents Enjoy Training Data Autonomy

Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency

Textualized Agent-Style Reasoning for Complex Tasks by Multiple Round LLM Generation

On the Brittle Foundations of ReAct Prompting for Agentic Large Language Models

Let's Be Self-generated via Step by Step: A Curriculum Learning Approach to Automated Reasoning with Large Language Models

Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration

KG-Agent: An Efficient Autonomous Agent Framework for Complex Reasoning over Knowledge Graph

Enhancing LLM Problem Solving with REAP: Reflection, Explicit Problem Deconstruction, and Advanced Prompting

MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning

ReAct Meets ActRe: Autonomous Annotations of Agent Trajectories for Contrastive Self-Training

MALT: Improving Reasoning with Multi-Agent LLM Training

Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents

Reason for Future, Act for Now: A Principled Architecture for Autonomous LLM Agents

From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models