WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

Zehan Qi,Xiao Liu,Iat Long Iong,Hanyu Lai,Xueqiao Sun,Xinyue Yang,Jiadai Sun,Yu Yang,Shuntian Yao,Tianjie Zhang,Wei Xu,Jie Tang,Yuxiao Dong

DOI: https://doi.org/10.48550/arXiv.2411.02337

2024-11-05

Abstract:Large language models (LLMs) have shown remarkable potential as autonomous agents, particularly in web-based tasks. However, existing LLM web agents heavily rely on expensive proprietary LLM APIs, while open LLMs lack the necessary decision-making capabilities. This paper introduces WebRL, a self-evolving online curriculum reinforcement learning framework designed to train high-performance web agents using open LLMs. WebRL addresses three key challenges in building LLM web agents, including the scarcity of training tasks, sparse feedback signals, and policy distribution drift in online learning. Specifically, WebRL incorporates 1) a self-evolving curriculum that generates new tasks from unsuccessful attempts, 2) a robust outcome-supervised reward model (ORM), and 3) adaptive reinforcement learning strategies to ensure consistent improvements. We apply WebRL to transform open Llama-3.1 and GLM-4 models into proficient web agents. On WebArena-Lite, WebRL improves the success rate of Llama-3.1-8B from 4.8% to 42.4%, and from 6.1% to 43% for GLM-4-9B. These open models significantly surpass the performance of GPT-4-Turbo (17.6%) and GPT-4o (13.9%) and outperform previous state-of-the-art web agents trained on open LLMs (AutoWebGLM, 18.2%). Our findings demonstrate WebRL's effectiveness in bridging the gap between open and proprietary LLM-based web agents, paving the way for more accessible and powerful autonomous web interaction systems.

Computation and Language

What problem does this paper attempt to address?

The problems that this paper attempts to solve mainly focus on how to use open - source large language models (LLMs) to train efficient web agents, especially achieving this goal through the self - evolving curriculum reinforcement learning framework (WEBRL) in an online environment. Specifically, the paper aims to overcome the following three key challenges: 1. **Scarcity of training tasks**: Unlike offline datasets, online benchmarks such as WebArena usually only provide a limited test set for evaluation, which greatly restricts the effective training of agents in these environments. 2. **Sparsity and cost of feedback signals**: Due to the lack of task - specific evaluation functions, it becomes difficult to successfully evaluate arbitrary web - browsing tasks. Moreover, tasks in WebArena usually have a long time span and on average require about 10 steps to complete, which leads to a significant sparsity of available signals during the online exploration process. 3. **Policy distribution drift in online learning**: Since there is no predefined training set, online exploration must be carried out, which inevitably leads to the distribution drift of agent policies and may cause catastrophic forgetting and performance degradation. To address these challenges, the paper proposes the WEBRL framework, which generates new tasks through a self - evolving curriculum learning strategy and combines a powerful outcome - supervised reward model (ORM) and an adaptive reinforcement learning strategy to ensure continuous performance improvement. Experimental results show that WEBRL can significantly improve the success rate of open - source LLMs on WebArena - Lite, even surpassing the state - of - the - art proprietary LLM APIs (such as GPT - 4 - Turbo) and other open - source LLM - based web agents.

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

Large Language Model as a Policy Teacher for Training Reinforcement Learning Agents

AutoWebGLM: A Large Language Model-based Web Navigating Agent

Large Language Models Can Self-Improve At Web Agent Tasks

Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning

SAC-GLAM: Improving Online RL for LLM agents with Soft Actor-Critic and Hindsight Relabeling

Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts

An Extremely Data-efficient and Generative LLM-based Reinforcement Learning Agent for Recommenders

Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation

A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis

AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents

Enabling Intelligent Interactions between an Agent and an LLM: A Reinforcement Learning Approach

Large Language Model Powered Agents in the Web

DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning

Online Preference-based Reinforcement Learning with Self-augmented Feedback from Large Language Model

Words as Beacons: Guiding RL Agents with High-Level Language Prompts

OpenWebAgent: An Open Toolkit to Enable Web Agents on Large Language Models

Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents

LMRL: a Multi-Agent Reinforcement Learning Model and Algorithm

AGILE: A Novel Reinforcement Learning Framework of LLM Agents

Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning