AutoWebGLM: A Large Language Model-based Web Navigating Agent

Hanyu Lai,Xiao Liu,Iat Long Iong,Shuntian Yao,Yuxuan Chen,Pengbo Shen,Hao Yu,Hanchen Zhang,Xiaohan Zhang,Yuxiao Dong,Jie Tang

2024-10-12

Abstract:Large language models (LLMs) have fueled many intelligent web agents, but most existing ones perform far from satisfying in real-world web navigation tasks due to three factors: (1) the complexity of HTML text data (2) versatility of actions on webpages, and (3) task difficulty due to the open-domain nature of the web. In light of these challenges, we develop the open AutoWebGLM based on ChatGLM3-6B. AutoWebGLM can serve as a powerful automated web navigation agent that outperform GPT-4. Inspired by human browsing patterns, we first design an HTML simplification algorithm to represent webpages with vital information preserved succinctly. We then employ a hybrid human-AI method to build web browsing data for curriculum training. Finally, we bootstrap the model by reinforcement learning and rejection sampling to further facilitate webpage comprehension, browser operations, and efficient task decomposition by itself. For comprehensive evaluation, we establish a bilingual benchmark -- AutoWebBench -- for real-world web navigation tasks. We evaluate AutoWebGLM across diverse web navigation benchmarks, demonstrating its potential to tackle challenging tasks in real environments. Related code, model, and data are released at \url{<a class="link-external link-https" href="https://github.com/THUDM/AutoWebGLM" rel="external noopener nofollow">this https URL</a>}.

Computation and Language

What problem does this paper attempt to address?

The paper attempts to address the issue of poor performance of existing large language model (LLM)-based automatic web agents in real-world web navigation tasks. Specifically, these agents exhibit significant shortcomings when dealing with the following three challenges: 1. **Complexity of HTML Text Data**: Web pages contain a large amount of lengthy and structurally complex HTML code, making it difficult for LLMs to effectively understand and manipulate web content. 2. **Diversity of Actions on Web Pages**: There is a wide variety of interactive actions on web pages, including clicking, scrolling, and inputting, which existing agents struggle to comprehensively cover. 3. **Difficulty of Open-Domain Tasks**: The openness and diversity of the internet make task completion more challenging, and existing agents lack the ability to perform correct reasoning and self-checking in open-domain environments. To address these challenges, the authors developed AutoWebGLM, an automatic web navigation agent based on ChatGLM3-6B. By designing an HTML simplification algorithm, constructing a hybrid human-machine dataset, and employing methods such as reinforcement learning and rejection sampling fine-tuning, AutoWebGLM is able to perform excellently in various web navigation tasks, even surpassing GPT-4. Additionally, the authors created a bilingual benchmark dataset, AutoWebBench, to evaluate the agent's performance in real-world environments.

AutoWebGLM: A Large Language Model-based Web Navigating Agent

AutoGLM: Autonomous Foundation Agents for GUIs

OpenWebAgent: An Open Toolkit to Enable Web Agents on Large Language Models

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation

Large Language Model Powered Agents in the Web

Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning

WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models

A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis

Large Language Model-Brained GUI Agents: A Survey

Auto-Intent: Automated Intent Discovery and Self-Exploration for Large Language Model Web Agents

CogAgent: A Visual Language Model for GUI Agents

AutoGuide: Automated Generation and Selection of Context-Aware Guidelines for Large Language Model Agents

Large Language Models Can Self-Improve At Web Agent Tasks

Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents

WebLINX: Real-World Website Navigation with Multi-Turn Dialogue

Large Language Models Empowered Personalized Web Agents

LASER: LLM Agent with State-Space Exploration for Web Navigation

GPT-4V(ision) is a Generalist Web Agent, if Grounded

WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences

AcawebAgent: A Large Language Model-Powered Assistant for Early Academic Research