WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration

Yao Zhang,Zijian Ma,Yunpu Ma,Zhen Han,Yu Wu,Volker Tresp

2024-08-29

Abstract:LLM-based autonomous agents often fail to execute complex web tasks that require dynamic interaction due to the inherent uncertainty and complexity of these environments. Existing LLM-based web agents typically rely on rigid, expert-designed policies specific to certain states and actions, which lack the flexibility and generalizability needed to adapt to unseen tasks. In contrast, humans excel by exploring unknowns, continuously adapting strategies, and resolving ambiguities through exploration. To emulate human-like adaptability, web agents need strategic exploration and complex decision-making. Monte Carlo Tree Search (MCTS) is well-suited for this, but classical MCTS struggles with vast action spaces, unpredictable state transitions, and incomplete information in web tasks. In light of this, we develop WebPilot, a multi-agent system with a dual optimization strategy that improves MCTS to better handle complex web environments. Specifically, the Global Optimization phase involves generating a high-level plan by breaking down tasks into manageable subtasks and continuously refining this plan, thereby focusing the search process and mitigating the challenges posed by vast action spaces in classical MCTS. Subsequently, the Local Optimization phase executes each subtask using a tailored MCTS designed for complex environments, effectively addressing uncertainties and managing incomplete information. Experimental results on WebArena and MiniWoB++ demonstrate the effectiveness of WebPilot. Notably, on WebArena, WebPilot achieves SOTA performance with GPT-4, achieving a 93% relative increase in success rate over the concurrent tree search-based method. WebPilot marks a significant advancement in general autonomous agent capabilities, paving the way for more advanced and reliable decision-making in practical environments.

Artificial Intelligence

What problem does this paper attempt to address?

The paper aims to address the issues faced by current network agents based on large language models (LLMs) when performing complex network tasks, especially those requiring dynamic interactions. Specifically, existing LLM network agents often rely on rigid strategies designed for specific states and actions, making it difficult for them to adapt to unseen tasks or handle uncertainty and incomplete information. To solve these problems, the paper proposes the WebPilot system, a multi-agent system that employs a dual optimization strategy (global optimization and local optimization) to improve the traditional Monte Carlo Tree Search (MCTS) method, enabling it to better tackle challenges in complex network environments. The main contributions of WebPilot include: 1. Introducing an autonomous multi-agent system, WebPilot, which combines global and local MCTS heuristic optimization strategies, endowing agents with human-like exploration, adaptation, and decision-making capabilities. 2. Developing a hierarchical reflection mechanism, including strategic reflection in global optimization and tactical reflection in local optimization, significantly enhancing adaptive learning and decision-making in ever-changing environments. 3. Proposing a new granular bidirectional self-reward mechanism that guides MCTS by integrating action effects with goal-oriented potential, achieving more precise evaluations in dynamic and ambiguous environments. 4. Demonstrating state-of-the-art performance in challenging benchmarks such as WebArena, significantly improving the ability of general autonomous agents to handle complex real-world tasks, especially in intricate real network environments.

WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration

Cooperative multi-agent target searching: a deep reinforcement learning approach based on parallel hindsight experience replay

Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents

ExACT: Teaching AI Agents to Explore with Reflective-MCTS and Exploratory Learning

A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis

OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization

Tree Search for Language Model Agents

WebArena: A Realistic Web Environment for Building Autonomous Agents

SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative Refinement

Multi-UAV Cooperative Search in Multi-Layered Aerial Computing Networks: A Multi-Agent Deep Reinforcement Learning Approach

LaMMA-P: Generalizable Multi-Agent Long-Horizon Task Allocation and Planning with LM-Driven PDDL Planner

AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents

A Multi-Agent Collaboration Scheme for Energy-Efficient Task Scheduling in a 3D UAV-MEC Space

Realization of Multi-Agent Planning System for Autonomous Spacecraft

Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation

Self-Motivated Multi-Agent Exploration

WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models

Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents

DIMASS: A Delaunay-Inspired, Hybrid Approach to a Team of Agents Search Strategy

Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents

Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents