Abstract:Autonomous agents have demonstrated significant potential in automating complex multistep decision-making tasks. However, even state-of-the-art vision-language models (VLMs), such as GPT-4o, still fall short of human-level performance, particularly in intricate web environments and long-horizon tasks. To address these limitations, we present ExACT, an approach to combine test-time search and self-learning to build o1-like models for agentic applications. We first introduce Reflective Monte Carlo Tree Search (R-MCTS), a novel test time algorithm designed to enhance AI agents' ability to explore decision space on the fly. R-MCTS extends traditional MCTS by 1) incorporating contrastive reflection, allowing agents to learn from past interactions and dynamically improve their search efficiency; and 2) using multi-agent debate for reliable state evaluation. Next, we introduce Exploratory Learning, a novel learning strategy to teach agents to search at inference time without relying on any external search algorithms. On the challenging VisualWebArena benchmark, our GPT-4o based R-MCTS agent achieves a 6% to 30% relative improvement across various tasks compared to the previous state-of-the-art. Additionally, we show that the knowledge and experience gained from test-time search can be effectively transferred back to GPT-4o via fine-tuning. After Exploratory Learning, GPT-4o 1) demonstrates the ability to explore the environment, evaluate a state, and backtrack to viable ones when it detects that the current state cannot lead to success, and 2) matches 87% of R-MCTS's performance while using significantly less compute. Notably, our work demonstrates the compute scaling properties in both training - data collection with R-MCTS - and testing time. These results suggest a promising research direction to enhance VLMs' capabilities for agentic applications via test-time search and self-learning.

Efficient Searching With MCTS and Imitation Learning: A Case Study in Pommerman

A Fast Evolutionary adaptation for MCTS in Pommerman

Multi-Agent Training for Pommerman: Curriculum Learning and Population-based Self-Play Approach

Large Scale Pursuit-Evasion under Collision Avoidance Using Deep Reinforcement Learning.

Know your Enemy: Investigating Monte-Carlo Tree Search with Opponent Models in Pommerman

School of hard knocks: Curriculum analysis for Pommerman with a fixed computational budget

An Efficient Dynamic Sampling Policy for Monte Carlo Tree Search.

A Surprisingly Simple Continuous-Action POMDP Solver: Lazy Cross-Entropy Search Over Policy Trees

Thompson Sampling Based Monte-Carlo Planning in POMDPs.

Spending Thinking Time Wisely: Accelerating MCTS with Virtual Expansions

A Self-Learning Monte Carlo Tree Search Algorithm for Robot Path Planning.

A Partially Observable Monte Carlo Planning Algorithm Based on Path Modification.

Optimized Monte Carlo Tree Search for Enhanced Decision Making in the FrozenLake Environment

Pommerman: A Multi-Agent Playground

ExACT: Teaching AI Agents to Explore with Reflective-MCTS and Exploratory Learning

Imagine, Initialize, and Explore: An Effective Exploration Method in Multi-Agent Reinforcement Learning

Towards a Characterisation of Monte-Carlo Tree Search Performance in Different Games

Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents

Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning

SCRIMP: Scalable Communication for Reinforcement- and Imitation-Learning-Based Multi-Agent Pathfinding

Mimicking To Dominate: Imitation Learning Strategies for Success in Multiagent Competitive Games