Abstract:Recently, large language models (LLMs) have demonstrated remarkable potential as an intelligent agent. However, existing researches mainly focus on enhancing the agent's reasoning or decision-making abilities through well-designed prompt engineering or task-specific fine-tuning, ignoring the procedure of exploration and exploitation. When addressing complex tasks within open-world interactive environments, these methods exhibit limitations. Firstly, the lack of global information of environments leads to greedy decisions, resulting in sub-optimal solutions. On the other hand, irrelevant information acquired from the environment not only adversely introduces noise, but also incurs additional cost. This paper proposes a novel approach, Weak Exploration to Strong Exploitation (WESE), to enhance LLM agents in solving open-world interactive tasks. Concretely, WESE involves decoupling the exploration and exploitation process, employing a cost-effective weak agent to perform exploration tasks for global knowledge. A knowledge graph-based strategy is then introduced to store the acquired knowledge and extract task-relevant knowledge, enhancing the stronger agent in success rate and efficiency for the exploitation task. Our approach is flexible enough to incorporate diverse tasks, and obtains significant improvements in both success rates and efficiency across four interactive benchmarks.

What problem does this paper attempt to address?

The paper aims to address two main issues faced by large language models (LLMs) as intelligent agents in open-world interaction tasks: 1. **Lack of global information leading to suboptimal decisions**: Due to the initial lack of overall understanding of the environment, LLMs may make suboptimal or non-optimal decisions, such as getting stuck in loops or taking inefficient paths when searching for specific items. 2. **A large amount of irrelevant information in the acquired data**: The information collected during exploration often contains many details that are not directly related to the current task. This irrelevant information not only interferes with the decision-making process of the LLM but also adds extra costs. To address the above challenges, the paper proposes a new method called "Weak Exploration to Strong Exploitation (WESE)." The core ideas of this method include: - **Decoupling exploration and exploitation**: By separating the exploration and exploitation processes, two different LLM agents are used to perform exploration and exploitation tasks respectively. The goal of the exploration phase is to interact with the environment to obtain information that helps solve the problem, while the exploitation phase involves reasoning and decision-making based on the acquired knowledge. - **Knowledge graph compression and retrieval strategy**: Information obtained during exploration is stored and organized in the form of a knowledge graph. A one-hop retrieval method is used to extract task-relevant information from the graph, thereby reducing the impact of irrelevant information. - **Cost-effective weak exploration**: It is observed that weaker LLMs (such as models with fewer parameters) are sufficient to complete exploration tasks. Therefore, the paper proposes using cost-effective weak LLMs for exploration and then leveraging the knowledge they acquire to enhance the performance of stronger LLMs during the exploitation phase. Experimental results show that the WESE method achieves significant improvements in success rate, efficiency, and cost across four open-world interaction benchmarks, particularly excelling in balancing effectiveness, efficiency, and cost.

WESE: Weak Exploration to Strong Exploitation for LLM Agents

Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration

ExpeL: LLM Agents Are Experiential Learners

WToE: Learning When to Explore in Multiagent Reinforcement Learning

Small LLMs Are Weak Tool Learners: A Multi-LLM Agent

From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning

Subspace-Aware Exploration for Sparse-Reward Multi-Agent Tasks.

EVOLvE: Evaluating and Optimizing LLMs For Exploration

Understanding the Weakness of Large Language Model Agents within a Complex Android Environment

Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents

Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation

OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization

AgentBench: Evaluating LLMs as Agents

Knowing What Not to Do: Leverage Language Model Insights for Action Space Pruning in Multi-agent Reinforcement Learning

Enabling Intelligent Interactions between an Agent and an LLM: A Reinforcement Learning Approach

Agent Planning with World Knowledge Model

Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information

Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models

Enabling Efficient Interaction between an Algorithm Agent and an LLM: A Reinforcement Learning Approach