WESE: Weak Exploration to Strong Exploitation for LLM Agents

Xu Huang,Weiwen Liu,Xiaolong Chen,Xingmei Wang,Defu Lian,Yasheng Wang,Ruiming Tang,Enhong Chen
2024-04-11
Abstract:Recently, large language models (LLMs) have demonstrated remarkable potential as an intelligent agent. However, existing researches mainly focus on enhancing the agent's reasoning or decision-making abilities through well-designed prompt engineering or task-specific fine-tuning, ignoring the procedure of exploration and exploitation. When addressing complex tasks within open-world interactive environments, these methods exhibit limitations. Firstly, the lack of global information of environments leads to greedy decisions, resulting in sub-optimal solutions. On the other hand, irrelevant information acquired from the environment not only adversely introduces noise, but also incurs additional cost. This paper proposes a novel approach, Weak Exploration to Strong Exploitation (WESE), to enhance LLM agents in solving open-world interactive tasks. Concretely, WESE involves decoupling the exploration and exploitation process, employing a cost-effective weak agent to perform exploration tasks for global knowledge. A knowledge graph-based strategy is then introduced to store the acquired knowledge and extract task-relevant knowledge, enhancing the stronger agent in success rate and efficiency for the exploitation task. Our approach is flexible enough to incorporate diverse tasks, and obtains significant improvements in both success rates and efficiency across four interactive benchmarks.
Artificial Intelligence,Multiagent Systems
What problem does this paper attempt to address?
The paper aims to address two main issues faced by large language models (LLMs) as intelligent agents in open-world interaction tasks: 1. **Lack of global information leading to suboptimal decisions**: Due to the initial lack of overall understanding of the environment, LLMs may make suboptimal or non-optimal decisions, such as getting stuck in loops or taking inefficient paths when searching for specific items. 2. **A large amount of irrelevant information in the acquired data**: The information collected during exploration often contains many details that are not directly related to the current task. This irrelevant information not only interferes with the decision-making process of the LLM but also adds extra costs. To address the above challenges, the paper proposes a new method called "Weak Exploration to Strong Exploitation (WESE)." The core ideas of this method include: - **Decoupling exploration and exploitation**: By separating the exploration and exploitation processes, two different LLM agents are used to perform exploration and exploitation tasks respectively. The goal of the exploration phase is to interact with the environment to obtain information that helps solve the problem, while the exploitation phase involves reasoning and decision-making based on the acquired knowledge. - **Knowledge graph compression and retrieval strategy**: Information obtained during exploration is stored and organized in the form of a knowledge graph. A one-hop retrieval method is used to extract task-relevant information from the graph, thereby reducing the impact of irrelevant information. - **Cost-effective weak exploration**: It is observed that weaker LLMs (such as models with fewer parameters) are sufficient to complete exploration tasks. Therefore, the paper proposes using cost-effective weak LLMs for exploration and then leveraging the knowledge they acquire to enhance the performance of stronger LLMs during the exploitation phase. Experimental results show that the WESE method achieves significant improvements in success rate, efficiency, and cost across four open-world interaction benchmarks, particularly excelling in balancing effectiveness, efficiency, and cost.