Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration

Yun Qu,Boyuan Wang,Yuhang Jiang,Jianzhun Shao,Yixiu Mao,Cheems Wang,Chang Liu,Xiangyang Ji
2024-10-03
Abstract:With expansive state-action spaces, efficient multi-agent exploration remains a longstanding challenge in reinforcement learning. Although pursuing novelty, diversity, or uncertainty attracts increasing attention, redundant efforts brought by exploration without proper guidance choices poses a practical issue for the community. This paper introduces a systematic approach, termed LEMAE, choosing to channel informative task-relevant guidance from a knowledgeable Large Language Model (LLM) for Efficient Multi-Agent Exploration. Specifically, we ground linguistic knowledge from LLM into symbolic key states, that are critical for task fulfillment, in a discriminative manner at low LLM inference costs. To unleash the power of key states, we design Subspace-based Hindsight Intrinsic Reward (SHIR) to guide agents toward key states by increasing reward density. Additionally, we build the Key State Memory Tree (KSMT) to track transitions between key states in a specific task for organized exploration. Benefiting from diminishing redundant explorations, LEMAE outperforms existing SOTA approaches on the challenging benchmarks (e.g., SMAC and MPE) by a large margin, achieving a 10x acceleration in certain scenarios.
Artificial Intelligence,Multiagent Systems
What problem does this paper attempt to address?
This paper attempts to solve the problem of efficient exploration in multi - agent reinforcement learning (MARL). Specifically, it focuses on how to achieve efficient multi - agent exploration in environments with large state - action spaces, avoid redundant exploration, and improve exploration efficiency. ### Main problems: 1. **Redundant exploration**: Traditional exploration methods such as pursuing novelty, diversity, and uncertainty may lead to redundant exploration irrelevant to the task, especially in complex environments. 2. **Lack of effective guidance**: In multi - agent systems, due to the exponential expansion of the state - action space, the lack of effective task - related guidance will lead to low exploration efficiency. 3. **Requirements for real - world applications**: In practical applications, such as MOBA games, social sciences, and multi - vehicle control, more efficient multi - agent exploration methods are required. ### Solutions: The paper proposes a new framework LEMAE (Large Language Model Enables Efficient Multi - Agent Exploration) to solve the above problems by introducing task - related guidance from large language models (LLM). The main contributions of LEMAE include: 1. **Building a bridge**: Combine the knowledge of LLM with RL to develop a systematic framework LEMAE for efficient multi - agent exploration. 2. **Key - state location**: Design a computationally efficient reasoning strategy, use LLM to distinguish key states crucial for task completion as sub - goals, and conduct targeted exploration. 3. **Organized exploration**: Introduce the Key - State Memory Tree (KSMT) to track the transitions between key states, and design the Subspace - based Hindsight Intrinsic Reward (SHIR) to encourage agents to move towards key states and reduce redundant exploration. ### Method overview: - **Key - state location**: Generate discriminant functions through LLM to identify key states from trajectories. - **Key - state - guided exploration**: Use SHIR to increase the reward density and guide agents to move towards key states; at the same time, organize the exploration process through KSMT to track the transitions of key states. ### Experimental results: Experiments show that LEMAE significantly outperforms the existing state - of - the - art methods (SOTA) in multiple benchmark tests, and the acceleration rate reaches 10 times in some scenarios. In particular, in complex multi - agent exploration tasks, LEMAE can effectively reduce redundant exploration and improve exploration efficiency. ### Conclusion: LEMAE successfully solves the redundant problem in multi - agent exploration and improves exploration efficiency by integrating the task - related prior knowledge of LLM, showing its potential in real - world applications.