Abstract:With expansive state-action spaces, efficient multi-agent exploration remains a longstanding challenge in reinforcement learning. Although pursuing novelty, diversity, or uncertainty attracts increasing attention, redundant efforts brought by exploration without proper guidance choices poses a practical issue for the community. This paper introduces a systematic approach, termed LEMAE, choosing to channel informative task-relevant guidance from a knowledgeable Large Language Model (LLM) for Efficient Multi-Agent Exploration. Specifically, we ground linguistic knowledge from LLM into symbolic key states, that are critical for task fulfillment, in a discriminative manner at low LLM inference costs. To unleash the power of key states, we design Subspace-based Hindsight Intrinsic Reward (SHIR) to guide agents toward key states by increasing reward density. Additionally, we build the Key State Memory Tree (KSMT) to track transitions between key states in a specific task for organized exploration. Benefiting from diminishing redundant explorations, LEMAE outperforms existing SOTA approaches on the challenging benchmarks (e.g., SMAC and MPE) by a large margin, achieving a 10x acceleration in certain scenarios.

What problem does this paper attempt to address?

This paper attempts to solve the problem of efficient exploration in multi - agent reinforcement learning (MARL). Specifically, it focuses on how to achieve efficient multi - agent exploration in environments with large state - action spaces, avoid redundant exploration, and improve exploration efficiency. ### Main problems: 1. **Redundant exploration**: Traditional exploration methods such as pursuing novelty, diversity, and uncertainty may lead to redundant exploration irrelevant to the task, especially in complex environments. 2. **Lack of effective guidance**: In multi - agent systems, due to the exponential expansion of the state - action space, the lack of effective task - related guidance will lead to low exploration efficiency. 3. **Requirements for real - world applications**: In practical applications, such as MOBA games, social sciences, and multi - vehicle control, more efficient multi - agent exploration methods are required. ### Solutions: The paper proposes a new framework LEMAE (Large Language Model Enables Efficient Multi - Agent Exploration) to solve the above problems by introducing task - related guidance from large language models (LLM). The main contributions of LEMAE include: 1. **Building a bridge**: Combine the knowledge of LLM with RL to develop a systematic framework LEMAE for efficient multi - agent exploration. 2. **Key - state location**: Design a computationally efficient reasoning strategy, use LLM to distinguish key states crucial for task completion as sub - goals, and conduct targeted exploration. 3. **Organized exploration**: Introduce the Key - State Memory Tree (KSMT) to track the transitions between key states, and design the Subspace - based Hindsight Intrinsic Reward (SHIR) to encourage agents to move towards key states and reduce redundant exploration. ### Method overview: - **Key - state location**: Generate discriminant functions through LLM to identify key states from trajectories. - **Key - state - guided exploration**: Use SHIR to increase the reward density and guide agents to move towards key states; at the same time, organize the exploration process through KSMT to track the transitions of key states. ### Experimental results: Experiments show that LEMAE significantly outperforms the existing state - of - the - art methods (SOTA) in multiple benchmark tests, and the acceleration rate reaches 10 times in some scenarios. In particular, in complex multi - agent exploration tasks, LEMAE can effectively reduce redundant exploration and improve exploration efficiency. ### Conclusion: LEMAE successfully solves the redundant problem in multi - agent exploration and improves exploration efficiency by integrating the task - related prior knowledge of LLM, showing its potential in real - world applications.

Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration

S2rl

Optimal Exploration Algorithm of Multi-Agent Reinforcement Learning Methods (Student Abstract)

From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning

Multi-agent Exploration with Sub-state Entropy Estimation

Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models

Knowing What Not to Do: Leverage Language Model Insights for Action Space Pruning in Multi-agent Reinforcement Learning

EVOLvE: Evaluating and Optimizing LLMs For Exploration

From Laws to Motivation: Guiding Exploration through Law-Based Reasoning and Rewards

From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems

Self-Motivated Multi-Agent Exploration

Enabling Intelligent Interactions between an Agent and an LLM: A Reinforcement Learning Approach

Option-based Multi-agent Exploration

Episodic Multi-agent Reinforcement Learning with Curiosity-driven Exploration

Improving Cooperative Multi-Agent Exploration via Surprise Minimization and Social Influence Maximization

Unleashing the Creative Mind: Language Model As Hierarchical Policy For Improved Exploration on Challenging Problem Solving

CMBE: Curiosity-driven Model-Based Exploration for Multi-Agent Reinforcement Learning in Sparse Reward Settings

Words as Beacons: Guiding RL Agents with High-Level Language Prompts

YOLO-MARL: You Only LLM Once for Multi-agent Reinforcement Learning

Imagine, Initialize, and Explore: An Effective Exploration Method in Multi-Agent Reinforcement Learning

Enabling Efficient Interaction between an Algorithm Agent and an LLM: A Reinforcement Learning Approach