Abstract:Policy diversity, encompassing the variety of policies an agent can adopt, enhances reinforcement learning (RL) success by fostering more robust, adaptable, and innovative problem-solving in the environment. The environment in which standard RL operates is usually modeled with a Markov Decision Process (MDP) as the theoretical foundation. However, in many real-world scenarios, the rewards depend on an agent's history of states and actions leading to a non-MDP. Under the premise of policy diffusion initialization, non-MDPs may have unstructured expanding solution space due to varying historical information and temporal dependencies. This results in solutions having non-equivalent closed forms in non-MDPs. In this paper, deriving diverse solutions for non-MDPs requires policies to break through the boundaries of the current solution space through gradual dispersion. The goal is to expand the solution space, thereby obtaining more diverse policies. Specifically, we first model the sequences of states and actions by a transformer-based method to learn policy embeddings for dispersion in the solution space, since the transformer has advantages in handling sequential data and capturing long-range dependencies for non-MDP. Then, we stack the policy embeddings to construct a dispersion matrix as the policy diversity measure to induce the policy dispersion in the solution space and obtain a set of diverse policies. Finally, we prove that if the dispersion matrix is positive definite, the dispersed embeddings can effectively enlarge the disagreements across policies, yielding a diverse expression for the original policy embedding distribution. Experimental results of both non-MDP and MDP environments show that this dispersion scheme can obtain more expressive diverse policies via expanding the solution space, showing more robust performance than the recent learning baselines.

Progressive Diversifying Policy for Multi-Agent Reinforcement Learning

Multiagent Reinforcement Learning for Strictly Constrained Tasks Based on Reward Recorder

Policy Diversity for Cooperative Agents

Non-local Policy Optimization via Diversity-regularized Collaborative Exploration

Celebrating Diversity in Shared Multi-Agent Reinforcement Learning

Measuring Policy Distance for Multi-Agent Reinforcement Learning

A Dynamically Adaptive Approach to Reducing Strategic Interference for Multi-agent Systems

Controlling Behavioral Diversity in Multi-Agent Reinforcement Learning

Dueling Network Architecture for Multi-Agent Deep Deterministic Policy Gradient

Policy Diagnosis via Measuring Role Diversity in Cooperative Multi-agent RL.

Optimal Exploration Algorithm of Multi-Agent Reinforcement Learning Methods (Student Abstract)

A Collaborative Multiagent Reinforcement Learning Method Based on Policy Gradient Potential

Heterogeneous Multi-Agent Reinforcement Learning for Zero-Shot Scalable Collaboration

Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning

HyperMARL: Adaptive Hypernetworks for Multi-Agent RL

Improving Multi-agent Reinforcement Learning with Stable Prefix Policy

Proximal Policy Gradient Arborescence for Quality Diversity Reinforcement Learning

An off-policy multi-agent stochastic policy gradient algorithm for cooperative continuous control

Diversifying Policies With Non-Markov Dispersion to Expand the Solution Space

Boosting Weak-to-Strong Agents in Multiagent Reinforcement Learning via Balanced PPO

VMAPD: Generate Diverse Solutions for Multi-Agent Games with Recurrent Trajectory Discriminators.