Abstract:Measuring and promoting policy diversity is critical for solving games with strong non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e.g., Rock-Paper-Scissors). With that in mind, maintaining a pool of diverse policies via open-ended learning is an attractive solution, which can generate auto-curricula to avoid being exploited. However, in conventional open-ended learning algorithms, there are no widely accepted definitions for diversity, making it hard to construct and evaluate the diverse policies. In this work, we summarize previous concepts of diversity and work towards offering a unified measure of diversity in multi-agent open-ended learning to include all elements in Markov games, based on both Behavioral Diversity (BD) and Response Diversity (RD). At the trajectory distribution level, we re-define BD in the state-action space as the discrepancies of occupancy measures. For the reward dynamics, we propose RD to characterize diversity through the responses of policies when encountering different opponents. We also show that many current diversity measures fall in one of the categories of BD or RD but not both. With this unified diversity measure, we design the corresponding diversity-promoting objective and population effectivity when seeking the best responses in open-ended learning. We validate our methods in both relatively simple games like matrix game, non-transitive mixture model, and the complex \textit{Google Research Football} environment. The population found by our methods reveals the lowest exploitability, highest population effectivity in matrix game and non-transitive mixture model, as well as the largest goal difference when interacting with opponents of various levels in \textit{Google Research Football}.

Quantifying the effects of environment and population diversity in multi-agent reinforcement learning

System Neural Diversity: Measuring Behavioral Heterogeneity in Multi-Agent Learning

Celebrating Diversity in Shared Multi-Agent Reinforcement Learning

Social diversity and social preferences in mixed-motive reinforcement learning

Towards Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games

A Unified Diversity Measure for Multiagent Reinforcement Learning

Controlling Behavioral Diversity in Multi-Agent Reinforcement Learning

Diverse Auto-Curriculum is Critical for Successful Real-World Multiagent Learning Systems

Quality-Similar Diversity via Population Based Reinforcement Learning

Iteratively Learning Novel Strategies with Diversity Measured in State Distances

Evolutionary Multi-agent Reinforcement Learning in Group Social Dilemmas

Enhanced Generalization through Prioritization and Diversity in Self-Imitation Reinforcement Learning over Procedural Environments with Sparse Rewards

Modelling Behavioural Diversity for Learning in Open-Ended Games

Diversity Induced Environment Design via Self-Play

Effects of Different Optimization Formulations in Evolutionary Reinforcement Learning on Diverse Behavior Generation

Measuring Policy Distance for Multi-Agent Reinforcement Learning

Generating and Adapting to Diverse Ad-Hoc Cooperation Agents in Hanabi

Multi-Agent Reinforcement Learning and Genetic Policy Sharing

Mitigating Relative Over-Generalization in Multi-Agent Reinforcement Learning

Diversity is Strength: Mastering Football Full Game with Interactive Reinforcement Learning of Multiple AIs

Celebrating Diversity with Subtask Specialization in Shared Multiagent Reinforcement Learning.