VMAPD: Generate Diverse Solutions for Multi-Agent Games with Recurrent Trajectory Discriminators.

Shiyu Huang,Chao Yu,Bin Wang,Dong Li,Yu Wang,Ting Chen,Jun Zhu
DOI: https://doi.org/10.1109/cog51982.2022.9893722
2022-01-01
Abstract:Recent algorithms designed for multi-agent tasks focus on finding a single optimal solution for all the agents. However, in many tasks (e.g., matrix games and transportation dispatching), there may exist more than one optimal solution, while previous algorithms can only converge to one of them. In many practical applications, it is important to develop reasonable agents with diverse behaviors. In this paper, we propose ā€¯variational multi-agent policy diversificationā€¯ (VMAPD), an on-policy framework for discovering diverse policies for coordination patterns of multiple agents. By taking advantage of latent variables and exploiting the connection between variational inference and multi-agent reinforcement learning, we derive a tractable evidence lower bound (ELBO) on the trajectories of all agents. Our algorithm uses policy iteration to maximize the derived lower bound and can be simply implemented by adding a pseudo reward during centralized learning. And the trained agents do not need to access the pseudo reward during decentralized execution. We demonstrate the effectiveness of our algorithm on several popular multi-agent testbeds. Experimental results show that VMAPD finds more solutions with similar sample complexity compared with other baselines.
What problem does this paper attempt to address?