Discovering Diverse Multi-Agent Strategic Behavior via Reward Randomization

Zhenggang Tang,Chao Yu,Boyuan Chen,Huazhe Xu,Xiaolong Wang,Fei Fang,Simon Du,Yu Wang,Yi Wu
DOI: https://doi.org/10.48550/arXiv.2103.04564
2021-03-12
Abstract:We propose a simple, general and effective technique, Reward Randomization for discovering diverse strategic policies in complex multi-agent games. Combining reward randomization and policy gradient, we derive a new algorithm, Reward-Randomized Policy Gradient (RPG). RPG is able to discover multiple distinctive human-interpretable strategies in challenging temporal trust dilemmas, including grid-world games and a real-world game <a class="link-external link-http" href="http://Agar.io" rel="external noopener nofollow">this http URL</a>, where multiple equilibria exist but standard multi-agent policy gradient algorithms always converge to a fixed one with a sub-optimal payoff for every player even using state-of-the-art exploration techniques. Furthermore, with the set of diverse strategies from RPG, we can (1) achieve higher payoffs by fine-tuning the best policy from the set; and (2) obtain an adaptive agent by using this set of strategies as its training opponents. The source code and example videos can be found in our website: <a class="link-external link-https" href="https://sites.google.com/view/staghuntrpg" rel="external noopener nofollow">this https URL</a>.
Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?