Hierarchical Deep Reinforcement Learning Agent with Counter Self-play on Competitive Games

Huazhe Xu,Keiran Paster,Qibin Chen,Haoran Tang,Pieter Abbeel,Trevor Darrell,Sergey Levine
2018-01-01
Abstract:Deep Reinforcement Learning algorithms lead to agents that can solve difficult decision making problems in complex environments. However, many difficult multi-agent competitive games, especially real-time strategy games are still considered beyond the capability of current deep reinforcement learning algorithms, although there has been a recent effort to change this\citep {openai_2017_dota, vinyals_2017_starcraft}. Moreover, when the opponents in a competitive game are suboptimal, the current\textit {Nash Equilibrium} seeking, self-play algorithms are often unable to generalize their strategies to opponents that play strategies vastly different from their own. This suggests that a learning algorithm that is beyond conventional self-play is necessary. We develop Hierarchical Agent with Self-play (HASP), a learning approach for obtaining hierarchically structured policies that can achieve higher performance than conventional self-play on competitive games through the use of a diverse pool of sub-policies we get from Counter Self-Play (CSP). We demonstrate that the ensemble policy generated by HASP can achieve better performance while facing unseen opponents that use sub-optimal policies. On a motivating iterated Rock-Paper-Scissor game and a partially observable real-time strategic game (http://generals. io/), we are led to the conclusion that HASP can perform better than conventional self-play as well as achieve 77% win rate against FloBot, an open-source agent which has ranked at position number 2 on the online leaderboards.
What problem does this paper attempt to address?