Learn Adaptive Dynamic Policy under Mixed Multi-Agent Environment

Zheng Xiao,Shiyong Zhang
DOI: https://doi.org/10.1109/cit.2008.4594682
2008-01-01
Abstract:Equilibrium based approach to multi-agent policy learning supposes that all agent uses the same algorithm. Otherwise it is easy for other agents to exploit its policy. Adaptive learning fits well versus fixed or equilibrium policy and in self-play. But it is deficient when against other kinds of adaptive players. On account of this mixed environment, this paper proposes a novel algorithm to learn time dependent policy which can also be capable of adapting to dynamics of policies of others. Policy planning related to time makes a key difference from the learning before which produces stationary policy. Based on two well-known zero-sum games it is demonstrated that agents using this algorithm can get higher utility against some adaptive players, and perform well in self play.
What problem does this paper attempt to address?