Stability of the Nash Equilibrium under Gradient Ascent Learning Algorithms in Two-Agent Two-Action Games

Amit Bhaya,Rodrigo Brandolt Sodre de Macedo,Lucas Shiguemitsu Shigueoka
DOI: https://doi.org/10.1109/cca.2013.6662854
2013-01-01
Abstract:This paper provides a unified view and stability analysis of reinforcement learning algorithms for general sum games that have been proposed in the literature. Specifically, the gradient ascent learning algorithms proposed by Singh, Kearns and Mansour, and the variant proposed by Bowling and Veloso are shown to lead to convergence to the Nash equilibrium, using a switching control viewpoint and providing a unified Lyapunov function analysis. Furthermore, a proof of stability of the Nash equilibrium under the weighted policy learning (WPL) algorithm, which was proposed, without formal proof, by Abdallah and Lesser, is also arrived at using a Liapunov function approach and involves the novel feature of an analysis of the virtual equilibrium points. The importance of providing a stability proof for WPL dynamics is that the latter allows agents to reach a Nash equilibrium in two-agent, two-action games in which the only feedback that an agent needs is its own reward, and no agent uses knowledge of the rewards or actions of other agents, or any a priori information on the location of the Nash equilibrium.
What problem does this paper attempt to address?