Timesharing-tracking: A New Framework for Decentralized Reinforcement Learning in Cooperative Multi-Agent Systems

Fu Bo,Chen Xin,He Yong,Wu Min
DOI: https://doi.org/10.1109/jas.2014.7004541
2014-01-01
IEEE/CAA Journal of Automatica Sinica
Abstract:The paper discusses how to learn the optimal cooperative policy in a decentralized way with known immediately individual reward. We propose a timesharing-tracking framework (TTF), in which agents learn their optimal policies alternatively on different states, in order to realize macroscopic simultaneous learning. Then the algorithm of the joint state Q-learning with best-response (BRQ-learning) to companions is proposed. Further, the BRQ-learning algorithm is extended into the TTF, so that the mechanism named multi-agent BRQ-learning with timesharing-tracking (BRQL-TT) is proposed to achieve optimal group policy. The simulation results illustrate that the proposed algorithm can learn the optimal joint behavior with less computation and faster speed comparing with other two classical learning algorithms.
What problem does this paper attempt to address?