Decentralized Multi-agent Reinforcement Learning with Multi-time Scale of Decision Epochs

Junjie Wu,Kuo Li,Qing-Shan Jia
DOI: https://doi.org/10.1109/cdc42340.2020.9304323
2020-01-01
Abstract:Multi-agent reinforcement learning (MARL) has attracted more and more attention in recent years. It is now widely applied in various fields, including cyber physical systems, smart grid, finance, social network, and among others. The current researches on MARL mainly focus single-time scale, in which the agents have the same decision epoch. While in real applications, it is common that the agents make decisions by different frequencies. In addition, different agents may have separate roles in the system. In this paper, we propose a more general MARL framework by introducing multi-time scale of decision epochs. We assume that agents share information with their neighbors, including state, action, and reward. The global observability of state and action, which is a common assumption, is not required. We propose a decentralized Q-learning algorithm and a modified MADDPG algorithm to solve the problem. The main contributions of this paper are as follows. First, we formulate the multi-time scale multi-agent reinforcement learning (MTMARL) problem. This provides a general framework for the related systems and problems. Second, we provide a networked decentralized multi-time scale multi-agent Q-learning algorithm to solve the problem and prove its convergence. Third, we test the algorithm numerically. The results show that the proposed algorithm performs better than the previous QD-learning and is only slightly worse than the centralized algorithm.
What problem does this paper attempt to address?