Decentralized Multi-Agent Reinforcement Learning with Networked Agents: Recent Advances

Kaiqing Zhang,Zhuoran Yang,Tamer Başar
DOI: https://doi.org/10.48550/arXiv.1912.03821
2019-12-09
Abstract:Multi-agent reinforcement learning (MARL) has long been a significant and everlasting research topic in both machine learning and control. With the recent development of (single-agent) deep RL, there is a resurgence of interests in developing new MARL algorithms, especially those that are backed by theoretical analysis. In this paper, we review some recent advances a sub-area of this topic: decentralized MARL with networked agents. Specifically, multiple agents perform sequential decision-making in a common environment, without the coordination of any central controller. Instead, the agents are allowed to exchange information with their neighbors over a communication network. Such a setting finds broad applications in the control and operation of robots, unmanned vehicles, mobile sensor networks, and smart grid. This review is built upon several our research endeavors in this direction, together with some progresses made by other researchers along the line. We hope this review to inspire the devotion of more research efforts to this exciting yet challenging area.
Machine Learning,Artificial Intelligence,Multiagent Systems,Systems and Control,Optimization and Control
What problem does this paper attempt to address?
The problem that this paper attempts to solve lies in the theoretical analysis and algorithm design in the field of multi - agent reinforcement learning (MARL), especially in the decentralized networked - agent setting. Specifically, the paper focuses on how multiple agents make sequential decisions in a common environment through information exchange with neighboring agents to optimize long - term rewards without the coordination of a central controller. Such settings are very common in applications such as robot control, self - driving vehicles, mobile sensor networks, and smart grids. The paper mainly addresses the following aspects: 1. **Handling inconsistent goals**: Since the goals of each agent may be inconsistent, the learning goals in MARL are not only one - dimensional, but also need to handle equilibrium points and other performance criteria besides reward optimization, such as communication/coordination efficiency and robustness against potential adversaries. 2. **Coping with the non - stationarity of the environment**: In MARL, the environment faced by each agent is non - stationary because the environment is affected not only by the evolution of the underlying system but also by the decisions of other agents that are simultaneously improving their strategies. This non - stationarity renders most theoretical analysis frameworks for single - agent RL ineffective. 3. **Solving the scalability problem**: As the number of agents increases, the joint action space grows exponentially, causing MARL algorithms to inherently face scalability problems. 4. **Handling complex information structures**: In the multi - agent setting, the information structure becomes more complex because some observations may not be shared with other agents, and sometimes this information is stored in a decentralized manner. To meet the above challenges, the paper reviews the recent progress in decentralized MARL of networked agents, especially those algorithms that can be supported by theoretical analysis. The paper also explores the potential of these algorithms in practical applications, such as improving the performance of multi - agent systems by reducing communication costs, enhancing the system's scalability and robustness.