Observer-Based Multiagent Deep Reinforcement Learning: A Fully Distributed Training Scheme

Lifu Ding,Qixuan Mao,Gangfeng Yan
DOI: https://doi.org/10.1109/tii.2024.3465601
IF: 12.3
2024-01-01
IEEE Transactions on Industrial Informatics
Abstract:With the extensive application of multiagent reinforcement learning (MARL), it encounters great obstacles in more complex engineering applications. Realistic environment in MARL is partially observable to the agents. Previous algorithms are not fully distributed to address the instability associated with partial observability, which presents difficulties in deploying. In this article, a theoretical analysis of partial observability of the policy-based method is given, and a fully distributed multiagent proximal policy optimization based on a distributed observer is proposed. In the algorithm, each agent has its own independent critic and actor, and performs policy iteration locally by its observers to achieve global optimization. With a fully distributed mapping method, actions are bound to adjacent spaces that satisfy constraints. After subjecting the algorithm to rigorous testing on a generalized constrained optimization model, the obtained results unequivocally demonstrate its high stability and superior decision-making prowess.
What problem does this paper attempt to address?