Abstract:Decentralized cooperative multi-agent deep reinforcement learning (MARL) can be a versatile learning framework, particularly in scenarios where centralized training is either not possible or not practical. One of the critical challenges in decentralized deep MARL is the non-stationarity of the learning environment when multiple agents are learning concurrently. A commonly used and efficient scheme for decentralized MARL is independent learning in which agents concurrently update their policies independently of each other. We first show that independent learning does not always converge, while sequential learning where agents update their policies one after another in a sequence is guaranteed to converge to an agent-by-agent optimal solution. In sequential learning, when one agent updates its policy, all other agent's policies are kept fixed, alleviating the challenge of non-stationarity due to simultaneous updates in other agents' policies. However, it can be slow because only one agent is learning at any time. Therefore it might also not always be practical. In this work, we propose a decentralized cooperative MARL algorithm based on multi-timescale learning. In multi-timescale learning, all agents learn simultaneously, but at different learning rates. In our proposed method, when one agent updates its policy, other agents are allowed to update their policies as well, but at a slower rate. This speeds up sequential learning, while also minimizing non-stationarity caused by other agents updating concurrently. Multi-timescale learning outperforms state-of-the-art decentralized learning methods on a set of challenging multi-agent cooperative tasks in the epymarl(Papoudakis et al., 2020) benchmark. This can be seen as a first step towards more general decentralized cooperative deep MARL methods based on multi-timescale learning.

Coordinated learning based on time-sharing tracking framework and Gaussian regression for continuous multi-agent systems

TraCo: Learning Virtual Traffic Coordinator for Cooperation with Multi-Agent Reinforcement Learning.

Multi-agent Continual Coordination Via Progressive Task Contextualization

Multiagent Continual Coordination via Progressive Task Contextualization

Timesharing-tracking: A New Framework for Decentralized Reinforcement Learning in Cooperative Multi-Agent Systems

Experimental Study on Decentralized Concurrent Learning for Multi-Agent System with Complex Dynamics

A Cooperative Multi-Agent Reinforcement Learning Method Based on Coordination Degree

Multi-Agent Reinforcement Learning in Time-varying Networked Systems

Convex Temporal Convolutional Network-Based Distributed Cooperative Learning Control for Multiagent Systems

Multi-Agent Concentrative Coordination with Decentralized Task Representation

Dealing With Non-stationarity in Decentralized Cooperative Multi-Agent Deep Reinforcement Learning via Multi-Timescale Learning

Decentralized Multi-agent Reinforcement Learning with Multi-time Scale of Decision Epochs

Inferring Latent Temporal Sparse Coordination Graph for Multi-Agent Reinforcement Learning

Tacit Learning with Adaptive Information Selection for Cooperative Multi-Agent Reinforcement Learning

Towards Efficient Multi-Agent Learning Systems

Entropy Enhanced Multi-Agent Coordination Based on Hierarchical Graph Learning for Continuous Action Space

An off-policy multi-agent stochastic policy gradient algorithm for cooperative continuous control

Large-Scale Traffic Signal Control Using a Novel Multiagent Reinforcement Learning

Tracking Algorithms for Multiagent Systems

Provable distributed adaptive temporal-difference learning over time-varying networks

Effective Multi-Agent Deep Reinforcement Learning Control with Relative Entropy Regularization