Abstract:To improve the efficiency of reinforcement learning (RL), we propose a novel asynchronous federated reinforcement learning (FedRL) framework termed AFedPG, which constructs a global model through collaboration among $N$ agents using policy gradient (PG) updates. To address the challenge of lagged policies in asynchronous settings, we design a delay-adaptive lookahead technique \textit{specifically for FedRL} that can effectively handle heterogeneous arrival times of policy gradients. We analyze the theoretical global convergence bound of AFedPG, and characterize the advantage of the proposed algorithm in terms of both the sample complexity and time complexity. Specifically, our AFedPG method achieves $O(\frac{{\epsilon}^{-2.5}}{N})$ sample complexity for global convergence at each agent on average. Compared to the single agent setting with $O(\epsilon^{-2.5})$ sample complexity, it enjoys a linear speedup with respect to the number of agents. Moreover, compared to synchronous FedPG, AFedPG improves the time complexity from $O(\frac{t_{\max}}{N})$ to $O({\sum_{i=1}^{N} \frac{1}{t_{i}}})^{-1}$, where $t_{i}$ denotes the time consumption in each iteration at agent $i$, and $t_{\max}$ is the largest one. The latter complexity $O({\sum_{i=1}^{N} \frac{1}{t_{i}}})^{-1}$ is always smaller than the former one, and this improvement becomes significant in large-scale federated settings with heterogeneous computing powers ($t_{\max}\gg t_{\min}$). Finally, we empirically verify the improved performance of AFedPG in four widely-used MuJoCo environments with varying numbers of agents. We also demonstrate the advantages of AFedPG in various computing heterogeneity scenarios.

Policy evaluation for reinforcement learning over asynchronous multi-agent networks

Fully asynchronous policy evaluation in distributed reinforcement learning over networks

Observer-Based Multiagent Deep Reinforcement Learning: A Fully Distributed Training Scheme

Target-Value-Competition-Based Multi-Agent Deep Reinforcement Learning Algorithm for Distributed Nonconvex Economic Dispatch

A Multi-Agent Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning

Multi-agent Policy Optimization with Approximatively Synchronous Advantage Estimation

Distributed Policy Evaluation Under Multiple Behavior Strategies

Scalable and Sample Efficient Distributed Policy Gradient Algorithms in Multi-Agent Networked Systems

Optimal Exploration Algorithm of Multi-Agent Reinforcement Learning Methods (Student Abstract)

Data-Based Optimal Consensus Control for Multiagent Systems With Policy Gradient Reinforcement Learning

Asynchronous reinforcement learning algorithms for solving discrete space path planning problems

Scalable Model-based Policy Optimization for Decentralized Networked Systems

Offline Decentralized Multi-Agent Reinforcement Learning

Reinforcement learning for multi-agent with asynchronous missing information fusion method

Communication-Efficient Policy Gradient Methods for Distributed Reinforcement Learning

A Policy Gradient Algorithm to Alleviate the Multi-Agent Value Overestimation Problem in Complex Environments

Multi-Agent Reinforcement Learning in Stochastic Networked Systems

Efficient Communications for Multi-Agent Reinforcement Learning in Wireless Networks

Multi-Agent Reinforcement Learning in Time-varying Networked Systems

Cooperative Multi-Agent Policy Gradients with Sub-optimal Demonstration

Asynchronous Federated Reinforcement Learning with Policy Gradient Updates: Algorithm Design and Convergence Analysis