Abstract:Federated Reinforcement Learning (FRL) has garnered increasing attention recently. However, due to the intrinsic spatio-temporal non-stationarity of data distributions, the current approaches typically suffer from high interaction and communication costs. In this paper, we introduce a new FRL algorithm, named $\texttt{MFPO}$, that utilizes momentum, importance sampling, and additional server-side adjustment to control the shift of stochastic policy gradients and enhance the efficiency of data utilization. We prove that by proper selection of momentum parameters and interaction frequency, $\texttt{MFPO}$ can achieve $\tilde{\mathcal{O}}(H N^{-1}\epsilon^{-3/2})$ and $\tilde{\mathcal{O}}(\epsilon^{-1})$ interaction and communication complexities ($N$ represents the number of agents), where the interaction complexity achieves linear speedup with the number of agents, and the communication complexity aligns the best achievable of existing first-order FL algorithms. Extensive experiments corroborate the substantial performance gains of $\texttt{MFPO}$ over existing methods on a suite of complex and high-dimensional benchmarks.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to jointly optimize the interaction complexity and communication complexity in Federated Reinforcement Learning (FRL). Specifically, due to the spatio - temporal non - stationarity of data distribution, current FRL methods usually encounter the following two main problems: 1. **High interaction/sampling cost**: During the continuous environmental exploration process, the interaction with the real system may be very slow, expensive or fragile, causing existing methods to be prone to excessive interaction/sampling costs. 2. **Gradient drift and increased communication complexity**: The dynamic data distribution will cause significant changes in the stochastic gradients among agents, resulting in unstable convergence performance and a substantial increase in communication complexity. To solve these problems, the paper proposes a new FRL algorithm - Momentum - assisted Federated Policy Optimization (MFPO). By introducing momentum, importance sampling and additional server - side adjustments, MFPO can control the changes in stochastic policy gradients and improve data utilization efficiency. The main contributions of this algorithm include: - **Optimization of interaction complexity and communication complexity**: By appropriately selecting the momentum parameter and interaction frequency, MFPO can achieve an interaction complexity of $\tilde{O}(HN^{-1}\epsilon^{-3/2})$ and a communication complexity of $\tilde{O}(\epsilon^{-1})$. Among them, the interaction complexity is linearly accelerated with the number of agents, and the communication complexity reaches the best level of existing first - order FL algorithms. - **Theoretical analysis and experimental verification**: The paper provides a strict theoretical analysis to prove the effectiveness of MFPO, and verifies that its performance is better than existing baseline methods through a series of complex high - dimensional benchmark tests (such as Classic Control, MuJoCo and image - based Atari games). In summary, this paper aims to overcome the problems of excessive interaction and communication costs in existing FRL methods by introducing the MFPO algorithm, thereby improving learning efficiency and performance.

Momentum-Based Federated Reinforcement Learning with Interaction and Communication Efficiency

Momentum for the Win: Collaborative Federated Reinforcement Learning across Heterogeneous Environments

Communication-Efficient Consensus Mechanism for Federated Reinforcement Learning

The Gradient Convergence Bound of Federated Multi-Agent Reinforcement Learning with Efficient Communication.

Federated Offline Policy Optimization with Dual Regularization

Communication-Efficient Consensus Mechanism for Federated Reinforcement Learning

Finite-Time Analysis of On-Policy Heterogeneous Federated Reinforcement Learning

FedLog: Personalized Federated Classification with Less Communication and More Flexibility

Decentralized Federated Policy Gradient with Byzantine Fault-Tolerance and Provably Fast Convergence

A Fair Federated Learning Framework with Reinforcement Learning.

Boosting Communication Efficiency in Federated Learning for Multiagent-Based Multimicrogrid Energy Management

Asynchronous Federated Reinforcement Learning with Policy Gradient Updates: Algorithm Design and Convergence Analysis

On the Linear Speedup of Personalized Federated Reinforcement Learning with Shared Representations

Federated Offline Reinforcement Learning with Proximal Policy Evaluation

BR-DeFedRL: Byzantine-Robust Decentralized Federated Reinforcement Learning with Fast Convergence and Communication Efficiency

An Optimization Method for Non-IID Federated Learning Based on Deep Reinforcement Learning

Federated Ensemble Model-Based Reinforcement Learning in Edge Computing

Improved Communication Efficiency in Federated Natural Policy Gradient via ADMM-based Gradient Updates

Federated Reinforcement Learning: Techniques, Applications, and Open Challenges

A Multi-agent Reinforcement Learning Approach for Efficient Client Selection in Federated Learning

Federated Deep Reinforcement Learning for RIS-Assisted Indoor Multi-Robot Communication Systems