Momentum-Based Federated Reinforcement Learning with Interaction and Communication Efficiency

Sheng Yue,Xingyuan Hua,Lili Chen,Ju Ren
2024-05-29
Abstract:Federated Reinforcement Learning (FRL) has garnered increasing attention recently. However, due to the intrinsic spatio-temporal non-stationarity of data distributions, the current approaches typically suffer from high interaction and communication costs. In this paper, we introduce a new FRL algorithm, named $\texttt{MFPO}$, that utilizes momentum, importance sampling, and additional server-side adjustment to control the shift of stochastic policy gradients and enhance the efficiency of data utilization. We prove that by proper selection of momentum parameters and interaction frequency, $\texttt{MFPO}$ can achieve $\tilde{\mathcal{O}}(H N^{-1}\epsilon^{-3/2})$ and $\tilde{\mathcal{O}}(\epsilon^{-1})$ interaction and communication complexities ($N$ represents the number of agents), where the interaction complexity achieves linear speedup with the number of agents, and the communication complexity aligns the best achievable of existing first-order FL algorithms. Extensive experiments corroborate the substantial performance gains of $\texttt{MFPO}$ over existing methods on a suite of complex and high-dimensional benchmarks.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to jointly optimize the interaction complexity and communication complexity in Federated Reinforcement Learning (FRL). Specifically, due to the spatio - temporal non - stationarity of data distribution, current FRL methods usually encounter the following two main problems: 1. **High interaction/sampling cost**: During the continuous environmental exploration process, the interaction with the real system may be very slow, expensive or fragile, causing existing methods to be prone to excessive interaction/sampling costs. 2. **Gradient drift and increased communication complexity**: The dynamic data distribution will cause significant changes in the stochastic gradients among agents, resulting in unstable convergence performance and a substantial increase in communication complexity. To solve these problems, the paper proposes a new FRL algorithm - Momentum - assisted Federated Policy Optimization (MFPO). By introducing momentum, importance sampling and additional server - side adjustments, MFPO can control the changes in stochastic policy gradients and improve data utilization efficiency. The main contributions of this algorithm include: - **Optimization of interaction complexity and communication complexity**: By appropriately selecting the momentum parameter and interaction frequency, MFPO can achieve an interaction complexity of \(\tilde{O}(HN^{-1}\epsilon^{-3/2})\) and a communication complexity of \(\tilde{O}(\epsilon^{-1})\). Among them, the interaction complexity is linearly accelerated with the number of agents, and the communication complexity reaches the best level of existing first - order FL algorithms. - **Theoretical analysis and experimental verification**: The paper provides a strict theoretical analysis to prove the effectiveness of MFPO, and verifies that its performance is better than existing baseline methods through a series of complex high - dimensional benchmark tests (such as Classic Control, MuJoCo and image - based Atari games). In summary, this paper aims to overcome the problems of excessive interaction and communication costs in existing FRL methods by introducing the MFPO algorithm, thereby improving learning efficiency and performance.