Abstract:When the data used for reinforcement learning (RL) are collected by multiple agents in a distributed manner, federated versions of RL algorithms allow collaborative learning without the need for agents to share their local data. In this paper, we consider federated Q-learning, which aims to learn an optimal Q-function by periodically aggregating local Q-estimates trained on local data alone. Focusing on infinite-horizon tabular Markov decision processes, we provide sample complexity guarantees for both the synchronous and asynchronous variants of federated Q-learning. In both cases, our bounds exhibit a linear speedup with respect to the number of agents and near-optimal dependencies on other salient problem parameters. In the asynchronous setting, existing analyses of federated Q-learning, which adopt an equally weighted averaging of local Q-estimates, require that every agent covers the entire state-action space. In contrast, our improved sample complexity scales inverse proportionally to the minimum entry of the average stationary state-action occupancy distribution of all agents, thus only requiring the agents to collectively cover the entire state-action space, unveiling the blessing of heterogeneity in enabling collaborative learning by relaxing the coverage requirement of the single-agent case. However, its sample complexity still suffers when the local trajectories are highly heterogeneous. In response, we propose a novel federated Q-learning algorithm with importance averaging, giving larger weights to more frequently visited state-action pairs, which achieves a robust linear speedup as if all trajectories are centrally processed, regardless of the heterogeneity of local behavior policies.

Federated Reinforcement Learning with Environment Heterogeneity

Federated Reinforcement Learning with Constraint Heterogeneity

Momentum for the Win: Collaborative Federated Reinforcement Learning across Heterogeneous Environments

On the Linear Speedup of Personalized Federated Reinforcement Learning with Shared Representations

Data Quality Aware Hierarchical Federated Reinforcement Learning Framework for Dynamic Treatment Regimes

Federated Temporal Difference Learning with Linear Function Approximation under Environmental Heterogeneity

Client Selection for Federated Policy Optimization with Environment Heterogeneity

On the Convergence Rates of Federated Q-Learning across Heterogeneous Environments

Finite-Time Analysis of On-Policy Heterogeneous Federated Reinforcement Learning

FedAPEN: Personalized Cross-silo Federated Learning with Adaptability to Statistical Heterogeneity

Federated Natural Policy Gradient and Actor Critic Methods for Multi-task Reinforcement Learning

Reward Shaping Based Federated Reinforcement Learning

Non-local Policy Optimization via Diversity-regularized Collaborative Exploration

Asynchronous Federated Reinforcement Learning with Policy Gradient Updates: Algorithm Design and Convergence Analysis

Federated Stochastic Approximation under Markov Noise and Heterogeneity: Applications in Reinforcement Learning

The Blessing of Heterogeneity in Federated Q-Learning: Linear Speedup and Beyond

Federated Ensemble Model-Based Reinforcement Learning in Edge Computing

Mixed Reinforcement Learning for Efficient Policy Optimization in Stochastic Environments

Towards Personalized Federated Learning via Heterogeneous Model Reassembly

Dynamic Fair Federated Learning Based on Reinforcement Learning

CAESAR: Enhancing Federated RL in Heterogeneous MDPs through Convergence-Aware Sampling with Screening