Abstract:In large-scale distributed machine learning, recent works have studied the effects of compressing gradients in stochastic optimization to alleviate the communication bottleneck. These works have collectively revealed that stochastic gradient descent (SGD) is robust to structured perturbations such as quantization, sparsification, and delays. Perhaps surprisingly, despite the surge of interest in multi-agent reinforcement learning, almost nothing is known about the analogous question: Are common reinforcement learning (RL) algorithms also robust to similar perturbations? We investigate this question by studying a variant of the classical temporal difference (TD) learning algorithm with a perturbed update direction, where a general compression operator is used to model the perturbation. Our work makes three important technical contributions. First, we prove that compressed TD algorithms, coupled with an error-feedback mechanism used widely in optimization, exhibit the same non-asymptotic theoretical guarantees as their SGD counterparts. Second, we show that our analysis framework extends seamlessly to nonlinear stochastic approximation schemes that subsume Q-learning. Third, we prove that for multi-agent TD learning, one can achieve linear convergence speedups with respect to the number of agents while communicating just $\tilde{O}(1)$ bits per iteration. Notably, these are the first finite-time results in RL that account for general compression operators and error-feedback in tandem with linear function approximation and Markovian sampling. Our proofs hinge on the construction of novel Lyapunov functions that capture the dynamics of a memory variable introduced by error-feedback.

State-Temporal Compression in Reinforcement Learning with the Reward-Restricted Geodesic Metric.

Episodic Reinforcement Learning with Expanded State-reward Space

Robust Predictable Control

Time Series Compression based on Reinforcement Learning

Reinforcement Learning with Generalizable Gaussian Splatting

Temporal Difference Learning with Compressed Updates: Error-Feedback meets Reinforcement Learning

Quasimetric Value Functions with Dense Rewards

RLx2: Training a Sparse Deep Reinforcement Learning Model from Scratch

Density-based Curriculum for Multi-goal Reinforcement Learning with Sparse Rewards

Quantile Regression Hindsight Experience Replay

Transform then Explore: a Simple and Effective Technique for Exploratory Combinatorial Optimization with Reinforcement Learning

TLDR: Unsupervised Goal-Conditioned RL via Temporal Distance-Aware Representations

TACO-RL: Task Aware Prompt Compression Optimization with Reinforcement Learning

Learning Long-Term Reward Redistribution via Randomized Return Decomposition

RGMDT: Return-Gap-Minimizing Decision Tree Extraction in Non-Euclidean Metric Space

Reinforcement Learning for Robust Header Compression under Model Uncertainty

Goal-Conditioned Reinforcement Learning with Disentanglement-based Reachability Planning

Policy Optimization with Smooth Guidance Learned from State-Only Demonstrations

Efficient Sparse-Reward Goal-Conditioned Reinforcement Learning with a High Replay Ratio and Regularization

Compressed Federated Reinforcement Learning with a Generative Model

Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards