Abstract:Despite overparameterization, deep networks trained via supervised learning are easy to optimize and exhibit excellent generalization. One hypothesis to explain this is that overparameterized deep networks enjoy the benefits of implicit regularization induced by stochastic gradient descent, which favors parsimonious solutions that generalize well on test inputs. It is reasonable to surmise that deep reinforcement learning (RL) methods could also benefit from this effect. In this paper, we discuss how the implicit regularization effect of SGD seen in supervised learning could in fact be harmful in the offline deep RL setting, leading to poor generalization and degenerate feature representations. Our theoretical analysis shows that when existing models of implicit regularization are applied to temporal difference learning, the resulting derived regularizer favors degenerate solutions with excessive "aliasing", in stark contrast to the supervised learning case. We back up these findings empirically, showing that feature representations learned by a deep network value function trained via bootstrapping can indeed become degenerate, aliasing the representations for state-action pairs that appear on either side of the Bellman backup. To address this issue, we derive the form of this implicit regularizer and, inspired by this derivation, propose a simple and effective explicit regularizer, called DR3, that counteracts the undesirable effects of this implicit regularizer. When combined with existing offline RL methods, DR3 substantially improves performance and stability, alleviating unlearning in Atari 2600 games, D4RL domains and robotic manipulation from images.

Parseval Regularization for Continual Reinforcement Learning

STRAPPER: Preference-based Reinforcement Learning via Self-training Augmentation and Peer Regularization

Continual Learning in Human Activity Recognition: an Empirical Analysis of Regularization

A Study of Plasticity Loss in On-Policy Deep Reinforcement Learning

Loss of Plasticity in Continual Deep Reinforcement Learning

Learning Continually by Spectral Regularization

Normalization and effective learning rates in reinforcement learning

Revisiting Plasticity in Visual Reinforcement Learning: Data, Modules and Training Stages

Neuroplastic Expansion in Deep Reinforcement Learning

Unified Policy Optimization for Continuous-action Reinforcement Learning in Non-stationary Tasks and Games

Control Regularization for Reduced Variance Reinforcement Learning

Optimization and Generalization of Regularization-Based Continual Learning: a Loss Approximation Viewpoint

RePo: Resilient Model-Based Reinforcement Learning by Regularizing Posterior Predictability

DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization

Latent Spectral Regularization for Continual Learning

The RL Perceptron: Generalisation Dynamics of Policy Learning in High Dimensions

RTRA: Rapid Training of Regularization-based Approaches in Continual Learning

Generalizing Consistency Policy to Visual RL with Prioritized Proximal Experience Regularization

Regularization Shortcomings for Continual Learning

Continual Task Allocation in Meta-Policy Network via Sparse Prompting