Abstract:Recent progress has been made in understanding optimisation dynamics in neural networks trained with full-batch gradient descent with momentum with the uncovering of the edge of stability phenomenon in supervised learning. The edge of stability phenomenon occurs as the leading eigenvalue of the Hessian reaches the divergence threshold of the underlying optimisation algorithm for a quadratic loss, after which it starts oscillating around the threshold, and the loss starts to exhibit local instability but decreases over long time frames. In this work, we explore the edge of stability phenomenon in reinforcement learning (RL), specifically off-policy Q-learning algorithms across a variety of data regimes, from offline to online RL. Our experiments reveal that, despite significant differences to supervised learning, such as non-stationarity of the data distribution and the use of bootstrapping, the edge of stability phenomenon can be present in off-policy deep RL. Unlike supervised learning, however, we observe strong differences depending on the underlying loss, with DQN -- using a Huber loss -- showing a strong edge of stability effect that we do not observe with C51 -- using a cross entropy loss. Our results suggest that, while neural network structure can lead to optimisation dynamics that transfer between problem domains, certain aspects of deep RL optimisation can differentiate it from domains such as supervised learning.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is whether there is an "Edge of Stability Phenomenon" in Reinforcement Learning (RL). This phenomenon was initially discovered in supervised learning. It refers to that when training a neural network using full - batch gradient descent, as the training progresses, the dominant Hessian eigenvalue of the loss function will gradually increase until it reaches the divergence threshold of the optimization algorithm. After that, this eigenvalue will fluctuate around the threshold, causing local instability in the loss function, but it can still generally decrease. By studying two off - policy reinforcement learning algorithms, DQN and C51, in different data environments from offline to online, the paper explores whether this phenomenon also applies to the field of reinforcement learning and analyzes the similarities and differences between its performance in supervised learning and in reinforcement learning. Specifically, the paper focuses on the following aspects: 1. **Existence of the Edge of Stability Phenomenon**: Verify whether the DQN and C51 will exhibit the Edge of Stability Phenomenon in different reinforcement learning environments (such as offline learning, online learning). 2. **Impact of Algorithm Differences**: Compare the performance differences of DQN (using Huber loss) and C51 (using cross - entropy loss) in the Edge of Stability Phenomenon, and explore the impact of different loss functions on the optimization dynamics. 3. **Impact of Data Distribution**: Study how the change in data distribution (such as the transition from offline to online) affects the occurrence of the Edge of Stability Phenomenon, especially whether this phenomenon still exists under non - stationary data distribution. 4. **Characteristics of Optimization Dynamics**: Analyze how the optimization dynamics in reinforcement learning are different from those in supervised learning due to the use of techniques such as bootstrapping, and what specific impacts these differences have on the Edge of Stability Phenomenon. Through the above research, the paper aims to deepen the understanding of the optimization dynamics in reinforcement learning and provide a theoretical basis for designing more effective training strategies.

Investigating the Edge of Stability Phenomenon in Reinforcement Learning

The Effective Horizon Explains Deep RL Performance in Stochastic Environments

Understanding and Diagnosing Deep Reinforcement Learning

Understanding Edge-of-Stability Training Dynamics with a Minimalist Example

Maximum Entropy Reinforcement Learning with Evolution Strategies

Taming "data-hungry" reinforcement learning? Stability in continuous state-action spaces

Learning to Stabilize Online Reinforcement Learning in Unbounded State Spaces

Edge of Stochastic Stability: Revisiting the Edge of Stability for SGD

Can Stability be Detrimental? Better Generalization through Gradient Descent Instabilities

Dissecting Deep RL with High Update Ratios: Combatting Value Divergence

Learning to Optimize for Reinforcement Learning

Survival Instinct in Offline Reinforcement Learning

Conformal Symplectic Optimization for Stable Reinforcement Learning

Dealing with uncertainty: balancing exploration and exploitation in deep recurrent reinforcement learning

Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach

Instabilities of Offline RL with Pre-Trained Neural Representation

EdgeRL: Reinforcement Learning-driven Deep Learning Model Inference Optimization at Edge

A PDE-based Explanation of Extreme Numerical Sensitivities and Edge of Stability in Training Neural Networks

Understanding, Predicting and Better Resolving Q-Value Divergence in Offline-RL

Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization