Abstract:Reinforcement learning has been successful across several applications in which agents have to learn to act in environments with sparse feedback. However, despite this empirical success there is still a lack of theoretical understanding of how the parameters of reinforcement learning models and the features used to represent states interact to control the dynamics of learning. In this work, we use concepts from statistical physics, to study the typical case learning curves for temporal difference learning of a value function with linear function approximators. Our theory is derived under a Gaussian equivalence hypothesis where averages over the random trajectories are replaced with temporally correlated Gaussian feature averages and we validate our assumptions on small scale Markov Decision Processes. We find that the stochastic semi-gradient noise due to subsampling the space of possible episodes leads to significant plateaus in the value error, unlike in traditional gradient descent dynamics. We study how learning dynamics and plateaus depend on feature structure, learning rate, discount factor, and reward function. We then analyze how strategies like learning rate annealing and reward shaping can favorably alter learning dynamics and plateaus. To conclude, our work introduces new tools to open a new direction towards developing a theory of learning dynamics in reinforcement learning.

A Multi-step Loss Function for Robust Learning of the Dynamics in Model-based Reinforcement Learning

Combating the Compounding-Error Problem with a Multi-step Model

Towards a Simple Approach to Multi-step Model-based Reinforcement Learning

Investigating Compounding Prediction Errors in Learned Dynamics Models

Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinforcement Learning

Loss Dynamics of Temporal Difference Reinforcement Learning

Improved deep learning of chaotic dynamical systems with multistep penalty losses

Learning Long-Horizon Predictions for Quadrotor Dynamics

Robust Reinforcement Learning under Diffusion Models for Data with Jumps

In Search for Architectures and Loss Functions in Multi-Objective Reinforcement Learning

Dynamic Loss For Robust Learning

Symmetric Reinforcement Learning Loss for Robust Learning on Diverse Tasks and Model Scales

Learning Dynamics Models for Model Predictive Agents

Planning with Exploration: Addressing Dynamics Bottleneck in Model-based Reinforcement Learning

Model-Based Reinforcement Learning via Meta-Policy Optimization

A Note on Loss Functions and Error Compounding in Model-based Reinforcement Learning

Deep Incremental Model Based Reinforcement Learning: A One-Step Lookback Approach for Continuous Robotics Control

Dynamics-Aware Loss for Learning with Label Noise

A Two-Stage Multi-Objective Deep Reinforcement Learning Framework.

Adaptive Asynchronous Control Using Meta-learned Neural Ordinary Differential Equations