Abstract:A key challenge in lifelong reinforcement learning (RL) is the loss of plasticity, where previous learning progress hinders an agent's adaptation to new tasks. While regularization and resetting can help, they require precise hyperparameter selection at the outset and environment-dependent adjustments. Building on the principled theory of online convex optimization, we present a parameter-free optimizer for lifelong RL, called TRAC, which requires no tuning or prior knowledge about the distribution shifts. Extensive experiments on Procgen, Atari, and Gym Control environments show that TRAC works surprisingly well-mitigating loss of plasticity and rapidly adapting to challenging distribution shifts-despite the underlying optimization problem being nonconvex and nonstationary.

What problem does this paper attempt to address?

This paper attempts to solve the problem of **loss of plasticity** encountered in lifelong reinforcement learning (lifelong RL). Specifically, when the task distribution in the environment changes, previous learning progress may impede the agent's ability to adapt to new tasks. This phenomenon can lead to poor performance of the agent when facing new tasks, and may even result in negative transfer, that is, previous learning experiences have a negative impact on new tasks. ### Core problems of the paper 1. **Loss of plasticity**: In lifelong RL, as the environment changes, the agent's parameters may gradually deviate from the optimal values, causing it to be unable to quickly adapt to new tasks. This is known as "loss of plasticity", that is, the agent loses the ability to respond to new environmental changes. 2. **Difficulty in hyper - parameter adjustment**: Existing methods such as regularization and resetting can alleviate the loss of plasticity to a certain extent, but these methods rely on precise hyper - parameter selection and need to be adjusted according to different environments. However, in lifelong RL, the specific changes in the environment are not known in advance, so it is difficult to perform effective hyper - parameter optimization. ### Solutions To solve these problems, the paper proposes a parameter - free optimizer named **TRAC (Adaptive Regularization in Continual environments)**. The main features of TRAC include: - **No need for hyper - parameter tuning**: TRAC is designed based on the Online Convex Optimization (OCO) theory and can work without relying on any hyper - parameters. It maintains parameter stability by adaptively selecting the regularization intensity, thereby avoiding the loss of plasticity. - **Adapt to non - convex and non - stationary optimization problems**: Although the optimization problems in lifelong RL are non - convex and non - stationary, TRAC can still effectively adapt to these challenges. Experimental results show that TRAC performs well in multiple benchmark environments and can quickly adapt to changes in task distribution. ### Experimental verification The paper has carried out extensive experiments in multiple benchmark environments to verify the effectiveness of TRAC. These experiments include: - **Procgen environment**: TRAC PPO performs well in multiple game environments, avoids the loss of plasticity, and rapidly increases the rewards in new tasks. - **Atari environment**: TRAC PPO shows a significant performance improvement in the switching of games with different action spaces. Especially within the first few million steps, the average reward of TRAC PPO is much higher than that of the baseline methods. - **Gym Control environment**: TRAC PPO still maintains good performance when facing extreme distribution changes, while other methods such as ADAM PPO and CReLU have a serious performance degradation. In conclusion, by introducing the TRAC optimizer, this paper solves the problem of loss of plasticity in lifelong RL and demonstrates its superior performance in a variety of complex environments.

Fast TRAC: A Parameter-Free Optimizer for Lifelong Reinforcement Learning

DROP: Conservative Model-based Optimization for Offline Reinforcement Learning

An Off-Policy Trust Region Policy Optimization Method with Monotonic Improvement Guarantee for Deep Reinforcement Learning

Successive Convex Approximation Based Off-Policy Optimization for Constrained Reinforcement Learning

Demonstration Data-Driven Parameter Adjustment for Trajectory Planning in Highly Constrained Environments

Train Trajectory Optimization with High-Risk State Space Boundaries: A Safe Reinforcement Learning Approach

A Tractable Inference Perspective of Offline RL

Metatrace Actor-Critic: Online Step-Size Tuning by Meta-gradient Descent for Reinforcement Learning Control

Can Learned Optimization Make Reinforcement Learning Less Difficult?

Generalized Population-Based Training for Hyperparameter Optimization in Reinforcement Learning

Metatrace: Online Step-size Tuning by Meta-gradient Descent for Reinforcement Learning Control

Reparameterized Policy Learning for Multimodal Trajectory Optimization

Reinforcement Learning for Branch-and-Bound Optimisation using Retrospective Trajectories

TrajDeleter: Enabling Trajectory Forgetting in Offline Reinforcement Learning Agents

Learning to Optimize for Reinforcement Learning

Online hyperparameter optimization by real-time recurrent learning

Adaptive trajectory-constrained exploration strategy for deep reinforcement learning

ATraDiff: Accelerating Online Reinforcement Learning with Imaginary Trajectories

Trajectory-Oriented Policy Optimization with Sparse Rewards

PTDRL: Parameter Tuning using Deep Reinforcement Learning

Real-Time Recurrent Reinforcement Learning