Revisiting Plasticity in Visual Reinforcement Learning: Data, Modules and Training Stages

Guozheng Ma,Lu Li,Sen Zhang,Zixuan Liu,Zhen Wang,Yixin Chen,Li Shen,Xueqian Wang,Dacheng Tao
2024-05-20
Abstract:Plasticity, the ability of a neural network to evolve with new data, is crucial for high-performance and sample-efficient visual reinforcement learning (VRL). Although methods like resetting and regularization can potentially mitigate plasticity loss, the influences of various components within the VRL framework on the agent's plasticity are still poorly understood. In this work, we conduct a systematic empirical exploration focusing on three primary underexplored facets and derive the following insightful conclusions: (1) data augmentation is essential in maintaining plasticity; (2) the critic's plasticity loss serves as the principal bottleneck impeding efficient training; and (3) without timely intervention to recover critic's plasticity in the early stages, its loss becomes catastrophic. These insights suggest a novel strategy to address the high replay ratio (RR) dilemma, where exacerbated plasticity loss hinders the potential improvements of sample efficiency brought by increased reuse frequency. Rather than setting a static RR for the entire training process, we propose Adaptive RR, which dynamically adjusts the RR based on the critic's plasticity level. Extensive evaluations indicate that Adaptive RR not only avoids catastrophic plasticity loss in the early stages but also benefits from more frequent reuse in later phases, resulting in superior sample efficiency.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address the issue of plasticity loss in Visual Reinforcement Learning (VRL). Specifically, the paper explores the following points: 1. **The role of Data Augmentation (DA) in maintaining the plasticity of VRL agents**: Through experiments, it was verified that data augmentation is crucial for preventing plasticity loss, and it was found that the effect of data augmentation is significantly better than other methods, such as resetting network parameters. 2. **Plasticity loss in the Critic Module as the main bottleneck for training efficiency**: By comparing the plasticity loss of different modules (encoder, actor, critic), the paper found that the plasticity loss in the critic module is the most significant influencing factor. This is contrary to the previous assumption that the plasticity loss in the encoder was the main reason. 3. **Irreversibility of early-stage plasticity loss**: The study found that if there is no timely intervention to restore the plasticity of the critic module in the early stages of training, this loss will be catastrophic and irreversible. Therefore, maintaining plasticity in the early stages is crucial. 4. **Dynamically adjusting the Replay Ratio (RR) to address the high RR dilemma**: The paper proposes a method called "adaptive RR," which dynamically adjusts the replay ratio based on the plasticity level of the critic module. This method aims to improve sample efficiency while avoiding the exacerbation of plasticity loss caused by increasing the replay ratio. In summary, this paper aims to propose effective strategies to improve the sample efficiency of VRL by deeply analyzing the mechanisms of plasticity loss in VRL, especially when dealing with the challenges of high-dimensional image observations.