Snapshot Reinforcement Learning: Leveraging Prior Trajectories for Efficiency

Yanxiao Zhao,Yangge Qian,Tianyi Wang,Jingyang Shan,Xiaolin Qin
2024-03-12
Abstract:Deep reinforcement learning (DRL) algorithms require substantial samples and computational resources to achieve higher performance, which restricts their practical application and poses challenges for further development. Given the constraint of limited resources, it is essential to leverage existing computational work (e.g., learned policies, samples) to enhance sample efficiency and reduce the computational resource consumption of DRL algorithms. Previous works to leverage existing computational work require intrusive modifications to existing algorithms and models, designed specifically for specific algorithms, lacking flexibility and universality. In this paper, we present the Snapshot Reinforcement Learning (SnapshotRL) framework, which enhances sample efficiency by simply altering environments, without making any modifications to algorithms and models. By allowing student agents to choose states in teacher trajectories as the initial state to sample, SnapshotRL can effectively utilize teacher trajectories to assist student agents in training, allowing student agents to explore a larger state space at the early training phase. We propose a simple and effective SnapshotRL baseline algorithm, S3RL, which integrates well with existing DRL algorithms. Our experiments demonstrate that integrating S3RL with TD3, SAC, and PPO algorithms on the MuJoCo benchmark significantly improves sample efficiency and average return, without extra samples and additional computational resources.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve sample efficiency and reduce computational resource consumption in deep reinforcement learning (DRL). Specifically, existing DRL algorithms require a large number of samples and computational resources to achieve higher performance, which limits their practical applications and development. In the case of limited resources, it becomes particularly important to utilize existing computational results (such as learned policies, samples) to enhance sample efficiency and reduce computational resource consumption. However, previous work usually requires invasive modifications to existing algorithms and models, and these modifications are designed for specific algorithms and lack flexibility and generality. To this end, the authors propose the Snapshot Reinforcement Learning (SnapshotRL) framework, which enhances sample efficiency by simply changing the environment without any modification to the algorithms and models. By allowing the student agent to select states in the teacher's trajectory as initial states for sampling, SnapshotRL can effectively use the teacher's trajectory to assist the training of the student agent, enabling the student agent to explore a larger state space in the early training stage. The paper also proposes a simple and effective SnapshotRL baseline algorithm, S3RL, and through experiments shows significant improvements of S3RL when combined with TD3, SAC, and PPO algorithms on the MuJoCo benchmark. These improvements achieve enhancements in sample efficiency and average return without increasing additional samples and computational resources.