Dataset Distillation for Offline Reinforcement Learning

Jonathan Light,Yuanzhe Liu,Ziniu Hu
2024-08-01
Abstract:Offline reinforcement learning often requires a quality dataset that we can train a policy on. However, in many situations, it is not possible to get such a dataset, nor is it easy to train a policy to perform well in the actual environment given the offline data. We propose using data distillation to train and distill a better dataset which can then be used for training a better policy model. We show that our method is able to synthesize a dataset where a model trained on it achieves similar performance to a model trained on the full dataset or a model trained using percentile behavioral cloning. Our project site is available at $\href{<a class="link-external link-https" href="https://datasetdistillation4rl.github.io" rel="external noopener nofollow">this https URL</a>}{\text{here}}$. We also provide our implementation at $\href{<a class="link-external link-https" href="https://github.com/ggflow123/DDRL" rel="external noopener nofollow">this https URL</a>}{\text{this GitHub repository}}$.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the key challenges in offline reinforcement learning (Offline Reinforcement Learning, Offline RL), that is, how to train a well - performing policy model without being able to interact online with the environment. Specifically, the paper proposes a new method - through dataset distillation to synthesize a higher - quality offline dataset for training better policy models. #### Main problems: 1. **Quality problem of offline data**: Offline RL depends on a dataset generated by an expert policy, but in practical applications, we can often only obtain data generated by sub - optimal policies, which limits the possibility of training high - performance policies. 2. **Distributional shift problem**: The policy trained offline may produce a different data distribution from the original dataset, leading to a decline in generalization ability. 3. **Data volume and efficiency problems**: Traditional offline RL methods usually require a large amount of data, and the method proposed in this paper can improve sample efficiency by synthesizing a smaller but higher - quality dataset. #### Solutions: - **Data distillation method**: The paper proposes using the data distillation technique to train and distill a higher - quality offline dataset. Through this method, the synthesized dataset can enable the model to show similar or even better performance during training compared to using the complete dataset or percentile behavioral cloning. - **Gradient matching loss**: To ensure the effectiveness of the synthesized dataset, the author introduces the gradient matching loss to minimize the gradient difference between the real dataset and the synthesized dataset. #### Experimental verification: - **Experimental environment**: The author conducted experiments in the Procgen environment, which is a suite of 16 procedurally - generated environments developed by OpenAI. Through these environments, the adaptability and generalization performance of the model in different scenarios can be evaluated. - **Experimental results**: The results show that the student model trained with the synthesized dataset shows comparable or even better performance in multiple environments compared to the student model trained with the complete dataset or percentile behavioral cloning, especially with a significant improvement in sample efficiency. In summary, the core problem of this paper is to improve the data quality in offline reinforcement learning through the data distillation technique, thereby improving the training effect and generalization ability of the policy model.