Abstract:Offline reinforcement learning often requires a quality dataset that we can train a policy on. However, in many situations, it is not possible to get such a dataset, nor is it easy to train a policy to perform well in the actual environment given the offline data. We propose using data distillation to train and distill a better dataset which can then be used for training a better policy model. We show that our method is able to synthesize a dataset where a model trained on it achieves similar performance to a model trained on the full dataset or a model trained using percentile behavioral cloning. Our project site is available at $\href{<a class="link-external link-https" href="https://datasetdistillation4rl.github.io" rel="external noopener nofollow">this https URL</a>}{\text{here}}$. We also provide our implementation at $\href{<a class="link-external link-https" href="https://github.com/ggflow123/DDRL" rel="external noopener nofollow">this https URL</a>}{\text{this GitHub repository}}$.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the key challenges in offline reinforcement learning (Offline Reinforcement Learning, Offline RL), that is, how to train a well - performing policy model without being able to interact online with the environment. Specifically, the paper proposes a new method - through dataset distillation to synthesize a higher - quality offline dataset for training better policy models. #### Main problems: 1. **Quality problem of offline data**: Offline RL depends on a dataset generated by an expert policy, but in practical applications, we can often only obtain data generated by sub - optimal policies, which limits the possibility of training high - performance policies. 2. **Distributional shift problem**: The policy trained offline may produce a different data distribution from the original dataset, leading to a decline in generalization ability. 3. **Data volume and efficiency problems**: Traditional offline RL methods usually require a large amount of data, and the method proposed in this paper can improve sample efficiency by synthesizing a smaller but higher - quality dataset. #### Solutions: - **Data distillation method**: The paper proposes using the data distillation technique to train and distill a higher - quality offline dataset. Through this method, the synthesized dataset can enable the model to show similar or even better performance during training compared to using the complete dataset or percentile behavioral cloning. - **Gradient matching loss**: To ensure the effectiveness of the synthesized dataset, the author introduces the gradient matching loss to minimize the gradient difference between the real dataset and the synthesized dataset. #### Experimental verification: - **Experimental environment**: The author conducted experiments in the Procgen environment, which is a suite of 16 procedurally - generated environments developed by OpenAI. Through these environments, the adaptability and generalization performance of the model in different scenarios can be evaluated. - **Experimental results**: The results show that the student model trained with the synthesized dataset shows comparable or even better performance in multiple environments compared to the student model trained with the complete dataset or percentile behavioral cloning, especially with a significant improvement in sample efficiency. In summary, the core problem of this paper is to improve the data quality in offline reinforcement learning through the data distillation technique, thereby improving the training effect and generalization ability of the policy model.

Dataset Distillation for Offline Reinforcement Learning

DARA: Dynamics-Aware Reward Augmentation in Offline Reinforcement Learning

D4RL: Datasets for Deep Data-Driven Reinforcement Learning

Behaviour Distillation

D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning

Offline Reinforcement Learning with Imbalanced Datasets

Data Valuation for Offline Reinforcement Learning

Boosting Offline Reinforcement Learning via Data Rebalancing

Datasets and Benchmarks for Offline Safe Reinforcement Learning

Offline RL With Realistic Datasets: Heteroskedasticity and Support Constraints

Adaptive Behavior Cloning Regularization for Stable Offline-to-Online Reinforcement Learning

Efficient Online Reinforcement Learning with Offline Data

Policy Regularization with Dataset Constraint for Offline Reinforcement Learning

Offline Behavior Distillation

Towards Trustworthy Dataset Distillation

ORL-AUDITOR: Dataset Auditing in Offline Deep Reinforcement Learning

Dataset Distillation from First Principles: Integrating Core Information Extraction and Purposeful Learning

A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open Problems

Dataset Distillation: A Comprehensive Review

Diffusion Policies creating a Trust Region for Offline Reinforcement Learning

Real-time Policy Distillation in Deep Reinforcement Learning