GTA: Generative Trajectory Augmentation with Guidance for Offline Reinforcement Learning

Jaewoo Lee,Sujin Yun,Taeyoung Yun,Jinkyoo Park

2024-06-12

Abstract:Offline Reinforcement Learning (Offline RL) presents challenges of learning effective decision-making policies from static datasets without any online interactions. Data augmentation techniques, such as noise injection and data synthesizing, aim to improve Q-function approximation by smoothing the learned state-action region. However, these methods often fall short of directly improving the quality of offline datasets, leading to suboptimal results. In response, we introduce \textbf{GTA}, Generative Trajectory Augmentation, a novel generative data augmentation approach designed to enrich offline data by augmenting trajectories to be both high-rewarding and dynamically plausible. GTA applies a diffusion model within the data augmentation framework. GTA partially noises original trajectories and then denoises them with classifier-free guidance via conditioning on amplified return value. Our results show that GTA, as a general data augmentation strategy, enhances the performance of widely used offline RL algorithms in both dense and sparse reward settings. Furthermore, we conduct a quality analysis of data augmented by GTA and demonstrate that GTA improves the quality of the data. Our code is available at <a class="link-external link-https" href="https://github.com/Jaewoopudding/GTA" rel="external noopener nofollow">this https URL</a>

Artificial Intelligence,Machine Learning

What problem does this paper attempt to address?

The paper aims to address the issue of data augmentation in Offline Reinforcement Learning (Offline RL). Specifically: 1. **Challenges of Offline Reinforcement Learning**: Offline RL learns effective decision-making policies from static datasets without requiring online interaction. However, these datasets often fail to sufficiently cover the state-action space, leading to extrapolation errors (i.e., overestimation of Q-values). 2. **Limitations of Existing Data Augmentation Methods**: Traditional data augmentation methods (such as noise injection and data synthesis) can improve the approximation of the Q-function but often fail to directly enhance the quality of the offline dataset, resulting in suboptimal outcomes. Generative data augmentation methods expand the dataset by generating synthetic data but are still limited by the support range of the original data, leading to low-quality generated data. 3. **Proposed New Method**: The paper introduces a new generative trajectory augmentation method—GTA (Generative Trajectory Augmentation), which uses conditional diffusion models to generate high-quality and dynamically reasonable trajectories within a data augmentation framework. GTA generates high-reward trajectories by partially noising the original trajectories and denoising them under the guidance of amplified rewards. In summary, the paper primarily addresses the shortcomings of existing offline reinforcement learning data augmentation methods in generating high-quality, high-reward trajectories and demonstrates the effectiveness of GTA in different task settings.

GTA: Generative Trajectory Augmentation with Guidance for Offline Reinforcement Learning

DARA: Dynamics-Aware Reward Augmentation in Offline Reinforcement Learning

Guided Data Augmentation for Offline Reinforcement Learning and Imitation Learning

Uncertainty-Aware Data Augmentation for Offline Reinforcement Learning

Uncertainty-driven Trajectory Truncation for Data Augmentation in Offline Reinforcement Learning

Automatic Data Augmentation for Generalization in Deep Reinforcement Learning

Model-based Trajectory Stitching for Improved Offline Reinforcement Learning

Automatic Data Augmentation for Generalization in Reinforcement Learning

Augmenting Offline Reinforcement Learning with State-only Interactions

Exploiting Generalization in Offline Reinforcement Learning via Unseen State Augmentations

ATraDiff: Accelerating Online Reinforcement Learning with Imaginary Trajectories

Small Dataset, Big Gains: Enhancing Reinforcement Learning by Offline Pre-Training with Model Based Augmentation

Using Offline Data to Speed-up Reinforcement Learning in Procedurally Generated Environments

Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory Weighting

Offline Trajectory Generalization for Offline Reinforcement Learning

Automatic Data Augmentation by Learning the Deterministic Policy

Generalization of Reinforcement Learning with Policy-Aware Adversarial Data Augmentation

A Recipe for Unbounded Data Augmentation in Visual Reinforcement Learning

TrajGAIL: Generating Urban Vehicle Trajectories using Generative Adversarial Imitation Learning

Offline Imitation Learning with Model-based Reverse Augmentation

Guiding Online Reinforcement Learning with Action-Free Offline Pretraining