Abstract:Offline Goal-Conditioned Reinforcement Learning (Offline GCRL) is an important problem in RL that focuses on acquiring diverse goal-oriented skills solely from pre-collected behavior datasets. In this setting, the reward feedback is typically absent except when the goal is achieved, which makes it difficult to learn policies especially from a finite dataset of suboptimal behaviors. In addition, realistic scenarios involve long-horizon planning, which necessitates the extraction of useful skills within sub-trajectories. Recently, the conditional diffusion model has been shown to be a promising approach to generate high-quality long-horizon plans for RL. However, their practicality for the goal-conditioned setting is still limited due to a number of technical assumptions made by the methods. In this paper, we propose SSD (Sub-trajectory Stitching with Diffusion), a model-based offline GCRL method that leverages the conditional diffusion model to address these limitations. In summary, we use the diffusion model that generates future plans conditioned on the target goal and value, with the target value estimated from the goal-relabeled offline dataset. We report state-of-the-art performance in the standard benchmark set of GCRL tasks, and demonstrate the capability to successfully stitch the segments of suboptimal trajectories in the offline data to generate high-quality plans.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve two main challenges in offline goal - conditioned reinforcement learning (Offline GCRL): 1. **Reward Sparsity**: In offline GCRL, the reward signal is usually sparse, and rewards are only provided when the goal is achieved. This makes it difficult to learn a policy from a limited dataset of sub - optimal behaviors. 2. **Unrealistic Trajectory Generation in Long - Horizon Planning Tasks**: Although existing diffusion - model - based methods show potential in generating high - quality long - horizon plans, they tend to generate unrealistic trajectories, especially in complex environments. To address these challenges, the authors propose a new method called SSD (Sub - trajectory Stitching with Diffusion), which uses a conditional diffusion model to generate sub - trajectories and stitches these sub - trajectories together to achieve the goal. Specifically, SSD solves the problem in the following ways: - **Multi - step Goal Chaining**: Enhance the ability to extract useful skills from sub - optimal trajectory fragments through multi - step goal chaining. This method can more effectively capture and integrate valuable parts, thereby generating high - quality behaviors. - **Value - Conditional Diffusion Model**: Introduce a conditional diffusion model that generates future plans based on the goal and the estimated action values. This design avoids the need for optimal plan length, explicit sub - goals, or hierarchical architectures, improving the realism and practicality of the generated trajectories. ### Core Contributions of the Paper 1. **Innovative Architecture**: Propose the Condition - Prompted - Unet architecture, which combines the Unet structure and Transformer blocks to better capture complex patterns and preserve spatial information, thereby generating more realistic trajectories. 2. **Efficient Training Method**: Solve the problems of reward sparsity and unrealistic trajectory generation by alternately training the goal - conditioned value function and the value - conditional diffusion model. 3. **Excellent Experimental Results**: Demonstrate state - of - the - art performance on standard benchmark datasets such as Maze2D and Multi2D, and also achieve significant results in robotic manipulation tasks (such as the Fetch environment). ### Summary SSD successfully solves the problems of reward sparsity and unrealistic trajectory generation in long - horizon planning tasks in offline GCRL by combining the conditional diffusion model and multi - step goal chaining, providing a new solution for the field of offline reinforcement learning.

Stitching Sub-Trajectories with Conditional Diffusion Model for Goal-Conditioned Offline RL

DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching

Curriculum Goal-Conditioned Imitation for Offline Reinforcement Learning

Goal-Conditioned Predictive Coding for Offline Reinforcement Learning

Diffusion Models as Optimizers for Efficient Planning in Offline RL

MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL

Reasoning with Latent Diffusion in Offline Reinforcement Learning

Offline Policy Learning via Skill-step Abstraction for Long-horizon Goal-Conditioned Tasks

SMORE: Score Models for Offline Goal-Conditioned Reinforcement Learning

Context-Former: Stitching via Latent Conditioned Sequence Modeling

GOPlan: Goal-conditioned Offline Reinforcement Learning by Planning with Learned Models

Instructed Diffuser with Temporal Condition Guidance for Offline Reinforcement Learning

Off-dynamics Conditional Diffusion Planners

Model-based Trajectory Stitching for Improved Offline Reinforcement Learning

Contrastive Diffuser: Planning Towards High Return States via Contrastive Learning

Learning to Reach Goals via Diffusion

Adaptive $Q$-Aid for Conditional Supervised Learning in Offline Reinforcement Learning

Efficient Diffusion Policies for Offline Reinforcement Learning

Diffusion-DICE: In-Sample Diffusion Guidance for Offline Reinforcement Learning

Goal-conditioned offline reinforcement learning through state space partitioning