Abstract:One important property of DIstribution Correction Estimation (DICE) methods is that the solution is the optimal stationary distribution ratio between the optimized and data collection policy. In this work, we show that DICE-based methods can be viewed as a transformation from the behavior distribution to the optimal policy distribution. Based on this, we propose a novel approach, Diffusion-DICE, that directly performs this transformation using diffusion models. We find that the optimal policy's score function can be decomposed into two terms: the behavior policy's score function and the gradient of a guidance term which depends on the optimal distribution ratio. The first term can be obtained from a diffusion model trained on the dataset and we propose an in-sample learning objective to learn the second term. Due to the multi-modality contained in the optimal policy distribution, the transformation in Diffusion-DICE may guide towards those local-optimal modes. We thus generate a few candidate actions and carefully select from them to approach global-optimum. Different from all other diffusion-based offline RL methods, the guide-then-select paradigm in Diffusion-DICE only uses in-sample actions for training and brings minimal error exploitation in the value function. We use a didatic toycase example to show how previous diffusion-based methods fail to generate optimal actions due to leveraging these errors and how Diffusion-DICE successfully avoids that. We then conduct extensive experiments on benchmark datasets to show the strong performance of Diffusion-DICE. Project page at <a class="link-external link-https" href="https://ryanxhr.github.io/Diffusion-DICE/" rel="external noopener nofollow">this https URL</a>.

D3D: Conditional Diffusion Model for Decision-Making under Random Frame Dropping

Decision Transformer under Random Frame Dropping

DROP: Conservative Model-based Optimization for Offline Reinforcement Learning

Off-dynamics Conditional Diffusion Planners

Long-Horizon Rollout via Dynamics Diffusion for Offline Reinforcement Learning

CMDRL: A Markovian Distributed Rate Limiting Algorithm in Cloud Networks

Efficient Diffusion Policies for Offline Reinforcement Learning

Diffusion Policies creating a Trust Region for Offline Reinforcement Learning

Diffusion Models as Optimizers for Efficient Planning in Offline RL

Diffusion World Model: Future Modeling Beyond Step-by-Step Rollout for Offline Reinforcement Learning

Diffusion-DICE: In-Sample Diffusion Guidance for Offline Reinforcement Learning

Diversification of Adaptive Policy for Effective Offline Reinforcement Learning

Reasoning with Latent Diffusion in Offline Reinforcement Learning

Instructed Diffuser with Temporal Condition Guidance for Offline Reinforcement Learning

Policy Representation via Diffusion Probability Model for Reinforcement Learning

Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model

On the Foundation of Distributionally Robust Reinforcement Learning

SimuDICE: Offline Policy Optimization Through World Model Updates and DICE Estimation

DIAR: Diffusion-model-guided Implicit Q-learning with Adaptive Revaluation

Learning from Random Demonstrations: Offline Reinforcement Learning with Importance-Sampled Diffusion Models

Diffusion Actor-Critic with Entropy Regulator