Deep Generative Models for Offline Policy Learning: Tutorial, Survey, and Perspectives on Future Directions

Jiayu Chen,Bhargav Ganguly,Yang Xu,Yongsheng Mei,Tian Lan,Vaneet Aggarwal

2024-02-21

Abstract:Deep generative models (DGMs) have demonstrated great success across various domains, particularly in generating texts, images, and videos using models trained from offline data. Similarly, data-driven decision-making and robotic control also necessitate learning a generator function from the offline data to serve as the strategy or policy. In this case, applying deep generative models in offline policy learning exhibits great potential, and numerous studies have explored in this direction. However, this field still lacks a comprehensive review and so developments of different branches are relatively independent. Thus, we provide the first systematic review on the applications of deep generative models for offline policy learning. In particular, we cover five mainstream deep generative models, including Variational Auto-Encoders, Generative Adversarial Networks, Normalizing Flows, Transformers, and Diffusion Models, and their applications in both offline reinforcement learning (offline RL) and imitation learning (IL). Offline RL and IL are two main branches of offline policy learning and are widely-adopted techniques for sequential decision-making. Specifically, for each type of DGM-based offline policy learning, we distill its fundamental scheme, categorize related works based on the usage of the DGM, and sort out the development process of algorithms in that field. Subsequent to the main content, we provide in-depth discussions on deep generative models and offline policy learning as a summary, based on which we present our perspectives on future research directions. This work offers a hands-on reference for the research progress in deep generative models for offline policy learning, and aims to inspire improved DGM-based offline RL or IL algorithms.

Machine Learning,Artificial Intelligence

What problem does this paper attempt to address?

The problem this paper attempts to address is: how to utilize Deep Generative Models (DGMs) to improve the effectiveness of decision-making and robot control in Offline Policy Learning. Specifically, the paper explores the application of DGMs in Offline Reinforcement Learning (Offline RL) and Imitation Learning (IL), which are the two main components of offline policy learning. Offline policy learning is a machine learning method that learns effective policies from pre-existing static datasets for (robot) control or decision-making. Offline reinforcement learning uses batches of experience data collected from other policies or human operators, aiming to develop a policy that can maximize cumulative rewards, which may require deviating from the behavior patterns observed in the training data. Imitation learning, on the other hand, trains policies by imitating expert behavior, using data that should be trajectories demonstrated by experts. These trajectories typically include state-action pairs (Learning from Demonstrations, LfD) or state-next state pairs (Learning from Observations, LfO). The paper points out that although DGMs have achieved significant success in fields such as computer vision (CV) and natural language processing (NLP), their application in the field of offline policy learning lacks systematic review. Therefore, this paper aims to provide a comprehensive review covering mainstream types of DGMs, such as Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), Normalizing Flows, Transformer models, and Diffusion Models, as well as their applications in offline reinforcement learning and imitation learning. Through this review, the authors hope to inspire more research and development of DGM-based offline reinforcement learning or imitation learning algorithms.

Deep Generative Models for Offline Policy Learning: Tutorial, Survey, and Perspectives on Future Directions

Design from Policies: Conservative Test-Time Adaptation for Offline Policy Optimization

Revisiting Generative Policies: A Simpler Reinforcement Learning Algorithmic Perspective

Offline Reinforcement Learning via High-Fidelity Generative Behavior Modeling

DiffPoGAN: Diffusion Policies with Generative Adversarial Networks for Offline Reinforcement Learning

A Review of Learning with Deep Generative Models from Perspective of Graphical Modeling

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

Boosting Offline Reinforcement Learning with Residual Generative Modeling

Training and Evaluation of Deep Policies using Reinforcement Learning and Generative Models

Enhancing Deep Reinforcement Learning: A Tutorial on Generative Diffusion Models in Network Optimization

Model Generation with Provable Coverability for Offline Reinforcement Learning

Generative AI for Deep Reinforcement Learning: Framework, Analysis, and Use Cases

Analyzing the Training Processes of Deep Generative Models.

Offline Model-Based Optimization via Policy-Guided Gradient Search

Offline Learning for Planning: A Summary

Deep Generative Models in Robotics: A Survey on Learning from Multimodal Demonstrations

Continual Offline Reinforcement Learning via Diffusion-based Dual Generative Replay

Adaptive Policy Learning for Offline-to-Online Reinforcement Learning

Distance Weighted Supervised Learning for Offline Interaction Data

The Generalization Gap in Offline Reinforcement Learning

Is Value Learning Really the Main Bottleneck in Offline RL?