Deep Generative Models for Offline Policy Learning: Tutorial, Survey, and Perspectives on Future Directions

Jiayu Chen,Bhargav Ganguly,Yang Xu,Yongsheng Mei,Tian Lan,Vaneet Aggarwal
2024-02-21
Abstract:Deep generative models (DGMs) have demonstrated great success across various domains, particularly in generating texts, images, and videos using models trained from offline data. Similarly, data-driven decision-making and robotic control also necessitate learning a generator function from the offline data to serve as the strategy or policy. In this case, applying deep generative models in offline policy learning exhibits great potential, and numerous studies have explored in this direction. However, this field still lacks a comprehensive review and so developments of different branches are relatively independent. Thus, we provide the first systematic review on the applications of deep generative models for offline policy learning. In particular, we cover five mainstream deep generative models, including Variational Auto-Encoders, Generative Adversarial Networks, Normalizing Flows, Transformers, and Diffusion Models, and their applications in both offline reinforcement learning (offline RL) and imitation learning (IL). Offline RL and IL are two main branches of offline policy learning and are widely-adopted techniques for sequential decision-making. Specifically, for each type of DGM-based offline policy learning, we distill its fundamental scheme, categorize related works based on the usage of the DGM, and sort out the development process of algorithms in that field. Subsequent to the main content, we provide in-depth discussions on deep generative models and offline policy learning as a summary, based on which we present our perspectives on future research directions. This work offers a hands-on reference for the research progress in deep generative models for offline policy learning, and aims to inspire improved DGM-based offline RL or IL algorithms.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem this paper attempts to address is: how to utilize Deep Generative Models (DGMs) to improve the effectiveness of decision-making and robot control in Offline Policy Learning. Specifically, the paper explores the application of DGMs in Offline Reinforcement Learning (Offline RL) and Imitation Learning (IL), which are the two main components of offline policy learning. Offline policy learning is a machine learning method that learns effective policies from pre-existing static datasets for (robot) control or decision-making. Offline reinforcement learning uses batches of experience data collected from other policies or human operators, aiming to develop a policy that can maximize cumulative rewards, which may require deviating from the behavior patterns observed in the training data. Imitation learning, on the other hand, trains policies by imitating expert behavior, using data that should be trajectories demonstrated by experts. These trajectories typically include state-action pairs (Learning from Demonstrations, LfD) or state-next state pairs (Learning from Observations, LfO). The paper points out that although DGMs have achieved significant success in fields such as computer vision (CV) and natural language processing (NLP), their application in the field of offline policy learning lacks systematic review. Therefore, this paper aims to provide a comprehensive review covering mainstream types of DGMs, such as Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), Normalizing Flows, Transformer models, and Diffusion Models, as well as their applications in offline reinforcement learning and imitation learning. Through this review, the authors hope to inspire more research and development of DGM-based offline reinforcement learning or imitation learning algorithms.