Offline Reinforcement Learning with Causal Structured World Models

Zheng-Mao Zhu,Xiong-Hui Chen,Hong-Long Tian,Kun Zhang,Yang Yu
DOI: https://doi.org/10.48550/arXiv.2206.01474
2022-06-03
Abstract:Model-based methods have recently shown promising for offline reinforcement learning (RL), aiming to learn good policies from historical data without interacting with the environment. Previous model-based offline RL methods learn fully connected nets as world-models that map the states and actions to the next-step states. However, it is sensible that a world-model should adhere to the underlying causal effect such that it will support learning an effective policy generalizing well in unseen states. In this paper, We first provide theoretical results that causal world-models can outperform plain world-models for offline RL by incorporating the causal structure into the generalization error bound. We then propose a practical algorithm, oFfline mOdel-based reinforcement learning with CaUsal Structure (FOCUS), to illustrate the feasibility of learning and leveraging causal structure in offline RL. Experimental results on two benchmarks show that FOCUS reconstructs the underlying causal structure accurately and robustly. Consequently, it performs better than the plain model-based offline RL algorithms and other causal model-based RL algorithms.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to use causal structures to improve the generalization ability of models in offline reinforcement learning (Offline RL). Specifically, traditional offline reinforcement learning methods usually use fully - connected networks as world models. These models may include some spurious variables when predicting the next state. These spurious variables may cause the model to perform poorly in unseen states because they do not capture the underlying causal relationships between states and actions. The paper points out that by incorporating causal structures into the world model, the influence of such spurious correlations can be avoided, thereby improving the generalization ability and learning efficiency of the model. To achieve this goal, the author proposes an algorithm named FOCUS (Offline Model - based Reinforcement Learning with Causal Structure). This algorithm first learns the causal structure from the data and then uses this causal structure to guide the learning of the world model. In this way, FOCUS can more accurately reconstruct the underlying causal structure and shows better performance than other non - causal - model - based offline reinforcement learning algorithms on two benchmark tests. The main contributions of the paper include: 1. **Theoretical support**: It is proved that in offline reinforcement learning, causal world models have an advantage in generalization error bounds compared to ordinary non - causal models. 2. **Practical algorithm**: A practical algorithm FOCUS is proposed, demonstrating the feasibility of learning and using causal structures in an offline environment. 3. **Experimental verification**: The theoretical analysis is verified by experimental results, indicating that FOCUS outperforms baseline models and other online causal models on multiple benchmark tasks. In summary, this paper aims to improve the performance of offline reinforcement learning by introducing causal structures, especially when dealing with unseen states, it can provide better generalization ability.