Abstract:Model-based methods have recently shown promising for offline reinforcement learning (RL), aiming to learn good policies from historical data without interacting with the environment. Previous model-based offline RL methods learn fully connected nets as world-models that map the states and actions to the next-step states. However, it is sensible that a world-model should adhere to the underlying causal effect such that it will support learning an effective policy generalizing well in unseen states. In this paper, We first provide theoretical results that causal world-models can outperform plain world-models for offline RL by incorporating the causal structure into the generalization error bound. We then propose a practical algorithm, oFfline mOdel-based reinforcement learning with CaUsal Structure (FOCUS), to illustrate the feasibility of learning and leveraging causal structure in offline RL. Experimental results on two benchmarks show that FOCUS reconstructs the underlying causal structure accurately and robustly. Consequently, it performs better than the plain model-based offline RL algorithms and other causal model-based RL algorithms.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to use causal structures to improve the generalization ability of models in offline reinforcement learning (Offline RL). Specifically, traditional offline reinforcement learning methods usually use fully - connected networks as world models. These models may include some spurious variables when predicting the next state. These spurious variables may cause the model to perform poorly in unseen states because they do not capture the underlying causal relationships between states and actions. The paper points out that by incorporating causal structures into the world model, the influence of such spurious correlations can be avoided, thereby improving the generalization ability and learning efficiency of the model. To achieve this goal, the author proposes an algorithm named FOCUS (Offline Model - based Reinforcement Learning with Causal Structure). This algorithm first learns the causal structure from the data and then uses this causal structure to guide the learning of the world model. In this way, FOCUS can more accurately reconstruct the underlying causal structure and shows better performance than other non - causal - model - based offline reinforcement learning algorithms on two benchmark tests. The main contributions of the paper include: 1. **Theoretical support**: It is proved that in offline reinforcement learning, causal world models have an advantage in generalization error bounds compared to ordinary non - causal models. 2. **Practical algorithm**: A practical algorithm FOCUS is proposed, demonstrating the feasibility of learning and using causal structures in an offline environment. 3. **Experimental verification**: The theoretical analysis is verified by experimental results, indicating that FOCUS outperforms baseline models and other online causal models on multiple benchmark tasks. In summary, this paper aims to improve the performance of offline reinforcement learning by introducing causal structures, especially when dealing with unseen states, it can provide better generalization ability.

Offline Reinforcement Learning with Causal Structured World Models

Why Online Reinforcement Learning is Causal

Learning by Doing: An Online Causal Reinforcement Learning Framework with Causal-Aware Policy

Causal Reinforcement Learning using Observational and Interventional Data

Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning

Explainable Reinforcement Learning via a Causal World Model

Out-of-Distribution Adaptation in Offline RL: Counterfactual Reasoning via Causal Normalizing Flows

FOSP: Fine-tuning Offline Safe Policy through World Models

BECAUSE: Bilinear Causal Representation for Generalizable Offline Model-based Reinforcement Learning

Finetuning Offline World Models in the Real World

Locality Sensitive Sparse Encoding for Learning World Models Online

Model-Based Offline Adaptive Policy Optimization with Episodic Memory

Online Reinforcement Learning in Non-Stationary Context-Driven Environments

Model-Based Offline Planning

When Demonstrations Meet Generative World Models: A Maximum Likelihood Framework for Offline Inverse Reinforcement Learning

Offline Reinforcement Learning from Datasets with Structured Non-Stationarity

Causal prompting model-based offline reinforcement learning

Tackling Non-Stationarity in Reinforcement Learning via Causal-Origin Representation

Are Expressive Models Truly Necessary for Offline RL?

Deep Reinforcement Learning with Causality-based Intrinsic Reward

Policy-Based Bayesian Active Causal Discovery with Deep Reinforcement Learning