Abstract:One of the significant challenges in reinforcement learning (RL) when dealing with noise is estimating latent states from observations. Causality provides rigorous theoretical support for ensuring that the underlying states can be uniquely recovered through identifiability. Consequently, some existing work focuses on establishing identifiability from a causal perspective to aid in the design of algorithms. However, these results are often derived from a purely causal viewpoint, which may overlook the specific RL context. We revisit this research line and find that incorporating RL-specific context can reduce unnecessary assumptions in previous identifiability analyses for latent states. More importantly, removing these assumptions allows algorithm design to go beyond the earlier boundaries constrained by them. Leveraging these insights, we propose a novel approach for general partially observable Markov Decision Processes (POMDPs) by replacing the complicated structural constraints in previous methods with two simple constraints for transition and reward preservation. With the two constraints, the proposed algorithm is guaranteed to disentangle state and noise that is faithful to the underlying dynamics. Empirical evidence from extensive benchmark control tasks demonstrates the superiority of our approach over existing counterparts in effectively disentangling state belief from noise.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? The paper "Rethinking State Disentanglement in Causal Reinforcement Learning" aims to solve an important challenge in dealing with noise in reinforcement learning (RL): estimating latent states from observational data. Specifically, the paper focuses on how to effectively separate latent states and noise in partially observable Markov decision processes (POMDPs). #### Main problems: 1. **Noise interference**: In real - world applications, observational data usually contains noise, which can interfere with the accurate estimation of latent states, thus affecting the learning effect of policies. 2. **Limitations of existing methods**: Most previous studies have been from a purely causal perspective, assuming that latent states can be divided into independent subsets and need to satisfy some strict assumptions (such as the invertibility of the observation function). These assumptions often do not hold in actual RL scenarios, limiting the design and performance of algorithms. #### Core contributions of the paper: - **Re - examining state disentanglement in causal RL**: The authors propose a new perspective, by combining the specific context of RL, relaxing some unnecessary assumptions, making the algorithm design more flexible and practical. - **Simplifying constraints**: Replace the previous complex methods with two simple constraints (transition - preserving and reward - preserving) to ensure that the state and noise can be disentangled faithfully to the underlying dynamics. - **Theoretical and experimental verification**: Through strict theoretical analysis and extensive experimental verification, prove the superiority of the new method in multiple benchmark control tasks. #### Specific improvements: 1. **Removing unnecessary assumptions**: For example, no longer requiring that latent states can be divided into independent subsets, nor assuming that the observation function is invertible. 2. **Introducing Belief - MDP**: Convert non - invertible POMDPs into equivalent belief - MDPs, thereby achieving more general identifiability results. 3. **Algorithm design**: Based on the above theoretical analysis, propose a new algorithm framework that can more effectively separate states and noise in complex environments. In summary, this paper proposes a new method more suitable for practical application scenarios by re - thinking the state disentanglement problem in causal RL, and proves its effectiveness and superiority through theory and experiment.

Rethinking State Disentanglement in Causal Reinforcement Learning

A Survey on Causal Reinforcement Learning

Causal explanation for reinforcement learning: quantifying state and temporal importance

Building Minimal and Reusable Causal State Abstractions for Reinforcement Learning

On Causally Disentangled State Representation Learning for Reinforcement Learning based Recommender Systems

Causal State Distillation for Explainable Reinforcement Learning

Learning by Doing: An Online Causal Reinforcement Learning Framework with Causal-Aware Policy

Reinforcement Learning with Perturbed Rewards

Explainable Reinforcement Learning via a Causal World Model

Causal Reinforcement Learning using Observational and Interventional Data

Causal Reinforcement Learning: A Survey

Fine-Grained Causal Dynamics Learning with Quantization for Improving Robustness in Reinforcement Learning

Disentangling causal effects for hierarchical reinforcement learning

Learning Causal Overhypotheses through Exploration in Children and Computational Models

Identifiability Guarantees for Causal Disentanglement from Purely Observational Data

Causal Coordinated Concurrent Reinforcement Learning

Learning Causal State Representations of Partially Observable Environments

CIER: A Novel Experience Replay Approach with Causal Inference in Deep Reinforcement Learning

Identifiable Causal Representation Learning: Unsupervised, Multi-View, and Multi-Environment

Why Online Reinforcement Learning is Causal

Causality-driven Hierarchical Structure Discovery for Reinforcement Learning