Abstract:Offline reinforcement learning (RL) harnesses the power of massive datasets for resolving sequential decision problems. Most existing papers only discuss defending against out-of-distribution (OOD) actions while we investigate a broader issue, the false correlations between epistemic uncertainty and decision-making, an essential factor that causes suboptimality. In this paper, we propose falSe COrrelation REduction (SCORE) for offline RL, a practically effective and theoretically provable algorithm. We empirically show that SCORE achieves the SoTA performance with 3.1x acceleration on various tasks in a standard benchmark (D4RL). The proposed algorithm introduces an annealing behavior cloning regularizer to help produce a high-quality estimation of uncertainty which is critical for eliminating false correlations from suboptimality. Theoretically, we justify the rationality of the proposed method and prove its convergence to the optimal policy with a sublinear rate under mild assumptions.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of false correlation in offline Reinforcement Learning (RL). Specifically: 1. **Background and Challenges**: - The goal of offline reinforcement learning is to learn the optimal policy from a pre - collected dataset without interacting with the environment. - Most existing studies only focus on how to defend against out - of - distribution (OOD) actions, that is, those actions that are not fully covered in the dataset. However, these methods do not fully solve the sub - optimality problem caused by the false correlation between epistemic uncertainty and decision - making. 2. **False Correlation**: - Due to the insufficient coverage of the dataset, a correlation is generated between epistemic uncertainty and decision - making. This makes the agent tend to choose strategies that seem good but are actually sub - optimal. - This false correlation is not only caused by OOD actions, but may also be due to the insufficient coverage of the state space. In addition, even among in - distribution samples, the uncertainty of different samples also varies. Samples with high uncertainty may introduce sub - optimality when the agent greedily pursues the maximum estimated value. 3. **Proposed Method**: - The paper proposes an algorithm named SCORE (falSe COrrelation REduction) to reduce false correlation in offline reinforcement learning. - SCORE helps to generate high - quality uncertainty estimates by introducing an annealing behavior cloning regularizer, thereby eliminating false correlation. - Theoretically, the authors prove that SCORE converges to the optimal policy at a sub - linear rate under mild assumptions, and does not need to sample and calculate the specific values of OOD samples. 4. **Experimental Results**: - Experiments show that SCORE achieves a 3.1 - fold speed improvement over existing methods in multiple tasks on the standard benchmark (D4RL) and reaches the state - of - the - art (SoTA) performance. In summary, this paper solves the sub - optimality problem caused by false correlation in offline reinforcement learning by proposing the SCORE algorithm, and provides theoretical and experimental evidence to support its effectiveness.

False Correlation Reduction for Offline Reinforcement Learning

DROP: Conservative Model-based Optimization for Offline Reinforcement Learning

Sparsity-based Safety Conservatism for Constrained Offline Reinforcement Learning

Offline Reinforcement Learning with OOD State Correction and OOD Action Suppression

Exploring and Addressing Reward Confusion in Offline Preference Learning

Towards Data-Driven Offline Simulations for Online Reinforcement Learning

Solving Continual Offline Reinforcement Learning with Decision Transformer

CDSA: Conservative Denoising Score-based Algorithm for Offline Reinforcement Learning

Look Beneath the Surface: Exploiting Fundamental Symmetry for Sample-Efficient Offline RL

Robust Offline Reinforcement Learning for Non-Markovian Decision Processes

Robust Offline Reinforcement Learning from Low-Quality Data

Boosting Offline Reinforcement Learning with Action Preference Query

Goal-conditioned Offline Reinforcement Learning through State Space Partitioning

Alleviating Matthew Effect of Offline Reinforcement Learning in Interactive Recommendation

Interpretable performance analysis towards offline reinforcement learning: A dataset perspective

Boosting Offline Reinforcement Learning via Data Rebalancing

Offline Fictitious Self-Play for Competitive Games

Is Pessimism Provably Efficient for Offline Reinforcement Learning?

Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes

A Minimalist Approach to Offline Reinforcement Learning

Near-Optimal Offline Reinforcement Learning via Double Variance Reduction