False Correlation Reduction for Offline Reinforcement Learning

Zhihong Deng,Zuyue Fu,Lingxiao Wang,Zhuoran Yang,Chenjia Bai,Tianyi Zhou,Zhaoran Wang,Jing Jiang
2023-11-01
Abstract:Offline reinforcement learning (RL) harnesses the power of massive datasets for resolving sequential decision problems. Most existing papers only discuss defending against out-of-distribution (OOD) actions while we investigate a broader issue, the false correlations between epistemic uncertainty and decision-making, an essential factor that causes suboptimality. In this paper, we propose falSe COrrelation REduction (SCORE) for offline RL, a practically effective and theoretically provable algorithm. We empirically show that SCORE achieves the SoTA performance with 3.1x acceleration on various tasks in a standard benchmark (D4RL). The proposed algorithm introduces an annealing behavior cloning regularizer to help produce a high-quality estimation of uncertainty which is critical for eliminating false correlations from suboptimality. Theoretically, we justify the rationality of the proposed method and prove its convergence to the optimal policy with a sublinear rate under mild assumptions.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of false correlation in offline Reinforcement Learning (RL). Specifically: 1. **Background and Challenges**: - The goal of offline reinforcement learning is to learn the optimal policy from a pre - collected dataset without interacting with the environment. - Most existing studies only focus on how to defend against out - of - distribution (OOD) actions, that is, those actions that are not fully covered in the dataset. However, these methods do not fully solve the sub - optimality problem caused by the false correlation between epistemic uncertainty and decision - making. 2. **False Correlation**: - Due to the insufficient coverage of the dataset, a correlation is generated between epistemic uncertainty and decision - making. This makes the agent tend to choose strategies that seem good but are actually sub - optimal. - This false correlation is not only caused by OOD actions, but may also be due to the insufficient coverage of the state space. In addition, even among in - distribution samples, the uncertainty of different samples also varies. Samples with high uncertainty may introduce sub - optimality when the agent greedily pursues the maximum estimated value. 3. **Proposed Method**: - The paper proposes an algorithm named SCORE (falSe COrrelation REduction) to reduce false correlation in offline reinforcement learning. - SCORE helps to generate high - quality uncertainty estimates by introducing an annealing behavior cloning regularizer, thereby eliminating false correlation. - Theoretically, the authors prove that SCORE converges to the optimal policy at a sub - linear rate under mild assumptions, and does not need to sample and calculate the specific values of OOD samples. 4. **Experimental Results**: - Experiments show that SCORE achieves a 3.1 - fold speed improvement over existing methods in multiple tasks on the standard benchmark (D4RL) and reaches the state - of - the - art (SoTA) performance. In summary, this paper solves the sub - optimality problem caused by false correlation in offline reinforcement learning by proposing the SCORE algorithm, and provides theoretical and experimental evidence to support its effectiveness.