Abstract:Deep reinforcement learning (DRL) is playing an increasingly important role in real-world applications. However, obtaining an optimally performing DRL agent for complex tasks, especially with sparse rewards, remains a significant challenge. The training of a DRL agent can be often trapped in a bottleneck without further progress. In this paper, we propose RICE, an innovative refining scheme for reinforcement learning that incorporates explanation methods to break through the training bottlenecks. The high-level idea of RICE is to construct a new initial state distribution that combines both the default initial states and critical states identified through explanation methods, thereby encouraging the agent to explore from the mixed initial states. Through careful design, we can theoretically guarantee that our refining scheme has a tighter sub-optimality bound. We evaluate RICE in various popular RL environments and real-world applications. The results demonstrate that RICE significantly outperforms existing refining schemes in enhancing agent performance.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the training bottleneck problem encountered by deep reinforcement learning (DRL) in complex tasks, especially the challenges in sparse - reward environments. Specifically: 1. **Training Bottleneck Problem**: When DRL agents are trained in complex tasks, they often get stuck in a bottleneck stage and cannot further improve their performance. This is manifested as the agent performing poorly in some critical states or being unable to achieve the final goal. 2. **Challenges in Sparse - Reward Environments**: In sparse - reward environments, it is difficult for agents to obtain sufficient feedback to guide their learning process, resulting in low training efficiency and poor performance. To solve these problems, the authors propose RICE (Reinforcement learning with Explanation), an innovative reinforcement learning refinement scheme. The main idea of RICE is to identify critical states through explanation methods and construct a new initial - state distribution, combining the default initial state with these critical states, thereby encouraging the agent to start exploration from a mixed initial state. ### Core Contributions of RICE 1. **Breaking Through the Training Bottleneck**: By combining explanation methods and a mixed initial - state distribution, RICE can effectively help the agent jump out of local optimal solutions and improve overall performance. 2. **Theoretical Guarantee**: The design of RICE ensures that it has a tighter sub - optimality bound, that is, it can theoretically better approximate the optimal policy. 3. **Improved Explanation Method**: The authors have improved the existing StateMask method, making it improve training efficiency while maintaining explanation accuracy. 4. **Extensive Experimental Verification**: The authors have evaluated the performance of RICE in multiple simulation games and real - world applications, and the results show that RICE is significantly superior to existing refinement methods. ### Formula Representation The formulas involved in the paper include, but are not limited to, the following: - **Value Function and Q - Function**: \[ V^{\pi}(s)=\mathbb{E}_{\pi}\left[\sum_{t = 0}^{\infty}\gamma^{t}R(s_{t},a_{t})\mid s_{0}=s\right] \] \[ Q^{\pi}(s,a)=\mathbb{E}_{\pi}\left[\sum_{t = 0}^{\infty}\gamma^{t}R(s_{t},a_{t})\mid s_{0}=s,a_{0}=a\right] \] - **Advantage Function**: \[ A^{\pi}(s,a)=Q^{\pi}(s,a)-V^{\pi}(s) \] - **State Occupancy Distribution**: \[ d_{\rho}^{\pi}(s)=(1-\gamma)\sum_{t = 0}^{\infty}\gamma^{t}\Pr_{\pi}(s_{t}=s\mid s_{0}\sim\rho) \] - **Sub - Optimality Bound**: \[ \text{SubOpt}:=V^{\pi^{*}}(\rho)-V^{\pi'}(\rho) \] Through these formulas, the authors show how RICE can effectively solve the bottleneck problem in DRL training both theoretically and practically.

RICE: Breaking Through the Training Bottlenecks of Reinforcement Learning with Explanation

A Survey on Explainable Reinforcement Learning: Concepts, Algorithms, Challenges

Leveraging Reward Consistency for Interpretable Feature Discovery in Reinforcement Learning

Causal State Distillation for Explainable Reinforcement Learning

RACCER: Towards Reachable and Certain Counterfactual Explanations for Reinforcement Learning

A Closer Look at Reward Decomposition for High-Level Robotic Explanations

Explaining Deep Q-Learning Experience Replay with SHapley Additive exPlanations

RL-CFR: Improving Action Abstraction for Imperfect Information Extensive-Form Games with Reinforcement Learning

DrM: Mastering Visual Reinforcement Learning through Dormant Ratio Minimization

Redefining Counterfactual Explanations for Reinforcement Learning: Overview, Challenges and Opportunities

Experiential Explanations for Reinforcement Learning

BET: Explaining Deep Reinforcement Learning through The Error-Prone Decisions

RLIF: Interactive Imitation Learning as Reinforcement Learning

Counterfactual Explainer Framework for Deep Reinforcement Learning Models Using Policy Distillation

Exploration-efficient Deep Reinforcement Learning with Demonstration Guidance for Robot Control

An Efficient Deep Reinforcement Learning Algorithm for Mapless Navigation with Gap-Guided Switching Strategy

CIER: A Novel Experience Replay Approach with Causal Inference in Deep Reinforcement Learning

Efficient Diversity-based Experience Replay for Deep Reinforcement Learning

Pseudo Value Network Distillation for High-Performance Exploration

CDT: Cascading Decision Trees for Explainable Reinforcement Learning

Reinforcement Learning with Probabilistically Complete Exploration