Identifying Critical States by the Action-Based Variance of Expected Return

Izumi Karino,Yoshiyuki Ohmura,Yasuo Kuniyoshi

DOI: https://doi.org/10.1007/978-3-030-61609-0_29

2020-11-08

Abstract:The balance of exploration and exploitation plays a crucial role in accelerating reinforcement learning (RL). To deploy an RL agent in human society, its explainability is also essential. However, basic RL approaches have difficulties in deciding when to choose exploitation as well as in extracting useful points for a brief explanation of its operation. One reason for the difficulties is that these approaches treat all states the same way. Here, we show that identifying critical states and treating them specially is commonly beneficial to both problems. These critical states are the states at which the action selection changes the potential of success and failure substantially. We propose to identify the critical states using the variance in the Q-function for the actions and to perform exploitation with high probability on the identified states. These simple methods accelerate RL in a grid world with cliffs and two baseline tasks of deep RL. Our results also demonstrate that the identified critical states are intuitively interpretable regarding the crucial nature of the action selection. Furthermore, our analysis of the relationship between the timing of the identification of especially critical states and the rapid progress of learning suggests there are a few especially critical states that have important information for accelerating RL rapidly.

Machine Learning

What problem does this paper attempt to address?

The problems that this paper attempts to solve mainly focus on two aspects: the balance between exploration and exploitation in Reinforcement Learning (RL) and explainability. Specifically: 1. **Balance between exploration and exploitation**: In reinforcement learning, finding a proper balance point between exploring new strategies and exploiting known optimal strategies is a core issue. Basic RL methods have difficulties in deciding when to choose exploitation, especially in those states where specific action selections will have a huge impact on future success or failure. This is because these methods usually treat all states in the same way and do not pay special attention to certain key states. 2. **Explainability**: When considering applying RL agents to human society, their explainability is also necessary. However, basic RL methods are difficult to extract useful information to briefly explain the key points of their operations, which limits their acceptability and usability in practical applications. To solve the above problems, the paper proposes a method to identify key states and exploit them with a high probability in these states, thereby accelerating the learning process of RL and improving its explainability. Key states refer to those states where the selection of specific actions will significantly change the possibility of future success or failure. Through this method, researchers hope to accelerate learning while providing more intuitive and easy - to - understand explanations to help people better understand and use RL systems.

Identifying Critical States by the Action-Based Variance of Expected Return

Dealing with uncertainty: balancing exploration and exploitation in deep recurrent reinforcement learning

Criticality and Safety Margins for Reinforcement Learning

Is High Variance Unavoidable in RL? A Case Study in Continuous Control

Exploration in Feature Space for Reinforcement Learning

Explaining Reinforcement Learning Agents Through Counterfactual Action Outcomes

Criticality-Based Varying Step-Number Algorithm for Reinforcement Learning

The Effective Horizon Explains Deep RL Performance in Stochastic Environments

Careful at Estimation and Bold at Exploration

Extremum-Seeking Action Selection for Accelerating Policy Optimization

ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning

Bounded Exploration with World Model Uncertainty in Soft Actor-Critic Reinforcement Learning Algorithm

Reward Uncertainty for Exploration in Preference-based Reinforcement Learning

MADE: Exploration via Maximizing Deviation from Explored Regions

Extreme Risk Mitigation in Reinforcement Learning using Extreme Value Theory

How does Your RL Agent Explore? An Optimal Transport Analysis of Occupancy Measure Trajectories

LiFE:Deep Exploration Via Linear-Feature Bonus in Continuous Control

Reinforcement Learning with Probabilistically Complete Exploration

RICE: Breaking Through the Training Bottlenecks of Reinforcement Learning with Explanation

Accelerating Reinforcement Learning with Value-Conditional State Entropy Exploration