Identifying Critical States by the Action-Based Variance of Expected Return

Izumi Karino,Yoshiyuki Ohmura,Yasuo Kuniyoshi
DOI: https://doi.org/10.1007/978-3-030-61609-0_29
2020-11-08
Abstract:The balance of exploration and exploitation plays a crucial role in accelerating reinforcement learning (RL). To deploy an RL agent in human society, its explainability is also essential. However, basic RL approaches have difficulties in deciding when to choose exploitation as well as in extracting useful points for a brief explanation of its operation. One reason for the difficulties is that these approaches treat all states the same way. Here, we show that identifying critical states and treating them specially is commonly beneficial to both problems. These critical states are the states at which the action selection changes the potential of success and failure substantially. We propose to identify the critical states using the variance in the Q-function for the actions and to perform exploitation with high probability on the identified states. These simple methods accelerate RL in a grid world with cliffs and two baseline tasks of deep RL. Our results also demonstrate that the identified critical states are intuitively interpretable regarding the crucial nature of the action selection. Furthermore, our analysis of the relationship between the timing of the identification of especially critical states and the rapid progress of learning suggests there are a few especially critical states that have important information for accelerating RL rapidly.
Machine Learning
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on two aspects: the balance between exploration and exploitation in Reinforcement Learning (RL) and explainability. Specifically: 1. **Balance between exploration and exploitation**: In reinforcement learning, finding a proper balance point between exploring new strategies and exploiting known optimal strategies is a core issue. Basic RL methods have difficulties in deciding when to choose exploitation, especially in those states where specific action selections will have a huge impact on future success or failure. This is because these methods usually treat all states in the same way and do not pay special attention to certain key states. 2. **Explainability**: When considering applying RL agents to human society, their explainability is also necessary. However, basic RL methods are difficult to extract useful information to briefly explain the key points of their operations, which limits their acceptability and usability in practical applications. To solve the above problems, the paper proposes a method to identify key states and exploit them with a high probability in these states, thereby accelerating the learning process of RL and improving its explainability. Key states refer to those states where the selection of specific actions will significantly change the possibility of future success or failure. Through this method, researchers hope to accelerate learning while providing more intuitive and easy - to - understand explanations to help people better understand and use RL systems.