Entropy-Aware Model Initialization for Effective Exploration in Deep Reinforcement Learning

Sooyoung Jang,Hyung-Il Kim
DOI: https://doi.org/10.48550/arXiv.2108.10533
2021-08-24
Abstract:Encouraging exploration is a critical issue in deep reinforcement learning. We investigate the effect of initial entropy that significantly influences the exploration, especially at the earlier stage. Our main observations are as follows: 1) low initial entropy increases the probability of learning failure, and 2) this initial entropy is biased towards a low value that inhibits exploration. Inspired by the investigations, we devise entropy-aware model initialization, a simple yet powerful learning strategy for effective exploration. We show that the devised learning strategy significantly reduces learning failures and enhances performance, stability, and learning speed through experiments.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively promote exploration in Deep Reinforcement Learning (DRL). Specifically, the authors focus on the impact of initial entropy on exploratory behavior, especially in tasks with discrete action spaces. The paper points out that low initial entropy increases the probability of learning failure, and this initial entropy often tends to be lower, which suppresses exploratory behavior. Therefore, the paper proposes an entropy - aware model initialization strategy, aiming to reduce learning failure and enhance performance, stability and learning speed by increasing the initial entropy. ### Main Findings and Contributions 1. **Reveal the Root Cause of Learning Failure**: - The paper observes through experiments that low initial entropy significantly increases the probability of learning failure. - There are significant differences in initial entropy between different tasks and model initializations, which makes it difficult to control the initial entropy in discrete control tasks. 2. **Propose an Entropy - Aware Model Initialization Strategy**: - This strategy repeats initializing the model and measuring entropy until the initial entropy exceeds a set threshold. - This strategy can be combined with any reinforcement learning algorithm because it only provides a well - initialized model. ### Experimental Results - **Reduce Learning Failure**: - After using the proposed entropy - aware model initialization strategy, the number of learning failures is significantly reduced. For example, in the Pong task, the number of learning failures using the default method is 22, while the number of learning failures using the new strategy is only 6; in the Breakout task, the number of learning failures is reduced from 15 to 1. - **Improve Performance**: - In the Pong task, the performance is improved by 1.98 times; in the Breakout task, the performance is improved by 2.04 times. - **Reduce Performance Fluctuation**: - The new strategy reduces performance fluctuations between different experiments. - **Accelerate Learning Speed**: - The new strategy significantly improves the learning speed, enabling the model to reach higher rewards more quickly. ### Implementation Method - **Algorithm Flow**: - Initialize the model. - Collect action selection probabilities through multiple actors. - Calculate the average entropy. - If the average entropy exceeds the set threshold, stop the initialization process and continue with DRL training; otherwise, reset the random seed and re - initialize the model, and repeat the above process. ### Conclusion The paper experimentally studies the impact of initial entropy in the DRL framework, especially for tasks with discrete action spaces. The research shows that low initial entropy will lead to frequent learning failures, and the proposed entropy - aware model initialization strategy effectively solves this problem and significantly improves the success rate, performance and speed of learning. This strategy has practical application value because it is easy to implement and applicable to multiple reinforcement learning algorithms.