Abstract:Encouraging exploration is a critical issue in deep reinforcement learning. We investigate the effect of initial entropy that significantly influences the exploration, especially at the earlier stage. Our main observations are as follows: 1) low initial entropy increases the probability of learning failure, and 2) this initial entropy is biased towards a low value that inhibits exploration. Inspired by the investigations, we devise entropy-aware model initialization, a simple yet powerful learning strategy for effective exploration. We show that the devised learning strategy significantly reduces learning failures and enhances performance, stability, and learning speed through experiments.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to effectively promote exploration in Deep Reinforcement Learning (DRL). Specifically, the authors focus on the impact of initial entropy on exploratory behavior, especially in tasks with discrete action spaces. The paper points out that low initial entropy increases the probability of learning failure, and this initial entropy often tends to be lower, which suppresses exploratory behavior. Therefore, the paper proposes an entropy - aware model initialization strategy, aiming to reduce learning failure and enhance performance, stability and learning speed by increasing the initial entropy. ### Main Findings and Contributions 1. **Reveal the Root Cause of Learning Failure**: - The paper observes through experiments that low initial entropy significantly increases the probability of learning failure. - There are significant differences in initial entropy between different tasks and model initializations, which makes it difficult to control the initial entropy in discrete control tasks. 2. **Propose an Entropy - Aware Model Initialization Strategy**: - This strategy repeats initializing the model and measuring entropy until the initial entropy exceeds a set threshold. - This strategy can be combined with any reinforcement learning algorithm because it only provides a well - initialized model. ### Experimental Results - **Reduce Learning Failure**: - After using the proposed entropy - aware model initialization strategy, the number of learning failures is significantly reduced. For example, in the Pong task, the number of learning failures using the default method is 22, while the number of learning failures using the new strategy is only 6; in the Breakout task, the number of learning failures is reduced from 15 to 1. - **Improve Performance**: - In the Pong task, the performance is improved by 1.98 times; in the Breakout task, the performance is improved by 2.04 times. - **Reduce Performance Fluctuation**: - The new strategy reduces performance fluctuations between different experiments. - **Accelerate Learning Speed**: - The new strategy significantly improves the learning speed, enabling the model to reach higher rewards more quickly. ### Implementation Method - **Algorithm Flow**: - Initialize the model. - Collect action selection probabilities through multiple actors. - Calculate the average entropy. - If the average entropy exceeds the set threshold, stop the initialization process and continue with DRL training; otherwise, reset the random seed and re - initialize the model, and repeat the above process. ### Conclusion The paper experimentally studies the impact of initial entropy in the DRL framework, especially for tasks with discrete action spaces. The research shows that low initial entropy will lead to frequent learning failures, and the proposed entropy - aware model initialization strategy effectively solves this problem and significantly improves the success rate, performance and speed of learning. This strategy has practical application value because it is easy to implement and applicable to multiple reinforcement learning algorithms.

Entropy-Aware Model Initialization for Effective Exploration in Deep Reinforcement Learning

Accelerating Reinforcement Learning with Value-Conditional State Entropy Exploration

Exploration Entropy for Reinforcement Learning

The Exploration-Exploitation Dilemma Revisited: An Entropy Perspective

Maximum Entropy Reinforcement Learning with Evolution Strategies

Off-Policy Actor-Critic in an Ensemble: Achieving Maximum General Entropy and Effective Environment Exploration in Deep Reinforcement Learning

An Effective Maximum Entropy Exploration Approach for Deceptive Game in Reinforcement Learning.

Deterministic Exploration via Stationary Bellman Error Maximization

Fast Rates for Maximum Entropy Exploration

A Temporally Correlated Latent Exploration for Reinforcement Learning

Intrinsic Rewards for Exploration Without Harm From Observational Noise: A Simulation Study Based on the Free Energy Principle

Multimodal Reward Shaping for Efficient Exploration in Reinforcement Learning

Understanding the impact of entropy on policy optimization

Exploration by Maximizing Renyi Entropy for Reward-Free RL Framework.

Robotic Exploration using Generalized Behavioral Entropy

Surprise-Adaptive Intrinsic Motivation for Unsupervised Reinforcement Learning

Learning Merton's Strategies in an Incomplete Market: Recursive Entropy Regularization and Biased Gaussian Exploration

Random curiosity-driven exploration in deep reinforcement learning

ELEMENT: Episodic and Lifelong Exploration via Maximum Entropy

Random Latent Exploration for Deep Reinforcement Learning

Off-Policy Maximum Entropy RL with Future State and Action Visitation Measures