Abstract:Prioritized experience replay (PER) is an important technique in deep reinforcement learning (DRL). It improves the sampling efficiency of data in various DRL algorithms and achieves great performance. PER uses temporal difference error (TD-error) to measure the value of experiences and adjusts the sampling probability of experiences. Although PER can sample valuable experiences according to the TD-error, freshness is also an important character of experiences. It implicitly reflects the potential value of experiences. Fresh experiences are produced by virtue of the current networks and they are more valuable for updating the current network parameters than the past. The sampling of fresh experiences to train the neural networks can increase the learning speed of the agent, but few algorithms can perform this job efficiently. To solve this issue, a novel experience replay method is proposed in this paper. We first define that the experience freshness is negatively correlated with the number of replays. A new hyper-parameter, the freshness discounted factor μ, is introduced in PER to measure the experience freshness. Further, a novel experience replacement strategy in the replay buffer is proposed to increase the experience replacement efficiency. In our method, the sampling probability of fresh experiences is increased by raising its priority properly. So the algorithm is more likely to choose fresh experiences to train the neural networks during the learning process. We evaluated this method in both discrete control tasks and continuous control tasks via OpenAI Gym. The experimental results show that our method achieves better performance in both modes of operation.

NFSP-PER: an Efficient Sampling NFSP-based Method with Prioritized Experience Replay

NFSP-PLT: Solving Games with a Weighted NFSP-PER-Based Method

RM-FSP: Regret Minimization Optimizes Neural Fictitious Self-Play

Monte Carlo Neural Fictitious Self-Play: Achieve Approximate Nash equilibrium of Imperfect-Information Games.

A Monte Carlo Neural Fictitious Self-Play approach to approximate Nash Equilibrium in imperfect-information dynamic games

Monte Carlo Neural Fictitious Self-Play: Approach to Approximate Nash equilibrium of Imperfect-Information Games

Optimize Neural Fictitious Self-Play in Regret Minimization Thinking

Leveraging Efficiency Through Hybrid Prioritized Experience Replay in Door Environment.

Prioritized experience replay based on dynamics priority

Prioritised Experience Replay Based on Sample Optimisation

Finding nash equilibrium for imperfect information games via fictitious play based on local regret minimization

High-Value Prioritized Experience Replay For Off-Policy Reinforcement Learning

Attention Loss Adjusted Prioritized Experience Replay

Ddper - Decentralized Distributed Prioritized Experience Replay.

Fresher Experience Plays a More Important Role in Prioritized Experience Replay

Investigating the Interplay of Prioritized Replay and Generalization

Strategy Optimization of Imperfect Information Games Based on NFSP with DDQN

Directly Attention Loss Adjusted Prioritized Experience Replay.

Deep Reinforcement Learning from Self-Play in Imperfect-Information Games

Deep Reinforcement Learning from Self-Play in No-limit Texas Hold'em Poker

Reinforcement Nash Equilibrium Solver