Abstract:Deep Reinforcement Learning (DRL) has achieved remarkable success in solving complex decision-making problems by combining the representation capabilities of deep learning with the decision-making power of reinforcement learning. However, learning in sparse reward environments remains challenging due to insufficient feedback to guide the optimization of agents, especially in real-life environments with high-dimensional states. To tackle this issue, experience replay is commonly introduced to enhance learning efficiency through past experiences. Nonetheless, current methods of experience replay, whether based on uniform or prioritized sampling, frequently struggle with suboptimal learning efficiency and insufficient utilization of samples. This paper proposes a novel approach, diversity-based experience replay (DBER), which leverages the deterministic point process to prioritize diverse samples in state realizations. We conducted extensive experiments on Robotic Manipulation tasks in MuJoCo, Atari games, and realistic in-door environments in Habitat. The results show that our method not only significantly improves learning efficiency but also demonstrates superior performance in sparse reward environments with high-dimensional states, providing a simple yet effective solution for this field.

What problem does this paper attempt to address?

### Problems the paper attempts to solve The paper aims to solve the problem of low learning efficiency in deep reinforcement learning (DRL) under sparse - reward environments. Specifically, the paper focuses on the fact that in high - dimensional state spaces, due to the lack of sufficient feedback to guide the agent's optimization, the learning process is slow and inefficient. This problem is particularly prominent in real - world environments because these environments usually have high - dimensional state representations and sparse reward signals. ### Solutions To solve this problem, the paper proposes a new method - Diversity - based Experience Replay (DBER). DBER utilizes Determinantal Point Processes (DPPs) to preferentially select diverse sample trajectories, thereby improving sample utilization and learning efficiency. Compared with traditional uniform sampling or priority sampling methods, DBER does not rely on temporal - difference error (TD - error) to select samples, but ensures the representativeness and effectiveness of samples through diversity. ### Main contributions 1. **Proposed an experience replay strategy based on DPP**: This strategy enhances learning efficiency by preferentially selecting diverse trajectories, and is especially suitable for sparse - reward environments in high - dimensional state spaces. 2. **Extensive experimental verification**: Experiments were carried out in multiple simulation environments, including the AI Habitat platform, Atari games, and MuJoCo simulation environments, verifying the effectiveness and adaptability of this method. ### Technical details - **Basic principle of experience replay**: By storing and revisiting past experiences, break the data correlation in online learning and improve sample efficiency. - **Application of DPPs**: DPPs are a probability model used to capture the diversity of a set of points. The diversity of trajectories is evaluated by calculating the determinant of the kernel matrix. - **Improvement of computational efficiency**: Cholesky decomposition and rejection sampling techniques are adopted to reduce computational complexity and improve the applicability of the algorithm. ### Experimental results - **Continuous control tasks**: In the Fetch Robot Arm and Shadow Dexterous Hand tasks in the MuJoCo environment, DBER significantly improves the learning speed and success rate. - **Discrete - action games**: In Atari games, DBER performs well in multiple benchmarks, especially in environments that require a large amount of exploration. - **Real - world environments**: In the visual navigation tasks on the AI Habitat platform, DBER shows a higher success rate and robustness in multiple complex real - indoor environments. ### Conclusion The DBER method proposed in the paper effectively solves the learning efficiency problem in sparse - reward environments in high - dimensional state spaces by introducing the concept of diversity. The experimental results show that DBER not only improves learning efficiency but also shows superior performance in multiple tasks and environments.

Efficient Diversity-based Experience Replay for Deep Reinforcement Learning

ACDER: Augmented Curiosity-Driven Experience Replay

Experience Selection In Multi-Agent Deep Reinforcement Learning

Locality-Sensitive State-Guided Experience Replay Optimization for Sparse Rewards in Online Recommendation

Leveraging Efficiency Through Hybrid Prioritized Experience Replay in Door Environment.

Z-Score Experience Replay in Off-Policy Deep Reinforcement Learning

Multi-Agent Deep Deterministic Policy Gradient Algorithm Based on Classification Experience Replay

Balanced Prioritized Experience Replay in Off-Policy Reinforcement Learning

High-Value Prioritized Experience Replay For Off-Policy Reinforcement Learning

Ddper - Decentralized Distributed Prioritized Experience Replay.

Episodic Reinforcement Learning with Expanded State-reward Space

ROER: Regularized Optimal Experience Replay

Multi-Input Autonomous Driving Based on Deep Reinforcement Learning with Double Bias Experience Replay

Enhanced Generalization through Prioritization and Diversity in Self-Imitation Reinforcement Learning over Procedural Environments with Sparse Rewards

Synthetic Experiences for Accelerating DQN Performance in Discrete Non-Deterministic Environments

Soft Hindsight Experience Replay

Prioritized Generative Replay

Re-attentive experience replay in off-policy reinforcement learning

Advances in Experience Replay

Parallel Curriculum Experience Replay in Distributed Reinforcement Learning.

Replay across Experiments: A Natural Extension of Off-Policy RL