Sample-Efficient Deep Reinforcement Learning Via Balance Sample

Haiyang Yang,Tao Wang,Zhiyong Tan,Yao Yu
DOI: https://doi.org/10.1109/yac57282.2022.10023918
2022-01-01
Abstract:In this paper, we propose two algorithms to improve sample efficiency by focusing on late stage samples in episodes. The first algorithm is Balanced Sample Experience Replay (BSER). Unlike the traditional random sampling approach, this algorithm improves the final score and stability in environment by learning more late stage experience in the corresponding episode. The second algorithm is weight-corrected DQN (WCDQN). This algorithm differs from the traditional undifferentiated update approach by differentially updating the samples used for training to improve the final score and stability in environment. We tested both algorithms on a classic Atari game environment and demonstrated the effectiveness of the algorithms.
What problem does this paper attempt to address?