Ddper - Decentralized Distributed Prioritized Experience Replay.

Sidun Liu,Peng Qiao,Yong Dou,Rongchun Li
DOI: https://doi.org/10.1109/ICME51207.2021.9428188
2021-01-01
Abstract:In off-policy reinforcement learning, prioritized experience replay plays an important role. However, the centralized prioritized experience replay becomes the bottleneck for efficient training. We propose to approximate the centralized prioritized experience replay in a distributed and decentralized way under certain mild assumptions. To be specific, each actor stores samples in its local replay in the same way as prioritized experience replay, the learner fetches a batch of samples from these replays following a certain strategy. We implement a Deep Q-Learning off-policy algorithm upon the proposed framework. The comparison experiments are performed on a commonly used subset of the Atari-57 learning environment. The experimental results show that the proposed framework speeds up training as the number of actors increases. With the same algorithm and hyper-parameter settings, the proposed framework with 16 actors achieves superior performance that Ape-X with 32 and even more actors does.
What problem does this paper attempt to address?