Efficient Exploitation for Off-Policy Reinforcement Learning Via a Novel Multi-scale Tabular Buffer

Shicheng Guo,Wei Xue,Wei Zhao,Yuanxia Shen,Gaohang Yu
DOI: https://doi.org/10.1109/acie58528.2023.00007
2023-01-01
Abstract:This paper is concerned with sample efficiency in off-policy reinforcement learning. For efficient exploitation of samples, we propose a novel method for updating buffer that allows important samples to remain in the buffer longer. Specifically, we first propose a new metric to determine the importance of a sample, which takes into account the value of the state and the period confidence level. Secondly, based on this metric, the multi-scale tabular buffer method is proposed, which sets multiple tables in the original buffer, and the size of the tables changes with time. Experiments show that the proposed method outperforms some state-of-the-art methods on a range of benchmark tasks. In particular, on some tasks, our method only requires half of samples to achieve the state-of-the-art performance.
What problem does this paper attempt to address?