Data Efficient Deep Reinforcement Learning with Action-Ranked Temporal Difference Learning
Qi Liu,Yanjie Li,Yuecheng Liu,Ke Lin,Jianqi Gao,Yunjiang Lou
DOI: https://doi.org/10.1109/tetci.2024.3369641
2024-01-01
IEEE Transactions on Emerging Topics in Computational Intelligence
Abstract:In value-based deep reinforcement learning (RL), value function approximation errors lead to suboptimal policies. Temporal difference (TD) learning is one of the most important methodologies to approximate state-action ( $Q$ ) value function. In TD learning, it is critical to estimate $Q$ values of greedy actions more accurately because a more accurate target $Q$ value enhances the estimation accuracy of $Q$ value. To improve the estimation accuracy of $Q$ value, we propose an action-ranked TD learning method to enhance the performance of deep RL by weighting each TD error according to the rank of its corresponding state-action pair's value among all the $Q$ values on a state. The proposed method can provide more accurate target values for TD learning, making the estimation of the $Q$ value more accurate. We apply the proposed method to a representative value-based deep RL algorithm, and results show that the proposed method outperforms baselines on 31 out of 40 Atari games. Furthermore, we extend the proposed method to multi-agent deep RL. To adaptively determine the hyperparameter in action-ranked TD learning, we propose a meta action-ranked TD learning. A series of experiments quantitatively verify that our methods outperform baselines on Atari games, StarCraft-II, and Grid World environments.