Approximation Error Back-Propagation for Q-Function in Scalable Reinforcement Learning with Tree Dependence Structure

Yuzi Yan,Yu Dong,Kai Ma,Yuan Shen
DOI: https://doi.org/10.1109/ICASSP49357.2023.10096433
2023-01-01
Abstract:This paper applies the exponential decay property of scalable RL theory to a specific scenario where the network structure is a tree, and use KL (Kullback-Leibler) divergence to analyze the propagation of approximation error along the structure over time, in order to quantify its backtracking result. We gain the insight that most of the approximation error originates from the inaccurate estimation of the state of the source nodes (root in Top-Down mode and leaves in Bottom-Up mode), which can be largely recovered by establishing the long-hop communication link <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">12</sup> .
What problem does this paper attempt to address?