State-Temporal Compression in Reinforcement Learning with the Reward-Restricted Geodesic Metric.

Shangqi Guo,Qi Yan,Xin Su,Xiaolin Hu,Feng Chen
DOI: https://doi.org/10.1109/tpami.2021.3069005
IF: 23.6
2022-01-01
IEEE Transactions on Pattern Analysis and Machine Intelligence
Abstract:It is difficult to solve complex tasks that involve large state spaces and long-term decision processes by reinforcement learning (RL) algorithms. A common and promising method to address this challenge is to compress a large RL problem into a small one. Towards this goal, the compression should be state-temporal and optimality-preserving (i.e., the optimal policy of the compressed problem should correspond to that of the uncompressed problem). In this paper, we propose a reward-restricted geodesic (RRG) metric, which can be learned by a neural network, to perform state-temporal compression in RL. We prove that compression based on the RRG metric is approximately optimality-preserving for the raw RL problem endowed with temporally abstract actions. With this compression, we design an RRG metric-based reinforcement learning (RRG-RL) algorithm to solve complex tasks. Experiments in both discrete (2D Minecraft) and continuous (Doom) environments demonstrated the superiority of our method over existing RL approaches.
What problem does this paper attempt to address?