Offline Reinforcement Learning with Policy Guidance and Uncertainty Estimation.

Lan Wu,Quan Liu,Lihua Zhang,Zhigang Huang
DOI: https://doi.org/10.1109/ICASSP48485.2024.10447908
2024-01-01
Abstract:Offline reinforcement learning is an approach for transforming static datasets into powerful decision engines. It cannot interact with the environment online, which leads to distribution shifts. Previous approaches addressed this problem by making the current policy as close as possible to the behavior policy. However, this type of approach severely limits the generalization ability of Q-functions. To address the above concerns, offline reinforcement learning with policy guidance and uncertainty estimation (PGUE) is proposed. PGUE proposes a fine-grained adjustment approach that improves the generalization ability of Q-functions using a perturbation model. The enhancement of the out-of-distribution generalization of Q-functions is achieved through the implicit guidance of the state space via a deterministic latent policy. Meanwhile, integrating uncertainty estimation into the loss function improves the in-distribution generalization of Q-functions. On the D4RL benchmark, PGUE has better performance than baselines. Moreover, we verify the state distribution and its out-of-distribution generalization ability.
What problem does this paper attempt to address?