Offline Reinforcement Learning Based on Next State Supervision.

Jie Yan,Quan Liu,Lihua Zhang
DOI: https://doi.org/10.1109/ICASSP48485.2024.10446781
2024-01-01
Abstract:Offline reinforcement learning aims to maximize the use of static offline datasets to train agents without interacting with the environment. For the problem of distribution shifts, most approaches avoid out-of-distribution actions through strong constraints, and do not consider generalization and learning within these domains. Based on this problem, we propose a novel method offline reinforcement learning based on next state supervision (NSS), which consists of two main components, the guidance policy and an adaptive coefficient. The guidance policy outputs the next-state with the highest value within a certain range around the current state and an adaptive coefficient regulates the weight of the penalty term in the learned policy. Empirical studies show that the method improves the performance of the baseline method with constraints while having some generalization ability.
What problem does this paper attempt to address?