Weighting Online Decision Transformer with Episodic Memory for Offline-to-Online Reinforcement Learning

Ma Xiao,Li Wu-Jun
DOI: https://doi.org/10.1109/icra57147.2024.10610701
2024-01-01
Abstract:Offline reinforcement learning (RL) has been shown to be successfully modeled as a sequence modeling problem, drawing inspiration from the success of Transformers. Offline RL is often limited by the quality of the offline dataset, so offline-to-online RL is a more realistic setting. Online decision transformer (ODT) is an effective and representative sequence modeling-based offline-to-online RL method. Despite its effectiveness, ODT still suffers from the sample inefficiency problem during the online fine-tuning phase. This sample inefficiency problem arises because the agent treats all state-action pairs in the replay buffer equally when trying to learn from the replay buffer. In this paper, we propose a simple yet effective method, called weighting online decision transformer with episodic memory (WODTEM), to improve sample efficiency. We first attempt to introduce an episodic memory (EM) mechanism into the sequence modeling-based RL methods. By utilizing the EM mechanism, we propose a novel training objective with a weighting function, based on ODT, to improve sample efficiency. Experimental results on multiple tasks show that WODTEM can improve sample efficiency.
What problem does this paper attempt to address?