SUF: Stabilized Unconstrained Fine-Tuning for Offline-to-Online Reinforcement Learning

Jiaheng Feng,Mingxiao Feng,Haolin Song,Wengang Zhou,Houqiang Li
DOI: https://doi.org/10.1609/aaai.v38i11.29083
2024-03-24
Proceedings of the AAAI Conference on Artificial Intelligence
Abstract:Offline-to-online reinforcement learning (RL) provides a promising solution to improving suboptimal offline pre-trained policies through online fine-tuning. However, one efficient method, unconstrained fine-tuning, often suffers from severe policy collapse due to excessive distribution shift. To ensure stability, existing methods retain offline constraints and employ additional techniques during fine-tuning, which hurts efficiency. In this work, we introduce a novel perspective: eliminating the policy collapse without imposing constraints. We observe that such policy collapse arises from the mismatch between unconstrained fine-tuning and the conventional RL training framework. To this end, we propose Stabilized Unconstrained Fine-tuning (SUF), a streamlined framework that benefits from the efficiency of unconstrained fine-tuning while ensuring stability by modifying the Update-To-Data ratio. With just a few lines of code adjustments, SUF demonstrates remarkable adaptability to diverse backbones and superior performance over state-of-the-art baselines.
What problem does this paper attempt to address?