Intuitive Fine-Tuning: Towards Unifying SFT and RLHF into a Single Process

Ning Ding,Kai Tian,Xingtai Lv,Bowen Zhou,Ermo Hua,Yue Yu,Kaiyan Zhang,Biqing Qi
Abstract:Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) are two fundamental processes for enhancing the capabilities of Language Models (LMs) post pre-training, aligning them better with human preferences. Although SFT advances in training efficiency, RLHF delivers better alignment, thus they are often combined. However, common practices simply apply them sequentially without unifying their optimization targets, resulting in trade-offs between fitting different objectives. This approach ignores the opportunities to bridge the paradigm gap and take the strengths from both. To obtain a unified understanding, we interpret SFT and RLHF with two sub-processes — Preference Estimation and Transition Optimization — defined at token level within the Markov Decision Process (MDP) framework. This modeling shows that SFT is only a specialized case of RLHF with inferior estimation and optimization. RLHF evaluates the quality of model’s entire generated answer, whereas SFT only scores predicted to-kens based on preceding tokens from target answers. Therefore, SFT overestimates the ability of model, leading to inferior optimization. Building on this view, we introduce Intuitive Fine-tuning (IFT) to integrate SFT and RLHF into a single process. IFT captures LMs’ intuitive sense of the entire answers through a temporal residual connection, but it solely relies on a single policy and the same volume of non-preference-labeled data as SFT. Our experiments show
Computer Science
What problem does this paper attempt to address?