ARGS: Alignment as Reward-Guided Search

Maxim Khanov,Jirayu Burapacheep,Yixuan Li
2024-01-24
Abstract:Aligning large language models with human objectives is paramount, yet common approaches including RLHF suffer from unstable and resource-intensive training. In response to this challenge, we introduce ARGS, Alignment as Reward-Guided Search, a novel framework that integrates alignment into the decoding process, eliminating the need for expensive RL training. By adjusting the model's probabilistic predictions using a reward signal, ARGS generates texts with semantic diversity while being aligned with human preferences, offering a promising and flexible solution for aligning language models. Notably, ARGS demonstrates consistent enhancements in average reward compared to baselines across diverse alignment tasks and various model dimensions. For example, under the same greedy-based decoding strategy, our method improves the average reward by 19.56% relative to the baseline and secures a preference or tie score of 64.33% in GPT-4 evaluation. We believe that our framework, emphasizing decoding-time alignment, paves the way for more responsive language models in the future. Code is publicly available at: \url{
Computation and Language,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to address the issue of aligning large language models (LLMs) with human goals. Specifically: 1. **Background**: - Large language models perform excellently in handling various tasks, but the diversity of training data can lead to the generation of misleading information or harmful outputs. - Current mainstream methods like RLHF (Reinforcement Learning with Human Feedback) are effective but suffer from instability and resource-intensive issues during the training process. 2. **Proposed Method**: - The paper proposes a new framework called ARGS (Alignment as Reward-Guided Search), which directly adjusts the model's probability predictions during decoding, using reward signals to generate text that aligns with human preferences. - This method avoids the instability and high costs associated with traditional RL training processes, allowing the model to quickly adapt to new requirements during the decoding phase. 3. **Main Contributions**: - ARGS not only improves the average reward score of the generated text but also enhances the diversity and consistency of the text. - Experimental results show that ARGS significantly outperforms baseline methods across various model architectures and scales. - The paper emphasizes the importance of alignment during decoding time, providing a new perspective for future AI safety research. Through these improvements, ARGS offers a flexible and efficient solution, enabling language models to better respond to contemporary demands without the need for extensive retraining.