Towards a Unified View of Preference Learning for Large Language Models: A Survey

Bofei Gao,Feifan Song,Yibo Miao,Zefan Cai,Zhe Yang,Liang Chen,Helan Hu,Runxin Xu,Qingxiu Dong,Ce Zheng,Shanghaoran Quan,Wen Xiao,Ge Zhang,Daoguang Zan,Keming Lu,Bowen Yu,Dayiheng Liu,Zeyu Cui,Jian Yang,Lei Sha,Houfeng Wang,Zhifang Sui,Peiyi Wang,Tianyu Liu,Baobao Chang
2024-10-31
Abstract:Large Language Models (LLMs) exhibit remarkably powerful capabilities. One of the crucial factors to achieve success is aligning the LLM's output with human preferences. This alignment process often requires only a small amount of data to efficiently enhance the LLM's performance. While effective, research in this area spans multiple domains, and the methods involved are relatively complex to understand. The relationships between different methods have been under-explored, limiting the development of the preference alignment. In light of this, we break down the existing popular alignment strategies into different components and provide a unified framework to study the current alignment strategies, thereby establishing connections among them. In this survey, we decompose all the strategies in preference learning into four components: model, data, feedback, and algorithm. This unified view offers an in-depth understanding of existing alignment algorithms and also opens up possibilities to synergize the strengths of different strategies. Furthermore, we present detailed working examples of prevalent existing algorithms to facilitate a comprehensive understanding for the readers. Finally, based on our unified perspective, we explore the challenges and future research directions for aligning large language models with human preferences.
Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how large language models (LLMs) can be better aligned with human preferences. Although large language models have demonstrated impressive capabilities in multiple fields, they still face challenges in ethics, safety, and reasoning. To address these challenges, many research efforts have proposed different alignment methods, but the relationships among these methods have not been fully explored, resulting in limited development of alignment strategies. For this reason, the authors propose a unified perspective to examine existing preference - learning methods, aiming to provide a systematic framework by decomposing these strategies into four components - Model, Data, Feedback, and Algorithm. This framework not only helps to understand the connections between existing algorithms but also provides new directions for future research. Specifically, the main contributions of the paper include: 1. **Proposing a unified perspective**: By decomposing existing preference - learning strategies into four components, a unified framework is established to study current alignment strategies, thereby establishing connections between different methods. 2. **Detailed analysis of common algorithms**: Specific working examples of popular existing algorithms are provided to help readers fully understand these algorithms. 3. **Exploring future research directions**: Based on the unified perspective, the challenges and future research directions for aligning large language models with human preferences are explored. In summary, this paper aims to help researchers better understand and explore the preference - alignment problem of large language models by providing a systematic framework.