Annotation-Efficient Preference Optimization for Language Model Alignment

Yuu Jinnai,Ukyo Honda
2024-05-22
Abstract:Preference optimization is a standard approach to fine-tuning large language models to align with human preferences. The quality, diversity, and quantity of the preference dataset are critical to the effectiveness of preference optimization. However, obtaining a large amount of high-quality and diverse preference annotations is difficult in many applications. This raises the question of how to use the limited annotation budget to create an effective preference dataset. To this end, we propose Annotation-Efficient Preference Optimization (AEPO). Instead of exhaustively annotating preference over all available response texts, AEPO selects a subset of responses that maximizes quality and diversity from the available responses, and then annotates preference over the selected ones. In this way, AEPO focuses the annotation budget on labeling preference over a smaller subset of responses with diversity and of high quality. We evaluate the performance of Direct Preference Optimization (DPO) using AEPO and show that it outperforms models trained using a standard DPO with the same annotation budget. Our code is available at
Computation and Language,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to generate an effective preference dataset under a limited annotation budget for the alignment optimization of large - scale language models (LLMs). Specifically, the paper focuses on how to reduce the annotation workload by selecting diverse and high - quality responses while maintaining or improving the performance of the model. Existing methods such as Direct Preference Optimization (DPO) and Reinforcement Learning from Human Feedback (RLHF) all rely on a large amount of high - quality preference - annotated data, but the acquisition of these data is costly and time - consuming. Therefore, the paper proposes Annotation - Efficient Preference Optimization (AEPO), a new preference optimization method, which aims to reduce the annotation requirements through an efficient sub - sampling strategy, thereby constructing a more effective, more diverse and higher - quality preference dataset under a limited budget. The main contributions of AEPO are as follows: 1. **Reducing annotation costs**: Compared with the traditional West - of - N (WoN) strategy, AEPO significantly reduces the required number of annotations by selecting a subset of diverse and high - quality responses for annotation instead of annotating all responses. 2. **Improving model performance**: The experimental results show that AEPO outperforms the traditional DPO method on multiple datasets (such as AlpacaFarm, Anthropic's Helpfulness and Harmlessness datasets), especially when the number of responses is large. 3. **Applicable to multiple tasks**: AEPO not only performs well in language model alignment tasks, but also shows good generalization ability in other tasks (such as ARC, HellaSwag, TruthfulQA and WinoGrande benchmark tests). Through these improvements, AEPO provides a feasible method for efficiently training and optimizing large - scale language models in resource - constrained environments.