Beyond Under-Alignment: Atomic Preference Enhanced Factuality Tuning for Large Language Models

Hongbang Yuan,Yubo Chen,Pengfei Cao,Zhuoran Jin,Kang Liu,Jun Zhao
2024-06-27
Abstract:Large language models (LLMs) have achieved remarkable success but still tend to generate factually erroneous responses, a phenomenon known as hallucination. A recent trend is to use preference learning to fine-tune models to align with factuality. However, existing work primarily evaluates fine-tuned models on in-domain (ID) datasets and the factuality on out-of-domain (OOD) datasets remains underexplored. In this paper, we conduct a comprehensive evaluation of the factuality of different models tuned by various preference learning algorithms and demonstrate that their performance on OOD datasets either increases minimally or decreases. Subsequently, we reveal that the main cause of model's failure to uphold factuality under a distribution shift is \textbf{under-alignment}, rather than \textbf{over-alignment}, by analyzing the token distribution shift of the models before and after tuning. Finally, we propose \textbf{APEFT} (\textbf{A}tomic \textbf{P}reference \textbf{E}nhanced \textbf{F}actuality \textbf{T}uning), a framework that enhances model's awareness of factuality at the granularity of individual facts. Extensive experiments demonstrate that APEFT improves model performance by an average of $\boldsymbol{3.45\%}$ on both ID and OOD datasets, which is highly effective.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
This paper attempts to address the hallucination problem that occurs when large language models (LLMs) generate factual responses. Specifically, although existing research adjusts models through preference learning methods to improve their factuality, these methods mainly evaluate model performance on datasets in the same domain as the training data, while insufficient research has been done on the factual performance on out - of - domain (OOD) datasets. The paper points out that the performance of existing methods on cross - domain datasets is either only slightly improved or even decreased. Therefore, by analyzing the behavioral changes of the model on different datasets before and after model tuning, the authors reveal that the main reason for the poor performance of the model on cross - domain datasets is "under - alignment" rather than "over - alignment". Based on this finding, they propose a new framework - Atomic Preference Enhanced Factuality Tuning (APEFT), which aims to enhance the model's factual awareness at the granularity of individual facts. Experimental results show that APEFT can significantly improve the model's performance on both in - domain and cross - domain datasets.