Aligning Large Language Model with Direct Multi-Preference Optimization for Recommendation

Zhuoxi Bai,Ning Wu,Fengyu Cai,Xinyi Zhu,Yun Xiong
DOI: https://doi.org/10.1145/3627673.3679611
2024-01-01
Abstract:Large Language Models (LLMs) have shown impressive performance in various domains, prompting researchers to explore their potential application in recommendation systems. However, directly applying LLMs to recommendation tasks has proven to be less effective due to the significant gap between the data used for pre-training LLMs and the specific requirements of recommendation tasks. In this study, we propose Direct Multi-Preference Optimization (DMPO), a streamlined framework to bridge this gap and enhance the alignment of LLMs for recommendation tasks. DMPO can be viewed as a pair-wise ranking loss to distinguish between positive and negative samples in recommendation tasks. Furthermore, DMPO improves the performance of LLM-based recommenders by maximizing the probability of positive samples and minimizing the probability of multiple negative samples at the same time. Experimental evaluations are conducted to compare DMPO with traditional recommendation methods and other LLM-based recommendation methods. The results reveal that DMPO significantly enhances the recommendation capabilities of LLMs across three real-world public datasets in few-shot scenarios. Furthermore, the experiments also demonstrate that DMPO exhibits superior generalization ability in cross-domain recommendation. A case study elucidates the reasons behind these consistent improvements and also underscores DMPO's potential as an explainable recommendation system. Our code and data are available at https://github.com/BZX667/DMPO.
What problem does this paper attempt to address?