Data-efficient Fine-tuning for LLM-based Recommendation

Xinyu Lin,Wenjie Wang,Yongqi Li,Shuo Yang,Fuli Feng,Yinwei Wei,Tat-Seng Chua

2024-06-04

Abstract:Leveraging Large Language Models (LLMs) for recommendation has recently garnered considerable attention, where fine-tuning plays a key role in LLMs' adaptation. However, the cost of fine-tuning LLMs on rapidly expanding recommendation data limits their practical application. To address this challenge, few-shot fine-tuning offers a promising approach to quickly adapt LLMs to new recommendation data. We propose the task of data pruning for efficient LLM-based recommendation, aimed at identifying representative samples tailored for LLMs' few-shot fine-tuning. While coreset selection is closely related to the proposed task, existing coreset selection methods often rely on suboptimal heuristic metrics or entail costly optimization on large-scale recommendation data. To tackle these issues, we introduce two objectives for the data pruning task in the context of LLM-based recommendation: 1) high accuracy aims to identify the influential samples that can lead to high overall performance; and 2) high efficiency underlines the low costs of the data pruning process. To pursue the two objectives, we propose a novel data pruning method based on two scores, i.e., influence score and effort score, to efficiently identify the influential samples. Particularly, the influence score is introduced to accurately estimate the influence of sample removal on the overall performance. To achieve low costs of the data pruning process, we use a small-sized surrogate model to replace LLMs to obtain the influence score. Considering the potential gap between the surrogate model and LLMs, we further propose an effort score to prioritize some hard samples specifically for LLMs. Empirical results on three real-world datasets validate the effectiveness of our proposed method. In particular, the proposed method uses only 2% samples to surpass the full data fine-tuning, reducing time costs by 97%.

Information Retrieval

What problem does this paper attempt to address?

The paper attempts to address the issue of high resource consumption and time costs when fine-tuning large language models (LLMs) on large-scale recommendation data. Specifically: 1. **Efficient Fine-Tuning**: Due to the gap between the pre-training data of LLMs on recommendation tasks and the actual recommendation tasks, and the continuous updating of recommendation data, frequent fine-tuning of LLMs becomes necessary. However, this requires a large amount of computational resources and time costs, thereby limiting the practicality of LLMs in real-world applications. 2. **Sample Selection**: To solve the above problem, researchers have proposed the task of "data pruning," which aims to identify representative samples from large-scale recommendation data to achieve effective fine-tuning of LLMs. By selecting a small number of representative samples for fine-tuning, time and computational costs can be significantly reduced. 3. **Limitations of Core Set Selection**: Existing core set selection methods (such as heuristic or optimization-based methods) either fail to effectively evaluate the impact of samples on empirical risk or are difficult to apply to large-scale datasets. Moreover, these methods rely on models trained on the entire dataset to select the core set, which becomes infeasible in recommendation systems due to the high training costs of LLMs. To address these issues, the paper proposes a new data pruning method called DEALRec, which combines influence score and effort score to efficiently identify the most influential samples for fine-tuning LLMs, thereby improving fine-tuning efficiency while maintaining good recommendation performance.

Data-efficient Fine-tuning for LLM-based Recommendation

Data-efficient Fine-tuning for LLM-based Recommendation

P3: A Policy-Driven, Pace-Adaptive, and Diversity-Promoted Framework for data pruning in LLM Training

Pruning Foundation Models for High Accuracy without Retraining

Finetuning Large Language Model for Personalized Ranking

TALLRec: An Effective and Efficient Tuning Framework to Align Large Language Model with Recommendation

Get more for less: Principled Data Selection for Warming Up Fine-Tuning in LLMs

LLM-enhanced Reranking in Recommender Systems

Speculative Coreset Selection for Task-Specific Fine-tuning

Maybe Only 0.5 Training Data Instruction Tuning

Tailored-LLaMA: Optimizing Few-Shot Learning in Pruned LLaMA Models with Task-Specific Prompts

Take the essence and discard the dross: A Rethinking on Data Selection for Fine-Tuning Large Language Models

LLMRec: Benchmarking Large Language Models on Recommendation Task

Reassessing Layer Pruning in LLMs: New Insights and Methods

Adaptive Dense-to-Sparse Paradigm for Pruning Online Recommendation System with Non-Stationary Data

Beware of Calibration Data for Pruning Large Language Models

RecRanker: Instruction Tuning Large Language Model as Ranker for Top-k Recommendation

Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs

NutePrune: Efficient Progressive Pruning with Numerous Teachers for Large Language Models

From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning

LLM-based Federated Recommendation