Abstract:Recommender Systems (RS) play a pivotal role in boosting user satisfaction by providing personalized product suggestions in domains such as e-commerce and entertainment. This study examines the integration of multimodal data text and audio into large language models (LLMs) with the aim of enhancing recommendation performance. Traditional text and audio recommenders encounter limitations such as the cold-start problem, and recent advancements in LLMs, while promising, are computationally expensive. To address these issues, Low-Rank Adaptation (LoRA) is introduced, which enhances efficiency without compromising performance. The ATFLRec framework is proposed to integrate audio and text modalities into a multimodal recommendation system, utilizing various LoRA configurations and modality fusion techniques. Results indicate that ATFLRec outperforms baseline models, including traditional and graph neural network-based approaches, achieving higher AUC scores. Furthermore, separate fine-tuning of audio and text data with distinct LoRA modules yields optimal performance, with different pooling methods and Mel filter bank numbers significantly impacting performance. This research offers valuable insights into optimizing multimodal recommender systems and advancing the integration of diverse data modalities in LLMs.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to improve the performance of recommendation systems, especially in terms of the cold - start problem and multimodal data fusion. Specifically: 1. **Cold - start problem**: Traditional text and audio recommendation systems have difficulty providing accurate personalized recommendations when facing new users or new items due to the lack of sufficient historical data. Although large - language models (LLMs) show potential in this regard, their computational costs are high, and parameter adjustments to adapt to the entire system are computationally impractical and expensive. 2. **Multimodal data fusion**: Most current recommendation systems rely only on data of a single modality (such as text or audio) and ignore the comprehensive use of multiple data modalities (such as text, audio, images, etc.). Although multimodal recommendation systems can provide more comprehensive user and content information, how to effectively integrate information of these different modalities remains a challenge. To solve these problems, the paper proposes the ATFLRec framework, aiming to improve the performance of recommendation systems in the following ways: - **Low - rank adaptation (LoRA)**: Improve efficiency by modifying specific system parameters without affecting the running time of the recommendation system. The LoRA method enables efficient model fine - tuning even in low - GPU - memory settings. - **Multimodal fusion**: Integrate audio and text - modality data into large - language models, using different LoRA configurations and modality - fusion techniques to enhance recommendation performance. The main contributions of the paper include: 1. Proposing a multimodal recommendation system that integrates audio - modality content into large - language models. 2. Exploring the impact of different LoRA modules on large - language models and providing empirical insights into multimodal model fine - tuning. 3. Studying the impact of different audio stacking pooling methods, multimodal data - fusion pooling methods, and the number of filters on the performance of recommendation systems. Through these improvements, ATFLRec can also significantly outperform traditional deep - learning recommendation methods in the case of few - shot learning and achieve better performance in terms of the AUC metric.

ATFLRec: A Multimodal Recommender System with Audio-Text Fusion and Low-Rank Adaptation via Instruction-Tuned Large Language Model

Lifelong Personalized Low-Rank Adaptation of Large Language Models for Recommendation

CoRAL: Collaborative Retrieval-Augmented Large Language Models Improve Long-tail Recommendation

Harnessing Large Language Models for Text-Rich Sequential Recommendation

MMREC: LLM Based Multi-Modal Recommender System

Representation Learning with Large Language Models for Recommendation

Adapting Large Language Models by Integrating Collaborative Semantics for Recommendation

Collaborative Cross-modal Fusion with Large Language Model for Recommendation

RLRF4Rec: Reinforcement Learning from Recsys Feedback for Enhanced Recommendation Reranking

LlamaRec: Two-Stage Recommendation using Large Language Models for Ranking

Large Language Model Can Interpret Latent Space of Sequential Recommender

TALLRec: An Effective and Efficient Tuning Framework to Align Large Language Model with Recommendation

Collaborative Large Language Model for Recommender Systems

Multimodal Large Language Models with Fusion Low Rank Adaptation for Device Directed Speech Detection

LLaRA: Aligning Large Language Models with Sequential Recommenders.

CALRec: Contrastive Alignment of Generative LLMs for Sequential Recommendation

ReLLa: Retrieval-enhanced Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation

LLMRec: Large Language Models with Graph Augmentation for Recommendation

LLaRA: Large Language-Recommendation Assistant

Large Language Models meet Collaborative Filtering: An Efficient All-round LLM-based Recommender System

A Multi-facet Paradigm to Bridge Large Language Model and Recommendation