A Neural Matrix Decomposition Recommender System Model based on the Multimodal Large Language Model

Ao Xiang,Bingjie Huang,Xinyu Guo,Haowei Yang,Tianyao Zheng
2024-07-12
Abstract:Recommendation systems have become an important solution to information search problems. This article proposes a neural matrix factorization recommendation system model based on the multimodal large language model called BoNMF. This model combines BoBERTa's powerful capabilities in natural language processing, ViT in computer in vision, and neural matrix decomposition technology. By capturing the potential characteristics of users and items, and after interacting with a low-dimensional matrix composed of user and item IDs, the neural network outputs the results. recommend. Cold start and ablation experimental results show that the BoNMF model exhibits excellent performance on large public data sets and significantly improves the accuracy of recommendations.
Information Retrieval,Artificial Intelligence
What problem does this paper attempt to address?
The paper aims to address the issue of information filtering in recommendation systems, particularly the challenges of accuracy with multimodal data in a big data environment, as well as the problems of cold start, sparsity, and scalability in traditional recommendation systems (such as collaborative filtering). The authors propose a neural matrix factorization recommendation system model based on a large multimodal language model (BoNMF). This model combines the powerful capabilities of BoBERTa in natural language processing, the advantages of ViT in the field of computer vision, and neural matrix factorization technology to improve the accuracy and personalization level of recommendation systems. Specifically, the BoNMF model addresses the aforementioned issues in the following ways: 1. **Utilizing BoBERTa and ViT to extract text and image features**: High-dimensional feature vectors are extracted from relevant textual information of users and items using BoBERTa, while ViT is used to extract relevant dimensional feature vectors from item images. 2. **Fusing multimodal information**: User IDs and item IDs are mapped to a low-dimensional embedding layer and fused with the feature vectors extracted by BoBERTa and ViT to form comprehensive feature representations of users and items. 3. **Addressing the cold start problem**: The prior knowledge of large pre-trained models (such as BoBERTa) is used to alleviate the issue of new users or new items lacking historical interaction data. 4. **Improving recommendation accuracy**: Neural networks are used to capture the interaction relationships between users and items and output predicted scores, thereby improving the relevance and ranking quality of recommendation results. In summary, this research aims to develop a new method that can effectively address the challenges of modern recommendation scenarios by integrating multimodal data and deep learning technologies.