Enhancing E-Commerce Recommendation using Pre-Trained Language Model and Fine-Tuning

Nuofan Xu,Chenhui Hu
DOI: https://doi.org/10.48550/arXiv.2302.04443
2023-02-09
Abstract:Pretrained Language Models (PLM) have been greatly successful on a board range of natural language processing (NLP) tasks. However, it has just started being applied to the domain of recommendation systems. Traditional recommendation algorithms failed to incorporate the rich textual information in e-commerce datasets, which hinderss the performance of those models. We present a thorough investigation on the effect of various strategy of incorporating PLMs into traditional recommender algorithms on one of the e-commerce datasets, and we compare the results with vanilla recommender baseline models. We show that the application of PLMs and domain specific fine-tuning lead to an increase on the predictive capability of combined models. These results accentuate the importance of utilizing textual information in the context of e-commerce, and provides insight on how to better apply PLMs alongside traditional recommender system algorithms. The code used in this paper is available on Github: <a class="link-external link-https" href="https://github.com/NuofanXu/bert_retail_recommender" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Computation and Language,Information Retrieval
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is that traditional recommendation algorithms in e - commerce recommendation systems fail to effectively utilize rich text information. Specifically, when dealing with e - commerce datasets, traditional recommendation algorithms have difficulty integrating text information such as product descriptions and user reviews into the recommendation model effectively, which limits the performance of the recommendation system. By introducing pre - trained language models (such as RoBERTa) and performing fine - tuning in specific domains, the paper aims to enhance the predictive ability of the recommendation system and improve the recommendation effect. ### Background and Motivation of the Paper With the rapid development of Internet technology, the problem of information overload faced by users is becoming increasingly serious, which affects the quality and efficiency of users' decision - making. To meet this challenge, researchers have developed various recommendation systems to generate customized information according to users' preferences. However, in the field of e - commerce, recommendation systems face some unique challenges: 1. **Unclear User Requirements**: Potential customers are often not clear about what they want to buy and it is difficult to accurately express their needs through limited attribute classifications. 2. **Under - utilization of Text Information**: Traditional recommendation algorithms mainly rely on user - item interaction data and cannot effectively utilize text information such as product descriptions and user reviews, resulting in limited recommendation effects. 3. **Lack of Negative Feedback**: In the online retail environment, the fact that a user does not purchase a certain product does not necessarily mean that they do not like the product, which makes it difficult for the recommendation system to accurately capture the actual preferences of users. ### Solutions The paper proposes a method that combines pre - trained language models (PLM) and traditional recommendation algorithms. The specific steps are as follows: 1. **Selection and Fine - tuning of Pre - trained Language Models**: - Select a high - performance pre - trained language model (such as RoBERTa). - Fine - tune the model in a specific domain to better understand the text information in e - commerce. 2. **Improvement of Recommendation Algorithms**: - **Matrix Factorization**: By introducing sentence embeddings, enhance the model's utilization of text information. - **XGBoost**: Use Random Forest for feature selection to reduce the influence of noise and irrelevant features and improve the recommendation effect of the XGBoost model. 3. **Experimental Design**: - Use the Online Retail dataset as the main research object. - Conduct experiments through multiple variant models (such as explicit recommendation, implicit recommendation, low - rank models, etc.) to evaluate the effects of different methods. ### Main Contributions 1. **Comprehensive Research**: Explore how to use rich text information in the e - commerce field to improve the performance of recommendation algorithms. 2. **Detailed Evaluation**: Conduct detailed performance evaluations and visual analysis on traditional recommendation algorithms and BERT - based recommendation models. 3. **Domain - Adaptive Pre - training**: Study the influence of different pre - training strategies (such as Task - Adaptive Pre - training TAPT and Domain - Adaptive Pre - training DAPT) on the recommendation effect. ### Experimental Results The experimental results show that combining pre - trained language models (such as RoBERTa) and traditional recommendation algorithms can significantly improve the performance of the recommendation system. Whether in the matrix factorization or XGBoost model, the introduction of sentence embeddings can consistently improve the model's predictive ability. In addition, the combined use of domain - adaptive pre - training and task - adaptive pre - training further improves the recommendation accuracy. ### Conclusion By introducing pre - trained language models and performing fine - tuning in specific domains, the paper successfully solves the problem that traditional algorithms in e - commerce recommendation systems fail to effectively utilize text information, providing new ideas and methods for improving the performance of recommendation systems.