Toward Improving the Prediction Accuracy of Product Recommendation System Using Extreme Gradient Boosting and Encoding Approaches

Zeinab Shahbazi,Debapriya Hazra,Sejoon Park,Yung Cheol Byun
DOI: https://doi.org/10.3390/sym12091566
2020-09-22
Symmetry
Abstract:With the spread of COVID-19, the “untact” culture in South Korea is expanding and customers are increasingly seeking for online services. A recommendation system serves as a decision-making indicator that helps users by suggesting items to be purchased in the future by exploring the symmetry between multiple user activity characteristics. A plethora of approaches are employed by the scientific community to design recommendation systems, including collaborative filtering, stereotyping, and content-based filtering, etc. The current paradigm of recommendation systems favors collaborative filtering due to its significant potential to closely capture the interest of a user as compared to other approaches. The collaborative filtering harnesses features like user-profile details, visited pages, and click information to determine the interest of a user, thereby recommending the items that are related to the user’s interest. The existing collaborative filtering approaches exploit implicit and explicit features and report either good classification or prediction outcome. These systems fail to exhibit good results for both measures at the same time. We believe that avoiding the recommendation of those items that have already been purchased could contribute to overcoming the said issue. In this study, we present a collaborative filtering-based algorithm to tackle big data of user with symmetric purchasing order and repetitive purchased products. The proposed algorithm relies on combining extreme gradient boosting machine learning architecture with word2vec mechanism to explore the purchased products based on the click patterns of users. Our algorithm improves the accuracy of predicting the relevant products to be recommended to the customers that are likely to be bought. The results are evaluated on the dataset that contains click-based features of users from an online shopping mall in Jeju Island, South Korea. We have evaluated Mean Absolute Error, Mean Square Error, and Root Mean Square Error for our proposed methodology and also other machine learning algorithms. Our proposed model generated the least error rate and enhanced the prediction accuracy of the recommendation system compared to other traditional approaches.
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problems in the prediction and classification accuracy of recommendation systems. Specifically, although the existing Collaborative Filtering (CF) methods can well capture users' interests, they perform poorly in simultaneously improving prediction and classification accuracy. In addition, existing systems usually recommend products that users have already purchased, which reduces the effectiveness of the recommendation and the user experience. To solve these problems, the author proposes a collaborative filtering algorithm based on Extreme Gradient Boosting (XGBoost) and encoding techniques. The main objectives of this algorithm are: 1. **Avoid recommending purchased products**: By avoiding recommending products that users have already purchased, improve the relevance and accuracy of the recommendation system. 2. **Combine XGBoost and Word2vec**: Use the XGBoost machine - learning architecture and Word2vec mechanism to explore purchased products based on users' click patterns, thereby improving prediction accuracy. 3. **Handle large - data sets**: Optimize for large - data sets where users have symmetric purchase orders and repeat purchases of products to improve the performance of the recommendation system. ### Method overview - **Data collection and pre - processing**: Collect users' click histories and purchase records from an online shopping mall in Jeju Island, and perform data cleaning and pre - processing. - **Feature extraction**: Extract features such as user IP, click information, visit date, time, page, product name, type, and ID. - **Model construction**: - Use Word2vec to encode data and generate a vector space. - Apply the XGBoost algorithm for prediction and classification to evaluate products that users may purchase. - **Evaluation metrics**: Use evaluation metrics such as Mean Absolute Error (MAE), Mean Square Error (MSE), and Root Mean Square Error (RMSE) to evaluate the performance of the model. ### Main contributions - Proposed a collaborative filtering algorithm that combines XGBoost and Word2vec, which improves the prediction and classification accuracy of the recommendation system. - Avoided recommending products that users have already purchased, which improved the relevance of the recommendation. - Verified on an actual online shopping data set, and the results showed that the proposed model has a lower error rate and higher prediction accuracy than traditional methods. Through these improvements, this research aims to provide a more effective recommendation system to help users better discover products of interest and at the same time improve the sales efficiency of merchants.