Real-time Incremental Recommendation for Streaming Data Based on Apache Flink.

Zhuo Tang,Zeyu Liu,Kenli Li,Keqin Li
DOI: https://doi.org/10.3233/ida-184330
IF: 1.7
2019-01-01
Intelligent Data Analysis
Abstract:Collaborative filtering (CF), one of the most famous methods for building recommendation systems, recommends relevant items to users or predicting ratings of users' unknown items. Matrix factorization (MF) models are well-known model to deal with predicting the rating problem. However, the recommendation system based on matrix factorization is hard to keep up with the rapidly changing real-world data. When ratings on new users or new items come, the static model can not fit well on new data. As a consequence, if the current thing does not apply, the prediction accuracy will lose. In addition, it is a significant computation cost to rebuild the model on the whole data. To capture these changes, in this paper, we construct an online-and-offline Collaborative Filtering with a multi-method model to improve the traditional CF method, called Online SGD with Offline Knowledge (OSGDO for short). Besides, we propose a real-time incremental recommendation framework on Apache Flink, which is a scalable stream and batch data processing platform. Meanwhile, we implement our proposed method on our proposed framework. Our method proves to be good at online training when new observations arrive. And the results of experiments show that the dynamic training process we proposed is more efficient than rebuild the model on all the data. At the same time, our algorithm performs well in practice and can achieve impressive accuracy quickly when it is tested with the well-known data sets of MoviesLens and Netflix.
What problem does this paper attempt to address?