Pipeline Item-Based Collaborative Filtering Based on Mapreduce

Zhi-Lin Zhao,Chang-Dong Wang,Yuan-Yu Wan,Zi-Wei Huang,Jian-Huang Lai
DOI: https://doi.org/10.1109/bdcloud.2015.15
2015-01-01
Abstract:As we all know, it is an era of information explosion, in which we always get huge amounts of information. Therefore, it is in urgent need of picking out the useful and interesting information quickly. In order to solve this serious problem, recommendation system arises at the historic moment. Among the existing recommendation algorithms, the item-based collaborative filtering recommendation algorithm is the most widely used one. Its principle is based on the user's evaluation of items. The purpose is to find the similarity between users, and recommend items to the target user according to the records of the similar users. However, the number of customers and products keeps increasing at a high rate, which increases the cost to find out the recommendation list for each user. The efficiency of a single common computer will not satisfy the requirement and the super computer will cost too much. In order to solve the problem, we propose to use MapReduce to implement the recommendation system. Besides, we distribute the job to some computer clusters and the input file of the current computer cluster only relies on the previous one or the origin input. So the pipeline technology will be adopted to improve the efficiency further. The experiment shows that the method can merge the ability of some common PC to process large-scale data in a short time.
What problem does this paper attempt to address?