CluCF: a Clustering CF Algorithm to Address Data Sparsity Problem
Chengyuan Yu,Linpeng Huang
DOI: https://doi.org/10.1007/s11761-016-0191-8
2016-01-01
Service Oriented Computing and Applications
Abstract:In QoS-based Web service recommendation, predicting Quality of Service (QoS) for users will greatly aid service selection and discovery. Collaborative filtering (CF) is an effective method for Web service selection and recommendation. Data sparsity is an important challenges for CF algorithms. Although model-based algorithms can address the data sparsity problem, those models are often time-consuming to build and update. Thus, these CF algorithms aren't fit for highly dynamic and large-scale environments, such as Web service recommendation systems. In order to overcome this drawback, this paper proposes a novel approach CluCF, which employs user clusters and service clusters to address the data sparsity problem and classifies the new user (the new service) by location factor to lower the time complexity of updating clusters. Additionally, in order to improve the prediction accuracy, CluCF employs time factor. Time-aware user-service matrix Mu;s(tk, d) is introduced, and the time-aware similarity measurement and time-aware QoS prediction are employed in this paper. Since the QoS performance of Web services is highly related to invocation time due to some time-varying factors (e.g., service status, network condition), time-aware similarity measurement and time-aware QoS prediction are more trustworthy than traditional similarity measurement and QoS prediction, respectively. Since similarity measurement and QoS prediction are two key steps of neighborhood-based CF, time-aware CF will be more accurate than traditional CF. Moreover, our approach systematically combines user-based and item-based methods and employs influence weights to balance these two predicted values, automatically. To validate our algorithm, this paper conducts a series of large-scale experiments based on a real-world Web service QoS dataset. Experimental results show that our approach is capable of alleviating the data sparsity problem.