SAKU: A distributed system for data analysis in large-scale dataset based on cloud computing

Lei Qin,Bin Wu,Qing Ke,Yuxiao Dong
DOI: https://doi.org/10.1109/FSKD.2011.6019711
2011-01-01
Abstract:Data analysis has been widely used in the enterprises for its high efficiency and accuracy, especially in the field of telecommunication industry, such as User Behavior Analysis, Customer Churn Prediction, etc. However, as the exponential growth of data, traditional data analysis tools can not handle such large-scale dataset. Furthermore, as business gets more and more complicated, there is much more requirement for integration of different data analysis tools. On the other hand, traditional analysis tools lack of visualization, which makes the result hard to understand. We propose a distributed system named SAKU, which resolves those problems. In this paper, we implement some algorithms using mapreduce framework in order to process large-scale data. We also discuss every part of the system. Furthermore, we come up with a new report framework based on cloud computing for visualization of largescale data. The most important thing is, we apply this system into a scenario which meets real-world requirements by using a large volume of data obtained from the telecom operators, which demonstrates high efficiency and scalability of the system.
What problem does this paper attempt to address?