Performance Analysis of Clustering Algorithm under Two Kinds of Big Data Architecture.

Beibei Li,Bo Liu,Weiwei Lin,Ying Zhang
DOI: https://doi.org/10.3233/jhs-170556
2017-01-01
Journal of High Speed Networks
Abstract:To compare the performance of the clustering algorithm on two data processing architectures, the implementations of k-means clustering algorithm on two big data architectures are given at first in this paper. Then we focus on the differences of theoretical performance of k-means algorithm on two architectures from the mathematical point of view. The theoretical analysis shows that Spark architecture is superior to the Hadoop in aspects of the average execution time and I/O time. Finally, a text data set of social networking site of users' behaviors is employed to conduct algorithm experiments. The results show that Spark is significantly less than MapReduce in aspects of the execution time and I/O time based on k-means algorithm. The theoretical analysis and the implementation technology of the big data algorithm proposed in this paper are a good reference for the application of big data technology.
What problem does this paper attempt to address?