Clustering Large Scale Data Set Based on Distributed Local Affinity Propagation on Spark

Wei Lu,Peng Cao
DOI: https://doi.org/10.14257/ijdta.2016.9.10.20
2016-01-01
International Journal of Database Theory and Application
Abstract:Affinity Propagation (AP) is a new clustering method to cluster data set efficiently. In this paper, a Distributed method of Local Affinity Propagation (DLAP) is proposed to solve hardware bottleneck and time-consuming problem. DLAP refines AP by reducing the calculating data scale in each iteration and keeps a high quality clustering result. The method is implemented on Apache Spark distributed computation framework. Depending on high iteration efficiency on Spark, the method has an impressive result in time complexity. Experiments are conducted on two-dimensional data to show that the time cost of LAP on single machine is better than the two methods, FSAP and FAP, meanwhile the result of DLAP on Spark is better than that on Hadoop.
What problem does this paper attempt to address?