DBSCAN Optimization Algorithm Based on KD-tree Partitioning inCloud Computing

景维鹏,程逸群,陈广胜
DOI: https://doi.org/10.3969/j.issn.1000-3428.2017.04.004
2017-01-01
Abstract:The parallel RDD-DBSCAN algorithm has a repeated access to the data set in the data partition and region query steps,which reduces the efficiency of the algorithm.Aiming at the above problems,a parallel DBSCAN algorithm based on data partitioning and fusion stragy(DBSCAN-PSM) is proposed.It imports the KD-tree to partition the data,merges the partition and region query steps,reduces the number of access to the data set and decreases the influence of I/O on the algorithm.Data fusion method is realized by determining the clustering characteristics of the spatial boundary points,which avoids the time overhead of global markup.Experimental results show that DBSCAN-PSM algorithm runs faster than RDD-DBSCAN by 18%.It can deal with mass data clustering problem more effectively.
What problem does this paper attempt to address?