PS-DBSCAN: An Efficient Parallel DBSCAN Algorithm Based on Platform Of AI (PAI)

Xu Hu,Jun Huang,Minghui Qiu,Cen Chen,Wei Chu
DOI: https://doi.org/10.48550/arXiv.1711.01034
2017-11-03
Abstract:We present PS-DBSCAN, a communication efficient parallel DBSCAN algorithm that combines the disjoint-set data structure and Parameter Server framework in Platform of AI (PAI). Since data points within the same cluster may be distributed over different workers which result in several disjoint-sets, merging them incurs large communication costs. In our algorithm, we employ a fast global union approach to union the disjoint-sets to alleviate the communication burden. Experiments over the datasets of different scales demonstrate that PS-DBSCAN outperforms the PDSDBSCAN with 2-10 times speedup on communication efficiency. We have released our PS-DBSCAN in an algorithm platform called Platform of AI (PAI - <a class="link-external link-https" href="https://pai.base.shuju.aliyun.com/" rel="external noopener nofollow">this https URL</a>) in Alibaba Cloud. We have also demonstrated how to use the method in PAI.
Databases,Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?