DACA: Distributed Adaptive Grid Decision Graph Based Clustering Algorithm

Jing He,Jun Zhou,Haoyu Wang,Li Cai
DOI: https://doi.org/10.1002/spe.3060
2022-01-01
Abstract:Clustering algorithms play a very important role in machine learning. With the development of big‐data artificial intelligence, distributed parallel algorithms have become an important research field. To reduce the computational complexity and running time of large‐scale datasets in the clustering process, this study proposes a distributed clustering algorithm DACA (distributed adaptive grid decision graph based clustering algorithm). In a distributed environment, DACA uses relative entropy to adaptively mesh the data to form an obvious sparse grid and dense grid. Then, the decision graph is used to determine the cluster center mesh object. Finally, the KD‐tree is used to accelerate the determination of the cluster center of sparse points to complete clustering. The algorithm is implemented using the popular Apache Spark computing framework, compared with other distributed clustering algorithms, DACA can adaptively divide the grid according to the data distribution to obtain better clustering effect. At the same time, KD tree algorithm is used to speed up the decision‐making of clustering center. Numerous experiments show that the DACA algorithm has excellent performance and accuracy on six standard datasets and real GPS trajectory datasets.
What problem does this paper attempt to address?