Distributed Hash table-based distributed subspace clustering algorithm

曲琳,周凡,田翔,陈耀武
DOI: https://doi.org/10.3785/j.issn.1008-973X.2010.02.003
2010-01-01
Abstract:A distributed subspace clustering (DISCLUS) algorithm based on distributed Hash table(DHT) was proposed. Each node executed subspace clustering on its local data. Then the clustering results of nodes were combined to form the final clustering results of the distributed system. The dataset reducing and pruning schemes were proposed to optimize the communication between nodes according to the speciality of subspace clustering. A DHT-based distributed voting (DDV) algorithm was proposed to combine the clustering results of nodes. The algorithm used the topology of the underlying overlay to hierarchically collect the voting information. All the nodes in the system can be covered without redundancy. The theoretical and experimental results show that the clustering error and the communication cost of DISCLUS algorithm are scalable to the dataset, nodes and dimensionality.
What problem does this paper attempt to address?