Parallel Image Scaling Density-based Clustering.

Wenhao Bi,An Zhang,Fei Gao
DOI: https://doi.org/10.1109/smc42975.2020.9282985
2020-01-01
Abstract:Clustering is one of the most important methods to discover the intrinsic grouping in a set of unlabeled data. As ways of getting data are more various and easier, the amount of data processed is increasing exponentially and the data is more likely to be located at different clients. Traditional clustering methods cannot process the large dataset one time due to the limit of memories. In this paper, an Image Scaling Density-based Clustering (ISDC) algorithm is proposed. ISDC can process data by a client alone as well as process in parallel by several clients to deal with data located at different clients. The ISDC algorithm does not need any parameters to be designated manually. The parameters are determined by the algorithm based on the statistical features of dataset. In Parallel ISDC or PISDC, each data block located at different client is clustered alone to form intermediate clusters. By border detection algorithm, representative clusters are formed by the points that are at the edge of intermediate clusters. Then, in global clustering, representative clusters from all clients are merged by the server. The border detection algorithm reduces the communication cost between clients and the server, as well as increases the efficiency of global clustering. At last, the server feeds back the clustering information to clients to complete clustering. Our experimental results verified the effectiveness and efficiency of PISDC and ISDC.
What problem does this paper attempt to address?