A boosted clustering algorithm for distributed homogeneous data mining

Li Chengan,Wu Tiejun
DOI: https://doi.org/10.1109/WCICA.2006.1714221
2006-01-01
Abstract:A new distributed clustering algorithm based on boosting techniques is present to efficiently integrate multiple partitions constructed over very large and distributed homogeneous databases that cannot be merged at a single location. In the proposed method, the individual clustering solutions are first produced from disjoint datasets at each boosting round and then the cluster prototypes rather than matrices of partitions are transferred to a site to generate a global cluster prototype which is broadcasted to all distributed sites and used to partition data in each site. Finally, all the individual solutions are combined into a weighted voting ensemble on each disjoint data set. Experimental results demonstrate that the proposed distributed clustering method can effectively achieve clustering accuracy comparable to or slightly better than the algorithms in which boosting techniques are applied to the centralized data. In addition, communication cost of the proposed algorithm is very small. © 2006 IEEE.
What problem does this paper attempt to address?