Distributed clustering algorithm based on ensemble learning

Genlin Ji,Xiaohan Ling,Ming Yang
DOI: https://doi.org/10.3321/j.issn:1001-0505.2007.04.008
2007-01-01
Abstract:A distributed clustering model based on ensemble learning is proposed. A typical distributed clustering scenario of the model is a 'two-stage' course, which firstly does clustering in local sites and then in global site. The local clustering results transmitted to server site form an ensemble and combining schemes of ensemble learning use the ensemble to generate global clustering results. The model converts distributed clustering into a combinatorial optimization problem. As an implementation for the model, a novel distributed K-means called DK-means is presented. DK-means firstly does clustering in each local site using K-means, then does clustering in global site which receives clustering results from local sites by K-means again. Despite the fact that data distribution varies in any local site, it always works well. Experimental results show that DK-means is effective and efficient. So it is also an empirical verification of validity to the model.
What problem does this paper attempt to address?