A new approach to generate diversified clusters for small data sets

Chun-Cheng Peng,Cheng-Jung Tsai,Ting-Yi Chang,Jen-Yuan Yeh,Po-Wei Hua
DOI: https://doi.org/10.1016/j.asoc.2020.106564
IF: 8.7
2020-10-01
Applied Soft Computing
Abstract:<p>Clustering is a common data mining technique whose main principle states that the samples within a cluster are similar to one another and dissimilar to those in other clusters. This means that samples in the same cluster possess high homogeneity, while different clusters possess high heterogeneity. However, a user may require a result of diversified clustering. Compared to traditional clustering methods, the aim of diversified clustering is to make samples of the same cluster possess high heterogeneity, and different clusters possess high homogeneity. Diversified clustering can be practically applied to aspects of our daily lives such as normal class grouping, student grouping in learning, cluster sampling, balanced diets and assignment of jobs. Nevertheless, our survey of related papers in the research field of data mining found that there has been no proposed research for diversified clustering. In this paper, we formal define the problem of diversified clustering and propose a new method to solve this problem. Experimental results showed that our method can generate good diversified clustering. However, our method is currently only appropriate for small data sets since the execution time of our method increases quickly as the number of diversified clusters increases. We also hope this paper will garner interest in more research on effective methods to generate diversified clusters for use in data mining.</p>
computer science, artificial intelligence, interdisciplinary applications
What problem does this paper attempt to address?