Clustering description learning: a comparative study

Chengzhi Zhang,Huilin Wang,Hongjiao Xu,Dan Wu
2009-01-01
Journal of Information and Computational Science
Abstract:Clustering description is one of the key issues in document clustering application. The traditional clustering algorithm can cluster the objects, but it can not give concept description for the clustered results. Document clustering description is to label the clustered results of document collection. It can help users determine whether the clusters are relevant to users' information requirements or not. Therefore, labeling a clustered set of documents is an important and challenging work in document clustering applications. To resolve the problem of the weak readability of document clustering results, a method of automatic labeling documents clusters based on machine learning, i.e. clustering description learning (CDL), is put forward. Experimental results show that the support vector model (SVM) model outperforms other machine learning methods such as multiple linear regression model in the task of clustering description learning. The factors affecting the SVM-based clustering description learning is also compared in this paper. Experimental results show that DF*ICF (Product of document frequency and inverse cluster frequency of descriptive phrase in current cluster) plays an important role in CDL and distinctiveness is a comparatively important requirement in the task of CDL. 1548-7741/ Copyright © 2009 Binary Information Press.
What problem does this paper attempt to address?