A Fused Multi-feature Based Co-training Approach for Document Clustering

Yuanqing Wang,Wenjun Wang,Weidi Dai,Pengfei Jiao,Wei Yu
DOI: https://doi.org/10.1109/ICISCE.2016.19
2016-01-01
Abstract:Document clustering is a popular topic in data mining and information retrieval. Most models and methods for this problem are based on computing the similarity between pair documents modeled in a space of all terms, or a new feature space obtained by applying a topic modeling technique for a given corpus. In this paper, we regard these two ideas as clustering on term feature and on semantic feature, and have an assumption that they can contribute to each other in clustering. Also, we propose a co-training approach for spectral clustering taking two features into account. Experiments on four real-world datasets show the feasibility and efficacy of our proposed approach compared with a number of the baseline methods.
What problem does this paper attempt to address?