Pseudo-Supervised Approach for Text Clustering Based on Consensus Analysis

Peixin Chen,Wu Guo,Lirong Dai,Zhenhua Ling
DOI: https://doi.org/10.1109/icassp.2018.8462376
2018-01-01
Abstract:In recent years, neural networks (NN) have achieved remarkable performance improvement in text classification due to their powerful ability to encode discriminative features by incorporating label information into model training. Inspired by the success of NN in text classification, we propose a pseudo-supervised neural network approach for text clustering. The neural network is trained in a supervised fashion with pseudo-labels, which are provided by the cluster labels of pre-clustering on unsupervised document representations. To enhance the quality of pseudo-labels, a consensus analysis is employed to select training samples for the neural network. The experimental results demonstrate that the proposed approach can improve the clustering performance significantly.
What problem does this paper attempt to address?