Neural Text Clustering with Document-Level Attention Based on Dynamic Soft Labels

Zhi Chen,Wu Guo,Li-Rong Dai,Zhen-Hua Ling,Jun Du
DOI: https://doi.org/10.21437/Interspeech.2019-1417
2019-01-01
Abstract:In this paper, the deep learning framework is applied in text clustering, an unsupervised task in natural language processing (NLP). Since there are no predefined labels available for text clustering, the deep neural network is trained in a pseudo-supervised fashion with labels generated from pre-clustering step. To address the wrong labelling problem from pre-clustering step, we adopt soft pseudo-labels instead of hard one-hot ones, and these labels are dynamically updated during training. Besides, we build a document-level attention over multiple documents based on dynamic soft pseudo-labels to further reduce the impact of the wrong labels. Experimental results on three public databases show that our model outperforms the state-of-the-art systems.
What problem does this paper attempt to address?