A diversifying hidden units method based on NMF for document representation

X. Jiang,H. Zhang,R. Liu,Y. Zuo
DOI: https://doi.org/10.1109/ICKEA.2016.7803001
2016-01-01
Abstract:Document modeling with hidden units as known as topics are very popular. Non-negative matrix factorization(NMF) is one of the most important techniques in document representation, which decomposes a document-term matrix into a document-topic matrix and a topic-term matrix. Since orthogonal constraint would limit terms occur only in one topic, we abandon this strong constraint. Furthermore, in order to represent documents in a certain number of topics with more semantic information, we add diversifying regularization and sparse constraint into NMF, which shows a great improvement in text classification and clustering. In the end, we draw the figure of topics similarities and display the top 20 weighted words in each topic to reveal that diversifying regularization can efficiently reduce the overlapping terms.
What problem does this paper attempt to address?