Probabilistic Text Categorization using Sparse Topical Encoding

Xipeng Qiuy,Xuanjing Huang,Lide Wu
2009-01-01
Abstract:In this paper, we propose a topic-based probabilistic text categorization model, which can be decomposed into two steps. Firstly, we present sparse non-negative matrix factorization (SNMF) algo- rithm, which can extract the sparse topical encoding for documents automatically and has an intuitive interpretation. Secondly, we calculate the similarity between documents by integrating the probabilistic similarities of their topics, we can assign the dierent weight to each topic by additive logistic regression.
What problem does this paper attempt to address?