An NMF-framework for Unifying Posterior Probabilistic Clustering and Probabilistic Latent Semantic Indexing

Zhong-Yuan Zhang,Tao Li,Chris Ding,Jie Tang
DOI: https://doi.org/10.1080/03610926.2012.714034
2014-01-01
Abstract:In document clustering, a document may be assigned to multiple clusters and the probabilities of a document belonging to different clusters are directly normalized. We propose a new Posterior Probabilistic Clustering (PPC) model that has this normalization property. The clustering model is based on Nonnegative Matrix Factorization (NMF) and flexible such that if we use class conditional probability normalization, the model reduces to Probabilistic Latent Semantic Indexing (PLSI). Systematic comparison and evaluation indicates that PPC is competitive with other state-of-art clustering methods. Furthermore, the results of PPC are more sparse and orthogonal, both of which are highly desirable.
What problem does this paper attempt to address?