PPLSA: Parallel Probabilistic Latent Semantic Analysis Based on MapReduce.

Ning Li,Fuzhen Zhuang,Qing He,Zhongzhi Shi
DOI: https://doi.org/10.1007/978-3-642-32891-6_8
2012-01-01
Abstract:PLSA(Probabilistic Latent Semantic Analysis) is a popular topic modeling technique for exploring document collections. Due to the increasing prevalence of large datasets, there is a need to improve the scalability of computation in PLSA. In this paper, we propose a parallel PLSA algorithm called PPLSA to accommodate large corpus collections in the MapReduce framework. Our solution efficiently distributes computation and is relatively simple to implement.
What problem does this paper attempt to address?