Online Belief Propagation Algorithm for Probabilistic Latent Semantic Analysis.
Yun Ye,Shengrong Gong,Chunping Liu,Jia Zeng,Ning Jia,Yi Zhang
DOI: https://doi.org/10.1007/s11704-013-2360-7
IF: 2.6688
2013-01-01
Frontiers of Computer Science
Abstract:Probabilistic latent semantic analysis (PLSA) is a topic model for text documents, which has been widely used in text mining, computer vision, computational biology and so on. For batch PLSA inference algorithms, the required memory size grows linearly with the data size, and handling massive data streams is very difficult. To process big data streams, we propose an online belief propagation (OBP) algorithm based on the improved factor graph representation for PLSA. The factor graph of PLSA facilitates the classic belief propagation (BP) algorithm. Furthermore, OBP splits the data stream into a set of small segments, and uses the estimated parameters of previous segments to calculate the gradient descent of the current segment. Because OBP removes each segment from memory after processing, it is memory-efficient for big data streams. We examine the performance of OBP on four document data sets, and demonstrate that OBP is competitive in both speed and accuracy for online expectation maximization (OEM) in PLSA, and can also give a more accurate topic evolution. Experiments on massive data streams from Baidu further confirm the effectiveness of the OBP algorithm.