Self-organizing Weighted Incremental Probabilistic Latent Semantic Analysis
Ning Li,Wenjuan Luo,Kun Yang,Fuzhen Zhuang,Qing He,Zhongzhi Shi
DOI: https://doi.org/10.1007/s13042-017-0681-9
2017-01-01
International Journal of Machine Learning and Cybernetics
Abstract:PLSA (Probabilistic Latent Semantic Analysis) is a popular topic modeling technique which has been widely applied to text mining applications to discover the underlying topics embedded in the data corpus. However, due to the variability of increasing data, it is necessary to discover the dynamic topics and process the large dataset incrementally. Moreover, PLSA models suffer from the problem of inferencing new documents. To overcome these problems, in this paper, we propose a novel Weighted Incremental PLSA algorithm called WIPLSA to dynamically discover topics and incrementally learn the topics from new documents. The experiments verify that the proposed WIPLSA could capture the dynamic topics hidden in the dynamic updating data corpus. Compared with PLSA, MAP PLSA and QB PLSA, WIPLSA performs better in perspexity on large dataset, which make it applicable for big data mining. In addition, WIPLSA has good performance in the application of document categorization.