An Improved Latent Dirichlet Allocation Model for Hot Topic Extraction

Guolong Liu,Xiaofei Xu,Ying Zhu,Li Li
DOI: https://doi.org/10.1109/BDCloud.2014.55
2014-01-01
Abstract:Micro blogging is fast becoming a dominant medium in social media and its impact is evident in our daily lives. A massive amount of information is produced on a daily basis. It is observed that detecting hot topics can be very helpful for people to get essential information quickly. But due to short and sparse features, high flood of meaningless tweets and other characteristics of micro blogs, traditional topic detection methods are unable to achieve a desirable level of performance. In this paper, we propose a multi-attribute latent dirichlet allocation (MA-LDA) model, a topic analysis model in which the time and tag attributes of micro blogs are incorporated into LDA model. By introducing a time variable about the time attribute, MA-LDA model can decide whether a word should appear in hot topics or not. Applying tag attribute allows MA-LDA model to rank the core words high in results so that the expressiveness of outcomes can be improved over the traditional LDA model. Empirical evaluation on real data sets demonstrate our method is able to detect hot topics accurately and efficiently with more terms associated with each hot topic found. Our study provides strong evidence of the importance of the temporal factor in hot topics extraction.
What problem does this paper attempt to address?