Bursty Event Detection from Microblog: a Distributed and Incremental Approach

Jianxin Li,Jianfeng Wen,Zhenying Tai,Richong Zhang,Weiren Yu
DOI: https://doi.org/10.1002/cpe.3657
2015-01-01
Concurrency and Computation Practice and Experience
Abstract:SummaryAs a new form of social media, microblogs (e.g., Twitter and Weibo) are playing an important role in people's daily life. With the rise in popularity and size of microblogs, there is a need for distributed approaches that can detect bursty event with low latency from the short‐text data stream. In this paper, we propose a distributed and incremental temporal topic model for microblogs called Bursty Event dEtection (BEE+). BEE+ is able to detect bursty events from short‐text dataset and model the temporal information. And BEE+ processes the post‐stream incrementally to track the topic drifting of events over time. Therefore, the latent semantic indices are preserved from one time period to the next. In order to achieve real‐time processing, we design a distributed execution framework based on Spark engine. To verify its ability to detect bursty event, we conduct experiments on a Weibo dataset of 6,360,125 posts. The results show that BEE+ can outperform the baselines for detecting the meaningful bursty events and track the topic drifting. Copyright © 2015 John Wiley & Sons, Ltd.
What problem does this paper attempt to address?