Online Hot Topic Discovery and Hotness Evaluation

Chunhui Deng,Huifang Deng,Yuxin Liu
DOI: https://doi.org/10.1145/3331453.3361319
2019-01-01
Abstract:In this paper, by analyzing the inadequacies of traditional TF-IDF(Term Frequency-Inverse Document Frequency) method and taking into account the factors of the location information, named entity and feature term burstiness, we put forward an improved weight calculation formula i.e., a new TF-IDF to update the feature term weight in real time. In this way, the accuracy of news representation model can be improved to some extent. Incremental k-means clustering based on time window and multi-center topic model is proposed to tackle topic center drift problem, reduce the error caused by inadequate topic model, and therefore, improve the clustering accuracy. At last, we defined an improved energy accumulation formula. And based on media attention, topic competition, topic burstiness magnitude and topic cohesiveness, we constructed a topic hotness evaluation model to quantify the topic hotness and therefore to better distinguish the hot topics from the cold topics. The experimental results demonstrated the effectiveness of our approaches and models.
What problem does this paper attempt to address?