Efficient Storage Management for Social Network Events Based on Clustering and Hot/Cold Data Classification

Yulai Xie,Shuai Tong,Pan Zhou,Yuli Li,Dan Feng
DOI: https://doi.org/10.1109/tcss.2022.3146310
2022-01-01
IEEE Transactions on Computational Social Systems
Abstract:Social network events are related to the national economy and people's livelihood, so timely perception and processing of massive social network events data are becoming increasingly important for public opinion analysis. How to fully exploit the access feature of event information to store and manage social network events is of great significance for accurate and real-time query analysis. We propose a social network event storage management method based on microblog text clustering and hot/cold data classification. First, for the microblog text data, we construct a keyword provenance graph by using the information entropy to measure the weight of the edge between keyword nodes. Then, we cluster the events using the provenance-based community partition (PCP) with local modularity to improve the event clustering accuracy. In addition, we can further filter noisy data via incremental clustering, enable hot/cold event data classification and dynamic migration, and compress cold data to save space on a hybrid storage architecture. The experimental results show that the clustering purity can reach more than 93% and the query time can be reduced by more than 70% using clustering and hybrid storage policy.
What problem does this paper attempt to address?