LiveIndex: A Distributed Online Index System for Temporal Microblog Data
Haifei Huang,Jianxin Li,Richong Zhang,Weiren Yu,Wuyang Ju
DOI: https://doi.org/10.1109/hpcc-css-icess.2015.70
2015-01-01
Abstract:Billions of microblogs are generated from many social medias such as Twitter and Weibo. How to make new microblogs available to the search engine immediately is a critical and challenging problem. Most existing studies generally put all terms' posting list together when building index, which leads to low index update performance and high query latency. In addition, time is a key feature of microblogs, and most applications including event detection only need most recent data. In this paper, we design a distributed online index system for temporal microblog data, named LiveIndex, which can significantly reduce the time cost of queries with specific time range, such as queries in event tracing. Firstly, our index is organized as Time Range Partitions to reduce update cost. Secondly, In every partition, a hash table is used to map each term's posting list to corresponding sub-partition. Finally, to further reduce the index cost, we adopt an index chain to merge terms with the same posting list. The experiments on the real dataset demonstrate the effectiveness and efficiency of our proposed method.