Provenance-based Indexing Support in Micro-blog Platforms
Junjie Yao,Bin Cui,Zijun Xue,Qingyun Liu
DOI: https://doi.org/10.1109/icde.2012.36
2012-01-01
Abstract:Recently, lots of micro-blog message sharing applications have emerged on the web. Users can publish short messages freely and get notified by the subscriptions instantly. Prominent examples include Twitter, Facebook's statuses, and Sina Weibo in China. The Micro-blog platform becomes a useful service for real time information creation and propagation. However, these messages' short length and dynamic characters have posed great challenges for effective content understanding. Additionally, the noise and fragments make it difficult to discover the temporal propagation trail to explore development of micro-blog messages. In this paper, we propose a provenance model to capture connections between micro-blog messages. Provenance refers to data origin identification and transformation logging, demonstrating of great value in recent database and workflow systems. To cope with the real time micro-message deluge, we utilize a novel message grouping approach to encode and maintain the provenance information. Furthermore, we adopt a summary index and several adaptive pruning strategies to implement efficient provenance updating. Based on the index, our provenance solution can support rich query retrieval and intuitive message tracking for effective message organization. Experiments conducted on a real dataset verify the effectiveness and efficiency of our approach. Provenance refers to data origin identification and transformation monitoring, which has been demonstrated of great value in database and workflow systems. In this paper, we propose a provenance model in micro-blog platforms, and design an indexing scheme to support provenance-based message discovery and maintenance, which can capture the interactions of messages for effective message organization. To cope with the real time micro-message tornadoes, we introduce a novel virtual annotation grouping approach to encode and maintain the provenance information. Furthermore, we design a summary index and adaptive pruning strategies to facilitate efficient message update. Based on this provenance index, our approach can support query and message tracking in micro-blog systems. Experiments conducted on real datasets verify the effectiveness and efficiency of our approach.