TI: an efficient indexing mechanism for real-time search on tweets.

Chun Chen,Feng Li,Beng Chin Ooi,Sai Wu
DOI: https://doi.org/10.1145/1989323.1989391
2011-01-01
Abstract:Real-time search dictates that new contents be made available for search immediately following their creation. From the database perspective, this requirement may be quite easily met by creating an up-to-date index for the contents and measuring search quality by the time gap between insertion time and availability of the index. This approach, however, poses new challenges for micro-blogging systems where thousands of concurrent users may upload their micro-blogs or tweets simultaneously. Due to the high update and query loads, conventional approaches would either fail to index the huge amount of newly created contents in real time or fall short of providing a scalable indexing service. In this paper, we propose a tweet index called the TI (Tweet Index), an adaptive indexing scheme for microblogging systems such as Twitter. The intuition of the TI is to index the tweets that may appear as a search result with high probability and delay indexing some other tweets. This strategy significantly reduces the indexing cost without compromising the quality of the search results. In the TI, we also devise a new ranking scheme by combining the relationship between the users and tweets. We group tweets into topics and update the ranking of a topic dynamically. The experiments on a real Twitter dataset confirm the efficiency of the TI.
What problem does this paper attempt to address?