Abstract:Real-time search dictates that new contents be made available for search immediately following their creation. From the database perspective, this requirement may be quite easily met by creating an up-to-date index for the contents and measuring search quality by the time gap between insertion time and availability of the index. This approach, however, poses new challenges for micro-blogging systems where thousands of concurrent users may upload their micro-blogs or tweets simultaneously. Due to the high update and query loads, conventional approaches would either fail to index the huge amount of newly created contents in real time or fall short of providing a scalable indexing service. In this paper, we propose a tweet index called the TI (Tweet Index), an adaptive indexing scheme for microblogging systems such as Twitter. The intuition of the TI is to index the tweets that may appear as a search result with high probability and delay indexing some other tweets. This strategy significantly reduces the indexing cost without compromising the quality of the search results. In the TI, we also devise a new ranking scheme by combining the relationship between the users and tweets. We group tweets into topics and update the ranking of a topic dynamically. The experiments on a real Twitter dataset confirm the efficiency of the TI.

Exploring Tweets Normalization and Query Time Sensitivity for Twitter Search

Microblog Search and Filtering with Real-Time Dynamics Based on BM25

Microblog Search and Filtering with Time Sensitive Feedback and Thresholding Bsed on BM25.

Microblog Track 2011 of FDU.

Compact Indexing and Judicious Searching for Billion-Scale Microblog Retrieval.

QCRI at TREC 2013 Microblog Track.

ICTNET at Microblog Track in TREC 2014.

Learning to Rank Microblog Posts for Real-Time Ad-Hoc Search.

QCRI at TREC 2014: Applying the KISS Principle for the TTG Task in the Microblog Track.

Real-Time Summarization of Twitter

Processing Long Queries Against Short Text

TI: an efficient indexing mechanism for real-time search on tweets.

Real-Time Search over a Microblogging System

TAKer: Fine-Grained Time-Aware Microblog Search with Kernel Density Estimation.

Automatic Query Optimization for Retrieving Traffic Tweets

Towards A Quality-Oriented Real-Time Web Crawler

Real-time Targeted Influence Maximization for Online Advertisements

Leveraging Tweet Ranking in an Optimization Frameworkfor Tweet Timeline Generation.

Real-time Filtering on Interest Profiles in Twitter Stream

Leveraging Tweet Ranking In An Optimization Framework For Tweet Timeline Generation