Abstract:The amount of data managed in today's Cloud systems has reached an unprecedented scale. In order to speed up query processing, an effective mechanism is to build indexes on attributes that are used in query predicates. However, conventional indexing schemes fail to provide a scalable service: as the size of these indexes are proportional to the data size, it is not space efficient to build many indexes. As such, it becomes more crucial to develop effective index to provide scalable database services in the Cloud. In this paper, we propose a compact bitmap indexing scheme for a large-scale data store. The bitmap indexing scheme combines state-of-the-art bitmap compression techniques, such as WAH encoding and bit-sliced encoding. To further reduce the index cost, a novel and query efficient partial indexing technique is adopted, which dynamically refreshes the index to handle updates and process queries. The intuition of our indexing approach is to maximize the number of indexed attributes, so that a wider range of queries, including range and join queries, can be efficiently supported. Our indexing scheme is light-weight and its creation can be seamlessly grafted onto the MapReduce processing engine without incurring significant running cost. Moreover, the compactness allows us to maintain the bitmap indexes in memory so that performance overhead of index access is minimal. We implement our indexing scheme on top of the underlying Distributed File System (DFS) and evaluate its performance on an in-house cluster. We compare our index-based query processing with HadoopDB to show its superior performance. Our experimental results confirm the effectiveness, efficiency and scalability of the indexing scheme.

Compact Indexing and Judicious Searching for Billion-Scale Microblog Retrieval.

TI: an efficient indexing mechanism for real-time search on tweets.

Real-Time Search over a Microblogging System

Scalable Top-K Spatial Keyword Search

Learning to Rank Microblog Posts for Real-Time Ad-Hoc Search.

A Content-Based Intelligent Ranking Model for Microblog

Finding Influential Local Users with Similar Interest from Geo-Tagged Social Media Data.

Processing Long Queries Against Short Text

Large scale microblog mining using distributed MB-LDA.

Exploring Tweets Normalization and Query Time Sensitivity for Twitter Search

An Efficient and Compact Indexing Scheme for Large-Scale Data Store.

Microblog Track 2011 of FDU.

Real Time Personalized Search on Social Networks

A tweet-centric approach for topic-specific author ranking in micro-blog

An Uncertainty-Aware Approach for Exploratory Microblog Retrieval

Microblog Search and Filtering with Real-Time Dynamics Based on BM25

An Efficient Publish/Subscribe Index for E-Commerce Databases

QCRI at TREC 2013 Microblog Track.

Real-time and Personalized Search over a Microblogging System.

Discerning Influence Patterns with Beta-Poisson Factorization in Microblogging Environments

The grand information flows in micro-blog