Abstract:The amount of data managed in today's Cloud systems has reached an unprecedented scale. In order to speed up query processing, an effective mechanism is to build indexes on attributes that are used in query predicates. However, conventional indexing schemes fail to provide a scalable service: as the size of these indexes are proportional to the data size, it is not space efficient to build many indexes. As such, it becomes more crucial to develop effective index to provide scalable database services in the Cloud. In this paper, we propose a compact bitmap indexing scheme for a large-scale data store. The bitmap indexing scheme combines state-of-the-art bitmap compression techniques, such as WAH encoding and bit-sliced encoding. To further reduce the index cost, a novel and query efficient partial indexing technique is adopted, which dynamically refreshes the index to handle updates and process queries. The intuition of our indexing approach is to maximize the number of indexed attributes, so that a wider range of queries, including range and join queries, can be efficiently supported. Our indexing scheme is light-weight and its creation can be seamlessly grafted onto the MapReduce processing engine without incurring significant running cost. Moreover, the compactness allows us to maintain the bitmap indexes in memory so that performance overhead of index access is minimal. We implement our indexing scheme on top of the underlying Distributed File System (DFS) and evaluate its performance on an in-house cluster. We compare our index-based query processing with HadoopDB to show its superior performance. Our experimental results confirm the effectiveness, efficiency and scalability of the indexing scheme.

HAP: an Efficient Hamming Space Index Based on Augmented Pigeonhole Principle

Binary Code Reranking Method with Weighted Hamming Distance

Generalizing the Pigeonhole Principle for Similarity Search in Hamming Space

GPH: Similarity Search in Hamming Space.

Data driven multi-index hashing

Deep Cauchy Hashing For Hamming Space Retrieval

An Efficient and Compact Indexing Scheme for Large-Scale Data Store.

Supervised Discrete Hashing for Hamming Space Retrieval

A Distance-Computation-Free Search Scheme for Binary Code Databases

Accelerating Search on Binary Codes in Weighted Hamming Space

Efficient Nearest Neighbor Search in High Dimensional Hamming Space

Lost in Binarization

Query-Adaptive Image Search with Hash Codes.

Fast kNN Search in Weighted Hamming Space With Multiple Tables

ProMIPS: Efficient High-Dimensional C-Approximate Maximum Inner Product Search with a Lightweight Index

Lost in binarization: query-adaptive ranking for similar image search with compact codes.

Maximum-Margin Hamming Hashing

Double-Bit Quantization and Index Hashing for Nearest Neighbor Search

Component hashing of variable-length binary aggregated descriptors for fast image search

Hierarchical indexing scheme for fast search in a large-scale image database

Fast Cosine Similarity Search in Binary Space with Angular Multi-Index Hashing