Abstract:The amount of data managed in today's Cloud systems has reached an unprecedented scale. In order to speed up query processing, an effective mechanism is to build indexes on attributes that are used in query predicates. However, conventional indexing schemes fail to provide a scalable service: as the size of these indexes are proportional to the data size, it is not space efficient to build many indexes. As such, it becomes more crucial to develop effective index to provide scalable database services in the Cloud. In this paper, we propose a compact bitmap indexing scheme for a large-scale data store. The bitmap indexing scheme combines state-of-the-art bitmap compression techniques, such as WAH encoding and bit-sliced encoding. To further reduce the index cost, a novel and query efficient partial indexing technique is adopted, which dynamically refreshes the index to handle updates and process queries. The intuition of our indexing approach is to maximize the number of indexed attributes, so that a wider range of queries, including range and join queries, can be efficiently supported. Our indexing scheme is light-weight and its creation can be seamlessly grafted onto the MapReduce processing engine without incurring significant running cost. Moreover, the compactness allows us to maintain the bitmap indexes in memory so that performance overhead of index access is minimal. We implement our indexing scheme on top of the underlying Distributed File System (DFS) and evaluate its performance on an in-house cluster. We compare our index-based query processing with HadoopDB to show its superior performance. Our experimental results confirm the effectiveness, efficiency and scalability of the indexing scheme.

LIFOSS: a learned index scheme for streaming scenarios

Distributed scheduling and storage scheme based on LSM-OCTree for spatiotemporal stream

A Workload-Controllable Dynamic Spatio-Temporal Index Scheme for Streaming Processing

A Scalable Learned Index Scheme in Storage Systems

SALI: A Scalable Adaptive Learned Index Framework based on Probability Models

<i>SA-LSM</i>: Optimize Data Layout for LSM-tree Based Storage using Survival Analysis

Frame-Level Video Caching and Transmission Scheduling Via Stochastic Learning

LISA: A Learned Index Structure for Spatial Data

Buffer Allocation Algorithms for Embedded Real-Time Streaming File System

An Efficient and Compact Indexing Scheme for Large-Scale Data Store.

Efficient Locality-Sensitive Hashing over High-Dimensional Streaming Data.

UpLIF: An Updatable Self-Tuning Learned Index Framework

Lc‐Stream: An elastic scheduling strategy with latency constraints in geo‐distributed stream computing environments

Cube-based Incremental Outlier Detection for Streaming Computing

A Simple Yet High-Performing On-disk Learned Index: Can We Have Our Cake and Eat it Too?

Updatable Learned Index with Precise Positions

Inference on High-dimensional Single-index Models with Streaming Data

A prefetching indexing scheme for in-memory database systems

On the Local Cache Update Rules in Streaming Federated Learning

Revisiting Learned Index with Byte-addressable Persistent Storage

CELOF: Effective and fast memory efficient local outlier detection in high-dimensional data streams