Abstract:The massive amounts of time series data continuously generated and collected by applications warrant the need for large scale distributed time series processing systems. Indexing plays a critical role in speeding up time series similarity queries on which various analytics and applications rely. However, the state-of-the-art indexing techniques, which are iSAX-based structures, do not scale well due to the small adopted fan-out (binary) that leads to a highly deep index tree, and the expensive search cost through many internal nodes. More seriously, the iSAX character-level cardinality adopted by these indices suffers from a poor maintenance of the proximity relationships among the time series objects, which leads to severe accuracy degradation for approximate similarity queries. In this paper, we propose the TARDIS distributed indexing framework to overcome the aforementioned limitations. TARDIS introduces a novel iSAX index tree that is based on a new word-level variable cardinality. The proposed index ensures compact structure, efficient search and comparison, and good preservation of the similarity relationships. TARDIS is suitable for indexing and querying billion-scale time series datasets. TARDIS is composed of one centralized global index and local distributed indices-one per each data partition across the cluster. TARDIS uses both the global and local indices to efficiently support exact match and kNN approximate queries. The system is implemented using Apache Spark, and extensive experiments are conducted on benchmark and real-world datasets. Evaluation results demonstrate that for over one billion time series dataset (TB scale), the construction of a clustered index is about 83% faster than the existing techniques. Moreover, the average response time of exact match queries is decreased by 50%, and the accuracy of the kNN approximate queries has increased more than 10 fold (from 3% to 40%) compared to the existing techniques.

TIIS: A Two-Level Inverted-Index Scheme for Large-Scale Data Processing in the Parallel Database System

Implementation of large-scale distributed information retrieval system

A Two-Tier Distributed Full-Text Indexing System

Mpdbs: A Multi-Level Parallel Database System Based on B-Tree

On Implementing a Text-Database-as-a-Service

Parallel Text Categorization of Massive Text Based on Hadoop

ITISS: an Efficient Framework for Querying Big Temporal Data.

TARDIS: Distributed Indexing Framework for Big Time Series Data

A Tree-structured Database Machine for Large Relational Database Systems

STLIS: A Scalable Two-Level Index Scheme for Big Data in IoT

Research of Massive Internet Text Data Real-Time Loading and Index System

A Study of Performance Optimization Method for Massive Spaito-temporal Data Based on Spatio-temporal Partition Clustering

Scalable Top-K Spatial Keyword Search

Parallel indexing technique for spatio-temporal data

An Efficient and Compact Indexing Scheme for Large-Scale Data Store.

Cluster Based Parallel Database Management System for Data Intensive Computing

Author ' s personal copy Parallel indexing technique for spatio-temporal data

A Distributed Inverted Indexing Scheme for Large-Scale RDF Data.

Parallel spatial index algorithm based on hilbert partition

The Distributed System for Inverted Multi-Index Visual Retrieval

The Design of Apache IoTDB Distributed Framework