Abstract:The massive amounts of time series data continuously generated and collected by applications warrant the need for large scale distributed time series processing systems. Indexing plays a critical role in speeding up time series similarity queries on which various analytics and applications rely. However, the state-of-the-art indexing techniques, which are iSAX-based structures, do not scale well due to the small adopted fan-out (binary) that leads to a highly deep index tree, and the expensive search cost through many internal nodes. More seriously, the iSAX character-level cardinality adopted by these indices suffers from a poor maintenance of the proximity relationships among the time series objects, which leads to severe accuracy degradation for approximate similarity queries. In this paper, we propose the TARDIS distributed indexing framework to overcome the aforementioned limitations. TARDIS introduces a novel iSAX index tree that is based on a new word-level variable cardinality. The proposed index ensures compact structure, efficient search and comparison, and good preservation of the similarity relationships. TARDIS is suitable for indexing and querying billion-scale time series datasets. TARDIS is composed of one centralized global index and local distributed indices-one per each data partition across the cluster. TARDIS uses both the global and local indices to efficiently support exact match and kNN approximate queries. The system is implemented using Apache Spark, and extensive experiments are conducted on benchmark and real-world datasets. Evaluation results demonstrate that for over one billion time series dataset (TB scale), the construction of a clustered index is about 83% faster than the existing techniques. Moreover, the average response time of exact match queries is decreased by 50%, and the accuracy of the kNN approximate queries has increased more than 10 fold (from 3% to 40%) compared to the existing techniques.

Time Series Data Encoding for Efficient Storage

Time series data encoding in Apache IoTDB: comparative analysis and recommendation

Multimodal Data Encoding and Compression in Apache IoTDB

Frequency Domain Data Encoding in Apache IoTDB.

TSCache

Apache IoTDB: A Time Series Database for IoT Applications

REGER: Reordering Time Series Data for Regression Encoding

TVStore: Automatically Bounding Time Series Storage via Time-Varying Compression

Grouping Time Series for Efficient Columnar Storage.

Apache IoTDB: time-series database for internet of things

The Embedded IoT Time Series Database for Hybrid Solid-State Storage System

A Fast Lightweight Time-Series Store for IoT Data

Performance Study of Time Series Databases

MOST: Model-Based Compression with Outlier Storage for Time Series Data

An Efficient NoSQL-Based Storage Schema for Large-Scale Time Series Data

In-Network Time-Series Data Compression for Electric Internet of Things

SciTS: A Benchmark for Time-Series Databases in Scientific Experiments and Industrial Internet of Things

TARDIS: Distributed Indexing Framework for Big Time Series Data

Semantically Enhanced Time Series Databases in IoT-Edge-Cloud Infrastructure

Optimizing Time Series Queries with Versions

Time-tired compaction: An elastic compaction scheme for LSM-tree based time-series database