Scalable Time Series Compound Infrastructure

Noura S. Alghamdi,Liang Zhang,Elke A. Rundensteiner,Mohamed Y. Eltabakh
DOI: https://doi.org/10.1145/3514221.3517888
2022-01-01
Abstract:Objects ranging from a patient's history of medical tests to an IoT device's series of sensor maintenance records leave digital traces in the form of big time series. These time series objects do not only span exceedingly long time periods (sometimes years), but are also characterized by intermittent yet interrelated time series measurements punctuated by long gaps of silence. This prevalent data type, which we refer to as Time Series Compound objects (or, TSC), has been largely overlooked in the literature. Unique challenges arise when managing, querying and analyzing repositories of these big TSC objects. These include appropriate similarity semantics with time misalignment resiliency, efficient storage of excessively long and complex objects, and TSC-holistic indexing. We demonstrate that state-of-the-art time series systems, although effective at indexing and searching regular time series data, fail to support such big TSC data. In this work, we introduce the first comprehensive solution for managing TSC objects as first class citizen. We introduce new similarity-match semantics as well as a compact misalignment-resilient representation for TSCs. Upon this foundation, we then design a TSC-aware distributed indexing infrastructure Sloth that supports scalable storage, indexing and querying of TB-scale TSC datasets. Our experimental study demonstrates that for TB-scale datasets, the query response time of Sloth is up to one order of magnitude faster than that of existing systems, while the mean average precision (mAP) for approximate kNN similarity match query results by Sloth is 70% more accurate than existing solutions.
What problem does this paper attempt to address?