REGER: Reordering Time Series Data for Regression Encoding

Jinzhao Xiao,Wendi He,Shaoxu Song,Xiangdong Huang,Chen Wang,Jianmin Wang
DOI: https://doi.org/10.1109/icde60146.2024.00100
2024-01-01
Abstract:Regression models are employed in lossless compression of time series data, by storing the residual of each point, known as regression encoding. Owing to value fluctuation, the regression residuals could be large and thus occupy huge space. It is worth noting that compared to the fluctuating values, time intervals are often regular and easy to compress, especially in the IoT scenarios where sensor data are collected in a preset frequency. In this sense, there is a trade-off between storing the regular timestamps and fluctuating values. Intuitively, rather than in time order, we may exchange the data points in the series such that the nearby ones have both smoother timestamps and values, leading to lower residuals. In this paper, we propose to reorder the time series data for better regression encoding. Rather than recomputing from scratch, efficient updates of residuals after moving some points are devised. The experimental comparison over various real-world datasets, either public or collected by our industrial partners, illustrates the superiority of the proposal in compression ratio. The method, REGression Encoding with Reordering (REGER), has now become an encoding method in an open-source time series database, Apache IoTDB.
What problem does this paper attempt to address?