Near Lossless Time Series Data Compression Methods using Statistics and Deviation

Vidhi Agrawal,Gajraj Kuldeep,Dhananjoy Dey
DOI: https://doi.org/10.48550/arXiv.2209.14162
2022-10-01
Abstract:The last two decades have seen tremendous growth in data collections because of the realization of recent technologies, including the internet of things (IoT), E-Health, industrial IoT 4.0, autonomous vehicles, etc. The challenge of data transmission and storage can be handled by utilizing state-of-the-art data compression methods. Recent data compression methods are proposed using deep learning methods, which perform better than conventional methods. However, these methods require a lot of data and resources for training. Furthermore, it is difficult to materialize these deep learning-based solutions on IoT devices due to the resource-constrained nature of IoT devices. In this paper, we propose lightweight data compression methods based on data statistics and deviation. The proposed method performs better than the deep learning method in terms of compression ratio (CR). We simulate and compare the proposed data compression methods for various time series signals, e.g., accelerometer, gas sensor, gyroscope, electrical power consumption, etc. In particular, it is observed that the proposed method achieves 250.8\%, 94.3\%, and 205\% higher CR than the deep learning method for the GYS, Gactive, and ACM datasets, respectively. The code and data are available at <a class="link-external link-https" href="https://github.com/vidhi0206/data-compression" rel="external noopener nofollow">this https URL</a> .
Information Theory,Signal Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges faced by Internet of Things (IoT) devices in data transmission and storage. With the rapid development of Internet of Things technology in recent years, sensors and devices in fields such as Internet of Things (IoT), E - Health, Industry 4.0, and self - driving vehicles have generated a vast amount of data. The transmission and storage of this data require efficient compression methods to reduce bandwidth and storage costs while maintaining data consistency and accuracy. ### Specific problems: 1. **Cost of data transmission and storage**: Due to the huge amount of data generated by IoT devices, the traditional data transmission and storage methods are costly. 2. **Limitations of existing compression methods**: - **Deep - learning methods**: Although they have good performance, they require a large amount of data and computing resources for training and are difficult to implement on resource - constrained IoT devices. - **Traditional compression methods**: Such as LFZip, CA, SZ, etc. Although they have a certain compression effect, the compression ratio is not high enough in some cases. ### Method proposed in the paper: The paper proposes a lightweight time - series data compression method based on statistics and deviation transformation. This method aims to solve the above problems through the following points: - **High - efficiency compression ratio**: Compared with existing deep - learning methods and other compression algorithms, this method can achieve a higher compression ratio with lower consumption of computing resources. - **Adapt to IoT devices**: Considering the resource limitations of IoT devices, this method is designed to be lightweight and can operate effectively on resource - constrained devices. - **Flexibility**: This method can select different compression strategies according to different data characteristics (such as volatility or trend), thereby improving the compression effect. ### Main contributions: - Proposed two time - series data compression methods based on statistics and deviation transformation (version 1 and version 2), and combined with the entropy coding framework to achieve efficient compression. - Verified by experiments, this method shows a compression ratio significantly superior to existing compression methods on multiple time - series data sets, especially on the ACM and GYS data sets, with the compression ratio increased by more than 2 times. ### Conclusion: The method proposed in this paper not only has an advantage in the compression ratio but also is more friendly in terms of computing resource consumption, and is suitable for application to resource - constrained IoT devices. Future work will further expand this method to handle multi - dimensional multi - sensor time - series data.