Machete: an Efficient Lossy Floating-Point Compressor Designed for Time Series Databases

Yang Shi,Xiangyu Zou,Xinyu Chen,Sian Jin,Dingwen Tao,Cai Deng,Yufan Chen,Wen Xia
DOI: https://doi.org/10.1109/dcc58796.2024.00061
2024-01-01
Abstract:As time series data become popular, their volume increases rapidly. Time series databases are designed for such data, and they process data in short slices, meaning that the compression units for compressors are small. How to compress the short slices of floating-points while reserving a high compression ratio and a high decompression speed remains a problem. To solve the problem, we propose a lossy compressor Machete. It uses an efficient hybrid encoder of Huffman encoding and variable length quantity (VLQ). Adaptive encoding selection makes it excel on short-slice data compression ratio, while the simple framework ensures fast decompression. We also find a limitation in VLQ and propose the optimal VLQ to further improve the compression ratio. Our evaluation on four real-world datasets shows that Machete outperforms state-of-theart compressors by 32%-80% on compression ratio and achieves the fastest decompression speed on two datasets. When applied to a well-known time series database InfluxDB, Machete saves disk usage up to 79% and improves the query performance of the InfluxDB database by saving I/O.
What problem does this paper attempt to address?