Data-Aware Adaptive Compression for Stream Processing
Yu Zhang,Feng Zhang,Hourun Li,Shuhao Zhang,Xiaoguang Guo,Yuxing Chen,Anqun Pan,Xiaoyong Du
DOI: https://doi.org/10.1109/tkde.2024.3377710
IF: 9.235
2024-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:Stream processing has been in widespread use, and one of the most common application scenarios is SQL query on streams. By 2021, the global deployment of IoT endpoints reached 12.3 billion, indicating a surge in data generation. However, the escalating demands for high throughput and low latency in stream processing systems have posed significant challenges due to the increasing data volume and evolving user requirements. We present a compression-based stream processing engine, called CompressStreamDB, which enables adaptive fine-grained stream processing directly on compressed streams, to significantly enhance the performance of existing stream processing solutions. CompressStreamDB utilizes nine diverse compression methods tailored for different stream data types and integrates a cost model to automatically select the most efficient compression schemes. CompressStreamDB provides high throughput with low latency in stream SQL processing by identifying and eliminating redundant data among streams. Our evaluation demonstrates that CompressStreamDB improves average performance by 3.84× and reduces average delay by 68.0% compared to the state-of-the-art stream processing solution for uncompressed streams, along with 68.7% space savings. Besides, our edge trials show an average throughput/price ratio of 9.95× and a throughput/power ratio of 7.32× compared to the cloud design.
computer science, information systems, artificial intelligence,engineering, electrical & electronic