Meces: Latency-efficient Rescaling via Prioritized State Migration for Stateful Distributed Stream Processing Systems

Rong Gu,Han Yin,Weichang Zhong,Chunfeng Yuan,Yihua Huang
2022-01-01
Abstract:Stateful distributed stream processing engines (SPEs) usually call for dynamic rescaling due to varying workloads. However, existing state migration approaches suffer from latency spikes, or high resource usage, or major disruptions as they ignore the order of state migration during rescaling. This paper reveals the importance of state migration order to the latency performance in SPEs. Based on that, we propose Meces, an on-the-fly state migration mechanism which prioritizes the state migration of hot keys (those being processed or about to be processed by downstream operator tasks) to achieve smooth rescaling. Meces leverages a fetch-on-demand design which migrates operator states at record-granularity for state consistency. We further devise a hierarchical state data structure and gradual strategy for migration efficiency. Meces is implemented on Apache Flink and evaluated with diversified benchmarks and scenarios. Compared to state-of-the-art approaches, Meces improves stream processing performance in terms of latency and throughput during rescaling by orders of magnitude, with negligible overhead and no disruption to non-rescaling periods.
What problem does this paper attempt to address?