Marabunta: Continuous Distributed Processing of Skewed Streams

Bing Li,Zhibin Zhang,Tianqi Zheng,Qiaoling Zhong,Qun Huang,Xueqi Cheng
DOI: https://doi.org/10.1109/ccgrid49817.2020.00-68
2020-01-01
Abstract:Current stream processing systems (SPSs) suffer from the imbalanced load and limited parallelism due to skewed data distributions and imbalanced computational resources. We observed that the cause of these problems is current SPSs partition their workloads statically. To address this problem, we design a distributed stream processing system, Marabunta, for skewed stream processing. Marabunta performs dynamic scaling and load balancing automatically at runtime. Large partitions in a skewed data distribution can be processed in parallel or migrated to idle machines to achieve load balancing. Moreover, Marabunta uses a new execution model to accelerate the execution by increases the parallelism and the computational resources utilization. We implemented Marabunta in C++ and optimized it for modern hardware. Our evaluations on typical streaming workloads show that Marabunta achieves higher throughputs and better elasticity with both uniform and skewed datasets compared to the state-of-the-art SPSs, e.g., Flink and Heron.
What problem does this paper attempt to address?