Hone: Mitigating Stragglers in Distributed Stream Processing With Tuple Scheduling
Wenxin Li,Duowen Liu,Kai Chen,Keqiu Li,Heng Qi
DOI: https://doi.org/10.1109/tpds.2021.3051059
IF: 5.3
2021-08-01
IEEE Transactions on Parallel and Distributed Systems
Abstract:Low latency stream processing on large clusters consisting of hundreds to thousands of servers is an increasingly important challenge. A crucial barrier to tackling this challenge is stragglers, i.e., tasks that are significantly straggling behind others in processing the stream data. However, prior straggler mitigation solutions have significant limitations. They balance streaming workloads among tasks but may incur imbalanced backlogs when the workloads exhibit variance, causing stragglers as well. Fortunately, we observe that carefully scheduling the outgoing tuples of different tasks can yield benefits for balancing backlogs, and thus avoids stragglers. To this end, we present Hone, a tuple scheduler that aims to minimize the maximum queue backlog of all tasks over time. Hone leverages an online Largest-Backlog-First (LBF) algorithm with a provable good competitive ratio to perform efficient tuple scheduling. We have implemented Hone based on Apache Storm and evaluated it extensively via both simulations and testbed experiments. Our results show that under the same workload balancing strategy–shuffle grouping, Hone outperforms the original Storm significantly, with the end-to-end tuple processing latency reduced by 78.7 percent on average.
computer science, theory & methods,engineering, electrical & electronic