Runtime-Aware Adaptive Scheduling in Stream Processing

Yuan Liu,Xuanhua Shi,Hai Jin
DOI: https://doi.org/10.1002/cpe.3661
2015-01-01
Concurrency and Computation Practice and Experience
Abstract:SummaryLong‐running stream applications usually share the same fundamental computational infrastructure. To improve the efficiency of data processing in stream processing systems, a data analysis operator could be partitioned into n parallel tasks. The partitioned tasks are usually deployed on m nodes coexisting with other application operators. Because the node performance can vary in unpredictable ways (i.e., (1) stream input rates may fluctuate and (2) computational resource availability varies as other applications are affected), the nodes have different processing steps, and the slow node determines the operator performance. Hence, the tasks should be redistributed at runtime for stream applications to meet their strict latency requirements. Our key idea is to redistribute the tasks to the best node dynamically adaptive to resource or load fluctuations. In this paper, we present a runtime‐aware adaptive schedule mechanism that aims at minimizing the operator processing latency and minimizing the latency difference between different nodes' tasks. We propose a new abstraction called performance cost ratio (PCR) that evaluates the node performance. The higher the node's PCR is, the less cost the node will pay for processing one tuple, and the more tasks should be deployed on it. In a scheduling, we first sort tasks descendingly by their loads and sort nodes by their PCR. Then we reassign the amount of computation according to the node's PCR to keep the node's PCR and its input rate the same or in similar proportion in all PCRs. The PCR‐based quantitative algorithm applies itself to make tasks loads quantized to the processing capacity of nodes, move the minimum amount of operator's tasks, and keep the tasks local at the same time. We have implemented a runtime‐aware adaptive scheduler as an extension to Storm and evaluated this strategy. We achieve the optimization goal using less computational resources. Copyright © 2015 John Wiley & Sons, Ltd.
What problem does this paper attempt to address?