Dynamic Resource Management In a Massively Parallel Stream Processing Engine

Kasper Grud Skat Madsen,Yongluan Zhou
DOI: https://doi.org/10.1145/2806416.2806449
2015-10-17
Abstract:The emerging interest in Massively Parallel Stream Processing Engines (MPSPEs), which are able to process long-standing computations over data streams with ever-growing velocity at a large-scale cluster, calls for efficient dynamic resource management techniques to avoid any waste of resources and/or excessive processing latency. In this paper, we propose an approach to integrate dynamic resource management with passive fault-tolerance mechanisms in a MPSPE so that we can harvest the checkpoints prepared for failure recovery to enhance the efficiency of dynamic load migrations. To maximize the opportunity of reusing checkpoints for fast load migration, we formally define a checkpoint allocation problem and provide a pragmatic algorithm to solve it. We implement all the proposed techniques on top of Apache Storm, an open-source MPSPE, and conduct extensive experiments using a real dataset to examine various aspects of our techniques. The results show that our techniques can greatly improve the efficiency of dynamic resource reconfiguration without imposing significant overhead or latency to the normal job execution.
What problem does this paper attempt to address?