Enhancing MapReduce Fault Recovery Through Binocular Speculation

Huansong Fu,Yue Zhu,Amit Kumar Nath,Md. Muhib Khan,Weikuan Yu
DOI: https://doi.org/10.48550/arXiv.1901.07715
2019-01-23
Distributed, Parallel, and Cluster Computing
Abstract:MapReduce speculation plays an important role in finding potential task stragglers and failures. But a tacit dichotomy exists in MapReduce due to its inherent two-phase (map and reduce) management scheme in which map tasks and reduce tasks have distinctly different execution behaviors, yet reduce tasks are dependent on the results of map tasks. We reveal that speculation policies for fault handling in MapReduce do not recognize this dichotomy between map and reduce tasks, which leads to an issue of speculation myopia for MapReduce fault recovery. These issues cause significant performance degradation upon network and node failures. To address the speculation myopia caused by MapReduce dichotomy, we introduce a new scheme called binocular speculation to help MapReduce increase its assessment scope for speculation. As part of the scheme, we also design three component techniques including neighborhood glance, collective speculation and speculative rollback. Our evaluation shows that, with these techniques, binocular speculation can increase the coordination of map and reduce phases, and enhance the efficiency of MapReduce fault recovery.
What problem does this paper attempt to address?