Using the Macroflow Abstraction to Minimize Machine Slot-time Spent on Networking in Hadoop

Bingchuan Tian,Chen Tian,Jiajun Sun,Junhua Yan,Yizhou Tang,Wei Wang,Haipeng Dai,Nai Xia,Guihai Chen,Wanchun Dou
DOI: https://doi.org/10.1145/3232565.3234504
2018-01-01
Abstract:Machine slot-time spent on data transmission has direct impact on average job completion time (JCT). In this paper, we propose Macroflow, a networking abstraction that can capture the primitive scheduling granularity of machine slot-time. We demonstrate that minimizing machine slot-time is equivalent to minimizing the average macroflow completion time (MCT). We prove that minimizing MCT to be strongly NP-hard and focus on developing effective heuristics. We propose the Smallest-Macroflow-First (SMF) and Smallest-Average-Macroflow-First (SAMF) heuristics that greedily schedule macroflows based on their network footprint. To work with existing commodity switches, priority discretization is performed to classify macroflows into a small number of priority queues.
What problem does this paper attempt to address?