A Delay-Aware Heartbeat Scheduling Mechanism for Cloud Datacenter
Ming Tang,Ling Wang,Chunlei Xu,Haodong Zou,Chenwei Su,Mingtao Ji
DOI: https://doi.org/10.1109/icdsca56264.2022.9987900
2022-01-01
Abstract:Recently, the existing distributed systems such as big data processing systems (Spark, Hadoop, etc.) and distributed training systems (Pytorch, Parameter Server, Tensorflow, etc.) need to send heartbeat packets to monitor the operation of tasks and detect the status of nodes. This paper studies the problem of how to design an efficient heartbeat mechanism for cloud datacenter, and we provide a new vision for solving this issue through the programmable switch. We firstly analyzed the format of the heartbeat packet and the aggregation process of heartbeat packets. Based on these, we designed an adaptive heartbeat mechanism by establishing an optimization problem with the goal of minimizing the aggregation cost. Furthermore, a heuristic scheduling algorithm is proposed and implemented on the simulation platform. The simulation results show that our proposed algorithm performs much better than the default heartbeat mechanism. That is, our proposed algorithm can effectively reduce the processing overhead and the network load.