Global-view based Task Migration for Deep Learning Processor

Jinyu Cheng,Kai Zhao,Yuanchao Xu
DOI: https://doi.org/10.1109/ispa-bdcloud-socialcom-sustaincom52081.2021.00128
2021-09-01
Abstract:In order to cope with the deep neural network application characterized by big data and intensive computation and memory access, deep learning processors often adopt NUMA architecture and multi-core architecture to alleviate the bandwidth bottleneck and contention problems existing in a single storage node and improve the parallelism of the system. This design also increases the complexity of task scheduling, considering not only the data affinity but also the utilization of DLP. When the load of DLP is uneven, the existing local view scheduling will make forward view scheduling according to the status of tasks waiting in line, instead of following the principle of data affinity, and forcibly migrate tasks to DLP in other quadrants with lower load. It is observed that this strategy produces unnecessary task migration in heterogeneous programming model. The reason is that the load is uneven from local view but is even from global view. This unnecessary migration leads to bandwidth fluctuation and overall performance degradation. In order to overcome this problem, this paper proposes global view scheduling, which schedules according to the status of tasks waiting in line for execution and tasks waiting in line for scheduling. Lazy migration is adopted when DLP load is balanced from the global view, and eager migration is adopted immediately when DLP load is unbalanced from the global view. Experimental results show that this scheduling method can reduce unnecessary migration, alleviate bandwidth fluctuation and improve overall performance without reducing NPU utilization.
What problem does this paper attempt to address?