Deep Learning Compiler Load Balancing Optimization Method for Model Training

WANG Li,GAO Kai,ZHAO Yaqian,LI Rengang,CAO Fang,GUO Zhenhua
DOI: https://doi.org/10.3778/j.issn.1673-9418.2209026
2024-01-01
Abstract:For computing-intensive artificial intelligence(AI)training tasks,the computational graph is more com-plex,and data loading,task division of the computational graph,and load balancing of task scheduling have become the key factors affecting the computing performance.This paper proposes three optimization methods to make the task scheduling of model training in deep learning compilers reach the load balance state.Firstly,the load balance between CPU and back-end computing devices is realized by automatically establishing an efficient pipeline for data loading and model training,which improves the overall energy efficiency of the system.Secondly,the layered opti-mization technology of computational graph is used to realize the load balance of computational graph when the back-end devices are scheduling.Finally,this paper improves the resource utilization of back-end devices by auto-matically establishing efficient pipeline between layers.Experimental results show that the proposed optimization method achieves the system load balancing in the process of automatically mapping the training tasks to underlying hardware devices.Compared with traditional deep learning frameworks and compilers such as TensorFlow,nGraph,etc.,this paper achieves 2%~10%performance improvement in the training of different AI models,and the overall power consumption of the training system can be reduced by more than 10%.
What problem does this paper attempt to address?