HSAS: Efficient task scheduling for large scale heterogeneous systolic array accelerator cluster
Kaige Yan,Yanshuang Song,Tao Liu,Jingweijia Tan,Xiaohui Wei,Xin Fu
DOI: https://doi.org/10.1016/j.future.2024.01.023
IF: 7.307
2024-05-01
Future Generation Computer Systems
Abstract:To efficiently process a large amount of deep neural network models can be challenging, due to significant differences among models and even layers. Nowadays, systolic array has become a common architecture for processing neural networks. With this architecture, different array sizes can lead to huge difference in hardware utilization for the same network. Therefore, to achieve the optimal processing efficiency for a large amount of models, a heterogeneous systolic array accelerator cluster could be more advantageous than a homogeneous architecture. In this work, we propose such heterogeneous architecture, and design its scheduling algorithm HSAS. HSAS can evaluate how models fit with systolic arrays, by our systolic array performance and energy models. Meanwhile, HSAS also takes load balance and preemption into consideration. We further introduce a task decomposition algorithm and subtask priority management table, to enable more fine-grained subtask level scheduling. Our evaluation shows task level HSAS can improve average normalized turnaround time, system throughput and fairness by up to more than 80% compared with classic and state-of-the-art methods, while subtask level HSAS can achieve 18%–63% improvement compared to other methods.
computer science, theory & methods