ONE-SA: Enabling Nonlinear Operations in Systolic Arrays for Efficient and Flexible Neural Network Inference

Ruiqi Sun,Yinchen Ni,Xin He,Jie Zhao,An Zou
2024-02-01
Abstract:The computation and memory-intensive nature of DNNs limits their use in many mobile and embedded contexts. Application-specific integrated circuit (ASIC) hardware accelerators employ matrix multiplication units (such as the systolic arrays) and dedicated nonlinear function units to speed up DNN computations. A close examination of these ASIC accelerators reveals that the designs are often specialized and lack versatility across different networks, especially when the networks have different types of computation. In this paper, we introduce a novel systolic array architecture, which is capable of executing nonlinear functions. By encompassing both inherent linear and newly enabled nonlinear functions within the systolic arrays, the proposed architecture facilitates versatile network inferences, substantially enhancing computational power and energy efficiency. Experimental results show that employing this systolic array enables seamless execution of entire DNNs, incurring only a negligible loss in the network inference accuracy. Furthermore, assessment and evaluation with FPGAs reveal that integrating nonlinear computation capacity into a systolic array does not introduce extra notable (less than 1.5%) block memory memories (BRAMs), look-up-tables (LUTs), or digital signal processors (DSPs) but a mere 13.3% - 24.1% more flip flops (FFs). In comparison to existing methodologies, executing the networks with the proposed systolic array, which enables the flexibility of different network models, yields up to 25.73x, 5.21x, and 1.54x computational efficiency when compared to general-purpose CPUs, GPUs, and SoCs respectively, while achieving comparable (83.4% - 135.8%) performance with the conventional accelerators which are designed for specific neural network models.
Hardware Architecture,Signal Processing
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the computational resource limitations and energy consumption issues currently faced by deep neural networks (DNNs) when deployed on mobile and embedded devices. Although application - specific integrated circuit (ASIC) hardware accelerators accelerate DNN computations by integrating matrix multiplication units (such as systolic arrays) and dedicated non - linear function units, these designs are often too specialized and lack flexibility across different network models, especially when these networks have different types of computational requirements. The paper proposes a novel systolic array architecture (called ONE - SA) that is capable of performing non - linear functions. By combining the inherent linear computing capabilities and the newly enabled non - linear computing capabilities in the systolic array, this architecture promotes diverse network inferences and significantly improves computing power and energy efficiency. Experimental results show that using this systolic array can seamlessly execute the entire DNN with only a negligible impact on network inference accuracy. Moreover, FPGA - based evaluations show that integrating non - linear computing capabilities into the systolic array does not introduce additional significant block memory (BRAM), look - up table (LUT) or digital signal processor (DSP) resource consumption, but only increases the flip - flop (FF) by 13.3% to 24.1%. Compared with existing methods, when using the proposed systolic array to execute network models, while maintaining performance comparable to accelerators designed for traditional specific neural network models, the computational efficiency is improved by 25.73 times, 5.21 times and 1.54 times compared to general - purpose CPUs, GPUs and SoCs respectively.