ONE-SA: Enabling Nonlinear Operations in Systolic Arrays for Efficient and Flexible Neural Network Inference

Ruiqi Sun,Yinchen Ni,Xin He,Jie Zhao,An Zou

2024-02-01

Abstract:The computation and memory-intensive nature of DNNs limits their use in many mobile and embedded contexts. Application-specific integrated circuit (ASIC) hardware accelerators employ matrix multiplication units (such as the systolic arrays) and dedicated nonlinear function units to speed up DNN computations. A close examination of these ASIC accelerators reveals that the designs are often specialized and lack versatility across different networks, especially when the networks have different types of computation. In this paper, we introduce a novel systolic array architecture, which is capable of executing nonlinear functions. By encompassing both inherent linear and newly enabled nonlinear functions within the systolic arrays, the proposed architecture facilitates versatile network inferences, substantially enhancing computational power and energy efficiency. Experimental results show that employing this systolic array enables seamless execution of entire DNNs, incurring only a negligible loss in the network inference accuracy. Furthermore, assessment and evaluation with FPGAs reveal that integrating nonlinear computation capacity into a systolic array does not introduce extra notable (less than 1.5%) block memory memories (BRAMs), look-up-tables (LUTs), or digital signal processors (DSPs) but a mere 13.3% - 24.1% more flip flops (FFs). In comparison to existing methodologies, executing the networks with the proposed systolic array, which enables the flexibility of different network models, yields up to 25.73x, 5.21x, and 1.54x computational efficiency when compared to general-purpose CPUs, GPUs, and SoCs respectively, while achieving comparable (83.4% - 135.8%) performance with the conventional accelerators which are designed for specific neural network models.

Hardware Architecture,Signal Processing

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the computational resource limitations and energy consumption issues currently faced by deep neural networks (DNNs) when deployed on mobile and embedded devices. Although application - specific integrated circuit (ASIC) hardware accelerators accelerate DNN computations by integrating matrix multiplication units (such as systolic arrays) and dedicated non - linear function units, these designs are often too specialized and lack flexibility across different network models, especially when these networks have different types of computational requirements. The paper proposes a novel systolic array architecture (called ONE - SA) that is capable of performing non - linear functions. By combining the inherent linear computing capabilities and the newly enabled non - linear computing capabilities in the systolic array, this architecture promotes diverse network inferences and significantly improves computing power and energy efficiency. Experimental results show that using this systolic array can seamlessly execute the entire DNN with only a negligible impact on network inference accuracy. Moreover, FPGA - based evaluations show that integrating non - linear computing capabilities into the systolic array does not introduce additional significant block memory (BRAM), look - up table (LUT) or digital signal processor (DSP) resource consumption, but only increases the flip - flop (FF) by 13.3% to 24.1%. Compared with existing methods, when using the proposed systolic array to execute network models, while maintaining performance comparable to accelerators designed for traditional specific neural network models, the computational efficiency is improved by 25.73 times, 5.21 times and 1.54 times compared to general - purpose CPUs, GPUs and SoCs respectively.

ONE-SA: Enabling Nonlinear Operations in Systolic Arrays for Efficient and Flexible Neural Network Inference

A Systolic SNN Inference Accelerator and Its Co-optimized Software Framework

DaDianNao: A Machine-Learning Supercomputer

On the Difficulty of Designing Processor Arrays for Deep Neural Networks

A Survey of Design and Optimization for Systolic Array-based DNN Accelerators

Addressing the issue of processing element under-utilization in general-purpose systolic deep learning accelerators

A Hybrid Heterogeneous Neural Network Accelerator Based on Systolic Array

A fine-grained mixed precision DNN accelerator using a two-stage big-little core RISC-V MCU.

Dual-Line-Systolic Array for High Performance CNN Accelerator

A High-Performance Systolic Array Accelerator Dedicated for CNN.

ReSA: Reconfigurable Systolic Array for Multiple Tiny DNN Tensors

Systolic Array Based Accelerator and Algorithm Mapping for Deep Learning Algorithms.

Configurable Multi-directional Systolic Array Architecture for Convolutional Neural Networks

ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars

Balancing Efficiency and Flexibility for DNN Acceleration via Temporal GPU-Systolic Array Integration

Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs.

Neural Synaptic Plasticity-Inspired Computing: A High Computing Efficient Deep Convolutional Neural Network Accelerator

Heterogeneous Systolic Array Architecture for Compact CNNs Hardware Accelerators

Revealing Untapped DSP Optimization Potentials for FPGA-Based Systolic Matrix Engines

Scale-out Systolic Arrays

Systolic-CNN: An OpenCL-defined Scalable Run-time-flexible FPGA Accelerator Architecture for Accelerating Convolutional Neural Network Inference in Cloud/Edge Computing