Abstract:The systolic array architecture is one of the most popular choices for convolutional neural network hardware accelerators. The biggest advantage of the systolic array architecture is its simple and efficient design principle. Without complicated control and dataflow, hardware accelerators with the systolic array can calculate traditional convolution very efficiently. However, this advantage also brings new challenges to the systolic array. When computing special types of convolution, such as the small-scale convolution or depthwise convolution, the processing element (PE) utilization rate of the array decreases sharply. The main reason is that the simple architecture design limits the flexibility of the systolic array. In this article, we design a configurable multi-directional systolic array (CMSA) to address these issues. First, we added a data path to the systolic array. It allows users to split the systolic array through configuration to speed up the calculation of small-scale convolution. Second, we redesigned the PE unit so that the array has multiple data transmission modes and dataflow strategies. This allows users to switch the dataflow of the PE array to speed up the calculation of depthwise convolution. In addition, unlike other works, we only make a few changes and modifications to the existing systolic array architecture. It avoids additional hardware overheads and can be easily deployed in application scenarios that require small systolic arrays such as mobile terminals. Based on our evaluation, CMSA can increase the PE utilization rate by up to 1.6 times compared to the typical systolic array when running the last layers of ResNet-18. When running depthwise convolution in MobileNet, CMSA can increase the utilization rate by up to 14.8 times. At the same time, CMSA and the traditional systolic arrays are similar in area and energy consumption.

COSY: an Energy-Efficient Hardware Architecture for Deep Convolutional Neural Networks Based on Systolic Array.

EWS: an Energy-Efficient CNN Accelerator with Enhanced Weight Stationary Dataflow

Optical Convolution Based Computational Method for Low-Power Image Processing

Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks

An Energy-Efficient Convolutional Neural Network Processor Architecture Based on a Systolic Array

S2Engine: A Novel Systolic Architecture for Sparse Convolutional Neural Networks

A Reconfigurable Spatial Architecture for Energy-Efficient Inception Neural Networks

A Low-Power Hardware Architecture for Real-Time CNN Computing

Data-centric Computation Mode for Convolution in Deep Neural Networks.

A High-Performance Systolic Array Accelerator Dedicated for CNN.

Relative Indexed Compressed Sparse Filter Encoding Format for Hardware-Oriented Acceleration of Deep Convolutional Neural Networks

UACT: A Unified Energy-efficient Computing Architecture for CNN and TCNN.

Efficient Hardware Architectures for Deep Convolutional Neural Network

Configurable Multi-directional Systolic Array Architecture for Convolutional Neural Networks

Accelerating Convolutional Neural Network Inference Based on a Reconfigurable Sliced Systolic Array

A Computing Efficient Hardware Architecture for Sparse Deep Neural Network Computing

A Scalable 3D Array Architecture for Accelerating Convolutional Neural Networks

Systolic-CNN: An OpenCL-defined Scalable Run-time-flexible FPGA Accelerator Architecture for Accelerating Convolutional Neural Network Inference in Cloud/Edge Computing

A High-Efficient and Configurable Hardware Accelerator for Convolutional Neural Network

Energy-Efficient Architecture for FPGA-based Deep Convolutional Neural Networks with Binary Weights