Heterogeneous Systolic Array Architecture for Compact CNNs Hardware Accelerators

Rui Xu,Sheng Ma,Yaohua Wang,Yang Guo
DOI: https://doi.org/10.1109/tpds.2021.3129647
2021-01-01
Abstract:Compact convolutional neural networks have become a hot research topic. However, we find that the hardware accelerator with systolic arrays processing compact models is extremely performance-inefficient, especially when processing depthwise convolutional layers in the networks. To make systolic arrays efficient for compact convolutional neural networks, we propose the heterogeneous systolic array (HeSA) architecture. It introduces heterogeneous processing elements that support multiple modes of dataflow, which can further exploit the reuse data chance of depthwise convolutional layers and without changing the architecture of the naïve systolic array. By increasing the utilization rate of processing elements in the array, HeSA improves the performance, throughput, and energy efficiency compared to the standard baseline. Based on our evaluation with typical workloads, HeSA improves the utilization rate of the computing resource in depthwise convolutional layers by 4.5×-5.5× and acquires 1.5-2.2× total performance speedup compared to the standard systolic array architecture. HeSA also improves the on-chip data reuse chance and saves over 20% of energy consumption. Meanwhile, the area of HeSA is basically unchanged compared to the baseline due to its simple design.
What problem does this paper attempt to address?