FullSparse: A Sparse-Aware GEMM Accelerator with Online Sparsity Prediction

Jiangnan Yu,Yang Fan,Hanfei Wang,Yuxuan Qiao,Zheng Wu,Xiankui Xiong,Xiao Yao,Haidong Yao,Yecheng Zhang
DOI: https://doi.org/10.1145/3649153.3649180
2024-01-01
Abstract:Leveraging sparsity optimizes storage and computation for resource-constrained devices in Deep Learning Neural Networks (DNNs). While neural networks naturally incorporate sparsity through operations like ReLU and quantization, diverse sparsity levels (0.2% to 99%) pose challenges for the design of computational units. In this paper, we provide an energy-efficient GEMM accelerator named FullSparse which is designed for diverse applications, accommodating varying sparsity levels in matrix multiplication (0.2% to 99%). This paper introduces three features for nuanced sparsity support: multi-sparsity control, predictive result sparsity, and a multi-sparsity-compatible PE array. Experimental evaluations affirm that our implementation while ensuring adaptability to sparsity, exhibits superior computational power comparable to the existing designs.
What problem does this paper attempt to address?