FuseKNA: Fused Kernel Convolution based Accelerator for Deep Neural Networks
Jianxun Yang,Zhao Zhang,Zhuangzhi Liu,Jing Thou,Leibo Liu,Shaojun Wei,Shouyi Yin
DOI: https://doi.org/10.1109/HPCA51647.2021.00079
2021-01-01
Abstract:Bit-serial computation has been a prevailing convolution method to accelerate varying-precision DNNs by slicing a multi-bit data into multiple 1-bit data and transforming a multiplication into multiple additions, where additions of zero bits are ineffectual, while additions of non-zero bits are repetitive since multiple kernels are quite possible to possess non-zero bits at the same kernel positions. Previous bit-serial accelerators only remove ineffectual additions by skipping computation of zero bits, however, repetitive additions are unable to be eliminated since they compute convolution of each kernel independently. In this work, we propose fused kernel convolution algorithm to eliminate both ineffectual and repetitive additions in bit-serial computation by exploiting bit repetition and bit sparsity in weights, for both convolutional and fully-connected layers. It unifies convolutions of multiple kernels into convolution of one fused kernel by firstly grouping additions into different patterns and secondly reconstructing convolution results, minimizing addition count. Meantime, the memory accesses of activations and partial sums are decreased due to less convolution count. Then a fused kernel convolution based accelerator, FuseKNA, is designed with compact compute logic, which fully exploits value sparsity of activations and bit sparsity of weights. Benchmarked with a set of mainstream DNNs, FuseKNA improves performance by 4.47 x, 2.31 x and 1.81 x, energy efficiency by 4.13 x, 3.06 x and 2.53 x over state-of-the-art Stripes, Pragmatic and Bit-Tactical.