ASLog: an Area-Efficient CNN Accelerator for Per-Channel Logarithmic Post-Training Quantization

Jiawei Xu,Jiangshan Fan,Baolin Nan,Chen Ding,Li-Rong Zheng,Zhuo Zou,Yuxiang Huan
DOI: https://doi.org/10.1109/tcsi.2023.3315299
2023-01-01
Abstract:Post-training quantization (PTQ) has been proven an efficient model compression technique for Convolution Neural Networks (CNNs), without re-training or access to labeled datasets. However, it remains challenging for a CNN accelerator to fulfill the efficiency potential of PTQ methods. A large number of PTQ techniques blindly pursue high theoretic compression effect and accuracy, ignoring their impact on the actual hardware implementation, which causes more hardware overhead than benefit. This paper introduces ASLog, a PTQ-friendly CNN accelerator that explores four key designs in an algorithm-hardware co-optimizing manner: the first practical 4-bit logarithmic PTQ pipeline SLogII, the multiplier-free arithmetic element (AE) design, the energy-efficient bias correction element (BCE) design, and the per-channel quantization friendly (PCF) architecture and dataflow. The proposed SLogII PTQ pipeline can push the limit of logarithmic PTQ to 4-bit with < 2.5% accuracy degradation on various image classification and face recognition tasks. Exploiting the approximate computing design and a novel encoding and decoding scheme, the proposed SLogII AE is >40% lower in power and area consumption compared with a common 8-bit multiplier. The BCE and PCF design proposed in this paper are the first to consider the hardware impact of the widely-used per-channel quantization and bias correction technique, enabling an efficient PTQ-friendly implementation with a small hardware overhead. The ASLog is validated in a UMC 40-nm process, with 12.2 TOPS/W energy efficiency and 0.80 mm2 core area. The ASLog can achieve 336.3 GOPS/mm2 area efficiency and >500 OPs/Byte operational intensity, which map to over $1.85\times $ and $1.12\times $ improvement compared with the previous related works.
What problem does this paper attempt to address?