A Low-Cost Reconfigurable Nonlinear Core for Embedded DNN Applications

Yue Li,Wei Cao,Xuegong Zhou,Lingli Wang
DOI: https://doi.org/10.1109/ICFPT51103.2020.00014
2020-01-01
Abstract:Nonlinear layers (NLs) are indispensable parts of deep neural networks (DNNs). Targeting different inference tasks, NLs vary with different DNN models. Based on input range reduction and piecewise polynomial approximation, this paper proposes a low-cost reconfigurable nonlinear core to accelerate diverse NLs. Three types of NLs are supported: pooling operations like max-pooling and average-pooling, classifiers like softmax and log-softmax, and element-wise functions like GELU, sigmoid and swish. By configuring the data paths and contents of the lookup tables (LUTs), a function can be easily implemented by the approximation-based method. The proposed core contains only two low bit-width multipliers, two adders, and one subtractor as computing units. Their full utilization has been achieved for all operations except pooling. A lossless bit-width reduction scheme and a coefficient rearrangement scheme are proposed to reduce the multiplier bit widths and the LUT sizes. When implemented on a Xilinx Zynq-7000 ZC706 FPGA board, our design can utilize a high-speed DSP block as two independent multipliers. Compared with the state-of-the-art nonlinear cores, the proposed core achieves higher precision for functions like sigmoid and tanh, and higher throughput for functions like softmax and log-softmax. Besides, the proposed core can support more nonlinear operations. Experiments on the latest DNNs such as BERT and EfficientNet show that our design can still achieve high accuracies with less than 0.2% loss.
What problem does this paper attempt to address?