Abstract:The posit number system aims to be a drop-in replacement of the existing IEEE floating-point standard. Its properties- tapered precision and high dynamic range, allow a smaller size posit to almost match the performance of a much larger size floating-point in representing decimals. This becomes especially useful for performing error-tolerant tasks like deep learning inference computation where low latency and area are a priority. Recent research has found that the performance of deep neural network models saturates beyond a certain level of accuracy of multipliers used for convolutions. Therefore, the extra hardware cost of developing precise arithmetic circuits for such applications becomes an unnecessary overhead. This paper explores approximate posit multipliers in the convolutional layers of deep neural networks and attempts to find an ideal balance between hardware utilization and inference accuracy. Posit multiplication involves several steps, with the mantissa multiplication step utilizing maximum hardware resources. To mitigate this, a posit multiplier circuit using an approximate hybrid-radix Booth encoding for mantissa multiplication and techniques such as truncation and bit masking based on input regime size are proposed. In addition, a novel Booth encoding control scheme to prevent unnecessary bits from switching has been devised to reduce dynamic power dissipation. Compared to existing literature, these optimizations have contributed to a 23% decrease in power dissipation in the mantissa multiplication stage. Further, a novel area and energy-efficient decoder architecture have also been developed with an 11% reduction in dynamic power dissipation and area compared to existing decoders. Overall, the proposed posit multiplier offers a 14% reduction in the PDP over the existing approximate posit multiplier designs. The proposed multiplier also achieves over 90% accuracy in inferencing deep learning models such as ResNet20, VGG-19 and DenseNet.

Low-Precision Mixed-Computation Models for Inference on Edge

Low-Precision Mixed-Computation Models for Inference on Edge

Pse: Mixed Quantization Framework of Neural Networks for Efficient Deployment

Hessian-based Mixed-Precision Quantization with Transition Aware Training for Neural Networks

Low- and Mixed-Precision Inference Accelerators

Deep Learning Training on the Edge with Low-Precision Posits

AMED: Automatic Mixed-Precision Quantization for Edge Devices

Cheetah: Mixed Low-Precision Hardware & Software Co-Design Framework for DNNs on the Edge

Free Bits: Latency Optimization of Mixed-Precision Quantized Neural Networks on the Edge

Neural Precision Polarization: Simplifying Neural Network Inference with Dual-Level Precision

On-Chip Hardware-Aware Quantization for Mixed Precision Neural Networks

Joint Pruning and Channel-wise Mixed-Precision Quantization for Efficient Deep Neural Networks

Efficient Low-Bit Neural Network with Memristor-Based Reconfigurable Circuits

PositNN: Training Deep Neural Networks with Mixed Low-Precision Posit

ADEPNET: A Dynamic-Precision Efficient Posit Multiplier for Neural Networks

Understanding the Impact of Precision Quantization on the Accuracy and Energy of Neural Networks

Optimizing Artificial Neural Networks to Minimize Arithmetic Errors in Stochastic Computing Implementations

Discovering Low-Precision Networks Close to Full-Precision Networks for Efficient Embedded Inference

Low-Precision Floating-Point Schemes for Neural Network Training