Abstract:Recent advances of deep neural networks (DNNs) promote low-level vision applications in real-world scenarios, e.g. , image enhancement, dehazing. Nevertheless, DNN-based methods encounter challenges in terms of high computational and memory requirements, especially when deployed on real-world devices with limited resources. Quantization is one of effective compression techniques that significantly reduces computational and memory requirements by employing low-bit parameters and bit-wise operations. However, low-bit quantization for computational imaging ( Q-Imaging ) remains largely unexplored and usually suffer from a significant performance drop compared with the real-valued counterparts. In this work, through empirical analysis, we identify the main factor responsible for such significant performance drop underlies in the large gradient estimation error from non-differentiable weight quantization methods, and the activation information degeneration along with the activation quantization. To address these issues, we introduce a differentiable quantization search (DQS) method to learn the quantized weights and an information boosting module (IBM) for network activation quantization. Our DQS method allows us to treat the discrete weights in a quantized neural network as variables that can be searched. We achieve this end by using a differential approach to accurately search for these weights. In specific, each weight is represented as a probability distribution across a set of discrete values. During training, these probabilities are optimized, and the values with the highest probabilities are chosen to construct the desired quantized network. Moreover, our IBM module can rectify the activation distribution before quantization to maximize the self-information entropy, which retains the maximum information during the quantization process. Extensive experiments across a range of image processing tasks, including enhancement, super-resolution, denoising and dehazing, validate the effectiveness of our Q-Imaging along with superior performances compared to a variety of state-of-the-art quantization methods. In particular, the method in Q-Imaging also achieves a strong generalization performance when composing a detection network for the dark object detection task.

Improving the Accuracy of Neural Networks in Analog Computing-in-memory Systems by a Generalized Quantization Method

Improving the accuracy of neural networks in analog computing-in-memory systems by analog weight.

A Low-Power In-Memory Multiplication and Accumulation Array with Modified Radix-4 Input and Canonical Signed Digit Weights

Quantization of Deep Neural Networks to facilitate self-correction of weights on Phase Change Memory-based analog hardware

CIMQ: A Hardware-Efficient Quantization Framework for Computing-In-Memory Based Neural Network Accelerators

Accuracy and Resiliency of Analog Compute-in-Memory Inference Engines

4.6-Bit Quantization for Fast and Accurate Neural Network Inference on CPUs

Accelerating Neural Network Inference by Overflow Aware Quantization

A Variation-Aware Binary Neural Network Framework for Process Resilient In-Memory Computations

On the Accuracy of Analog Neural Network Inference Accelerators

Class-based Quantization for Neural Networks

CANET: Quantized Neural Network Inference With 8-bit Carry-Aware Accumulator

A 4-Bit Integer-Only Neural Network Quantization Method Based on Shift Batch Normalization

Scale-CIM: Precision-Scalable Computing-in-Memory for Energy-Efficient Quantized Neural Networks

Learning Accurate Low-bit Quantization towards Efficient Computational Imaging

VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference

PR-CIM: a Variation-Aware Binary-Neural-Network Framework for Process-Resilient Computation-in-memory

Activation Redistribution Based Hybrid Asymmetric Quantization Method of Neural Networks

Low Precision Quantization-aware Training in Spiking Neural Networks with Differentiable Quantization Function

Residual Quantization for Low Bit-Width Neural Networks.

On-Chip Hardware-Aware Quantization for Mixed Precision Neural Networks