Abstract:Adder Neural Network (AdderNet) provides a new way for developing energy-efficient neural networks by replacing the expensive multiplications in convolution with cheaper additions (i.e.l1-norm). To achieve higher hardware efficiency, it is necessary to further study the low-bit quantization of AdderNet. Due to the limitation that the commutative law in multiplication does not hold in l1-norm, the well-established quantization methods on convolutional networks cannot be applied on AdderNets. Thus, the existing AdderNet quantization techniques propose to use only one shared scale to quantize both the weights and activations simultaneously. Admittedly, such an approach can keep the commutative law in the l1-norm quantization process, while the accuracy drop after low-bit quantization cannot be ignored. To this end, we first thoroughly analyze the difference on distributions of weights and activations in AdderNet and then propose a new quantization algorithm by redistributing the weights and the activations. Specifically, the pre-trained full-precision weights in different kernels are clustered into different groups, then the intra-group sharing and inter-group independent scales can be adopted. To further compensate the accuracy drop caused by the distribution difference, we then develop a lossless range clamp scheme for weights and a simple yet effective outliers clamp strategy for activations. Thus, the functionality of full-precision weights and the representation ability of full-precision activations can be fully preserved. The effectiveness of the proposed quantization method for AdderNet is well verified on several benchmarks, e.g., our 4-bit post-training quantized adder ResNet-18 achieves an 66.5% top-1 accuracy on the ImageNet with comparable energy efficiency, which is about 8.5% higher than that of the previous AdderNet quantization methods.

Improving Lightweight AdderNet via Distillation from l 2 to l 1 -Norm

Improving Lightweight AdderNet via Distillation From ℓ2 to ℓ1-norm

DCCD: Reducing Neural Network Redundancy Via Distillation

Towards Stable and Robust AdderNets

LaSNN: Layer-wise ANN-to-SNN Distillation for Effective and Efficient Training in Deep Spiking Neural Networks

Redistribution of Weights and Activations for AdderNet Quantization

Training Compact DNNs with l 1 / 2 Regularization

FixNorm: Dissecting Weight Decay for Training Deep Neural Networks

IDSNN: Towards High-Performance and Low-Latency SNN Training Via Initialization and Distillation.

L1 -Norm Batch Normalization for Efficient Training of Deep Neural Networks

Self-Distillation: Towards Efficient and Compact Neural Networks

SPARSE DEEP NEURAL NETWORKS USING <i>L</i><sub>1,</sub>-WEIGHT NORMALIZATION

LAPTOP-Diff: Layer Pruning and Normalized Distillation for Compressing Diffusion Models

Lightweight Diffusion Models with Distillation-Based Block Neural Architecture Search

Learning Sparse Neural Networks through L0 Regularization

Layer-Wise Training To Create Efficient Convolutional Neural Networks

Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation

Learning Efficient Convolutional Networks Through Network Slimming.

LDD: High-Precision Training of Deep Spiking Neural Network Transformers Guided by an Artificial Neural Network