Improving Lightweight AdderNet via Distillation from l 2 to l 1 -Norm

Minjing Dong,Xinghao Chen,Yunhe Wang,Chang Xu
DOI: https://doi.org/10.1109/tip.2023.3318940
IF: 10.6
2023-01-01
IEEE Transactions on Image Processing
Abstract:To achieve efficient inference with a hardware-friendly design, Adder Neural Networks (ANNs) are proposed to replace expensive multiplication operations in Convolutional Neural Networks (CNNs) with cheap additions through utilizing l<sub>1</sub> -norm for similarity measurement instead of cosine distance. However, we observe that there exists an increasing gap between CNNs and ANNs with reducing parameters, which cannot be eliminated by existing algorithms. In this paper, we present a simple yet effective Norm-Guided Distillation (NGD) method for l<sub>1</sub> -norm ANNs to learn superior performance from l<sub>2</sub> -norm ANNs. Although CNNs achieve similar accuracy with l<sub>2</sub> -norm ANNs, the clustering performance based on l<sub>2</sub> -distance can be easily learned by l<sub>1</sub> -norm ANNs compared with cross correlation in CNNs. The features in l<sub>2</sub> -norm ANNs are encouraged to achieve intra-class centralization and inter-class decentralization to amplify this advantage. Furthermore, the roughly estimated gradients in vanilla ANNs are modified to a progressive approximation from l<sub>2</sub> -norm to l<sub>1</sub> -norm so that a more accurate optimization can be achieved. Extensive evaluations on several benchmarks demonstrate the effectiveness of NGD on lightweight networks. For example, our method improves ANN by 10.43% with 0.25× GhostNet on CIFAR-100 and 3.1% with 1.0× GhostNet on ImageNet.
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?