Abstract:Binary Neural Networks~(BNNs) have been proven to be highly effective for deploying deep neural networks on mobile and embedded platforms. Most existing works focus on minimizing quantization errors, improving representation ability, or designing gradient approximations to alleviate gradient mismatch in BNNs, while leaving the weight sign flipping, a critical factor for achieving powerful BNNs, untouched. In this paper, we investigate the efficiency of weight sign updates in BNNs. We observe that, for vanilla BNNs, over 50\% of the weights remain their signs unchanged during training, and these weights are not only distributed at the tails of the weight distribution but also universally present in the vicinity of zero. We refer to these weights as ``silent weights'', which slow down convergence and lead to a significant accuracy degradation. Theoretically, we reveal this is due to the independence of the BNNs gradient from the latent weight distribution. To address the issue, we propose Overcome Silent Weights~(OvSW). OvSW first employs Adaptive Gradient Scaling~(AGS) to establish a relationship between the gradient and the latent weight distribution, thereby improving the overall efficiency of weight sign updates. Additionally, we design Silence Awareness Decaying~(SAD) to automatically identify ``silent weights'' by tracking weight flipping state, and apply an additional penalty to ``silent weights'' to facilitate their flipping. By efficiently updating weight signs, our method achieves faster convergence and state-of-the-art performance on CIFAR10 and ImageNet1K dataset with various architectures. For example, OvSW obtains 61.6\% and 65.5\% top-1 accuracy on the ImageNet1K using binarized ResNet18 and ResNet34 architecture respectively. Codes are available at \url{<a class="link-external link-https" href="https://github.com/JingyangXiang/OvSW" rel="external noopener nofollow">this https URL</a>}.

SATB-Nets: Training Deep Neural Networks with Segmented Asymmetric Ternary and Binary Weights

Asymmetric Ternary Networks

Tbn: Convolutional Neural Network With Ternary Inputs And Binary Weights

Ternary Weight Networks

Deep Spiking Neural Networks with Binary Weights for Object Recognition

Trained Ternary Quantization

CSA-Net: An Adaptive Binary Neural Network and Application on Remote Sensing Image Classification

Sparsity-Control Ternary Weight Networks

TB-DNN: A Thin Binarized Deep Neural Network with High Accuracy

Training Binary Weight Networks via Semi-Binary Decomposition

Twin Network Augmentation: A Novel Training Strategy for Improved Spiking Neural Networks and Efficient Weight Quantization

Simultaneously Optimizing Weight and Quantizer of Ternary Neural Network using Truncated Gaussian Approximation

Sub-bit Neural Networks: Learning to Compress and Accelerate Binary Neural Networks

OvSW: Overcoming Silent Weights for Accurate Binary Neural Networks

AdaBin: Improving Binary Neural Networks with Adaptive Binary Sets

FATNN: Fast and Accurate Ternary Neural Networks

Tiled Bit Networks: Sub-Bit Neural Network Compression Through Reuse of Learnable Binary Vectors

ALBSNN: ultra-low latency adaptive local binary spiking neural network with accuracy loss estimator

Training Compact Neural Networks with Binary Weights and Low Precision Activations

TernaryNet: faster deep model inference without GPUs for medical 3D segmentation using sparse and binary convolutions

Group Binary Weight Networks