Learning Adaptive Gradients for Binary Neural Networks

WANG Zi-wei,LU Ji-wen,ZHOU Jie
DOI: https://doi.org/10.12263/dzxb.20211084
2023-01-01
Abstract:Binary neural networks are widely employed in visual tasks due to the computation acceleration and storage shrinkage compared with the float counterparts. In order to train the non-differentiable networks, some continuous relaxation methods were proposed to approximate the quantizer including straight-through estimator(STE) and Sigmoid. However, these methods cause:(1) gradient mismatch due to the discrepancy between the quantizer and the relaxed function,(2) gradient vanishing due to the activation saturation. Because of the nature of quantization, the accuracy and validity of the gradient cannot be obtained for binary neural networks at the same time. In this paper, we propose AdaBNN that simultaneously solves the gradient mismatch and vanishing by adaptively achieving the optimal trade-off. Specifically, we theoretically prove the contradiction between gradient accuracy and validity, and formulate the evaluation measure for the trade-off by comparing the relaxed gradient norm and the discrepancy with true gradients. Therefore, the binary neural networks are trained effectively by changing the relaxation function based on the measure. Compared with the widely adopted BNN, experiments on ImageNet show that our method increases the top-1 classification accuracy by 17.1%.
What problem does this paper attempt to address?