Local-binarized very deep residual network for visual categorization

Xuejing Liu,Liang Li,Shuhui Wang,Zheng-Jun Zha,Qingming Huang
DOI: https://doi.org/10.1016/j.neucom.2020.11.041
IF: 6
2021-03-01
Neurocomputing
Abstract:<p>Residual networks usually require more layers to achieve remarkable performance in complex visual categorization tasks, such as pose estimation. However, the increasing number of layers leads to a heavy burden on training and forward inference as well as over-fitting. This paper proposed local binary residual block (LBB) to promote the very deep residual networks on the trainable parameters, FLOPs and accuracy. In each LBB, the <span class="math"><math>3×3</math></span> filters are binarized based on Bernoulli distribution under a sparse constraint, an activation function is prepared to trigger the non-linear response, and the linear <span class="math"><math>1×1</math></span> filters are learned in a real-valued way. After stochastic binarized initialization, the <span class="math"><math>3×3</math></span> filters in LBB need not be updated during training. The above architecture reduces at least 69.2% trainable parameters and 70.5% FLOPs compared to the original model. The LBB is derived from three observations: 1) Activated responses of one standard <span class="math"><math>k×k</math></span> convolutional layer can be approximated by combining binarized <span class="math"><math>k×k</math></span> filters with <span class="math"><math>1×1</math></span> filters; 2) Most computation in the very deep residual networks is spent on the <span class="math"><math>3×3</math></span> convolutions; and 3) <span class="math"><math>1×1</math></span> filters play an important role in cross-channel information integration. In addition, the LBB module is suitable for the very deep network framework, including stacked hourglass network and pyramid residual modules. Experiments are conducted on MPII and LSP dataset for pose estimation task; CIFAR-10, CIFAR-100 and ImageNet datasets for object recognition; ECSSD, HKU-IS, PASCAL-S, DUT-OMRON, DUTS for saliency detection. The results show that our model can accelerate the training and inference of the network with only a slight performance degradation.</p>
computer science, artificial intelligence
What problem does this paper attempt to address?