Progressive Channel-Shrinking Network

Jianhong Pan,Siyuan Yang,Lin Geng Foo,Qiuhong Ke,Hossein Rahmani,Zhipeng Fan,Jun Liu
2023-04-01
Abstract:Currently, salience-based channel pruning makes continuous breakthroughs in network compression. In the realization, the salience mechanism is used as a metric of channel salience to guide pruning. Therefore, salience-based channel pruning can dynamically adjust the channel width at run-time, which provides a flexible pruning scheme. However, there are two problems emerging: a gating function is often needed to truncate the specific salience entries to zero, which destabilizes the forward propagation; dynamic architecture brings more cost for indexing in inference which bottlenecks the inference speed. In this paper, we propose a Progressive Channel-Shrinking (PCS) method to compress the selected salience entries at run-time instead of roughly approximating them to zero. We also propose a Running Shrinking Policy to provide a testing-static pruning scheme that can reduce the memory access cost for filter indexing. We evaluate our method on ImageNet and CIFAR10 datasets over two prevalent networks: ResNet and VGG, and demonstrate that our PCS outperforms all baselines and achieves state-of-the-art in terms of compression-performance tradeoff. Moreover, we observe a significant and practical acceleration of inference.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are two major issues existing in the current saliency - based channel pruning methods when dynamically adjusting the channel width at runtime: 1. **Sudden Pruning Operation**: Since the saliency vector rarely contains zero values, in order to disable the channel output (i.e., channel pruning), it is usually necessary to use a gate function or a step function to quantize or truncate some saliency entries to zero. This sudden pruning will harm the network performance because different inputs will cause the pruning scheme to keep changing during the training process, resulting in training instability. 2. **Inefficient Inference**: Since the pruning scheme is not fixed among different input samples, a large number of channel indexing operations are required for each inference, which increases the memory access cost (MAC) and thus reduces the inference speed. To solve these problems, the paper proposes a new Progressive Channel - Shrinking (PCS) method, which mainly includes the following two aspects: 1. **Progressive Shrinking Strategy**: Use a continuous and differentiable saliency generator to avoid the back - propagation problem, and then gradually shrink the saliency entries related to relatively low - saliency channels instead of directly truncating them to zero. This can avoid the training instability problem caused by sudden pruning. 2. **Runtime Shrinking Strategy**: Propose a runtime shrinking strategy to avoid a large number of weight indexing during inference and significantly improve the inference speed. The runtime shrinking strategy ensures that the pruning schemes for all samples are the same, and the disabled channels can be directly removed after training, so that no indexing operations are required during inference. Through these methods, the paper aims to achieve a more stable training process and higher inference efficiency, while achieving a better balance between compression performance and model performance. Experimental results show that the PCS method outperforms the existing channel pruning methods on popular network models such as ResNet and VGG on the ImageNet and CIFAR10 datasets.