A Feature-map Discriminant Perspective for Pruning Deep Neural Networks

Zejiang Hou,Sun-Yuan Kung
DOI: https://doi.org/10.48550/arXiv.2005.13796
2020-05-28
Abstract:Network pruning has become the de facto tool to accelerate deep neural networks for mobile and edge applications. Recently, feature-map discriminant based channel pruning has shown promising results, as it aligns well with the CNN objective of differentiating multiple classes and offers better interpretability of the pruning decision. However, existing discriminant-based methods are challenged by computation inefficiency, as there is a lack of theoretical guidance on quantifying the feature-map discriminant power. In this paper, we present a new mathematical formulation to accurately and efficiently quantify the feature-map discriminativeness, which gives rise to a novel criterion,Discriminant Information(DI). We analyze the theoretical property of DI, specifically the non-decreasing property, that makes DI a valid selection criterion. DI-based pruning removes channels with minimum influence to DI value, as they contain little information regarding to the discriminant power. The versatility of DI criterion also enables an intra-layer mixed precision quantization to further compress the network. Moreover, we propose a DI-based greedy pruning algorithm and structure distillation technique to automatically decide the pruned structure that satisfies certain resource budget, which is a common requirement in reality. Extensive experiments demonstratethe effectiveness of our method: our pruned ResNet50 on ImageNet achieves 44% FLOPs reduction without any Top-1 accuracy loss compared to unpruned model
Machine Learning,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to reduce the computational complexity and model size of deep neural networks (DNN) through pruning techniques without affecting their performance, so that they can operate efficiently in resource - constrained mobile devices and edge - computing environments. Specifically, the authors focus on the channel - pruning problem in convolutional neural networks (CNN) and propose a new method to quantify the discriminative ability of feature maps to guide channel pruning more effectively. ### Problem Background With the wide application of deep learning in computer vision tasks, deep convolutional neural networks (CNN) have achieved remarkable success. However, these models usually have a large number of parameters and extremely high computational complexity, which makes them difficult to be deployed in resource - constrained mobile devices or embedded systems. To solve this problem, researchers have proposed various model - compression methods, among which a commonly - used technique is **channel pruning**, that is, removing redundant feature maps and their corresponding convolutional filters to reduce the computational amount and storage requirements of the model. ### Existing Challenges Although the existing discriminative - based channel - pruning methods have achieved certain results, they still face the following challenges: 1. **Low computational efficiency**: Lack of theoretical guidance, unable to efficiently quantify the discriminative ability of feature maps. 2. **Dependence on additional loss functions**: Some methods need to introduce additional auxiliary loss functions for retraining, which increases the computational cost and the complexity of human intervention. 3. **Insufficient applicability to multi - classification problems**: Many classic binary - classification discriminative indicators perform poorly when directly applied to multi - classification scenarios. ### Core Contributions of the Paper To solve the above problems, this paper proposes a new method for quantifying the discriminative ability of feature maps - **Discriminant Information (DI)**. The main features of DI include: - **Theoretical basis**: The mathematical expression of DI is derived from the perspectives of discriminant analysis and predictor learning, and its non - decreasing property is proved, making it an effective selection criterion. - **High efficiency**: DI is data - dependent, but experiments show that it has high stability and robustness to the input sample distribution, so only a small number of training samples are needed to accurately estimate the importance of channels. - **No need for additional loss functions**: Different from existing methods, DI does not need to introduce additional loss functions or constraint conditions, simplifying the pruning process. - **Adaptive resource budget**: The greedy pruning algorithm based on DI can automatically determine the target pruning structure that meets a specific resource budget (such as FLOPs), without the need for repeated iteration of pruning and fine - tuning. ### Experimental Results The experimental results show that the pruning method based on DI performs well on multiple datasets (CIFAR10/100, ImageNet, CUB - 200) and various network architectures (VGG, ResNet, MobileNet). For example, on the ImageNet dataset, the pruned ResNet50 model reduces 44% of FLOPs, and the Top - 1 accuracy does not decrease. In addition, DI also supports intra - layer mixed - precision quantization, further compressing the model size. ### Summary The main goal of this paper is to develop an efficient and theoretically - rigorous method for quantifying the discriminative ability of feature maps to guide the channel pruning of CNN. By introducing discriminant information (DI), the authors not only improve the efficiency and effectiveness of pruning, but also avoid many problems existing in traditional methods.