Abstract:Neural networks can be drastically shrunk in size by removing redundant parameters. While crucial for the deployment on resource-constraint hardware, oftentimes, compression comes with a severe drop in accuracy and lack of adversarial robustness. Despite recent advances, counteracting both aspects has only succeeded for moderate compression rates so far. We propose a novel method, HARP, that copes with aggressive pruning significantly better than prior work. For this, we consider the network holistically. We learn a global compression strategy that optimizes how many parameters (compression rate) and which parameters (scoring connections) to prune specific to each layer individually. Our method fine-tunes an existing model with dynamic regularization, that follows a step-wise incremental function balancing the different objectives. It starts by favoring robustness before shifting focus on reaching the target compression rate and only then handles the objectives equally. The learned compression strategies allow us to maintain the pre-trained model natural accuracy and its adversarial robustness for a reduction by 99% of the network original size. Moreover, we observe a crucial influence of non-uniform compression across layers.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: in the process of significantly pruning neural networks (i.e., removing redundant parameters) to adapt to resource - constrained hardware, how to maintain the natural accuracy and adversarial robustness of the model. Specifically, existing pruning methods often significantly reduce the accuracy of the model and the robustness against adversarial attacks when significantly compressing the model, and this paper proposes a new method - HARP (Holistic Adversarially Robust Pruning), aiming to overcome these problems through a global rather than uniform pruning strategy. ### Main Problems and Solutions 1. **Problems**: - **Accuracy Decline**: When a neural network is significantly pruned, its natural accuracy will decline significantly. - **Lack of Adversarial Robustness**: The robustness of the pruned model against adversarial attacks will also be weakened. - **Limitations of Existing Methods**: Although there have been some progress, existing pruning methods can only maintain accuracy and robustness simultaneously at a medium compression rate, and are not effective for high compression rates. 2. **Solutions**: - **HARP Method**: This method optimizes the compression rate and pruning parameters of each layer through a global pruning strategy considering the entire network. Specifically, it includes: - **Non - uniform Compression Strategy**: Allows different layers to have different compression rates instead of using the same compression rate for all layers. - **Dynamic Regularization**: Gradually adjusts the compression target during the training process, first focuses on robustness, then gradually reaches the target compression rate, and finally balances the two. - **Learning Pruning Masks**: Dynamically determines which parameters should be pruned by introducing learnable compression quotas and connection importance scores. ### Formula Representation - **Compression - control Loss Function**: \[ L_{\text{hw}}(\hat{\theta}, a_t) := \max\left\{\frac{\Theta \neq 0}{a_t\cdot\Theta}- 1,0\right\} \] where $\Theta \neq 0$ represents the current number of non - zero weights retained, and $a_t$ is the target compression rate. - **Layer Compression Rate**: \[ a^{(l)} = g(r^{(l)})=(1 - a_{\min})\cdot\text{sig}(r^{(l)})+ a_{\min} \] where $r^{(l)}$ is a trainable compression quota, $g$ is an activation function, and $\text{sig}$ is a sigmoid function. - **Binary Pruning Mask**: \[ M^{(l)} := \left(1_{s > P(\alpha^{(l)}, S^{(l)})}\right) \] where $P(\alpha^{(l)}, S^{(l)})$ is a function for determining the pruning threshold, and $\alpha^{(l)} = 1 - a^{(l)}$. Through these methods, HARP can maintain the natural accuracy and adversarial robustness of the model while significantly compressing the neural network, especially performing extremely well at a 99% compression rate.

Holistic Adversarially Robust Pruning

Class-Aware Pruning for Efficient Neural Networks

Loss Constrains Added Squeeze and Excitation Blocks for Pruning Deep Neural Networks

MCMC: Multi-Constrained Model Compression Via One-Stage Envelope Reinforcement Learning.

Editorial for Pattern Recognition Letters Special Issue on Face-Based Emotion Understanding

Pruning at a Glance: Global Neural Pruning for Model Compression

Adversarial Robustness Vs. Model Compression, or Both?

AACP: Model Compression by Accurate and Automatic Channel Pruning.

Pruning in the Face of Adversaries

Quantisation and Pruning for Neural Network Compression and Regularisation

Automated Model Compression by Jointly Applied Pruning and Quantization

Anonymous Model Pruning for Compressing Deep Neural Networks

COP: Customized Deep Model Compression Via Regularized Correlation-Based Filter-Level Pruning.

Conditional Automated Channel Pruning for Deep Neural Networks

Sparsity Meets Robustness: Channel Pruning for the Feynman-Kac Formalism Principled Robust Deep Neural Nets

Network Automatic Pruning: Start NAP and Take a Nap

AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates

Optimization based Layer-wise Magnitude-based Pruning for DNN Compression

OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization

Towards Hardware-Specific Automatic Compression of Neural Networks

LAPP: Layer Adaptive Progressive Pruning for Compressing CNNs from Scratch