Abstract:Network pruning is an effective approach to reduce network complexity with acceptable performance compromise. Existing studies achieve the sparsity of neural networks via time-consuming weight training or complex searching on networks with expanded width, which greatly limits the applications of network pruning. In this paper, we show that high-performing and sparse sub-networks without the involvement of weight training, termed "lottery jackpots", exist in pre-trained models with unexpanded width. Our presented lottery jackpots are traceable through empirical and theoretical outcomes. For example, we obtain a lottery jackpot that has only 10% parameters and still reaches the performance of the original dense VGGNet-19 without any modifications on the pre-trained weights on CIFAR-10. Furthermore, we improve the efficiency for searching lottery jackpots from two perspectives. First, we observe that the sparse masks derived from many existing pruning criteria have a high overlap with the searched mask of our lottery jackpot, among which, the magnitude-based pruning results in the most similar mask with ours. In compliance with this insight, we initialize our sparse mask using the magnitude-based pruning, resulting in at least 3× cost reduction on the lottery jackpot searching while achieving comparable or even better performance. Second, we conduct an in-depth analysis of the searching process for lottery jackpots. Our theoretical result suggests that the decrease in training loss during weight searching can be disturbed by the dependency between weights in modern networks. To mitigate this, we propose a novel short restriction method to restrict change of masks that may have potential negative impacts on the training loss, which leads to a faster convergence and reduced oscillation for searching lottery jackpots. Consequently, our searched lottery jackpot removes 90% weights in ResNet-50, while it easily obtains more than 70% top-1 accuracy using only 5 searching epochs on ImageNet.

Robust Binary Models by Pruning Randomly-initialized Networks

Batch Normalization Assisted Adversarial Pruning: Towards Lightweight, Sparse and Robust Models.

Loss Constrains Added Squeeze and Excitation Blocks for Pruning Deep Neural Networks

Shallow Binary Features Enhance the Robustness of Deep Convolutional Neural Networks

Beyond Pruning Criteria: The Dominant Role of Fine-Tuning and Adaptive Ratios in Neural Network Robustness

"Understanding Robustness Lottery": A Geometric Visual Comparative Analysis of Neural Network Pruning Approaches

Adversarial Robustness Vs. Model Compression, or Both?

The Search for Sparse, Robust Neural Networks

Pruning in the Face of Adversaries

Second Rethinking of Network Pruning in the Adversarial Setting

Robust Sparse Regularization: Simultaneously Optimizing Neural Network Robustness and Compactness

Adversarial Structured Neural Network Pruning

Sparse Binary Programming Method for Pruning of Randomly Initialized Neural Networks

Towards Robust Pruning: An Adaptive Knowledge-Retention Pruning Strategy for Language Models

No Free Prune: Information-Theoretic Barriers to Pruning at Initialization

Sparse DNNs with Improved Adversarial Robustness.

Improving Model Robustness Against Adversarial Examples with Redundant Fully Connected Layer.

Magnificent Minified Models

Lottery Jackpots Exist in Pre-trained Models.

Efficient Weight Pruning using Pre-trained Lottery Jackpots

Studying the Consistency and Composability of Lottery Ticket Pruning Masks