Structurally Prune Anything: Any Architecture, Any Framework, Any Time

Xun Wang,John Rachwan,Stephan Günnemann,Bertrand Charpentier
2024-03-03
Abstract:Neural network pruning serves as a critical technique for enhancing the efficiency of deep learning models. Unlike unstructured pruning, which only sets specific parameters to zero, structured pruning eliminates entire channels, thus yielding direct computational and storage benefits. However, the diverse patterns for coupling parameters, such as residual connections and group convolutions, the diverse deep learning frameworks, and the various time stages at which pruning can be performed make existing pruning methods less adaptable to different architectures, frameworks, and pruning criteria. To address this, we introduce Structurally Prune Anything (SPA), a versatile structured pruning framework that can prune neural networks with any architecture, from any framework, and at any stage of training. SPA leverages a standardized computational graph and ONNX representation to prune diverse neural network architectures without the need for manual intervention. SPA employs a group-level importance estimation method, which groups dependent computational operators, estimates their importance, and prunes unimportant coupled channels. This enables the transfer of various existing pruning criteria into a structured group style. As a result, SPA supports pruning at any time, either before training, after training with fine-tuning, or after training without fine-tuning. In the context of the latter, we introduce Optimal Brain SPA (OBSPA), an algorithm that achieves state-of-the-art pruning results needing neither fine-tuning nor calibration data. In extensive experiments, SPA shows competitive to state-of-the-art pruning performance across various architectures, from popular frameworks, at different pruning times.
Machine Learning,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address the issue of structured pruning in the field of neural network compression, specifically targeting three main challenges: 1. **Adapting to different model architectures**: Existing structured pruning methods are difficult to directly apply to various neural network architectures because they usually require case-by-case analysis for each specific architecture. 2. **Pruning at different stages of training**: Pruning can be performed before training, during training, or after training. However, most methods focus on post-training pruning (usually accompanied by fine-tuning to recover performance loss due to pruning), with less attention given to the other two scenarios. 3. **Cross-framework generality**: Existing pruning methods are often limited by the deep learning framework they were developed on, which restricts their generality and portability across different frameworks. To address these issues, the paper proposes a method called "Structurally Prune Anything" (SPA), which has the following features: - **Cross-framework pruning**: SPA can handle models from different deep learning frameworks (such as PyTorch, TensorFlow, etc.) and achieves cross-framework compatibility by using the ONNX format to construct standardized computation graphs. - **Adaptability to any architecture**: SPA introduces a four-step process to automatically identify and prune coupled channels, making it flexible to apply to any neural network architecture and easily convert many existing pruning criteria into structured forms. - **Pruning at any stage**: SPA supports pruning at different stages of training, including pre-training pruning, post-training pruning with fine-tuning, and post-training pruning without fine-tuning. Specifically, for the last scenario, the paper proposes Optimal Brain SPA (OBSPA), a pruning algorithm that achieves state-of-the-art performance without requiring fine-tuning or calibration data. Experimental results show that SPA successfully prunes models from various source frameworks (such as PyTorch, TensorFlow, MXNet, and Jax) while maintaining good performance. Additionally, SPA can prune different neural network architectures, including but not limited to AlexNet, DenseNet-121, EfficientNet-b0, demonstrating its wide applicability and effectiveness.