Abstract:Efficiency of neural network inference is undeniably important in a time where commercial use of AI models increases daily. Node pruning is the art of removing computational units such as neurons, filters, attention heads, or even entire layers to significantly reduce inference time while retaining network performance. In this work, we propose the projection of unit activations to an orthogonal subspace in which there is no redundant activity and within which we may prune nodes while simultaneously recovering the impact of lost units via linear least squares. We identify that, for effective node pruning, this subspace must be constructed using a triangular transformation matrix, a transformation which is equivalent to and unnormalized Gram-Schmidt orthogonalization. We furthermore show that the order in which units are orthogonalized can be optimised to maximally reduce node activations in our subspace and thereby form a more optimal ranking of nodes. Finally, we leverage these orthogonal subspaces to automatically determine layer-wise pruning ratios based upon the relative scale of node activations in our subspace, equivalent to cumulative variance. Our proposed method reaches state of the art when pruning ImageNet trained VGG-16 and rivals more complex state of the art methods when pruning ResNet-50 networks across a range of pruning ratios.

What problem does this paper attempt to address?

The core problem that this paper attempts to solve is to improve the inference efficiency of neural networks, especially in the context of the increasing demand for AI models in commercial applications. Specifically, the authors focus on the node pruning technique, aiming to significantly reduce the inference time while maintaining the network performance by removing computing units (such as neurons, filters, attention heads, etc.). ### Main Problems and Solutions 1. **Improve Inference Efficiency**: - **Background**: As the complexity and scale of deep - learning models keep increasing, the computational resources required to train and run these models have also increased substantially. This makes it impractical to run these models on general - purpose hardware. - **Objective**: Reduce the computational overhead of the model through pruning techniques so that it can run efficiently on a wider range of hardware platforms. 2. **Node Pruning Methods**: - **Traditional Methods**: Traditional pruning methods usually decide which nodes can be removed based on the importance score of weights. However, this method may lead to a decline in the performance of the pruned model because they do not take into account the redundant information between nodes. - **New Method**: This paper proposes a new pruning method - Subspace Node Pruning (SNP). This method projects the node activations into an orthogonal subspace, removes redundant activities in this subspace, and recovers the influence of the pruned nodes through Linear Least Squares (LLS). ### Key Innovation Points 1. **Subspace Construction**: - Use a triangular transformation matrix (such as unnormalized Gram - Schmidt orthogonalization) to project the node activations into an orthogonal subspace, ensuring that there is no redundant activity in this subspace. - This method allows the layer output to be immediately reconstructed through LLS while pruning nodes. 2. **Importance Scoring**: - Propose an importance scoring method based on non - redundant activities. By measuring and removing the linearly decodable information between each node and other nodes, ensure that only truly unique activities are retained. - Specifically, use the unnormalized ZCA transformation to re - order the nodes, thereby better capturing the unique contributions of the nodes. 3. **Global Importance Measurement**: - Automatically determine the pruning ratio of each layer based on the cumulative variance proportion in the subspace. This method takes into account not only local importance but also global importance, ensuring optimal pruning throughout the network. ### Experimental Results - **Experimental Setup**: The authors conducted experiments on models such as VGG - 11, VGG - 16, VGG - 19, and ResNet - 50, and used the ImageNet dataset for verification. - **Performance Comparison**: Compared with existing pruning methods, the SNP method performs well under different pruning ratios, especially when performing large - scale pruning, it can still maintain high model performance. - **Retraining Effects**: After retraining, the performance of the SNP method is further improved. Especially under the global variance truncation (var) strategy, its performance is significantly improved compared with uniform pruning. ### Summary This paper solves the redundant information problem existing in the existing pruning techniques by introducing the subspace node pruning method, and improves the performance and inference efficiency of the pruned model. This method is not only applicable to single - branch networks (such as VGG), but can also be extended to multi - branch networks (such as ResNet), showing its wide application potential.

Subspace Node Pruning

Class-Aware Pruning for Efficient Neural Networks

Loss Constrains Added Squeeze and Excitation Blocks for Pruning Deep Neural Networks

Network Pruning Spaces

Filter Pruning Via Feature Map Clustering.

Efficient DNN Neuron Pruning by Minimizing Layer-wise Nonlinear Reconstruction Error

A Pruning Method Based on the Dissimilarity of Angle among Channels and Filters

Adversarial Structured Neural Network Pruning

Connectivity Matters: Neural Network Pruning Through the Lens of Effective Sparsity

Complexity-Aware Training of Deep Neural Networks for Optimal Structure Discovery

Knapsack Pruning with Inner Distillation

Network Automatic Pruning: Start NAP and Take a Nap

Net-Trim: Convex Pruning of Deep Neural Networks with Performance Guarantee

When to Prune? A Policy towards Early Structural Pruning

One-Cycle Pruning: Pruning ConvNets Under a Tight Training Budget

Neural network relief: a pruning algorithm based on neural activity

Concurrent Training and Layer Pruning of Deep Neural Networks

Small Contributions, Small Networks: Efficient Neural Network Pruning Based on Relative Importance

Pruning Filters while Training for Efficiently Optimizing Deep Learning Networks

Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures