Abstract:The rapid increase in the parameters of deep learning models has led to significant costs, challenging computational efficiency and model interpretability. In this paper, we introduce a novel and straightforward neural network pruning framework that incorporates the Gumbel-Softmax technique. This framework enables the simultaneous optimization of a network's weights and topology in an end-to-end process using stochastic gradient descent. Empirical results demonstrate its exceptional compression capability, maintaining high accuracy on the MNIST dataset with only 0.15\% of the original network parameters. Moreover, our framework enhances neural network interpretability, not only by allowing easy extraction of feature importance directly from the pruned network but also by enabling visualization of feature symmetry and the pathways of information propagation from features to outcomes. Although the pruning strategy is learned through deep learning, it is surprisingly intuitive and understandable, focusing on selecting key representative features and exploiting data patterns to achieve extreme sparse pruning. We believe our method opens a promising new avenue for deep learning pruning and the creation of interpretable machine learning systems.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the problems of computational efficiency and model interpretability brought about by the rapid increase in the number of parameters in deep - learning models. Specifically: 1. **Over - parameterization problem**: As deep - learning models become more and more complex, the number of parameters increases dramatically, leading to a significant increase in training and inference costs and reducing the interpretability of the model. 2. **Neural network pruning challenges**: Finding a method that can reduce the number of parameters as much as possible while maintaining the performance of the model is an NP - hard combinatorial optimization problem. Traditional methods such as heuristic search and iterative magnitude pruning (IMP) have limitations and it is difficult to achieve both efficient compression and high precision simultaneously. To solve these problems, the author proposes a new neural network pruning framework based on the Gumbel - Softmax technique. This framework can simultaneously optimize network weights and topological structures through stochastic gradient descent (SGD), thereby achieving the following goals: - **Efficient compression ability**: On the MNIST dataset, only 0.15% of the original network parameters (404 weights) are retained, and a relatively high classification accuracy can still be maintained. - **Enhanced model interpretability**: Through the pruned network, feature importance can be directly extracted, and feature symmetry and information propagation paths can be visualized to help understand the relationship between input variables and output variables. In short, this paper is committed to developing a neural network pruning method that can greatly compress the model scale and improve the model interpretability at the same time.

Neural Network Pruning by Gradient Descent

Class-Aware Pruning for Efficient Neural Networks

Structured Deep Neural Network Pruning by Varying Regularization Parameters.

Pruning by Training: A Novel Deep Neural Network Compression Framework for Image Processing.

A Feature-map Discriminant Perspective for Pruning Deep Neural Networks

Global Sparse Momentum SGD for Pruning Very Deep Neural Networks

FGGP: Fixed-Rate Gradient-First Gradual Pruning

Sparse optimization guided pruning for neural networks

Pruning the Deep Neural Network by Similar Function

CGaP: Continuous Growth and Pruning for Efficient Deep Learning

Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures

Pruning of Deep Spiking Neural Networks Through Gradient Rewiring.

Enabling Retrain-free Deep Neural Network Pruning using Surrogate Lagrangian Relaxation

Surrogate Lagrangian Relaxation: A Path To Retrain-free Deep Neural Network Pruning

RGP: Neural Network Pruning Through Regular Graph With Edges Swapping

Efficient DNN Neuron Pruning by Minimizing Layer-wise Nonlinear Reconstruction Error

Structural Pruning in Deep Neural Networks: A Small-World Approach

Differential Evolution Based Layer-Wise Weight Pruning for Compressing Deep Neural Networks

Knapsack Pruning with Inner Distillation

RGP: Neural Network Pruning through Its Regular Graph Structure

Accelerating CNN Training by Pruning Activation Gradients