Operations of the facility planning team. Part I.

M. Fortune

Abstract:

What problem does this paper attempt to address?

Class-Aware Pruning for Efficient Neural Networks

Mengnan Jiang,Jingcun Wang,Amro Eldebiky,Xunzhao Yin,Cheng Zhuo,Ing-Chao Lin,Grace Li Zhang

DOI: https://doi.org/10.23919/date58400.2024.10546870

2024-01-01

Abstract:Deep neural networks (DNNs) have demonstrated remarkable success in various fields. However, the large number of floating-point operations (FLOPs) in DNNs poses challenges for their deployment in resource-constrained applications, e.g., edge devices. To address the problem, pruning has been introduced to reduce the computational cost in executing DNNs. Previous pruning strategies are based on weight values, gradient values and activation outputs. Different from previous pruning solutions, in this paper, we propose a class-aware pruning technique to compress DNNs, which provides a novel perspective to reduce the computational cost of DNNs. In each iteration, the neural network training is modified to facilitate the class-aware pruning. Afterwards, the importance of filters with respect to the number of classes is evaluated. The filters that are only important for a few number of classes are removed. The neural network is then retrained to compensate for the incurred accuracy loss. The pruning iterations end until no filter can be removed anymore, indicating that the remaining filters are very important for many classes. This pruning technique outperforms previous pruning solutions in terms of accuracy, pruning ratio and the reduction of FLOPs. Experimental results confirm that this class-aware pruning technique can significantly reduce the number of weights and FLOPs, while maintaining a high inference accuracy. Our code is available at https://github.com/HWAI-TUDa/Class-Aware-Pruning
Structured Pruning for Efficient Convolutional Neural Networks Via Incremental Regularization

Huan Wang,Xinyi Hu,Qiming Zhang,Yuehai Wang,Lu Yu,Haoji Hu

DOI: https://doi.org/10.1109/jstsp.2019.2961233

IF: 7.695

2020-01-01

IEEE Journal of Selected Topics in Signal Processing

Abstract:Modern Convolutional Neural Networks (CNNs) are usually restricted by their massive computation and high storage. Parameter pruning is a promising approach for CNN compression and acceleration by eliminating redundant model parameters with tolerable performance degradation. Despite its effectiveness, existing regularization-based parameter pruning methods usually drive weights towards zero with large and constant regularization factors, which neglects the fragility of the expressiveness of CNNs, and thus calls for a more gentle regularization scheme so that the networks can adapt during pruning. To achieve this, we propose a novel regularization-based pruning method, named IncReg, to incrementally assign different regularization factors to different weights based on their relative importance. Empirical analysis on CIFAR-10 dataset verifies the merits of IncReg. Further extensive experiments with popular CNNs on CIFAR-10 and ImageNet datasets show that IncReg achieves comparable to even better results compared with state-of-the-arts. Moreover, to resolve the problem that column pruning cannot be directly applied to off-the-shelf deep learning libraries for acceleration, we generalize IncReg from column pruning to spatial pruning, which can equip existing structured pruning methods (such as channel pruning) for further acceleration with ignorable accuracy loss. Our source codes and trained models are available at: https://github.com/mingsun-tse/caffe_increghttps://github.com/mingsun-tse/caffe_increg.
Loss Constrains Added Squeeze and Excitation Blocks for Pruning Deep Neural Networks

Yiqin Wang,Ming Li,Hongye Su,Lei Xie,Xiaochen Li,Weihua Xu,Xiaozhou Xu

DOI: https://doi.org/10.1109/icarcv50220.2020.9305335

2020-01-01

Abstract:Deep neural networks are proved to be very effective to solve problems on image classification, object detection and segmentation. However, in cases where only limited hardware is acquired, it may be a problem to deploy big models with excellent performance as they are sometimes calculation consuming. To overcome the limits on power, memory and calculation, channel pruning is proposed to compress the model in channel wise and soon become a common approach to have big models compressed. Generally, pruning is a three-stage pipeline containing training, pruning and finetuning. In this work, we come up with a new pruning approach that needs no finetuning. The major idea is extracting channel saliences by squeeze and excitation block and pushing the salience to either 0 or 1 by a sin-based function. Then take the salience as criteria for pruning. As the criteria of our approach is activation rather than trainable parameter, finetuning is not necessary in our pruning strategy which make the pruning process more stable and time saving. Experiment on flowers demonstrates our new designed pruning method is effective on reducing the model scale while maintaining the overall accuracy.
A Pruning Method Based on the Dissimilarity of Angle among Channels and Filters

Jiayi Yao,Ping Li,Xiatao Kang,Yuzhe Wang

DOI: https://doi.org/10.1109/ictai56018.2022.00084

2022-01-01

Abstract:Convolutional Neural Network (CNN) is more and more widely used in various fileds, and its computation and memory-demand are also increasing significantly. In order to make it applicable to limited conditions such as embedded application, network compression comes out. Among them, researchers pay more attention to network pruning. In this paper, we encode the convolution network to obtain the similarity of different encoding nodes, and evaluate the connectivity-power among convolutional kernels on the basis of the similarity. Then impose different level of penalty according to different connectivity-power. Meanwhile, we propose Channel Pruning base on the Dissimilarity of Angle (DACP). Firstly, we train a sparse model by GL penalty, and impose an angle dissimilarity (AD) constraint on the channels and filters of convolutional network to obtain a more sparse structure. Eventually, the effectiveness of our method is demonstrated in the section of experiment. On CIFAR-10, we reduce 66.86% FLOPs on VGG-16 with 93.31% accuracy after pruning, where FLOPs represents the number of floating-point operations per second of the model. Moreover, on ResNet-32, we reduce FLOPs by 58.46%, which makes the accuracy after pruning reach 91.76%.
A roulette wheel-based pruning method to simplify cumbersome deep neural networks

Kit Yan Chan,Ka Fai Cedric Yiu,Shan Guo,Huimin Jiang

DOI: https://doi.org/10.1007/s00521-024-09719-6

2024-05-02

Neural Computing and Applications

Abstract:Abstract Deep neural networks (DNNs) have been applied in many pattern recognition or object detection applications. DNNs generally consist of millions or even billions of parameters. These demanding computational storage and requirements impede deployments of DNNs in resource-limited devices, such as mobile devices, micro-controllers. Simplification techniques such as pruning have commonly been used to slim DNN sizes. Pruning approaches generally quantify the importance of each component such as network weight. Weight values or weight gradients in training are commonly used as the importance metric. Small weights are pruned and large weights are kept. However, small weights are possible to be connected with significant weights which have impact to DNN outputs. DNN accuracy can be degraded significantly after the pruning process. This paper proposes a roulette wheel-like pruning algorithm, in order to simplify a trained DNN while keeping the DNN accuracy. The proposed algorithm generates a branch of pruned DNNs which are generated by a roulette wheel operator. Similar to the roulette wheel selection in genetic algorithms, small weights are more likely to be pruned but they can be kept; large weights are more likely to be kept but they can be pruned. The slimmest DNN with the best accuracy is selected from the branch. The performance of the proposed pruning algorithm is evaluated by two deterministic datasets and four non-deterministic datasets. Experimental results show that the proposed pruning algorithm generates simpler DNNs while DNN accuracy can be kept, compared to several existing pruning approaches.

computer science, artificial intelligence
Structural Pruning in Deep Neural Networks: A Small-World Approach

Gokul Krishnan,Xiaocong Du,Yu Cao

DOI: https://doi.org/10.48550/arXiv.1911.04453

2019-11-12

Abstract:Deep Neural Networks (DNNs) are usually over-parameterized, causing excessive memory and interconnection cost on the hardware platform. Existing pruning approaches remove secondary parameters at the end of training to reduce the model size; but without exploiting the intrinsic network property, they still require the full interconnection to prepare the network. Inspired by the observation that brain networks follow the Small-World model, we propose a novel structural pruning scheme, which includes (1) hierarchically trimming the network into a Small-World model before training, (2) training the network for a given dataset, and (3) optimizing the network for accuracy. The new scheme effectively reduces both the model size and the interconnection needed before training, achieving a locally clustered and globally sparse model. We demonstrate our approach on LeNet-5 for MNIST and VGG-16 for CIFAR-10, decreasing the number of parameters to 2.3% and 9.02% of the baseline model, respectively.

Machine Learning,Computer Vision and Pattern Recognition
Global Sparse Momentum SGD for Pruning Very Deep Neural Networks

Xiaohan Ding,Guiguang Ding,Xiangxin Zhou,Yuchen Guo,Jungong Han,Ji Liu

DOI: https://doi.org/10.48550/arXiv.1909.12778

2019-10-25

Abstract:Deep Neural Network (DNN) is powerful but computationally expensive and memory intensive, thus impeding its practical usage on resource-constrained front-end devices. DNN pruning is an approach for deep model compression, which aims at eliminating some parameters with tolerable performance degradation. In this paper, we propose a novel momentum-SGD-based optimization method to reduce the network complexity by on-the-fly pruning. Concretely, given a global compression ratio, we categorize all the parameters into two parts at each training iteration which are updated using different rules. In this way, we gradually zero out the redundant parameters, as we update them using only the ordinary weight decay but no gradients derived from the objective function. As a departure from prior methods that require heavy human works to tune the layer-wise sparsity ratios, prune by solving complicated non-differentiable problems or finetune the model after pruning, our method is characterized by 1) global compression that automatically finds the appropriate per-layer sparsity ratios; 2) end-to-end training; 3) no need for a time-consuming re-training process after pruning; and 4) superior capability to find better winning tickets which have won the initialization lottery.

Machine Learning,Computer Vision and Pattern Recognition
DARB: A Density-Aware Regular-Block Pruning for Deep Neural Networks

Ao Ren,Tao Zhang,Yuhao Wang,Sheng Lin,Peiyan Dong,Yen-kuang Chen,Yuan Xie,Yanzhi Wang

DOI: https://doi.org/10.48550/arXiv.1911.08020

IF: 5.414

2019-11-19

Machine Learning

Abstract:The rapidly growing parameter volume of deep neural networks (DNNs) hinders the artificial intelligence applications on resource constrained devices, such as mobile and wearable devices. Neural network pruning, as one of the mainstream model compression techniques, is under extensive study to reduce the number of parameters and computations. In contrast to irregular pruning that incurs high index storage and decoding overhead, structured pruning techniques have been proposed as the promising solutions. However, prior studies on structured pruning tackle the problem mainly from the perspective of facilitating hardware implementation, without analyzing the characteristics of sparse neural networks. The neglect on the study of sparse neural networks causes inefficient trade-off between regularity and pruning ratio. Consequently, the potential of structurally pruning neural networks is not sufficiently mined. In this work, we examine the structural characteristics of the irregularly pruned weight matrices, such as the diverse redundancy of different rows, the sensitivity of different rows to pruning, and the positional characteristics of retained weights. By leveraging the gained insights as a guidance, we first propose the novel block-max weight masking (BMWM) method, which can effectively retain the salient weights while imposing high regularity to the weight matrix. As a further optimization, we propose a density-adaptive regular-block (DARB) pruning that outperforms prior structured pruning work with high pruning ratio and decoding efficiency. Our experimental results show that DARB can achieve 13$\times$ to 25$\times$ pruning ratio, which are 2.8$\times$ to 4.3$\times$ improvements than the state-of-the-art counterparts on multiple neural network models and tasks. Moreover, DARB can achieve 14.3$\times$ decoding efficiency than block pruning with higher pruning ratio.
Dual-Grained Lightweight Strategy

Debin Liu,Xiang Bai,Ruonan Zhao,Xianjun Deng,Laurence T. Yang

DOI: https://doi.org/10.1109/tpami.2024.3437421

IF: 23.6

2024-11-08

IEEE Transactions on Pattern Analysis and Machine Intelligence

Abstract:Removing redundant parameters and computations before the model training has attracted a great interest as it can effectively reduce the storage space of the model, speed up the training and inference of the model, and save energy consumption during the running of the model. In addition, the simplification of deep neural network models can enable high-performance network models to be deployed to resource-constrained edge devices, thus promoting the development of the intelligent world. However, current pruning at initialization methods exhibit poor performance at extreme sparsity. In order to improve the performance of the model under extreme sparsity, this paper proposes a dual-grained lightweight strategy-TEDEPR. This is the first time that TEDEPR has used tensor theory in the pruning at initialization method to optimize the structure of a sparse sub-network model and improve its performance. Specifically, first, at the coarse-grained level, we represent the weight matrix or weight tensor of the model as a low-rank tensor decomposition form and use multi-step chain operations to enhance the feature extraction capability of the base module to construct a low-rank compact network model. Second, unimportant weights are pruned at a fine-grained level based on the trainability of the weights in the low-rank model before the training of the model, resulting in the final compressed model. To evaluate the superiority of TEDEPR, we conducted extensive experiments on MNIST, UCF11, CIFAR-10, CIFAR-100, Tiny-ImageNet and ImageNet datasets with LeNet, LSTM, VGGNet, ResNet and Transformer architectures, and compared with state-of-the-art methods. The experimental results show that TEDEPR has higher accuracy, faster training and inference, and less storage space than other pruning at initialization methods under extreme sparsity.

computer science, artificial intelligence,engineering, electrical & electronic
CRISP: Hybrid Structured Sparsity for Class-aware Model Pruning

Shivam Aggarwal,Kuluhan Binici,Tulika Mitra

2024-03-18

Abstract:Machine learning pipelines for classification tasks often train a universal model to achieve accuracy across a broad range of classes. However, a typical user encounters only a limited selection of classes regularly. This disparity provides an opportunity to enhance computational efficiency by tailoring models to focus on user-specific classes. Existing works rely on unstructured pruning, which introduces randomly distributed non-zero values in the model, making it unsuitable for hardware acceleration. Alternatively, some approaches employ structured pruning, such as channel pruning, but these tend to provide only minimal compression and may lead to reduced model accuracy. In this work, we propose CRISP, a novel pruning framework leveraging a hybrid structured sparsity pattern that combines both fine-grained N:M structured sparsity and coarse-grained block sparsity. Our pruning strategy is guided by a gradient-based class-aware saliency score, allowing us to retain weights crucial for user-specific classes. CRISP achieves high accuracy with minimal memory consumption for popular models like ResNet-50, VGG-16, and MobileNetV2 on ImageNet and CIFAR-100 datasets. Moreover, CRISP delivers up to 14$\times$ reduction in latency and energy consumption compared to existing pruning methods while maintaining comparable accuracy. Our code is available at <a class="link-external link-https" href="https://github.com/shivmgg/CRISP/" rel="external noopener nofollow">this https URL</a>.

Computer Vision and Pattern Recognition,Hardware Architecture,Machine Learning
A Dynamic Pruning Method on Multiple Sparse Structures in Deep Neural Networks

Jie Hu,Peng Lin,Huajun Zhang,Zining Lan,Wenxin Chen,Kailiang Xie,Siyun Chen,Hao Wang,Sheng Chang

DOI: https://doi.org/10.1109/access.2023.3267469

IF: 3.9

2023-04-25

IEEE Access

Abstract:It is well known that significant computational power and a large amount of memory are required for deep neural networks, which makes them difficult to apply in resource-limited environments. So, many network compression and acceleration technologies have emerged, of which connection pruning is widely applied due to its effectiveness and convenience. A novel connection pruning method for full model capacity on multiple sparse structures is proposed in this paper. We design a simple and efficient function called Dynamic Processing Unit (DPU) for handling the evaluated weights. Our method has the following features: 1) Instead of being pruned directly or set to 0, the weights are controlled by the DPU to determine whether they will be used during subsequent forward passes of the network during the iteration of pruning training. 2) It supports the traditional multi-steps prune method as well as the end-to-end training mode that can get a compressed network in a single stage by fusing training and pruning. 3) It can learn multiple useful sparse structures, including, but not limited to, depth-wise, filter-wise, channel-wise, 2D-filter-wise, row-wise, column-wise, connection-wise and mixed sparse structures. Our method is tested on various widely-used datasets and models, such as the LeNet and the ResNet on MNIST and CIFAR-10. Importantly, it demonstrates good performance in all these cases. Some details about our method can be found at the following URL: https://github.com/hujie369/DPU

computer science, information systems,telecommunications,engineering, electrical & electronic
State Transition of Dendritic Spines Improves Learning of Sparse Spiking Neural Networks.

Yanqi Chen,Zhaofei Yu,Wei Fang,Zhengyu Ma,Tiejun Huang,Yonghong Tian

2022-01-01

Abstract:Spiking Neural Networks (SNNs) are considered a promising alternative to Artificial Neural Networks (ANNs) for their event-driven computing paradigm when deployed on energy-efficient neuromorphic hardware. Recently, deep SNNs have shown breathtaking performance improvement through cutting-edge training strategy and flexible structure, which also scales up the number of parameters and computational burdens in a single network. Inspired by the state transition of dendritic spines in the filopodial model of spinogenesis, we model different states of SNN weights, facilitating weight optimization for pruning. Furthermore, the pruning speed can be regulated by using different functions describing the growing threshold of state transition. We organize these techniques as a dynamic pruning algorithm based on nonlinear reparameterization mapping from spine size to SNN weights. Our approach yields sparse deep networks on the large-scale dataset (SEW ResNet18 on ImageNet) while maintaining state-of-the-art low performance loss ( 3% at 88.8% sparsity) compared to existing pruning methods on directly trained SNNs. Moreover, we find out pruning speed regulation while learning is crucial to avoiding disastrous performance degradation at the final stages of training, which may shed light on future work on SNN pruning.
SR-init: an Interpretable Layer Pruning Method

Hui Tang,Yao Lu,Qi Xuan

DOI: https://doi.org/10.1109/icassp49357.2023.10095306

2023-01-01

Abstract:Despite the popularization of deep neural networks (DNNs) in many fields, it is still challenging to deploy state-of-the-art models to resource-constrained devices due to high computational overhead. Model pruning provides a feasible solution to the aforementioned challenges. However, the interpretation of existing pruning criteria is always overlooked. To counter this issue, we propose a novel layer pruning method by exploring the Stochastic Re-initialization. Our SR-init method is inspired by the discovery that the accuracy drop due to stochastic re-initialization of layer parameters differs in various layers. On the basis of this observation, we come up with a layer pruning criterion, i.e., those layers that are not sensitive to stochastic re-initialization (low accuracy drop) produce less contribution to the model and could be pruned with acceptable loss. Afterward, we experimentally verify the interpretability of SR-init via feature visualization. The visual explanation demonstrates that SR-init is theoretically feasible, thus we compare it with state-of-the-art methods to further evaluate its practicability. As for ResNet56 on CIFAR-10 and CIFAR-100, SR-init achieves a great reduction in parameters (63.98% and 37.71%) with an ignorable drop in top-1 accuracy (-0.56% and 0.8%). With ResNet50 on ImageNet, we achieve a 15.59% FLOPs reduction by removing 39.29% of the parameters, with only a drop of 0.6% in top-1 accuracy. Our code is available at https://github.com/huitang-zjut/SR-init.
One-Cycle Pruning: Pruning ConvNets Under a Tight Training Budget

Nathan Hubens,Matei Mancas,Bernard Gosselin,Marius Preda,Titus Zaharia

DOI: https://doi.org/10.48550/arXiv.2107.02086

2021-07-05

Computer Vision and Pattern Recognition

Abstract:Introducing sparsity in a neural network has been an efficient way to reduce its complexity while keeping its performance almost intact. Most of the time, sparsity is introduced using a three-stage pipeline: 1) train the model to convergence, 2) prune the model according to some criterion, 3) fine-tune the pruned model to recover performance. The last two steps are often performed iteratively, leading to reasonable results but also to a time-consuming and complex process. In our work, we propose to get rid of the first step of the pipeline and to combine the two other steps in a single pruning-training cycle, allowing the model to jointly learn for the optimal weights while being pruned. We do this by introducing a novel pruning schedule, named One-Cycle Pruning, which starts pruning from the beginning of the training, and until its very end. Adopting such a schedule not only leads to better performing pruned models but also drastically reduces the training budget required to prune a model. Experiments are conducted on a variety of architectures (VGG-16 and ResNet-18) and datasets (CIFAR-10, CIFAR-100 and Caltech-101), and for relatively high sparsity values (80%, 90%, 95% of weights removed). Our results show that One-Cycle Pruning consistently outperforms commonly used pruning schedules such as One-Shot Pruning, Iterative Pruning and Automated Gradual Pruning, on a fixed training budget.
Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Torsten Hoefler,Dan Alistarh,Tal Ben-Nun,Nikoli Dryden,Alexandra Peste

DOI: https://doi.org/10.48550/arXiv.2102.00554

IF: 5.414

2021-01-31

Machine Learning

Abstract:The growing energy and performance costs of deep learning have driven the community to reduce the size of neural networks by selectively pruning components. Similarly to their biological counterparts, sparse networks generalize just as well, if not better than, the original dense networks. Sparsity can reduce the memory footprint of regular networks to fit mobile devices, as well as shorten training time for ever growing networks. In this paper, we survey prior work on sparsity in deep learning and provide an extensive tutorial of sparsification for both inference and training. We describe approaches to remove and add elements of neural networks, different training strategies to achieve model sparsity, and mechanisms to exploit sparsity in practice. Our work distills ideas from more than 300 research papers and provides guidance to practitioners who wish to utilize sparsity today, as well as to researchers whose goal is to push the frontier forward. We include the necessary background on mathematical methods in sparsification, describe phenomena such as early structure adaptation, the intricate relations between sparsity and the training process, and show techniques for achieving acceleration on real hardware. We also define a metric of pruned parameter efficiency that could serve as a baseline for comparison of different sparse networks. We close by speculating on how sparsity can improve future workloads and outline major open problems in the field.
Three-Stage Global Channel Pruning for Resources-Limited Platform

Yijie Chen,Rui Li,Wanli Li,Jilong Wang,Renfa Li

DOI: https://doi.org/10.1109/tnnls.2023.3292152

IF: 14.255

2023-01-01

IEEE Transactions on Neural Networks and Learning Systems

Abstract:Deep neural networks (DNNs) have demonstrated remarkable performance in many fields, and deploying them on resource-limited devices has drawn more and more attention in industry and academia. Typically, there are great challenges for intelligent networked vehicles and drones to deploy object detection tasks due to the limited memory and computing power of embedded devices. To meet these challenges, hardware-friendly model compression approaches are required to reduce model parameters and computation. Three-stage global channel pruning, which involves sparsity training, channel pruning, and fine-tuning, is very popular in the field of model compression for its hardware-friendly structural pruning and ease of implementation. However, existing methods suffer from problems such as uneven sparsity, damage to the network structure, and reduced pruning ratio due to channel protection. To solve these issues, the present article makes the following significant contributions. First, we present an element-level heatmap-guided sparsity training method to achieve even sparsity, resulting in higher pruning ratio and improved performance. Second, we propose a global channel pruning method that fuses both global and local channel importance metrics to identify unimportant channels for pruning. Third, we present a channel replacement policy (CRP) to protect layers, ensuring that the pruning ratio can be guaranteed even under high pruning rate conditions. Evaluations show that our proposed method significantly outperforms the state-of-the-art (SOTA) methods in terms of pruning efficiency, making it more suitable for deployment on resource-limited devices.

computer science, artificial intelligence, theory & methods,engineering, electrical & electronic, hardware & architecture
Efficient DNN Neuron Pruning by Minimizing Layer-wise Nonlinear Reconstruction Error

Chunhui Jiang,Guiying Li,Chao Qian,Ke Tang

DOI: https://doi.org/10.24963/ijcai.2018/318

2018-01-01

Abstract:Deep neural networks (DNNs) have achieved great success, but the applications to mobile devices are limited due to their huge model size and low inference speed. Much effort thus has been devoted to pruning DNNs. Layer-wise neuron pruning methods have shown their effectiveness, which minimize the reconstruction error of linear response with a limited number of neurons in each single layer pruning. In this paper, we propose a new layer-wise neuron pruning approach by minimizing the reconstruction error of nonlinear units, which might be more reasonable since the error before and after activation can change significantly. An iterative optimization procedure combining greedy selection with gradient decent is proposed for single layer pruning. Experimental results on benchmark DNN models show the superiority of the proposed approach. Particularly, for VGGNet, the proposed approach can compress its disk space by 13.6× and bring a speedup of 3.7×; for AlexNet, it can achieve a compression rate of 4.1× and a speedup of 2.2×, respectively.
(Pen-) Ultimate DNN Pruning

Marc Riera,Jose-Maria Arnau,Antonio Gonzalez

DOI: https://doi.org/10.48550/arXiv.1906.02535

2019-06-06

Abstract:DNN pruning reduces memory footprint and computational work of DNN-based solutions to improve performance and energy-efficiency. An effective pruning scheme should be able to systematically remove connections and/or neurons that are unnecessary or redundant, reducing the DNN size without any loss in accuracy. In this paper we show that prior pruning schemes require an extremely time-consuming iterative process that requires retraining the DNN many times to tune the pruning hyperparameters. We propose a DNN pruning scheme based on Principal Component Analysis and relative importance of each neuron's connection that automatically finds the optimized DNN in one shot without requiring hand-tuning of multiple parameters.

Machine Learning
Connectivity Matters: Neural Network Pruning Through the Lens of Effective Sparsity

Artem Vysogorets,Julia Kempe

DOI: https://doi.org/10.48550/arXiv.2107.02306

2023-04-08

Abstract:Neural network pruning is a fruitful area of research with surging interest in high sparsity regimes. Benchmarking in this domain heavily relies on faithful representation of the sparsity of subnetworks, which has been traditionally computed as the fraction of removed connections (direct sparsity). This definition, however, fails to recognize unpruned parameters that detached from input or output layers of underlying subnetworks, potentially underestimating actual effective sparsity: the fraction of inactivated connections. While this effect might be negligible for moderately pruned networks (up to 10-100 compression rates), we find that it plays an increasing role for thinner subnetworks, greatly distorting comparison between different pruning algorithms. For example, we show that effective compression of a randomly pruned LeNet-300-100 can be orders of magnitude larger than its direct counterpart, while no discrepancy is ever observed when using SynFlow for pruning [Tanaka et al., 2020]. In this work, we adopt the lens of effective sparsity to reevaluate several recent pruning algorithms on common benchmark architectures (e.g., LeNet-300-100, VGG-19, ResNet-18) and discover that their absolute and relative performance changes dramatically in this new and more appropriate framework. To aim for effective, rather than direct, sparsity, we develop a low-cost extension to most pruning algorithms. Further, equipped with effective sparsity as a reference frame, we partially reconfirm that random pruning with appropriate sparsity allocation across layers performs as well or better than more sophisticated algorithms for pruning at initialization [Su et al., 2020]. In response to this observation, using a simple analogy of pressure distribution in coupled cylinders from physics, we design novel layerwise sparsity quotas that outperform all existing baselines in the context of random pruning.

Machine Learning,Computer Vision and Pattern Recognition
Investigating the Effect of Network Pruning on Performance and Interpretability

Jonathan von Rad,Florian Seuffert

2024-09-29

Abstract:Deep Neural Networks (DNNs) are often over-parameterized for their tasks and can be compressed quite drastically by removing weights, a process called pruning. We investigate the impact of different pruning techniques on the classification performance and interpretability of GoogLeNet. We systematically apply unstructured and structured pruning, as well as connection sparsity (pruning of input weights) methods to the network and analyze the outcomes regarding the network's performance on the validation set of ImageNet. We also compare different retraining strategies, such as iterative pruning and one-shot pruning. We find that with sufficient retraining epochs, the performance of the networks can approximate the performance of the default GoogLeNet - and even surpass it in some cases. To assess interpretability, we employ the Mechanistic Interpretability Score (MIS) developed by Zimmermann et al. . Our experiments reveal that there is no significant relationship between interpretability and pruning rate when using MIS as a measure. Additionally, we observe that networks with extremely low accuracy can still achieve high MIS scores, suggesting that the MIS may not always align with intuitive notions of interpretability, such as understanding the basis of correct decisions.

Machine Learning,Computer Vision and Pattern Recognition

Operations of the facility planning team. Part I.

Class-Aware Pruning for Efficient Neural Networks

Structured Pruning for Efficient Convolutional Neural Networks Via Incremental Regularization

Loss Constrains Added Squeeze and Excitation Blocks for Pruning Deep Neural Networks

A Pruning Method Based on the Dissimilarity of Angle among Channels and Filters

A roulette wheel-based pruning method to simplify cumbersome deep neural networks

Structural Pruning in Deep Neural Networks: A Small-World Approach

Global Sparse Momentum SGD for Pruning Very Deep Neural Networks

DARB: A Density-Aware Regular-Block Pruning for Deep Neural Networks

Dual-Grained Lightweight Strategy

CRISP: Hybrid Structured Sparsity for Class-aware Model Pruning

A Dynamic Pruning Method on Multiple Sparse Structures in Deep Neural Networks

State Transition of Dendritic Spines Improves Learning of Sparse Spiking Neural Networks.

SR-init: an Interpretable Layer Pruning Method

One-Cycle Pruning: Pruning ConvNets Under a Tight Training Budget

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Three-Stage Global Channel Pruning for Resources-Limited Platform

Efficient DNN Neuron Pruning by Minimizing Layer-wise Nonlinear Reconstruction Error

(Pen-) Ultimate DNN Pruning

Connectivity Matters: Neural Network Pruning Through the Lens of Effective Sparsity

Investigating the Effect of Network Pruning on Performance and Interpretability