Abstract:Deep neural networks typically impose significant computational loads and memory consumption. Moreover, the large parameters pose constraints on deploying the model on edge devices such as embedded systems. Tensor decomposition offers a clear advantage in compressing large-scale weight tensors. Nevertheless, direct utilization of low-rank decomposition typically leads to significant accuracy loss. This paper proposes a model compression method that integrates Variational Bayesian Matrix Factorization (VBMF) with orthogonal regularization. Initially, the model undergoes over-parameterization and training, with orthogonal regularization applied to enhance its likelihood of achieving the accuracy of the original model. Secondly, VBMF is employed to estimate the rank of the weight tensor at each layer. Our framework is sufficiently general to apply to other convolutional neural networks and easily adaptable to incorporate other tensor decomposition methods. Experimental results show that for both high and low compression ratios, our compression model exhibits advanced performance.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the excessive computational load and memory consumption when deep neural networks (DNNs) are deployed on embedded devices. Specifically, the paper focuses on the compression problem of large - scale weight tensors in convolutional neural networks (CNNs). Although the existing tensor decomposition techniques can achieve thousands - of - times parameter compression in video tasks, in image classification tasks, especially when using Tensor Train (TT) and Tensor Ring (TR) decomposition, a slightly larger compression ratio will lead to a significant loss of accuracy. To solve these problems, the paper proposes a new model compression method, which combines variational Bayesian matrix factorization (VBMF) and orthogonal regularization. This method aims to improve the performance of the compressed model through the following steps: 1. **Over - parameterized training and orthogonal regularization**: First, over - parameterize the training of the model and impose orthogonal regularization to ensure that the model can reach or exceed the accuracy of the original model. 2. **VBMF to estimate the rank**: Then use VBMF to estimate the rank of the weight tensor of each layer. 3. **Low - rank training**: Finally, conduct low - rank training to obtain the compressed model. The main contributions of the paper include: - Proposing a framework that combines over - parameterized training and orthogonal regularization, which not only provides better initial values but also ensures orthogonality. - Using VBMF to estimate the rank of one modality in TK - 2 decomposition, and the other modality is determined according to the relationship between the input and output channels of the convolutional neural network. - The experimental results on multiple DNN models show that the compressed model exhibits excellent performance at both high and low compression ratios. Through these improvements, the paper solves the limitations of the existing tensor decomposition methods in CNN compression, especially achieving efficient compression while maintaining the accuracy of the model.

Convolutional Neural Network Compression Based on Low-Rank Decomposition

Convolutional neural networks compression with low rank and sparse tensor decompositions

Low-rank Tensor Decomposition for Compression of Convolutional Neural Networks Using Funnel Regularization

Stable Low-rank Tensor Decomposition for Compression of Convolutional Neural Network

Deep Convolutional Neural Network Compression Method: Tensor Ring Decomposition with Variational Bayesian Approach

On Compressing Deep Models by Low Rank and Sparse Decomposition.

Low-Rank+Sparse Tensor Compression for Neural Networks

Deep neural network compression by Tucker decomposition with nonlinear response

Unified Framework for Neural Network Compression via Decomposition and Optimal Rank Selection

Towards Compact Neural Networks via End-to-End Training: A Bayesian Tensor Approach with Automatic Rank Determination

Accelerating the Low-Rank Decomposed Models

Convolutional Neural Network Compression via Dynamic Parameter Rank Pruning

Convolutional neural networks with low-rank regularization

On Model Compression for Neural Networks: Framework, Algorithm, and Convergence Guarantee

Towards Efficient Tensor Decomposition-Based DNN Model Compression with Optimization Framework

Speeding-up and compression convolutional neural networks by low-rank decomposition without fine-tuning

Hybrid Tensor Decomposition in Neural Network Compression

Holistic CNN Compression Via Low-Rank Decomposition with Knowledge Transfer.

Semi-tensor Product-based TensorDecomposition for Neural Network Compression

Reduced storage direct tensor ring decomposition for convolutional neural networks compression

Iterative Deep Model Compression and Acceleration in the Frequency Domain.