Abstract:Despite their high accuracy, complex neural networks demand significant computational resources, posing challenges for deployment on resource-constrained devices such as mobile phones and embedded systems. Compression algorithms have been developed to address these challenges by reducing model size and computational demands while maintaining accuracy. Among these approaches, factorization methods based on tensor decomposition are theoretically sound and effective. However, they face difficulties in selecting the appropriate rank for decomposition. This paper tackles this issue by presenting a unified framework that simultaneously applies decomposition and optimal rank selection, employing a composite compression loss within defined rank constraints. Our approach includes an automatic rank search in a continuous space, efficiently identifying optimal rank configurations without the use of training data, making it computationally efficient. Combined with a subsequent fine-tuning step, our approach maintains the performance of highly compressed models on par with their original counterparts. Using various benchmark datasets, we demonstrate the efficacy of our method through a comprehensive analysis.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is how to perform decomposition and optimal rank selection simultaneously during the neural network compression process in order to reduce the model size and computational requirements while maintaining the accuracy of the model. Specifically, the authors propose a unified framework (Optimal Rank Tensor de - cOmpoSition, ORTOS), which automatically searches for the optimal rank configuration in the continuous space by using a composite compression loss function under the specified rank constraint, thereby achieving efficient neural network compression. ### Background of the Main Problem 1. **Deployment Challenges on Resource - Constrained Devices** - Although complex neural networks have high accuracy, they require a large amount of computational resources, which makes them difficult to be deployed on resource - constrained devices, such as mobile phones, embedded systems, etc. 2. **Limitations of Existing Compression Algorithms** - Existing compression algorithms (such as pruning, quantization, knowledge distillation, and tensor decomposition) can effectively reduce the model size and computational requirements, but they face difficulties in choosing the appropriate rank for decomposition. - In particular, although the tensor decomposition method is theoretically effective, there are non - uniqueness and NP - hard problems in choosing the appropriate rank. ### Core Contributions of the Paper 1. **Unified Framework ORTOS** - By introducing a composite compression loss function, it performs decomposition and optimal rank selection simultaneously under the specified rank constraint. - It proposes an automatic rank - searching method, which can efficiently identify the optimal rank configuration in the continuous space without training data and has high computational efficiency. 2. **Multi - step Search Strategy** - It adopts a multi - step search strategy, gradually refines the rank search space to ensure that all possible rank values are covered, thereby achieving the maximum compression rate. 3. **Fine - Tuning Step** - Combined with the subsequent fine - tuning step, it maintains the performance of the highly compressed model comparable to that of the original model. ### Experimental Results - Verification was carried out on multiple benchmark datasets (such as CIFAR - 10 and ImageNet), demonstrating the effectiveness of this method. - For ResNet - 20 and VGG - 16 models, ORTOS achieved significant compression in terms of FLOPs and the number of parameters, and in some cases also improved the accuracy. - For ResNet - 18 and MobileNetV2 models, especially when using TT decomposition, ORTOS achieved state - of - the - art performance in terms of Top - 1, Top - 5 accuracy, and the reduction of FLOPs and the number of parameters. ### Summary This paper solves the problems of decomposition and optimal rank selection in neural network compression by proposing a unified framework ORTOS, achieving efficient model compression while maintaining the high performance of the model. This method is of great significance for applications on resource - constrained devices.

Unified Framework for Neural Network Compression via Decomposition and Optimal Rank Selection

Deep Learning Model Compression with Rank Reduction in Tensor Decomposition.

Convolutional Neural Network Compression Based on Low-Rank Decomposition

Neural Network Compression Based on Tensor Ring Decomposition

An Accuracy-Preserving Neural Network Compression Via Tucker Decomposition

Compression of Recurrent Neural Networks using Matrix Factorization

Convolutional neural networks compression with low rank and sparse tensor decompositions

On-Demand Deep Model Compression for Mobile Devices

Low-Rank+Sparse Tensor Compression for Neural Networks

Towards Efficient Tensor Decomposition-Based DNN Model Compression with Optimization Framework

Towards Compact Neural Networks via End-to-End Training: A Bayesian Tensor Approach with Automatic Rank Determination

Deep neural network compression by Tucker decomposition with nonlinear response

Adaptive Tensor-Train Decomposition for Neural Network Compression

Stable Low-rank Tensor Decomposition for Compression of Convolutional Neural Network

On Model Compression for Neural Networks: Framework, Algorithm, and Convergence Guarantee

CNN Compression-Recovery Framework Via Rank Allocation Decomposition with Knowledge Transfer

Quantization Aware Factorization for Deep Neural Network Compression

A CNN Compression Method Via Dynamic Channel Ranking Strategy

CMD: Controllable Matrix Decomposition with Global Optimization for Deep Neural Network Compression

Deep Convolutional Neural Network Compression Method: Tensor Ring Decomposition with Variational Bayesian Approach

DNN Compression Approach Based on Bayesian Optimization Tensor Ring Decomposition