Unified Framework for Neural Network Compression via Decomposition and Optimal Rank Selection

Ali Aghababaei-Harandi,Massih-Reza Amini
2024-09-05
Abstract:Despite their high accuracy, complex neural networks demand significant computational resources, posing challenges for deployment on resource-constrained devices such as mobile phones and embedded systems. Compression algorithms have been developed to address these challenges by reducing model size and computational demands while maintaining accuracy. Among these approaches, factorization methods based on tensor decomposition are theoretically sound and effective. However, they face difficulties in selecting the appropriate rank for decomposition. This paper tackles this issue by presenting a unified framework that simultaneously applies decomposition and optimal rank selection, employing a composite compression loss within defined rank constraints. Our approach includes an automatic rank search in a continuous space, efficiently identifying optimal rank configurations without the use of training data, making it computationally efficient. Combined with a subsequent fine-tuning step, our approach maintains the performance of highly compressed models on par with their original counterparts. Using various benchmark datasets, we demonstrate the efficacy of our method through a comprehensive analysis.
Machine Learning,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is how to perform decomposition and optimal rank selection simultaneously during the neural network compression process in order to reduce the model size and computational requirements while maintaining the accuracy of the model. Specifically, the authors propose a unified framework (Optimal Rank Tensor de - cOmpoSition, ORTOS), which automatically searches for the optimal rank configuration in the continuous space by using a composite compression loss function under the specified rank constraint, thereby achieving efficient neural network compression. ### Background of the Main Problem 1. **Deployment Challenges on Resource - Constrained Devices** - Although complex neural networks have high accuracy, they require a large amount of computational resources, which makes them difficult to be deployed on resource - constrained devices, such as mobile phones, embedded systems, etc. 2. **Limitations of Existing Compression Algorithms** - Existing compression algorithms (such as pruning, quantization, knowledge distillation, and tensor decomposition) can effectively reduce the model size and computational requirements, but they face difficulties in choosing the appropriate rank for decomposition. - In particular, although the tensor decomposition method is theoretically effective, there are non - uniqueness and NP - hard problems in choosing the appropriate rank. ### Core Contributions of the Paper 1. **Unified Framework ORTOS** - By introducing a composite compression loss function, it performs decomposition and optimal rank selection simultaneously under the specified rank constraint. - It proposes an automatic rank - searching method, which can efficiently identify the optimal rank configuration in the continuous space without training data and has high computational efficiency. 2. **Multi - step Search Strategy** - It adopts a multi - step search strategy, gradually refines the rank search space to ensure that all possible rank values are covered, thereby achieving the maximum compression rate. 3. **Fine - Tuning Step** - Combined with the subsequent fine - tuning step, it maintains the performance of the highly compressed model comparable to that of the original model. ### Experimental Results - Verification was carried out on multiple benchmark datasets (such as CIFAR - 10 and ImageNet), demonstrating the effectiveness of this method. - For ResNet - 20 and VGG - 16 models, ORTOS achieved significant compression in terms of FLOPs and the number of parameters, and in some cases also improved the accuracy. - For ResNet - 18 and MobileNetV2 models, especially when using TT decomposition, ORTOS achieved state - of - the - art performance in terms of Top - 1, Top - 5 accuracy, and the reduction of FLOPs and the number of parameters. ### Summary This paper solves the problems of decomposition and optimal rank selection in neural network compression by proposing a unified framework ORTOS, achieving efficient model compression while maintaining the high performance of the model. This method is of great significance for applications on resource - constrained devices.