Abstract:Deep learning models like Transformers and Convolutional Neural Networks (CNNs) have revolutionized various domains, but their parameter-intensive nature hampers deployment in resource-constrained settings. In this paper, we introduce a novel concept utilizes column space and row space of weight matrices, which allows for a substantial reduction in model parameters without compromising performance. Leveraging this paradigm, we achieve parameter-efficient deep learning models.. Our approach applies to both Bottleneck and Attention layers, effectively halving the parameters while incurring only minor performance degradation. Extensive experiments conducted on the ImageNet dataset with ViT and ResNet50 demonstrate the effectiveness of our method, showcasing competitive performance when compared to traditional models. This approach not only addresses the pressing demand for parameter efficient deep learning solutions but also holds great promise for practical deployment in real-world scenarios.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **How to significantly reduce the number of parameters in deep neural networks (such as Transformer and convolutional neural networks CNNs) without significantly sacrificing performance, in order to improve the model's deployment ability in resource - constrained environments**. Specifically, the author proposes a new method of using the column space and row space of the weight matrix, thereby achieving a parameter - efficient deep - learning model. This method can significantly reduce the number of model parameters, especially in bottleneck layers and attention layers, where the number of parameters can be reduced by half, while only causing a slight performance degradation. ### Main problems and solutions 1. **Problem background**: - Deep - learning models (such as Transformers and CNNs) have achieved great success in many fields, but these models usually require a large number of parameters. - The parameter - intensive nature makes these models difficult to deploy in resource - constrained environments, such as mobile devices or edge - computing platforms. 2. **Proposed method**: - Use the column space and row space of the weight matrix and reduce the number of model parameters by sharing the parameters in these spaces. - This method is applicable to the multi - head attention mechanism (Multi - Head Attention, MHA) and feed - forward network (Feed Forward Network, FFN) of Transformer, as well as the bottleneck layers in ResNet. - In this way, the number of parameters can be significantly reduced while maintaining the performance of the model. 3. **Experimental verification**: - Extensive experiments were carried out on the ImageNet dataset, and the ViT and ResNet50 models were used for testing. - The experimental results show that after using this method, the number of model parameters is reduced by about half, while the performance has almost no obvious decline. ### Formula representation - For the multi - head attention mechanism (MHA), the original formula is: \[ \hat{x}=\text{MHA}(Q, K, V)=\text{MHA}(W_q x, W_k x, W_v x) \] In the new method, using the column space and row space, the formula becomes: \[ \hat{x}=\text{MHA}(W_q x, W_k x, W_{kv}^T x) \] For the linear projection part: \[ \text{Proj}(\hat{x}, W_{proj}) = W_{proj}^T \hat{x} \] - For the feed - forward network (FFN), the original formula is: \[ \text{FFN}(x)=W_2 F(W_1 x + b_1)+b_2 \] In the new method, using the column space and row space of a single weight matrix \(W\), the formula becomes: \[ \text{FFN}(x)=W^T F(W x + b_1)+b_2 \] - For the bottleneck layers, the original formula is: \[ \text{Bottleneck}(x)=W_1 G(W_2 x) \] In the new method, using the column space and row space of a single weight matrix \(W\), the formula becomes: \[ \text{Bottleneck}(x)=W^T G(W x) \] ### Conclusion This research proposes a new method. By using the column space and row space of the weight matrix, a parameter - efficient deep - learning model is achieved. The experimental results show that this method can maintain the model performance while reducing the number of parameters, thus providing an effective solution for practical applications in resource - constrained environments.

Do deep neural networks utilize the weight space efficiently?

Multi-Dimension Compression of Feed-Forward Network in Vision Transformers

Tensorizing Neural Networks

Compressing Deep Neural Networks With Sparse Matrix Factorization

Space Efficient Quantization for Deep Convolutional Neural Networks

Revealing the Utilized Rank of Subspaces of Learning in Neural Networks

Reducing the Transformer Architecture to a Minimum

Compressing Transformers: Features Are Low-Rank, but Weights Are Not!

Re-training and parameter sharing with the Hash trick for compressing convolutional neural networks

Convolutional Neural Network Compression via Dynamic Parameter Rank Pruning

Convolutional Neural Network Compression Based on Low-Rank Decomposition

Low-Cost Parameterizations of Deep Convolutional Neural Networks

Compressing deep neural networks by matrix product operators

Pre-defined Sparsity for Low-Complexity Convolutional Neural Networks

The Shallow End: Empowering Shallower Deep-Convolutional Networks through Auxiliary Outputs

Efficient feature transform module

Energy-Constrained Compression for Deep Neural Networks via Weighted Sparse Projection and Layer Input Masking

Neural Functional Transformers

Efficient Neural Network Compression Inspired by Compressive Sensing.

Learning Efficient Convolutional Networks Through Network Slimming.

DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and Vision Transformers