LORS: Low-rank Residual Structure for Parameter-Efficient Network Stacking

Jialin Li,Qiang Nie,Weifu Fu,Yuhuan Lin,Guangpin Tao,Yong Liu,Chengjie Wang

2024-03-07

Abstract:Deep learning models, particularly those based on transformers, often employ numerous stacked structures, which possess identical architectures and perform similar functions. While effective, this stacking paradigm leads to a substantial increase in the number of parameters, posing challenges for practical applications. In today's landscape of increasingly large models, stacking depth can even reach dozens, further exacerbating this issue. To mitigate this problem, we introduce LORS (LOw-rank Residual Structure). LORS allows stacked modules to share the majority of parameters, requiring a much smaller number of unique ones per module to match or even surpass the performance of using entirely distinct ones, thereby significantly reducing parameter usage. We validate our method by applying it to the stacked decoders of a query-based object detector, and conduct extensive experiments on the widely used MS COCO dataset. Experimental results demonstrate the effectiveness of our method, as even with a 70\% reduction in the parameters of the decoder, our method still enables the model to achieve comparable or

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in deep - learning models, especially Transformer - based models, due to the extensive use of stacking structures (i.e., multiple modules with the same architecture and performing similar functions), the number of model parameters increases dramatically, thus posing challenges to training, inference, and deployment. The author observes that although the stacking structure enhances the model's capabilities, it also significantly increases the number of parameters. Therefore, they propose a new method - the Low - Rank Residual Structure (LORS), aiming to reduce the number of parameters in the stacking structure while maintaining or improving the model performance. Specifically, LORS achieves this goal by decomposing the parameters of the stacking modules into two parts: shared parameters (representing commonalities) and private parameters (capturing specific features). The shared parameters are used jointly by all modules and trained together, while the private parameters are independently owned by each module. This method allows each stacking module to retain only the parameters that capture its unique characteristics, thereby greatly reducing the overall amount of parameter usage. Experimental results show that even when the decoder parameters are reduced by 70%, the method using LORS can still enable the model to achieve performance comparable to or even better than that of the original model.

LORS: Low-rank Residual Structure for Parameter-Efficient Network Stacking

LoRAP: Transformer Sub-Layers Deserve Differentiated Structured Compression for Large Language Models

Structure-Preserving Network Compression Via Low-Rank Induced Training Through Linear Layers Composition

Maestro: Uncovering Low-Rank Structures via Trainable Decomposition

LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters

Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training

LORD: Low Rank Decomposition Of Monolingual Code LLMs For One-Shot Compression

Investigating Low-Rank Training in Transformer Language Models: Efficiency and Scaling Analysis

Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers

Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA

DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs

LPRNet: Lightweight Deep Network by Low-rank Pointwise Residual Convolution

Residual Networks of Residual Networks: Multilevel Residual Networks

LoRA-Switch: Boosting the Efficiency of Dynamic LLM Adapters via System-Algorithm Co-design

Sparsing Deep Neural Network Using Semi-Discrete Matrix Decomposition

LoTR: Low Tensor Rank Weight Adaptation

LoLDU: Low-Rank Adaptation via Lower-Diag-Upper Decomposition for Parameter-Efficient Fine-Tuning

On the importance of network architecture in training very deep neural networks

Co-Exploring Structured Sparsification and Low-Rank Tensor Decomposition for Compact DNNs

LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation

Efficient Stagewise Pretraining via Progressive Subnetworks