LORS: Low-rank Residual Structure for Parameter-Efficient Network Stacking

Jialin Li,Qiang Nie,Weifu Fu,Yuhuan Lin,Guangpin Tao,Yong Liu,Chengjie Wang
2024-03-07
Abstract:Deep learning models, particularly those based on transformers, often employ numerous stacked structures, which possess identical architectures and perform similar functions. While effective, this stacking paradigm leads to a substantial increase in the number of parameters, posing challenges for practical applications. In today's landscape of increasingly large models, stacking depth can even reach dozens, further exacerbating this issue. To mitigate this problem, we introduce LORS (LOw-rank Residual Structure). LORS allows stacked modules to share the majority of parameters, requiring a much smaller number of unique ones per module to match or even surpass the performance of using entirely distinct ones, thereby significantly reducing parameter usage. We validate our method by applying it to the stacked decoders of a query-based object detector, and conduct extensive experiments on the widely used MS COCO dataset. Experimental results demonstrate the effectiveness of our method, as even with a 70\% reduction in the parameters of the decoder, our method still enables the model to achieve comparable or
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in deep - learning models, especially Transformer - based models, due to the extensive use of stacking structures (i.e., multiple modules with the same architecture and performing similar functions), the number of model parameters increases dramatically, thus posing challenges to training, inference, and deployment. The author observes that although the stacking structure enhances the model's capabilities, it also significantly increases the number of parameters. Therefore, they propose a new method - the Low - Rank Residual Structure (LORS), aiming to reduce the number of parameters in the stacking structure while maintaining or improving the model performance. Specifically, LORS achieves this goal by decomposing the parameters of the stacking modules into two parts: shared parameters (representing commonalities) and private parameters (capturing specific features). The shared parameters are used jointly by all modules and trained together, while the private parameters are independently owned by each module. This method allows each stacking module to retain only the parameters that capture its unique characteristics, thereby greatly reducing the overall amount of parameter usage. Experimental results show that even when the decoder parameters are reduced by 70%, the method using LORS can still enable the model to achieve performance comparable to or even better than that of the original model.