Adaptive Feature-based Low-Rank Compression of Large Language Models Via Bayesian Optimization

Yixin Ji,Yang Xiang,Juntao Li,Qingrong Xia,Zi Ye,Xinyu Duan,Zhefeng Wang,Kehai Chen,Min Zhang
DOI: https://doi.org/10.18653/v1/2024.findings-emnlp.240
2024-01-01
Abstract:In recent years, large language models (LLMs) have driven advances in naturallanguage processing. Still, their growing scale has increased the computationalburden, necessitating a balance between efficiency and performance. Low-rankcompression, a promising technique, reduces non-essential parameters bydecomposing weight matrices into products of two low-rank matrices. Yet, itsapplication in LLMs has not been extensively studied. The key to low-rankcompression lies in low-rank factorization and low-rank dimensions allocation.To address the challenges of low-rank compression in LLMs, we conduct empiricalresearch on the low-rank characteristics of large models. We propose a low-rankcompression method suitable for LLMs. This approach involves precise estimationof feature distributions through pooled covariance matrices and a Bayesianoptimization strategy for allocating low-rank dimensions. Experiments on theLLaMA-2 models demonstrate that our method outperforms existing strongstructured pruning and low-rank compression techniques in maintaining modelperformance at the same compression ratio.
What problem does this paper attempt to address?