Data-freeWeight Compress and Denoise for Large Language Models

Runyu Peng,Yunhua Zhou,Qipeng Guo,Yang Gao,Hang Yan,Xipeng Qiu,Dahua Lin
DOI: https://doi.org/10.48550/arxiv.2402.16319
2024-01-01
Abstract:Large Language Models (LLMs) are reshaping the research landscape inartificial intelligence, particularly as model parameters scale upsignificantly, unlocking remarkable capabilities across various domains.Nevertheless, the scalability of model parameters faces constraints due tolimitations in GPU memory and computational speed. To address theseconstraints, various weight compression methods have emerged, such as Pruningand Quantization. Given the low-rank nature of weight matrices in languagemodels, the reduction of weights through matrix decomposition undoubtedly holdssignificant potential and promise. In this paper, drawing upon the intrinsicstructure of LLMs, we propose a novel approach termed Data-free Joint Rank-kApproximation for compressing the parameter matrices. Significantly, our methodis characterized by without necessitating additional involvement of any corpus,while simultaneously preserving orthogonality in conjunction with pruning andquantization methods. We achieve a model pruning of 80retaining 93.43Additionally, we explore the fundamental properties of the weight matrix ofLLMs undergone Rank-k Approximation and conduct comprehensive experiments toelucidate our hypothesis.
What problem does this paper attempt to address?