SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression

Xin Wang,Yu Zheng,Zhongwei Wan,Mi Zhang
2024-05-28
Abstract:The advancements in Large Language Models (LLMs) have been hindered by their substantial sizes, which necessitate LLM compression methods for practical deployment. Singular Value Decomposition (SVD) offers a promising solution for LLM compression. However, state-of-the-art SVD-based LLM compression methods have two key limitations: truncating smaller singular values may lead to higher compression loss, and the lack of update on the compressed weight after SVD truncation. In this work, we propose SVD-LLM, a new SVD-based LLM compression method that addresses the limitations of existing methods. SVD-LLM incorporates a truncation-aware data whitening strategy to ensure a direct mapping between singular values and compression loss. Moreover, SVD-LLM adopts a layer-wise closed-form model parameter update strategy to compensate for accuracy degradation under high compression ratios. We evaluate SVD-LLM on a total of 10 datasets and eight models from three different LLM families at four different scales. Our results demonstrate the superiority of SVD-LLM over state-of-the-arts, especially at high model compression ratios.
Computation and Language,Machine Learning
What problem does this paper attempt to address?
The problems that this paper attempts to solve are two key limitations in the compression process of large - scale language models (LLMs): 1. **Truncating smaller singular values may lead to higher compression losses**: Traditional singular value decomposition (SVD) - based LLM compression methods may lead to higher compression losses when truncating smaller singular values. This is because these methods do not establish a direct relationship between singular values and compression losses, and thus are not precise enough when choosing which singular values to truncate. 2. **Lack of update of compressed weights after SVD truncation**: As the model compression ratio increases, the number of singular values to be truncated also increases. To compensate for the decrease in accuracy caused by truncating a large number of singular values, it is necessary to update the remaining parameters after compression. However, existing SVD - based LLM compression methods do not take this into account, and thus cannot effectively compensate for the decrease in accuracy at high compression ratios. To solve these problems, the paper proposes a new SVD - based LLM compression method - SVD - LLM. SVD - LLM overcomes the above limitations through the following two key techniques: 1. **Truncation - aware data whitening**: SVD - LLM introduces a truncation - aware data whitening technique, which ensures a direct mapping between singular values and model compression losses. This allows for more precise identification of which singular values should be truncated to minimize compression losses. 2. **Layer - by - layer closed - form model parameter update**: To compensate for the decrease in accuracy at high compression ratios, SVD - LLM adopts a layer - by - layer closed - form model parameter update strategy to update the compressed weights layer by layer. Through these improvements, SVD - LLM shows better performance than existing methods on multiple datasets and LLMs of different scales, especially at high compression ratios.