SparseLLM: Towards Global Pruning for Pre-trained Language Models

Guangji Bai,Yijiang Li,Chen Ling,Kibaek Kim,Liang Zhao
2024-11-01
Abstract:The transformative impact of large language models (LLMs) like LLaMA and GPT on natural language processing is countered by their prohibitive computational demands. Pruning has emerged as a pivotal compression strategy, introducing sparsity to enhance both memory and computational efficiency. Yet, traditional global pruning is impractical for LLMs due to scalability issues, while local pruning, despite its efficiency, leads to suboptimal solutions. Addressing these challenges, we propose SparseLLM, a novel framework that redefines the global pruning process into manageable, coordinated subproblems, allowing for resource-efficient optimization with global optimality. SparseLLM's approach, which conceptualizes LLMs as a chain of modular functions and leverages auxiliary variables for problem decomposition, not only facilitates a pragmatic application on LLMs but also demonstrates significant performance improvements, particularly in high-sparsity regimes where it surpasses current state-of-the-art methods.
Computation and Language
What problem does this paper attempt to address?
This paper attempts to solve the problem of excessive computational requirements of large - language models (LLMs) in natural - language processing. Specifically, although large - language models such as LLaMA and GPT perform excellently in various complex language benchmark tests, they require a large amount of computational resources, which limits their wide application. To solve this problem, the author proposes a new framework - SparseLLM, aiming to introduce sparsity through global pruning techniques to improve memory and computational efficiency. ### Main problems 1. **High consumption of computational resources**: Due to the large number of parameters, large - language models usually require significant computational resources to run, making it difficult to deploy and use them in resource - limited environments. 2. **Limitations of traditional global pruning**: Traditional global pruning methods need to load the entire model onto the same GPU, which is impractical for modern large - language models because these models are too large. 3. **Sub - optimal solution of local pruning**: Although the local pruning method compresses each layer separately and can reduce the demand for computational resources, this method only focuses on minimizing local errors, resulting in a decline in the overall model performance, especially in the case of high sparsity. ### Solutions SparseLLM redefines the global pruning process and decomposes it into several manageable sub - problems, thereby achieving resource - efficient optimization and maintaining global optimality. Specifically: - **Modular function chain**: Consider the large - language model as a chain of modular functions, where the output of each module serves as the input of the next module. - **Auxiliary variables**: Introduce auxiliary variables to decompose the problem so that each sub - problem can be solved in a low - resource environment, and these sub - problems can be coordinated to achieve the global pruning goal. - **Alternating optimization algorithm**: Propose an alternating optimization algorithm to efficiently solve these sub - problems through the closed - form solutions of each sub - problem, thereby achieving global optimality. ### Experimental results The experimental results show that SparseLLM significantly outperforms existing local pruning methods, such as SparseGPT and Wanda, in the case of high sparsity (> 60%). In particular, on large - scale models (such as OPT - 66b), SparseLLM can significantly reduce perplexity and improve the compression effect of the model. ### Conclusion SparseLLM provides an effective solution. Through global pruning techniques, it significantly reduces the computational resource requirements of large - language models while maintaining high performance, making them easier to deploy and use in resource - constrained environments.