LoRAShear: Efficient Large Language Model Structured Pruning and Knowledge Recovery

Tianyi Chen,Tianyu Ding,Badal Yadav,Ilya Zharkov,Luming Liang
2023-10-31
Abstract:Large Language Models (LLMs) have transformed the landscape of artificial intelligence, while their enormous size presents significant challenges in terms of computational costs. We introduce LoRAShear, a novel efficient approach to structurally prune LLMs and recover knowledge. Given general LLMs, LoRAShear at first creates the dependency graphs over LoRA modules to discover minimally removal structures and analyze the knowledge distribution. It then proceeds progressive structured pruning on LoRA adaptors and enables inherent knowledge transfer to better preserve the information in the redundant structures. To recover the lost knowledge during pruning, LoRAShear meticulously studies and proposes a dynamic fine-tuning schemes with dynamic data adaptors to effectively narrow down the performance gap to the full models. Numerical results demonstrate that by only using one GPU within a couple of GPU days, LoRAShear effectively reduced footprint of LLMs by 20% with only 1.0% performance degradation and significantly outperforms state-of-the-arts. The source code will be available at <a class="link-external link-https" href="https://github.com/microsoft/lorashear" rel="external noopener nofollow">this https URL</a>.
Computation and Language,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The paper primarily addresses the issues of high computational cost and resource consumption in large language models (LLMs) by proposing a new method called LoRAShear. LoRAShear aims to efficiently compress LLMs through structured pruning and knowledge recovery, thereby significantly reducing the model size while maintaining high performance under limited resource conditions. Specifically, LoRAShear addresses the following issues: 1. **Automatic Discovery of Minimal Removable Structures**: By analyzing dependency graphs, it automatically identifies the smallest units that can be removed without affecting the model's functionality. 2. **Knowledge Distribution Analysis**: It analyzes the distribution of knowledge across different model components to determine which parts are crucial for model performance, thereby avoiding the removal of critical structures during pruning. 3. **Progressive Structured Pruning**: A new algorithm called LoRA Half-Space Projected Gradient (LHSPG) is proposed to progressively identify and remove redundant structures based on information from LoRA modules, and transfer the knowledge contained in these structures to more important ones to retain as much of the original model's knowledge as possible. 4. **Dynamic Knowledge Recovery**: By dynamically selecting and fine-tuning pre-training and instruction fine-tuning datasets, it effectively recovers the knowledge lost during the pruning process. Experimental results show that with a 20% pruning rate, LoRAShear loses only about 1% of performance compared to the full model; and with a 50% pruning rate, LoRAShear still retains 82% of the original model's performance, significantly outperforming existing methods. In summary, LoRAShear aims to achieve effective compression of large language models through structured pruning and efficient knowledge recovery strategies in resource-constrained environments, while minimizing performance loss as much as possible.