LLMCBench: Benchmarking Large Language Model Compression for Efficient Deployment

Ge Yang,Changyi He,Jinyang Guo,Jianyu Wu,Yifu Ding,Aishan Liu,Haotong Qin,Pengliang Ji,Xianglong Liu
2024-10-31
Abstract:Although large language models (LLMs) have demonstrated their strong intelligence ability, the high demand for computation and storage hinders their practical application. To this end, many model compression techniques are proposed to increase the efficiency of LLMs. However, current researches only validate their methods on limited models, datasets, metrics, etc, and still lack a comprehensive evaluation under more general scenarios. So it is still a question of which model compression approach we should use under a specific case. To mitigate this gap, we present the Large Language Model Compression Benchmark (LLMCBench), a rigorously designed benchmark with an in-depth analysis for LLM compression algorithms. We first analyze the actual model production requirements and carefully design evaluation tracks and metrics. Then, we conduct extensive experiments and comparison using multiple mainstream LLM compression approaches. Finally, we perform an in-depth analysis based on the evaluation and provide useful insight for LLM compression design. We hope our LLMCBench can contribute insightful suggestions for LLM compression algorithm design and serve as a foundation for future research. Our code is available at <a class="link-external link-https" href="https://github.com/AboveParadise/LLMCBench" rel="external noopener nofollow">this https URL</a>.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve The paper "LLMCBench: Benchmarking Large Language Model Compression for Efficient Deployment" aims to address several key issues in the field of large language model (LLM) compression: 1. **Limited Scope of Performance Evaluation**: - Current LLM compression research is typically validated on a limited set of models, datasets, and metrics, making it difficult to form a fair comparison between different methods. For example, different compression methods may choose different baseline models and datasets to evaluate their effectiveness, making it hard to compare evaluation results horizontally. - This makes it challenging to select the appropriate LLM compression method for specific scenarios. 2. **Efficiency Evaluation Metrics Remain Theoretical**: - Most existing LLM compression methods focus only on computational complexity or model storage as efficiency metrics, lacking a comprehensive evaluation of broader efficiency metrics such as actual acceleration and GPU memory reduction. - Resource consumption during the compression process is often overlooked. - Compressed LLMs need to be used in real-world scenarios, so the reliability of the model is also an important aspect of the compression algorithm, but this has not been fully considered in existing research. To bridge these gaps, the authors propose a benchmarking platform called **Large Language Model Compression Benchmark (LLMCBench)**. This platform designs multiple evaluation tracks and metrics to comprehensively evaluate current mainstream LLM compression algorithms, aiming to provide valuable insights and recommendations for the design and future research of LLM compression algorithms.