LLMCBench: Benchmarking Large Language Model Compression for Efficient Deployment

Ge Yang,Changyi He,Jinyang Guo,Jianyu Wu,Yifu Ding,Aishan Liu,Haotong Qin,Pengliang Ji,Xianglong Liu

2024-10-31

Abstract:Although large language models (LLMs) have demonstrated their strong intelligence ability, the high demand for computation and storage hinders their practical application. To this end, many model compression techniques are proposed to increase the efficiency of LLMs. However, current researches only validate their methods on limited models, datasets, metrics, etc, and still lack a comprehensive evaluation under more general scenarios. So it is still a question of which model compression approach we should use under a specific case. To mitigate this gap, we present the Large Language Model Compression Benchmark (LLMCBench), a rigorously designed benchmark with an in-depth analysis for LLM compression algorithms. We first analyze the actual model production requirements and carefully design evaluation tracks and metrics. Then, we conduct extensive experiments and comparison using multiple mainstream LLM compression approaches. Finally, we perform an in-depth analysis based on the evaluation and provide useful insight for LLM compression design. We hope our LLMCBench can contribute insightful suggestions for LLM compression algorithm design and serve as a foundation for future research. Our code is available at <a class="link-external link-https" href="https://github.com/AboveParadise/LLMCBench" rel="external noopener nofollow">this https URL</a>.

Computation and Language,Artificial Intelligence

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve The paper "LLMCBench: Benchmarking Large Language Model Compression for Efficient Deployment" aims to address several key issues in the field of large language model (LLM) compression: 1. **Limited Scope of Performance Evaluation**: - Current LLM compression research is typically validated on a limited set of models, datasets, and metrics, making it difficult to form a fair comparison between different methods. For example, different compression methods may choose different baseline models and datasets to evaluate their effectiveness, making it hard to compare evaluation results horizontally. - This makes it challenging to select the appropriate LLM compression method for specific scenarios. 2. **Efficiency Evaluation Metrics Remain Theoretical**: - Most existing LLM compression methods focus only on computational complexity or model storage as efficiency metrics, lacking a comprehensive evaluation of broader efficiency metrics such as actual acceleration and GPU memory reduction. - Resource consumption during the compression process is often overlooked. - Compressed LLMs need to be used in real-world scenarios, so the reliability of the model is also an important aspect of the compression algorithm, but this has not been fully considered in existing research. To bridge these gaps, the authors propose a benchmarking platform called **Large Language Model Compression Benchmark (LLMCBench)**. This platform designs multiple evaluation tracks and metrics to comprehensively evaluate current mainstream LLM compression algorithms, aiming to provide valuable insights and recommendations for the design and future research of LLM compression algorithms.

LLMCBench: Benchmarking Large Language Model Compression for Efficient Deployment

LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit

A Survey on Model Compression for Large Language Models

Feature-based Low-Rank Compression of Large Language Models via Bayesian Optimization

Search for Efficient Large Language Models

LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models

Evaluating the Impact of Compression Techniques on Task-Specific Performance of Large Language Models

Aggressive Post-Training Compression on Extremely Large Language Models

The Cost of Compression: Investigating the Impact of Compression on Parametric Knowledge in Language Models

Ranking LLMs by compression

Beyond Perplexity: Multi-dimensional Safety Evaluation of LLM Compression

Compressing LLMs: The Truth is Rarely Pure and Never Simple

PalmBench: A Comprehensive Benchmark of Compressed Large Language Models on Mobile Platforms

Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models

Evaluating Large Language Models for Generalization and Robustness via Data Compression

Benchmarking the Text-to-SQL Capability of Large Language Models: A Comprehensive Evaluation

New Solutions on LLM Acceleration, Optimization, and Application

OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning

Compression Represents Intelligence Linearly

ML-Bench: Large Language Models Leverage Open-source Libraries for Machine Learning Tasks

LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators