A Comprehensive Survey of Compression Algorithms for Language Models

Seungcheol Park,Jaehyeon Choi,Sojin Lee,U Kang

2024-01-27

Abstract:How can we compress language models without sacrificing accuracy? The number of compression algorithms for language models is rapidly growing to benefit from remarkable advances of recent language models without side effects due to the gigantic size of language models, such as increased carbon emissions and expensive maintenance fees. While numerous compression algorithms have shown remarkable progress in compressing language models, it ironically becomes challenging to capture emerging trends and identify the fundamental concepts underlying them due to the excessive number of algorithms. In this paper, we survey and summarize diverse compression algorithms including pruning, quantization, knowledge distillation, low-rank approximation, parameter sharing, and efficient architecture design. We not only summarize the overall trend of diverse compression algorithms but also select representative algorithms and provide in-depth analyses of them. We discuss the value of each category of compression algorithms, and the desired properties of low-cost compression algorithms which have a significant impact due to the emergence of large language models. Finally, we introduce promising future research topics based on our survey results.

Computation and Language,Artificial Intelligence

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to compress language models without sacrificing accuracy. With the rapid growth in the scale of language models, their huge model sizes have brought about critical issues such as increased carbon emissions and high maintenance costs. Therefore, the paper focuses on how to compress language models through various compression algorithms to reduce these negative impacts while maintaining the performance of the models. Specifically, the paper aims to: 1. **Review and summarize**: Conduct a comprehensive review and summary of existing language model compression algorithms, including methods such as pruning, quantization, knowledge distillation, low - rank approximation, parameter sharing, and efficient architecture design. 2. **In - depth analysis**: Select representative algorithms for in - depth analysis and provide meaningful insights. 3. **Discussion and future directions**: Discuss the value of each compression algorithm and the characteristics required for low - cost compression algorithms, and propose future research directions. Through these goals, the paper hopes to provide researchers and engineers with a comprehensive guide to help them understand and apply language model compression techniques, thereby achieving efficient and accurate language models in practical applications.

A Comprehensive Survey of Compression Algorithms for Language Models

A Survey on Model Compression for Large Language Models

Model Compression and Efficient Inference for Large Language Models: A Survey

Language Modeling Is Compression

Comprehensive Study on Performance Evaluation and Optimization of Model Compression: Bridging Traditional Deep Learning and Large Language Models

The Cost of Compression: Investigating the Impact of Compression on Parametric Knowledge in Language Models

Evaluating Large Language Models for Generalization and Robustness via Data Compression

A Survey of Small Language Models

A Survey on Transformer Compression

Compression of Deep Learning Models for Text: A Survey

LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit

A Survey on Model Compression and Acceleration for Pretrained Language Models

Prompt Compression for Large Language Models: A Survey

Evaluating the Impact of Compression Techniques on Task-Specific Performance of Large Language Models

The Efficiency Spectrum of Large Language Models: An Algorithmic Survey

Semantic Compression With Large Language Models

Foundations of Large Language Model Compression -- Part 1: Weight Quantization

Bridging Information-Theoretic and Geometric Compression in Language Models

Model Compression for Deep Neural Networks: A Survey

Feature-based Low-Rank Compression of Large Language Models via Bayesian Optimization

Aggressive Post-Training Compression on Extremely Large Language Models