Efficient Large Language Models: A Survey

Zhongwei Wan,Xin Wang,Che Liu,Samiul Alam,Yu Zheng,Jiachen Liu,Zhongnan Qu,Shen Yan,Yi Zhu,Quanlu Zhang,Mosharaf Chowdhury,Mi Zhang

2024-05-23

Abstract:Large Language Models (LLMs) have demonstrated remarkable capabilities in important tasks such as natural language understanding and language generation, and thus have the potential to make a substantial impact on our society. Such capabilities, however, come with the considerable resources they demand, highlighting the strong need to develop effective techniques for addressing their efficiency challenges. In this survey, we provide a systematic and comprehensive review of efficient LLMs research. We organize the literature in a taxonomy consisting of three main categories, covering distinct yet interconnected efficient LLMs topics from model-centric, data-centric, and framework-centric perspective, respectively. We have also created a GitHub repository where we organize the papers featured in this survey at

Computation and Language,Artificial Intelligence

What problem does this paper attempt to address?

This paper is a review of research on the efficiency of large language models (LLMs). With outstanding performance in tasks such as natural language understanding and generation, LLMs such as Open AI's GPT series, Meta's LLaMA series, and Google's Gemini have had a significant impact on society. However, these capabilities come with significant resource demands, including increased GPU hours during training and inference, resulting in high running costs. The paper aims to systematically review and organize technical research on improving LLM efficiency. The authors categorize the relevant literature into three main categories: model-centric, data-centric, and framework-centric, covering various efficiency optimization methods such as compression, pre-training, fine-tuning, inference acceleration, and architecture design. In addition, the paper discusses the role of data quality and structure in improving LLM efficiency, as well as dedicated frameworks for LLM training, fine-tuning, inference, and serving. The paper provides a graph showing the relationship between LLM performance, training time, and inference throughput, emphasizing the trade-off between model size and resource consumption, and highlights the achievement of higher efficiency through optimization techniques, as shown by the Mistral-7B model. The authors have also established a GitHub repository for continuously updating and maintaining relevant research papers for researchers and practitioners to reference. In summary, this paper attempts to address how to reduce the resource requirements of large language models without sacrificing performance, achieving more efficient and economical operation through algorithmic improvements, data selection, and framework enhancements.

Efficient Large Language Models: A Survey

The Efficiency Spectrum of Large Language Models: An Algorithmic Survey

A Survey on Efficient Inference for Large Language Models

Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language Models

Efficient Multimodal Large Language Models: A Survey

A Survey on Evaluation of Large Language Models

A Survey on Evaluation of Large Language ModelsJust Accepted

A Survey of Large Language Models

Survey of different Large Language Model Architectures: Trends, Benchmarks, and Challenges

Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems

History, Development, and Principles of Large Language Models-An Introductory Survey

A Comprehensive Overview of Large Language Models

Large Language Models Meet NLP: A Survey

Large Language Models: A Survey

Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers

A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers

A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness

A survey on large language models for recommendation

A Review of Current Trends, Techniques, and Challenges in Large Language Models (LLMs)