Abstract:Large Language Models (LLMs) have emerged as powerful tools for natural language processing tasks, revolutionizing the field with their ability to understand and generate human-like text. As the demand for more sophisticated LLMs continues to grow, there is a pressing need to address the computational challenges associated with their scale and complexity. This paper presents a comprehensive survey on hardware accelerators designed to enhance the performance and energy efficiency of Large Language Models. By examining a diverse range of accelerators, including GPUs, FPGAs, and custom-designed architectures, we explore the landscape of hardware solutions tailored to meet the unique computational demands of LLMs. The survey encompasses an in-depth analysis of architecture, performance metrics, and energy efficiency considerations, providing valuable insights for researchers, engineers, and decision-makers aiming to optimize the deployment of LLMs in real-world applications.

What problem does this paper attempt to address?

The paper primarily aims to address the computational resource demands of large language models (LLMs), particularly focusing on the hardware performance and energy efficiency requirements during their training and inference processes. With the development of large language models such as GPT-3, their applications in natural language processing tasks have become increasingly widespread, but they also bring significant computational challenges. To tackle these challenges, the paper provides a comprehensive survey, focusing on the design of hardware accelerators to improve the performance and energy efficiency of large language models. Specifically, the paper covers the following key points: 1. **Background and Purpose of the Survey**: As the scale of large language models grows, the demand for computational resources increases dramatically. This survey aims to comprehensively analyze how various hardware accelerators (including GPUs, FPGAs, and custom-designed architectures) optimize the computational efficiency and energy consumption of large language models. 2. **Comparison with Existing Work**: Compared to other related surveys, this paper provides more comprehensive and in-depth content. It is not limited to specific types of hardware or model compression algorithms but covers a wide range of hardware solutions and conducts in-depth technical and performance analyses. 3. **Technical Details**: The paper details several FPGA-based accelerator design schemes, including MNNFast, FTRANS, multi-head attention mechanism accelerators, NPE, column-balanced block pruning, DFX, OPU, and some CPU and GPU-based acceleration schemes such as SoftMax reorganization, LightSeq2, simplified Transformer networks, etc. 4. **Performance Evaluation**: By comparing the performance of different accelerators in terms of speed, energy efficiency, and other aspects, the paper demonstrates the practical effects of these schemes. For example, some FPGA accelerators can achieve several times the speed and energy efficiency improvements over CPUs and GPUs for certain tasks. In summary, the goal of this paper is to provide valuable insights for researchers, engineers, and decision-makers, helping them understand and choose the most suitable hardware acceleration solutions to optimize the deployment of large language models in practical application scenarios.

A Survey on Hardware Accelerators for Large Language Models

Hardware Acceleration of LLMs: A comprehensive survey and comparison

A Comprehensive Performance Study of Large Language Models on Novel AI Accelerators

A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models

Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective

The Efficiency Spectrum of Large Language Models: An Algorithmic Survey

New Solutions on LLM Acceleration, Optimization, and Application

A Hardware Evaluation Framework for Large Language Model Inference

LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators

Efficient Large Language Models: A Survey

Efficient Training of Large Language Models on Distributed Infrastructures: A Survey

A Comprehensive Survey of Accelerated Generation Techniques in Large Language Models

Survey of different Large Language Model Architectures: Trends, Benchmarks, and Challenges

Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems

Large Language Models: A Survey

A Comprehensive Evaluation of FPGA-Based Spatial Acceleration of LLMs

Understanding the Potential of FPGA-Based Spatial Acceleration for Large Language Model Inference

Achieving Peak Performance for Large Language Models: A Systematic Review

A Survey on Evaluation of Large Language Models

Large Language Models for Data Annotation and Synthesis: A Survey

A Survey on Model Compression for Large Language Models