Abstract:Large Language Models (LLMs) have become extremely potent instruments with exceptional capacities for comprehending and producing human-like text in a wide range of applications. However, the increasing size and complexity of LLMs present significant challenges in both training and deployment, leading to substantial computational and storage costs as well as heightened energy consumption. In this paper, we provide a review of recent advancements and research directions aimed at addressing these challenges and enhancing the efficiency of LLM-based systems. We begin by discussing algorithm-level acceleration techniques focused on optimizing LLM inference speed and resource utilization. We also explore LLM-hardware co-design strategies with a vision to improve system efficiency by tailoring hardware architectures to LLM requirements. Further, we delve into LLM-to-accelerator compilation approaches, which involve customizing hardware accelerators for efficient LLM deployment. Finally, as a case study to leverage LLMs for assisting circuit design, we examine LLM-aided design methodologies for an important task: High-Level Synthesis (HLS) functional verification, by creating a new dataset that contains a large number of buggy and bug-free codes, which can be essential for training LLMs to specialize on HLS verification and debugging. For each aspect mentioned above, we begin with a detailed background study, followed by the presentation of several novel solutions proposed to overcome specific challenges. We then outline future research directions to drive further advancements. Through these efforts, we aim to pave the way for more efficient and scalable deployment of LLMs across a diverse range of applications.

Has LLM Reached the Scaling Ceiling Yet? Unified Insights into LLM Regularities and Constraints

Densing Law of LLMs

LLMs learn governing principles of dynamical systems, revealing an in-context neural scaling law

LLM Circuit Analyses Are Consistent Across Training and Scale

LLM-Generated Natural Language Meets Scaling Laws: New Explorations and Data Augmentation Methods

Temporal Scaling Law for Large Language Models

ScaleLLM: A Resource-Frugal LLM Serving Framework by Optimizing End-to-End Efficiency

Is Bigger and Deeper Always Better? Probing LLaMA Across Scales and Layers

Eight Things to Know about Large Language Models

New Solutions on LLM Acceleration, Optimization, and Application

Rethinking Conventional Wisdom in Machine Learning: From Generalization to Scaling

A Review of Current Trends, Techniques, and Challenges in Large Language Models (LLMs)

Scaling Laws for Discriminative Classification in Large Language Models

Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems

Limits for Learning with Language Models

LLM2: Let Large Language Models Harness System 2 Reasoning

CodeGen2: Lessons for Training LLMs on Programming and Natural Languages

Large Language Models are legal but they are not: Making the case for a powerful LegalLLM

Scaling Efficient LLMs

Concept Bottleneck Large Language Models

Performance Law of Large Language Models