Abstract:Large Language Models (LLMs) have become extremely potent instruments with exceptional capacities for comprehending and producing human-like text in a wide range of applications. However, the increasing size and complexity of LLMs present significant challenges in both training and deployment, leading to substantial computational and storage costs as well as heightened energy consumption. In this paper, we provide a review of recent advancements and research directions aimed at addressing these challenges and enhancing the efficiency of LLM-based systems. We begin by discussing algorithm-level acceleration techniques focused on optimizing LLM inference speed and resource utilization. We also explore LLM-hardware co-design strategies with a vision to improve system efficiency by tailoring hardware architectures to LLM requirements. Further, we delve into LLM-to-accelerator compilation approaches, which involve customizing hardware accelerators for efficient LLM deployment. Finally, as a case study to leverage LLMs for assisting circuit design, we examine LLM-aided design methodologies for an important task: High-Level Synthesis (HLS) functional verification, by creating a new dataset that contains a large number of buggy and bug-free codes, which can be essential for training LLMs to specialize on HLS verification and debugging. For each aspect mentioned above, we begin with a detailed background study, followed by the presentation of several novel solutions proposed to overcome specific challenges. We then outline future research directions to drive further advancements. Through these efforts, we aim to pave the way for more efficient and scalable deployment of LLMs across a diverse range of applications.

Towards Optimizing the Costs of LLM Usage

Optimizing Numerical Estimation and Operational Efficiency in the Legal Domain through Large Language Models

SMART: Automatically Scaling Down Language Models with Accuracy Guarantees for Reduced Processing Fees

Fine Tuning LLM for Enterprise: Practical Guidelines and Recommendations

Optimizing LLM Queries in Relational Workloads

"Which LLM should I use?": Evaluating LLMs for tasks performed by Undergraduate Computer Science Students

New Solutions on LLM Acceleration, Optimization, and Application

Analyzing LLM Usage in an Advanced Computing Class in India

Towards Greener LLMs: Bringing Energy-Efficiency to the Forefront of LLM Inference

LLMs for science: Usage for code generation and data analysis

From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge

LLMs as On-demand Customizable Service

LLMs vs Established Text Augmentation Techniques for Classification: When do the Benefits Outweight the Costs?

A Reality check of the benefits of LLM in business

The unreasonable effectiveness of large language models in zero-shot semantic annotation of legal texts

Balancing Cost and Effectiveness of Synthetic Data Generation Strategies for LLMs

OptLLM: Optimal Assignment of Queries to Large Language Models

An energy-based comparative analysis of common approaches to text classification in the Legal domain

No Size Fits All: The Perils and Pitfalls of Leveraging LLMs Vary with Company Size

Understanding LLMs: A Comprehensive Overview from Training to Inference

MetaLLM: A High-performant and Cost-efficient Dynamic Framework for Wrapping LLMs