Abstract:Large Language Models (LLMs) have demonstrated capabilities for producing code in Hardware Description Languages (HDLs). However, most of the focus remains on their abilities to write functional code, not test code. The hardware design process consists of both design and test, and so eschewing validation and verification leaves considerable potential benefit unexplored, given that a design and test framework may allow for progress towards full automation of the digital design pipeline. In this work, we perform one of the first studies exploring how a LLM can both design and test hardware modules from provided specifications. Using a suite of 8 representative benchmarks, we examined the capabilities and limitations of the state-of-the-art conversational LLMs when producing Verilog for functional and verification purposes. We taped out the benchmarks on a Skywater 130nm shuttle and received the functional chip.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to evaluate the capabilities of large - language models (LLMs) in hardware design and testing. Specifically, the researchers focus on how to use LLMs to generate functional Hardware Description Language (HDL) code, such as Verilog, and the corresponding testbenches from the given specification descriptions. This includes not only the design part, that is, generating hardware module code that implements specific functions, but also the testing part, that is, creating testbenches that can verify the correctness of these modules. Through this research, the author hopes to explore the potential of LLMs in the automated digital design process, especially their capabilities in design verification and testing. In the paper, 8 representative benchmark test cases are used to evaluate the performance of four of the latest conversational LLMs (ChatGPT - 4, ChatGPT - 3.5, Bard, HuggingChat) in generating functional HDL code and testbenches. The focus of the research is: 1. **Design Capability**: Evaluate the ability of LLMs to generate correct HDL code according to the given specification descriptions. 2. **Testing Capability**: Evaluate the ability of LLMs to generate effective, self - checking testbenches that can be used to verify the functionality of the generated HDL code. 3. **Interactivity**: Examine the effectiveness of using tool feedback (TF), simple human feedback (SHF), medium human feedback (MHF), and advanced human feedback (AHF) to fix errors when the code or testbenches generated by LLMs are incorrect. Through these evaluations, the researchers aim to reveal the actual application potential and limitations of current LLMs in the field of hardware design and testing, and provide directions for further research and development.

Evaluating LLMs for Hardware Design and Test

LLM-Aided Efficient Hardware Design Automation

Are LLMs Any Good for High-Level Synthesis?

FVEval: Understanding Language Model Capabilities in Formal Verification of Digital Hardware

Evaluating Large Language Models for Automatic Register Transfer Logic Generation via High-Level Synthesis

Benchmarking Large Language Models for Automated Verilog RTL Code Generation

C2HLSC: Can LLMs Bridge the Software-to-Hardware Design Gap?

LLM4DV: Using Large Language Models for Hardware Test Stimuli Generation

LLM4VV: Exploring LLM-as-a-Judge for Validation and Verification Testsuites

Advanced Large Language Model (LLM)-Driven Verilog Development: Enhancing Power, Performance, and Area Optimization in Code Synthesis

VerilogReader: LLM-Aided Hardware Test Generation

C2HLSC: Leveraging Large Language Models to Bridge the Software-to-Hardware Design Gap

VHDL-Eval: A Framework for Evaluating Large Language Models in VHDL Code Generation

UVLLM: An Automated Universal RTL Verification Framework using LLMs

LLM4EDA: Emerging Progress in Large Language Models for Electronic Design Automation

AssertLLM: Generating and Evaluating Hardware Verification Assertions from Design Specifications via Multi-LLMs

VerilogEval: Evaluating Large Language Models for Verilog Code Generation

Large Language Models to Generate System-Level Test Programs Targeting Non-functional Properties

LLM-Aided Testbench Generation and Bug Detection for Finite-State Machines

Exploring and Characterizing Large Language Models For Embedded System Development and Debugging

Digital ASIC Design with Ongoing LLMs: Strategies and Prospects