Abstract:Traditionally, digital hardware designs are written in the Verilog hardware description language (HDL) and debugged manually by engineers. This can be time-consuming and error-prone for complex designs. Large Language Models (LLMs) are emerging as a potential tool to help generate fully functioning HDL code, but most works have focused on generation in the single-shot capacity: i.e., run and evaluate, a process that does not leverage debugging and as such does not adequately reflect a realistic development process. In this work we evaluate the ability of LLMs to leverage feedback from electronic design automation (EDA) tools to fix mistakes in their own generated Verilog. To accomplish this we present an open-source, highly customizable framework, AutoChip, which combines conversational LLMs with the output from Verilog compilers and simulations to iteratively generate and repair Verilog. To determine the success of these LLMs we leverage the VerilogEval benchmark set. We evaluate four state-of-the-art conversational LLMs, focusing on readily accessible commercial models. EDA tool feedback proved to be consistently more effective than zero-shot prompting only with GPT-4o, the most computationally complex model we evaluated. In the best case we observed a 5.8% increase in the number of successful designs with a 34.2% decrease in cost over the best zero-shot results. Mixing smaller models with this larger model at the end of the feedback iterations resulted in equally as much success as with GPT-4o using feedback, but for an additional 41.9% less cost (overall decrease in cost over zero-shot of 89.6%).

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to use the feedback from Electronic Design Automation (EDA) tools to improve the quality of Verilog code generated by Large Language Models (LLMs). Traditionally, digital hardware designs are written in Verilog Hardware Description Language (HDL) and manually debugged by engineers, which is a time - consuming and error - prone process. Although large language models can generate fully functional HDL code, most studies only focus on single - generation capabilities, that is, running and evaluation, without using the feedback in the debugging process, which does not fully reflect the actual development process. To improve this, the authors propose an open - source framework, AutoChip, which combines the output of conversational LLMs with Verilog compilers and simulation tools to iteratively generate and repair Verilog code. The main goal of the research is to evaluate whether LLMs can use the feedback from EDA tools to correct errors in the Verilog code they generate themselves, and to experimentally verify the impact of different feedback modes (such as "concise" and "full context") on code - generation quality and cost. Specifically, the paper explores the following six research questions: 1. **RQ1**: Can the feedback from hardware verification tools improve the HDL code generated by LLMs beyond zero - sample results? 2. **RQ2**: Do the number of iterations and the number of candidate responses affect the quality of the generated code and the number of correct implementations? 3. **RQ3**: What is the impact of code generation based on tool feedback on cost? 4. **RQ4**: Does the amount of context provided in the feedback affect the proportion of successful designs? 5. **RQ5**: Are LLMs more effective in solving certain types of hardware design problems than others? 6. **RQ6**: Can the mixed use of multiple LLMs with different capabilities during the design process improve the generation quality while reducing cost? Through these research questions, the paper aims to explore how to further automate the hardware design process, reduce the workload of designers, and improve the quality and efficiency of the generated code.

Can EDA Tool Feedback Improve Verilog Generation by LLMs?

AutoChip: Automating HDL Generation Using LLM Feedback

Data is all you need: Finetuning LLMs for Chip Design via an Automated design-data augmentation framework

LLM-aided explanations of EDA synthesis errors

EDA-Aware RTL Generation with Large Language Models

LLM4EDA: Emerging Progress in Large Language Models for Electronic Design Automation

LLM-Aided Efficient Hardware Design Automation

LLM-Aided Testbench Generation and Bug Detection for Finite-State Machines

Evaluating LLMs for Hardware Design and Test

Benchmarking Large Language Models for Automated Verilog RTL Code Generation

Advanced Large Language Model (LLM)-Driven Verilog Development: Enhancing Power, Performance, and Area Optimization in Code Synthesis

VHDL-Eval: A Framework for Evaluating Large Language Models in VHDL Code Generation

Revisiting VerilogEval: Newer LLMs, In-Context Learning, and Specification-to-RTL Tasks

Towards LLM-Powered Verilog RTL Assistant: Self-Verification and Self-Correction

ChatChisel: Enabling Agile Hardware Design with Large Language Models

HDLdebugger: Streamlining HDL debugging with Large Language Models

Chip-Chat: Challenges and Opportunities in Conversational Hardware Design

A Multi-Expert Large Language Model Architecture for Verilog Code Generation

VerilogReader: LLM-Aided Hardware Test Generation

AutoBench: Automatic Testbench Generation and Evaluation Using LLMs for HDL Design

Digital ASIC Design with Ongoing LLMs: Strategies and Prospects