Can EDA Tool Feedback Improve Verilog Generation by LLMs?

Jason Blocklove,Shailja Thakur,Benjamin Tan,Hammond Pearce,Siddharth Garg,Ramesh Karri
2024-11-02
Abstract:Traditionally, digital hardware designs are written in the Verilog hardware description language (HDL) and debugged manually by engineers. This can be time-consuming and error-prone for complex designs. Large Language Models (LLMs) are emerging as a potential tool to help generate fully functioning HDL code, but most works have focused on generation in the single-shot capacity: i.e., run and evaluate, a process that does not leverage debugging and as such does not adequately reflect a realistic development process. In this work we evaluate the ability of LLMs to leverage feedback from electronic design automation (EDA) tools to fix mistakes in their own generated Verilog. To accomplish this we present an open-source, highly customizable framework, AutoChip, which combines conversational LLMs with the output from Verilog compilers and simulations to iteratively generate and repair Verilog. To determine the success of these LLMs we leverage the VerilogEval benchmark set. We evaluate four state-of-the-art conversational LLMs, focusing on readily accessible commercial models. EDA tool feedback proved to be consistently more effective than zero-shot prompting only with GPT-4o, the most computationally complex model we evaluated. In the best case we observed a 5.8% increase in the number of successful designs with a 34.2% decrease in cost over the best zero-shot results. Mixing smaller models with this larger model at the end of the feedback iterations resulted in equally as much success as with GPT-4o using feedback, but for an additional 41.9% less cost (overall decrease in cost over zero-shot of 89.6%).
Hardware Architecture,Artificial Intelligence,Programming Languages
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to use the feedback from Electronic Design Automation (EDA) tools to improve the quality of Verilog code generated by Large Language Models (LLMs). Traditionally, digital hardware designs are written in Verilog Hardware Description Language (HDL) and manually debugged by engineers, which is a time - consuming and error - prone process. Although large language models can generate fully functional HDL code, most studies only focus on single - generation capabilities, that is, running and evaluation, without using the feedback in the debugging process, which does not fully reflect the actual development process. To improve this, the authors propose an open - source framework, AutoChip, which combines the output of conversational LLMs with Verilog compilers and simulation tools to iteratively generate and repair Verilog code. The main goal of the research is to evaluate whether LLMs can use the feedback from EDA tools to correct errors in the Verilog code they generate themselves, and to experimentally verify the impact of different feedback modes (such as "concise" and "full context") on code - generation quality and cost. Specifically, the paper explores the following six research questions: 1. **RQ1**: Can the feedback from hardware verification tools improve the HDL code generated by LLMs beyond zero - sample results? 2. **RQ2**: Do the number of iterations and the number of candidate responses affect the quality of the generated code and the number of correct implementations? 3. **RQ3**: What is the impact of code generation based on tool feedback on cost? 4. **RQ4**: Does the amount of context provided in the feedback affect the proportion of successful designs? 5. **RQ5**: Are LLMs more effective in solving certain types of hardware design problems than others? 6. **RQ6**: Can the mixed use of multiple LLMs with different capabilities during the design process improve the generation quality while reducing cost? Through these research questions, the paper aims to explore how to further automate the hardware design process, reduce the workload of designers, and improve the quality and efficiency of the generated code.