Classification-Based Automatic HDL Code Generation Using LLMs

Wenhao Sun,Bing Li,Grace Li Zhang,Xunzhao Yin,Cheng Zhuo,Ulf Schlichtmann
2024-07-04
Abstract:While large language models (LLMs) have demonstrated the ability to generate hardware description language (HDL) code for digital circuits, they still suffer from the hallucination problem, which leads to the generation of incorrect HDL code or misunderstanding of specifications. In this work, we introduce a human-expert-inspired method to mitigate the hallucination of LLMs and improve the performance in HDL code generation. We first let LLMs classify the type of the circuit based on the specifications. Then, according to the type of the circuit, we split the tasks into several sub-procedures, including information extraction and human-like design flow using Electronic Design Automation (EDA) tools. Besides, we also use a search method to mitigate the variation in code generation. Experimental results show that our method can significantly improve the functional correctness of the generated Verilog and reduce the hallucination of LLMs.
Hardware Architecture
What problem does this paper attempt to address?
The paper primarily addresses the hallucination problem in large language models (LLMs) when generating hardware description language (HDL) code, which involves generating incorrect HDL code or misunderstanding specifications. To tackle this challenge, the authors propose a human-expert-inspired approach to mitigate hallucinations in LLMs and enhance HDL code generation performance. The core contributions of the paper can be summarized as follows: 1. **Classification and Subtask Division**: First, LLMs classify the circuit type based on the specifications (e.g., combinational logic or sequential logic). Then, depending on the circuit type, the task is broken down into multiple subprocesses, including information extraction and human-like design processes using electronic design automation (EDA) tools. 2. **Application of Search Methods**: A search strategy is introduced to allocate test budgets to specific types of circuit generation processes (such as combinational logic COMB and sequential logic SEQU) as well as general processes (BEHAV). This method improves the efficiency of the code generation process by selecting the most promising information lists and distributing them between specific types and general processes. 3. **Experimental Results**: The proposed method's effectiveness is validated through experiments, significantly improving the functional correctness of the generated Verilog code. Specifically, on the VerilogEval-human dataset, the proposed scheme improves performance by 4.7%, 11.0%, and 14.7% in Pass@1, Pass@5, and Pass@10, respectively, compared to baseline methods. On the VerilogEval-machine dataset, there is also an improvement of over 5%. 4. **Motivation Analysis**: The paper further analyzes the limitations of existing methods, pointing out that retrieval-augmented generation (RAG) methods rely on the quality of the database, and constructing a high-quality database is costly. Additionally, using testbench feedback as a database alternative is not always feasible. Therefore, the proposed method aims to mitigate the hallucination problem in LLMs without relying on fine-tuning, manual labor, databases, or testbench feedback. 5. **Technical Details**: The technical implementation steps are detailed, from circuit classification to information list extraction to the specific type of circuit generation process. It particularly emphasizes how reducing the number of reasoning steps can lower the risk of hallucinations in LLMs. In summary, this paper proposes a novel approach to improve LLMs' performance in HDL code generation, achieving significant results in addressing the hallucination problem.