BetterV: Controlled Verilog Generation with Discriminative Guidance

Zehua Pei,Hui-Ling Zhen,Mingxuan Yuan,Yu Huang,Bei Yu
2024-05-02
Abstract:Due to the growing complexity of modern Integrated Circuits (ICs), there is a need for automated circuit design methods. Recent years have seen rising research in hardware design language generation to facilitate the design process. In this work, we propose a Verilog generation framework, BetterV, which fine-tunes the large language models (LLMs) on processed domain-specific datasets and incorporates generative discriminators for guidance on particular design demands. The Verilog modules are collected, filtered and processed from internet to form a clean and abundant dataset. Instruct-tuning methods are specially designed to fine-tune the LLMs to understand the knowledge about Verilog. Furthermore, data are augmented to enrich the training set and also used to train a generative discriminator on particular downstream task, which leads a guidance for the LLMs to optimize the Verilog implementation. BetterV has the ability to generate syntactically and functionally correct Verilog, which can outperform GPT-4 on the VerilogEval benchmark. With the help of task-specific generative discriminator, BetterV can achieve remarkable improvement on various electronic design automation (EDA) downstream tasks, including the netlist node reduction for synthesis and verification runtime reduction with Boolean Satisfiability (SAT) solving.
Artificial Intelligence,Programming Languages
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to use large - language models (LLMs) to automatically generate functionally correct and well - optimized Verilog code in the field of electronic design automation (EDA). Specifically, the paper proposes a framework named BetterV, which aims to address the challenges in existing methods in the following ways: 1. **Enhancing LLMs' understanding of Verilog**: Through domain - specific instruction - tuning, enable LLMs to better understand the Verilog language and its design requirements. 2. **Generating high - quality Verilog code**: Increase the diversity and quantity of training data through data augmentation techniques, reduce the risk of overfitting, and improve the quality of the generated code. 3. **Optimizing downstream tasks**: Introduce a generative discriminator to guide LLMs to consider specific downstream task requirements, such as logic synthesis node reduction and verification runtime reduction, when generating Verilog code. ### Main contributions 1. **Pioneering application**: BetterV is the first work to apply controllable text generation techniques to engineering optimization challenges, especially in the optimization of EDA downstream tasks. 2. **Task - driven method**: BetterV is the first Verilog generation method oriented by downstream tasks. Guided by task - specific discriminators, it improves training efficiency and practical application value. 3. **Surpassing existing models**: Using pre - trained models with 6.7B/7B parameters and not relying on prompt engineering strategies, BetterV can generate Verilog code with correct grammar and functionality and outperform GPT - 4 in the VerilogEval benchmark. 4. **Data augmentation**: Propose a data augmentation method for implementing diverse specifications for Verilog, which effectively solves the problem of scarce Verilog resources. ### Experimental results In terms of functional correctness, BetterV performs better than other models on the VerilogEval benchmark, especially on test cases written by humans. The specific results are shown in the following table: | Model | VerilogEval - machine (pass@1) | VerilogEval - machine (pass@5) | VerilogEval - machine (pass@10) | VerilogEval - human (pass@1) | VerilogEval - human (pass@5) | VerilogEval - human (pass@10) | | ------ | ----------------------------- | ----------------------------- | ------------------------------ | --------------------------- | --------------------------- | --------------------------- | | GPT - 3.5 | 46.7 | 69.1 | 74.1 | 26.7 | 45.8 | 51.7 | | GPT - 4 | 60.0 | 70.6 | 73.5 | 43.5 | 55.8 | 58.9 | | CodeLlama | 43.1 | 47.1 | 47.7 | 18.2 | 22.7 | 24.3 | | DeepSeek | 52.2 | 55.4 | 56.8 | 30.2 | 33.9 | 34.9 | | CodeQwen | 46.5 | 54.9 | 56.4 | 22.5 | 26.1 | 28.0 | | ChipNeMo | 43.4 | - | - | 22.4 | - | - | | Thakur et al. | 44.0 | 52.6 | 59.2 | 30.3 | 43.9 | 49.6 |