Abstract:Large Language Models (LLMs) have become increasingly popular for generating RTL code. However, producing error-free RTL code in a zero-shot setting remains highly challenging for even state-of-the-art LLMs, often leading to issues that require manual, iterative refinement. This additional debugging process can dramatically increase the verification workload, underscoring the need for robust, automated correction mechanisms to ensure code correctness from the start. In this work, we introduce AIvril2, a self-verifying, LLM-agnostic agentic framework aimed at enhancing RTL code generation through iterative corrections of both syntax and functional errors. Our approach leverages a collaborative multi-agent system that incorporates feedback from error logs generated by EDA tools to automatically identify and resolve design flaws. Experimental results, conducted on the VerilogEval-Human benchmark suite, demonstrate that our framework significantly improves code quality, achieving nearly a 3.4$\times$ enhancement over prior methods. In the best-case scenario, functional pass rates of 77% for Verilog and 66% for VHDL were obtained, thus substantially improving the reliability of LLM-driven RTL code generation.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the challenges encountered when using large - language models (LLMs) to generate register - transfer - level (RTL) code. Although LLMs perform well in generating RTL code, it is still very difficult to generate error - free RTL code in a zero - shot setting. This has led to the following main problems: 1. **Syntax and functional errors**: The RTL code generated by LLMs often contains syntax and functional errors and requires manual, iterative correction. This additional debugging process significantly increases the verification workload. 2. **Lack of an automated verification mechanism**: Existing solutions usually lack a powerful automated verification mechanism and cannot ensure that correct code is generated from the start. 3. **Insufficient multilingual support**: Many existing methods are only targeted at specific RTL languages (such as Verilog), which limits their application in different hardware description languages. To solve these problems, the paper proposes the AIVRIL 2 framework, which is a self - validating, LLM - independent proxy framework that enhances RTL code generation by iteratively correcting syntax and functional errors. Specifically, the main contributions of AIVRIL 2 include: - **Two - stage test bench and RTL code generation pipeline**: The first stage focuses on syntax correction, and the second stage focuses on functional correction. Both of these stages integrate LLMs with EDA tools to achieve the gradual optimization of the code. - **Multi - agent system**: Three specialized LLM agents are introduced: - **Code Agent**: Responsible for generating robust RTL code and comprehensive test benches. - **Review Agent**: Interprets complex EDA logs to detect and correct syntax errors. - **Verification Agent**: Analyzes functional traces to ensure the accuracy of the design. - **Language - independence**: AIVRIL 2 is completely orthogonal to the target RTL language and independent of the specific LLM, making it adaptable to various scenarios and different work flows. The experimental results show that in the best - case scenario, AIVRIL 2 improves the code quality by 3.4 times compared to existing methods, achieving a Verilog functional pass rate of 77% and a VHDL functional pass rate of 66% respectively, significantly enhancing the reliability and efficiency of LLM - driven RTL code generation. Through these improvements, the AIVRIL 2 framework provides a comprehensive and flexible solution to some of the key problems in current automatic RTL code generation.

EDA-Aware RTL Generation with Large Language Models

AIvril: AI-Driven RTL Generation With Verification In-The-Loop

Towards LLM-Powered Verilog RTL Assistant: Self-Verification and Self-Correction

Evaluating Large Language Models for Automatic Register Transfer Logic Generation via High-Level Synthesis

Benchmarking Large Language Models for Automated Verilog RTL Code Generation

Can EDA Tool Feedback Improve Verilog Generation by LLMs?

LAAG-RV: LLM Assisted Assertion Generation for RTL Design Verification

Advanced Large Language Model (LLM)-Driven Verilog Development: Enhancing Power, Performance, and Area Optimization in Code Synthesis

A Multi-Expert Large Language Model Architecture for Verilog Code Generation

RTLFixer: Automatically Fixing RTL Syntax Errors with Large Language Models

VHDL-Eval: A Framework for Evaluating Large Language Models in VHDL Code Generation

AutoVCoder: A Systematic Framework for Automated Verilog Code Generation using LLMs

VeriGen: A Large Language Model for Verilog Code Generation

Data is all you need: Finetuning LLMs for Chip Design via an Automated design-data augmentation framework

LLM-Aided Efficient Hardware Design Automation

Automatic High-quality Verilog Assertion Generation through Subtask-Focused Fine-Tuned LLMs and Iterative Prompting

LLM-aided explanations of EDA synthesis errors

LLM4EDA: Emerging Progress in Large Language Models for Electronic Design Automation

AssertLLM: Generating and Evaluating Hardware Verification Assertions from Design Specifications via Multi-LLMs

FVEval: Understanding Language Model Capabilities in Formal Verification of Digital Hardware