A Comparative Study of DSL Code Generation: Fine-Tuning vs. Optimized Retrieval Augmentation

Nastaran Bassamzadeh,Chhaya Methani

2024-07-03

Abstract:Natural Language to Code Generation has made significant progress in recent years with the advent of Large Language Models(LLMs). While generation for general-purpose languages like C, C++, and Python has improved significantly, LLMs struggle with custom function names in Domain Specific Languages or DSLs. This leads to higher hallucination rates and syntax errors, specially for DSLs having a high number of custom function names. Additionally, constant updates to function names add to the challenge as LLMs need to stay up-to-date. In this paper, we present optimizations for using Retrieval Augmented Generation (or RAG) with LLMs for DSL generation along with an ablation study comparing these strategies. We generated a train as well as test dataset with a DSL to represent automation tasks across roughly 700 APIs in public domain. We used the training dataset to fine-tune a Codex model for this DSL. Our results showed that the fine-tuned model scored the best on code similarity metric. With our RAG optimizations, we achieved parity for similarity metric. The compilation rate, however, showed that both the models still got the syntax wrong many times, with RAG-based method being 2 pts better. Conversely, hallucination rate for RAG model lagged by 1 pt for API names and by 2 pts for API parameter keys. We conclude that an optimized RAG model can match the quality of fine-tuned models and offer advantages for new, unseen APIs.

Software Engineering,Artificial Intelligence,Computation and Language

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to improve the quality of generated code in natural language - to - domain - specific language (NL2DSL) code generation, especially when dealing with domain - specific languages (DSL) that contain a large number of custom function names. Specifically, the paper focuses on the following aspects: 1. **Reducing the hallucination rate**: Current large - scale language models (LLMs) are prone to hallucination when generating DSL code, that is, generating non - existent API names or parameter keys. This results in low - quality generated code that is difficult to use directly. 2. **Improving syntactic correctness**: Since DSL usually has a strict syntactic structure, the code generated by LLMs often has syntactic errors, which affects the usability of the code. 3. **Adapting to frequently updated APIs**: As APIs are constantly updated, LLMs need to be able to quickly adapt to new API names and functions, rather than relying solely on fixed training data. 4. **Optimizing retrieval - augmented generation (RAG) technology**: By optimizing RAG technology, the paper aims to improve the quality of generated DSL code, making it comparable to fine - tuned models while having better scalability and adaptability. To address these problems, the paper proposes a series of methods, including fine - tuning the model, optimizing RAG technology, dynamically selecting a small number of examples (few - shots), and adding API metadata, etc. The purpose of these methods is to improve the model's adaptability to new APIs while ensuring the quality of the generated code, reducing the hallucination rate and syntactic errors.

A Comparative Study of DSL Code Generation: Fine-Tuning vs. Optimized Retrieval Augmentation

Plan with Code: Comparing approaches for robust NL to DSL generation

De-Hallucinator: Mitigating LLM Hallucinations in Code Generation Tasks via Iterative Grounding

LLM-Assisted Code Cleaning For Training Accurate Code Generators

Optimizing Large Language Models for OpenAPI Code Completion

SynCode: LLM Generation with Grammar Augmentation

Preference-Guided Refactored Tuning for Retrieval Augmented Code Generation

ProgAI: Enhancing Code Generation with LLMs For Real World Challenges

DiffCoder: Enhancing Large Language Model on API Invocation via Analogical Code Exercises

Should AI Optimize Your Code? A Comparative Study of Current Large Language Models Versus Classical Optimizing Compilers

A Survey on LLM-based Code Generation for Low-Resource and Domain-Specific Programming Languages

Exploring Demonstration Retrievers in RAG for Coding Tasks: Yeas and Nays!

DSLR: Document Refinement with Sentence-Level Re-ranking and Reconstruction to Enhance Retrieval-Augmented Generation

Rewriting the Code: A Simple Method for Large Language Model Augmented Code Search

CodeLutra: Boosting LLM Code Generation via Preference-Guided Refinement

LLM Agents Improve Semantic Code Search

Enhancing Code Translation in Language Models with Few-Shot Learning via Retrieval-Augmented Generation

Towards Optimizing a Retrieval Augmented Generation using Large Language Model on Academic Data

RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards

Teaching Code LLMs to Use Autocompletion Tools in Repository-Level Code Generation