Abstract:Code large language models (LLMs) face limitations in repository-level code generation due to their lack of awareness of repository-level dependencies (e.g., user-defined attributes), resulting in dependency errors such as undefined-variable and no-member errors. In this work, we introduce ToolGen, an approach that integrates autocompletion tools into the code LLM generation process to address these dependencies. ToolGen comprises two main phases: Trigger Insertion and Model Fine-tuning (Offline), and Tool-integrated Code Generation (Online). During the offline phase, ToolGen augments functions within a given code corpus with a special mark token, indicating positions to trigger autocompletion tools. These augmented functions, along with their corresponding docstrings, are then used to fine-tune a selected code LLM. In the online phase, ToolGen iteratively generates functions by predicting tokens step-by-step using the fine-tuned LLM. Whenever a mark token is encountered, ToolGen invokes the autocompletion tool to suggest code completions and selects the most appropriate one. We conduct comprehensive experiments to evaluate ToolGen's effectiveness in repository-level code generation. To facilitate this evaluation, we create a benchmark comprising 671 real-world code repositories and introduce two new dependency-based metrics: Dependency Coverage and Static Validity Rate. The results demonstrate that ToolGen significantly improves Dependency Coverage by 31.4% to 39.1% and Static Validity Rate by 44.9% to 57.7% across the three LLMs, while maintaining competitive or improved performance in widely recognized similarity metrics such as BLEU-4, CodeBLEU, Edit Similarity, and Exact Match. On the CoderEval dataset, ToolGen achieves improvements of 40.0% and 25.0% in Pass@1 for CodeT5 and CodeLlama, respectively.

A syntax-guided multi-task learning approach for Turducken-style code generation

Lyra: A Benchmark for Turducken-Style Code Generation

From Symbolic Tasks to Code Generation: Diversification Yields Better Task Performers

Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation

Teaching Code LLMs to Use Autocompletion Tools in Repository-Level Code Generation

SynCoBERT: Syntax-Guided Multi-Modal Contrastive Pre-Training for Code Representation

Incorporating Domain Knowledge through Task Augmentation for Front-End JavaScript Code Generation

UniCoder: Scaling Code Large Language Model via Universal Code

StructCoder: Structure-Aware Transformer for Code Generation

Multi-task learning based pre-trained language model for code completion

CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis

Bridging Code Semantic and LLMs: Semantic Chain-of-Thought Prompting for Code Generation

JumpCoder: Go Beyond Autoregressive Coder via Online Modification

CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models

CodeGRAG: Bridging the Gap between Natural Language and Programming Language via Graphical Retrieval Augmented Generation

Compilable Neural Code Generation with Compiler Feedback

DolphCoder: Echo-Locating Code Large Language Models with Diverse and Multi-Objective Instruction Tuning

GrammarT5: Grammar-Integrated Pretrained Encoder-Decoder Neural Model for Code

Seq2Seq or Seq2Tree: Generating Code Using Both Paradigms Via Mutual Learning.

CodeT: Code Generation with Generated Tests