DolphCoder: Echo-Locating Code Large Language Models with Diverse and Multi-Objective Instruction Tuning

Yejie Wang,Keqing He,Guanting Dong,Pei Wang,Weihao Zeng,Muxi Diao,Yutao Mou,Mengdi Zhang,Jingang Wang,Xunliang Cai,Weiran Xu

2024-02-14

Abstract:Code Large Language Models (Code LLMs) have demonstrated outstanding performance in code-related tasks. Several instruction tuning approaches have been proposed to boost the code generation performance of pre-trained Code LLMs. In this paper, we introduce a diverse instruction model (DolphCoder) with self-evaluating for code generation. It learns diverse instruction targets and combines a code evaluation objective to enhance its code generation ability. Our model achieves superior performance on the HumanEval and MBPP benchmarks, demonstrating new insights for future code instruction tuning work. Our key findings are: (1) Augmenting more diverse responses with distinct reasoning paths increases the code capability of LLMs. (2) Improving one's ability to evaluate the correctness of code solutions also enhances their ability to create it.

Computation and Language,Artificial Intelligence

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper aims to address two main issues in the task of code generation: 1. **Lack of Diversity in Code Generation**: Existing code generation models typically focus on generating a single correct answer, neglecting the diversity brought by different solution paths. The authors found that increasing the diversity of responses can enhance the model's code generation capability. 2. **Insufficient Code Evaluation Capability**: Current code generation models, although capable of generating syntactically and logically reasonable code snippets, struggle to identify subtle errors such as edge cases and input-output format errors. This limitation hinders the model's ability to generate high-quality code. The authors believe that improving the model's ability to evaluate code correctness can also enhance its code generation capability. To this end, the authors propose a diversified instruction model named DolphCoder, which combines a self-evaluation mechanism to enhance the diversity and accuracy of code generation. Specifically, DolphCoder is trained through the following two stages: 1. **Diversified Instruction Tuning (DIT)**: Using different system prompts to obtain diverse code solutions from ChatGPT, thereby increasing the model's diversity. 2. **Multi-Objective Instruction Tuning (MOT)**: Combining traditional code generation tasks with code evaluation tasks, improving the model's overall capability through a multi-step training method. Through these methods, DolphCoder significantly outperforms existing open-source code generation models in multiple benchmark tests, demonstrating its superior performance in code generation tasks.

DolphCoder: Echo-Locating Code Large Language Models with Diverse and Multi-Objective Instruction Tuning

WaveCoder: Widespread And Versatile Enhancement For Code Large Language Models By Instruction Tuning

CodeT5+: Open Code Large Language Models for Code Understanding and Generation

InverseCoder: Self-improving Instruction-Tuned Code LLMs with Inverse-Instruct

AlchemistCoder: Harmonizing and Eliciting Code Capability by Hindsight Tuning on Multi-source Data

How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with Really Good Data

WizardCoder: Empowering Code Large Language Models with Evol-Instruct

LLaMoCo: Instruction Tuning of Large Language Models for Optimization Code Generation

CodeACT: Code Adaptive Compute-efficient Tuning Framework for Code LLMs

How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data

Large Language Models as Code Executors: An Exploratory Study

Evaluating Instruction-Tuned Large Language Models on Code Comprehension and Generation

SelfCodeAlign: Self-Alignment for Code Generation

MoTCoder: Elevating Large Language Models with Modular of Thought for Challenging Programming Tasks

Benchmarking Large Language Models on Controllable Generation under Diversified Instructions

InstructCoder: Instruction Tuning Large Language Models for Code Editing

From Symbolic Tasks to Code Generation: Diversification Yields Better Task Performers

Semi-Instruct: Bridging Natural-Instruct and Self-Instruct for Code Large Language Models

Improving Natural Language Capability of Code Large Language Model

Evaluating and Aligning CodeLLMs on Human Preference