DolphCoder: Echo-Locating Code Large Language Models with Diverse and Multi-Objective Instruction Tuning

Yejie Wang,Keqing He,Guanting Dong,Pei Wang,Weihao Zeng,Muxi Diao,Yutao Mou,Mengdi Zhang,Jingang Wang,Xunliang Cai,Weiran Xu
2024-02-14
Abstract:Code Large Language Models (Code LLMs) have demonstrated outstanding performance in code-related tasks. Several instruction tuning approaches have been proposed to boost the code generation performance of pre-trained Code LLMs. In this paper, we introduce a diverse instruction model (DolphCoder) with self-evaluating for code generation. It learns diverse instruction targets and combines a code evaluation objective to enhance its code generation ability. Our model achieves superior performance on the HumanEval and MBPP benchmarks, demonstrating new insights for future code instruction tuning work. Our key findings are: (1) Augmenting more diverse responses with distinct reasoning paths increases the code capability of LLMs. (2) Improving one's ability to evaluate the correctness of code solutions also enhances their ability to create it.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address two main issues in the task of code generation: 1. **Lack of Diversity in Code Generation**: Existing code generation models typically focus on generating a single correct answer, neglecting the diversity brought by different solution paths. The authors found that increasing the diversity of responses can enhance the model's code generation capability. 2. **Insufficient Code Evaluation Capability**: Current code generation models, although capable of generating syntactically and logically reasonable code snippets, struggle to identify subtle errors such as edge cases and input-output format errors. This limitation hinders the model's ability to generate high-quality code. The authors believe that improving the model's ability to evaluate code correctness can also enhance its code generation capability. To this end, the authors propose a diversified instruction model named DolphCoder, which combines a self-evaluation mechanism to enhance the diversity and accuracy of code generation. Specifically, DolphCoder is trained through the following two stages: 1. **Diversified Instruction Tuning (DIT)**: Using different system prompts to obtain diverse code solutions from ChatGPT, thereby increasing the model's diversity. 2. **Multi-Objective Instruction Tuning (MOT)**: Combining traditional code generation tasks with code evaluation tasks, improving the model's overall capability through a multi-step training method. Through these methods, DolphCoder significantly outperforms existing open-source code generation models in multiple benchmark tests, demonstrating its superior performance in code generation tasks.