PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback

Bo Shen,Jiaxin Zhang,Taihong Chen,Daoguang Zan,Bing Geng,An Fu,Muhan Zeng,Ailun Yu,Jichuan Ji,Jingyang Zhao,Yuenan Guo,Qianxiang Wang

2023-07-27

Abstract:Large Language Models for Code (Code LLM) are flourishing. New and powerful models are released on a weekly basis, demonstrating remarkable performance on the code generation task. Various approaches have been proposed to boost the code generation performance of pre-trained Code LLMs, such as supervised fine-tuning, instruction tuning, reinforcement learning, etc. In this paper, we propose a novel RRTF (Rank Responses to align Test&Teacher Feedback) framework, which can effectively and efficiently boost pre-trained large language models for code generation. Under this framework, we present PanGu-Coder2, which achieves 62.20% pass@1 on the OpenAI HumanEval benchmark. Furthermore, through an extensive evaluation on CoderEval and LeetCode benchmarks, we show that PanGu-Coder2 consistently outperforms all previous Code LLMs.

Computation and Language,Artificial Intelligence,Machine Learning,Programming Languages,Software Engineering

What problem does this paper attempt to address?

The paper aims to address the issue of performance improvement in large language models (Code LLM) for code generation tasks. Specifically, the authors propose a new framework called RRTF (RankResponses to align Test&Teacher Feedback) to effectively enhance the performance of pre-trained code generation models. Compared to existing reinforcement learning-based methods, the RRTF framework simplifies the training process by using ranking feedback instead of absolute reward values and can optimize the model more efficiently. The main contributions of the paper include: 1. Proposing a new optimization paradigm, RRTF, which is a data-efficient, easy-to-implement, and model-agnostic framework that can significantly improve the performance of pre-trained code generation models. 2. Applying the RRTF framework to the StarCoder 15B model, resulting in a new model named PanGu-Coder2, which outperforms all published code generation models in the HumanEval, CoderEval, and LeetCode benchmarks. 3. Sharing experiences and findings on constructing effective training data, training models using the RRTF framework, and optimizing models for fast inference. Experimental results show that PanGu-Coder2 not only excels in simple programming tasks but also outperforms existing models in real-world software development scenarios and programming competition problems. This indicates that the model has strong applicability and superiority in various programming tasks.

PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback

StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback

PanGu-Coder: Program Synthesis with Function-Level Language Modeling

CodeLutra: Boosting LLM Code Generation via Preference-Guided Refinement

StepCoder: Improving Code Generation with Reinforcement Learning from Compiler Feedback

Improving Natural Language Capability of Code Large Language Model

L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models

Large Language Models as Code Executors: An Exploratory Study

R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models

CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models

CodeT5+: Open Code Large Language Models for Code Understanding and Generation

ML-Bench: Large Language Models Leverage Open-source Libraries for Machine Learning Tasks

DolphCoder: Echo-Locating Code Large Language Models with Diverse and Multi-Objective Instruction Tuning

AICoderEval: Improving AI Domain Code Generation of Large Language Models

BioCoder: A Benchmark for Bioinformatics Code Generation with Large Language Models

Sifting through the Chaff: On Utilizing Execution Feedback for Ranking the Generated Code Candidates

Think Outside the Code: Brainstorming Boosts Large Language Models in Code Generation

AlchemistCoder: Harmonizing and Eliciting Code Capability by Hindsight Tuning on Multi-source Data

CodeJudge: Evaluating Code Generation with Large Language Models

Coder Reviewer Reranking for Code Generation