PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback

Bo Shen,Jiaxin Zhang,Taihong Chen,Daoguang Zan,Bing Geng,An Fu,Muhan Zeng,Ailun Yu,Jichuan Ji,Jingyang Zhao,Yuenan Guo,Qianxiang Wang
2023-07-27
Abstract:Large Language Models for Code (Code LLM) are flourishing. New and powerful models are released on a weekly basis, demonstrating remarkable performance on the code generation task. Various approaches have been proposed to boost the code generation performance of pre-trained Code LLMs, such as supervised fine-tuning, instruction tuning, reinforcement learning, etc. In this paper, we propose a novel RRTF (Rank Responses to align Test&Teacher Feedback) framework, which can effectively and efficiently boost pre-trained large language models for code generation. Under this framework, we present PanGu-Coder2, which achieves 62.20% pass@1 on the OpenAI HumanEval benchmark. Furthermore, through an extensive evaluation on CoderEval and LeetCode benchmarks, we show that PanGu-Coder2 consistently outperforms all previous Code LLMs.
Computation and Language,Artificial Intelligence,Machine Learning,Programming Languages,Software Engineering
What problem does this paper attempt to address?
The paper aims to address the issue of performance improvement in large language models (Code LLM) for code generation tasks. Specifically, the authors propose a new framework called RRTF (RankResponses to align Test&Teacher Feedback) to effectively enhance the performance of pre-trained code generation models. Compared to existing reinforcement learning-based methods, the RRTF framework simplifies the training process by using ranking feedback instead of absolute reward values and can optimize the model more efficiently. The main contributions of the paper include: 1. Proposing a new optimization paradigm, RRTF, which is a data-efficient, easy-to-implement, and model-agnostic framework that can significantly improve the performance of pre-trained code generation models. 2. Applying the RRTF framework to the StarCoder 15B model, resulting in a new model named PanGu-Coder2, which outperforms all published code generation models in the HumanEval, CoderEval, and LeetCode benchmarks. 3. Sharing experiences and findings on constructing effective training data, training models using the RRTF framework, and optimizing models for fast inference. Experimental results show that PanGu-Coder2 not only excels in simple programming tasks but also outperforms existing models in real-world software development scenarios and programming competition problems. This indicates that the model has strong applicability and superiority in various programming tasks.