CodeDPO: Aligning Code Models with Self Generated and Verified Source Code

Kechi Zhang,Ge Li,Yihong Dong,Jingjing Xu,Jun Zhang,Jing Su,Yongfei Liu,Zhi Jin
2024-10-08
Abstract:Code generation models have shown significant potential for programming tasks. However, existing training methods like supervised fine-tuning face key limitations: they do not effectively teach models to prioritize correct over incorrect solutions in ambiguous situations, nor do they effectively optimize the runtime efficiency of the generated code. To address these challenges, we propose CodeDPO, a framework that integrates preference learning into code generation to improve two key code preference factors: code correctness and efficiency. CodeDPO employs a novel dataset construction method, utilizing a self-generation-and-validation mechanism that simultaneously generates and evaluates code and test cases. The underlying assumption is that test cases executable by multiple code snippets provide more reliable validation, and code that passes more tests is more likely to be correct. Through this self-validation process, our PageRank-inspired algorithm iteratively updates the ranking score of each code snippet, ultimately creating a code preference optimization dataset based on correctness and efficiency. CodeDPO is flexible and scalable, generating diverse preference optimization data without depending on external resources. Through comprehensive evaluations of five widely used benchmarks, CodeDPO demonstrates significant improvements in correctness and efficiency compared to existing methods. Our experiments prove that CodeDPO enhances the capabilities of LLMs in code generation and provides a robust foundation for conducting code preference optimization in more complex and challenging real-world scenarios.
Software Engineering
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are the two major challenges faced by existing code - generation models during the training process: **code correctness** and **running efficiency**. Specifically: 1. **Code correctness problem**: - Although the existing Supervised Fine - Tuning (SFT) method can improve the overall quality of the generated code, it cannot effectively teach the model to preferentially select the correct solution in ambiguous situations. - This results in the model may produce undesirable outputs when generating code, especially when facing complex tasks. 2. **Running efficiency problem**: - Existing methods fail to effectively optimize the running efficiency of the generated code, which makes the generated code may be inferior in performance to hand - written code. To solve these problems, the author proposes the **CodeDPO** framework, which improves two key factors, code correctness and efficiency, by introducing Preference Learning into the code - generation model. Specific practices include: - **Automatic generation and verification mechanism**: CodeDPO uses a novel dataset construction method. By automatically generating and verifying code and test cases, it ensures that the generated code can pass more tests and is thus more likely to be correct. - **PageRank - inspired algorithm**: This algorithm iteratively updates the score of each code snippet based on the number of tests it passes and its reliability, and finally creates a code preference - optimized dataset based on correctness and efficiency. - **Flexibility and extensibility**: CodeDPO does not rely on external resources and can create diverse preference - optimized data through the automatic generation and verification mechanism, which is suitable for complex real - world scenarios. Through these improvements, CodeDPO shows significant performance improvements in multiple benchmark tests, especially in terms of code correctness and efficiency. ### Summary This paper aims to solve the deficiencies of existing code - generation models in terms of correctness and efficiency by proposing the CodeDPO framework, thereby improving the quality and performance of the generated code.