CYCLE: Learning to Self-Refine the Code Generation

Yangruibo Ding,Marcus J. Min,Gail Kaiser,Baishakhi Ray
2024-03-28
Abstract:Pre-trained code language models have achieved promising performance in code generation and improved the programming efficiency of human developers. However, their self-refinement capability is typically overlooked by the existing evaluations of code LMs, which focus only on the accuracy of the one-time prediction. For the cases when code LMs fail to implement the correct program, developers actually find it hard to debug and fix the faulty prediction since it is not written by the developers themselves. Unfortunately, our study reveals that code LMs cannot efficiently self-refine their faulty generations as well.
Software Engineering,Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in code generation, although existing pre - trained code language models (code LMs) perform excellently in the accuracy of one - time prediction, they have obvious deficiencies in self - refinement. Specifically, when the code generated by these models fails the test cases, they have difficulty in effectively self - correcting errors based on execution feedback. This causes developers to face difficulties in debugging and fixing when using the code generated by these models, especially in the exploration mode, that is, when developers face unclear or not fully defined requirements. The paper proposes a framework named Cycle, aiming to enhance the self - correction ability of code language models by using execution feedback, thereby improving their performance in the exploration mode. ### Main contributions of the paper 1. **Revealing the weaknesses of code language models**: The paper points out that existing code language models perform poorly in understanding execution feedback and self - correcting errors. 2. **Proposing the Cycle framework**: This framework teaches code language models how to self - correct by jointly focusing on natural language problem descriptions, error - prone code generated by the model, and execution feedback. 3. **Data collection and training strategies**: The paper designs an automated data generation method to construct a data set specifically for self - correction training, and proposes a training strategy to enable the model to learn self - correction more effectively. 4. **Experimental verification**: The paper conducts extensive experiments on three popular code - generation benchmark data sets, and the results show that the Cycle framework significantly improves the performance of code generation, especially in terms of self - correction. ### Technical details of the paper 1. **Data preparation stage**: - **Fine - tuning the code language model**: First, use the verified correct code to fine - tune the pre - trained code language model to reduce the risk of the model generating error - prone code. - **Prompting the code language model to expose weaknesses**: By prompting the fine - tuned model to generate code and execute test cases, collect error - prone code and its execution feedback to construct training samples. 2. **Learning the self - correction stage**: - **Aggregating information**: Design a template to aggregate the problem description, error - prone code, and execution feedback together as the input of the model. - **Self - correction learning**: The model gradually improves its self - correction ability by learning to predict the correct code solution. - **Past - Generated Mask (PGM)**: To prevent the model from simply copying error - prone code during the training process, the past - generated mask technique is introduced, making the model more inclined to truly understand and correct errors. 3. **Self - correction as iterative programming**: - **Automated workflow**: Deploy the learned model, automatically generate code according to the problem description, and automatically verify and correct the code through test cases, simulating the iterative programming practice of human developers. ### Conclusion The paper significantly improves the performance of code language models in self - correction through the Cycle framework, especially in the exploration mode, providing developers with a more powerful tool to help them generate and debug code more efficiently.