CodeCoT and Beyond: Learning to Program and Test Like a Developer.

Dong Huang,Qingwen Bu,Yuhao Qing,Heming Cui
DOI: https://doi.org/10.48550/arxiv.2308.08784
2023-01-01
Abstract:Chain-of-thought (CoT) has emerged as a groundbreaking tool in NLP, notablyfor its efficacy in complex reasoning tasks, such as mathematical proofs.However, its application in code generation faces a distinct challenge, i.e.,although the code generated with CoT reasoning is logically correct, it facesthe problem of syntax error (e.g., invalid syntax error report) during codeexecution, which causes the CoT result's pass@1 in HumanEval even lower thanthe zero-shot result. In this paper, we present Code Chain-of-Thought (CodeCoT) that integrates CoTwith a self-examination process for code generation. CodeCoT begins with theLLMs using CoT for initial code development to ensure the generated codefollows the correct logic flow. Then, CodeCoT will generate test cases tovalidate whether the code has syntax errors during the execution. CodeCoT thenemploys a self-examination phase, in which the generated code is executedagainst these test cases in the local environment. If the local environmentraises error information (e.g., invalid syntax error), CodeCoT will iterativelyrefine the code based on the feedback information. Within this loop, CodeCoTcan make sure their generated codes not only follow the logic flow of the codedescription, but the syntax error will also be addressed with theself-examination process. Our evaluation results reveal that CodeCoT improvesthe effectiveness of code generation. For example, CodeCoT increases pass@1from 75.6
What problem does this paper attempt to address?