Leveraging Print Debugging to Improve Code Generation in Large Language Models

Xueyu Hu,Kun Kuang,Jiankai Sun,Hongxia Yang,Fei Wu
2024-01-11
Abstract:Large language models (LLMs) have made significant progress in code generation tasks, but their performance in tackling programming problems with complex data structures and algorithms remains suboptimal. To address this issue, we propose an in-context learning approach that guides LLMs to debug by using a "print debugging" method, which involves inserting print statements to trace and analysing logs for fixing the bug. We collect a Leetcode problem dataset and evaluate our method using the Leetcode online judging system. Experiments with GPT-4 demonstrate the effectiveness of our approach, outperforming rubber duck debugging in easy and medium-level Leetcode problems by 1.5% and 17.9%.
Computation and Language,Software Engineering
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to address the poor performance of large language models (LLMs) when dealing with complex data structures and algorithmic programming problems. Specifically: 1. **Background**: - Large language models have made significant progress in code generation tasks, but their performance is still unsatisfactory when faced with complex programming problems (such as those involving complex data structures and algorithms). - GPT-4 has an accuracy rate of 76% on simple-level Leetcode problems, but only 26% and 7% on medium and hard levels, respectively. 2. **Main Objectives**: - Propose a new method to guide LLMs in debugging code through the use of "print debugging," thereby improving their performance on complex programming problems. - Compared to existing debugging methods (such as rubber duck debugging), this method can better identify and fix errors. 3. **Specific Methods**: - Use the "print debugging" method by inserting print statements in the code to track variable values during execution and analyze logs to locate and fix errors. - Collect a dataset of problems from the Leetcode platform and evaluate it using the Leetcode online judge system. - Experimental results show that this method significantly outperforms existing rubber duck debugging methods on simple and medium difficulty Leetcode problems, improving accuracy by 1.5% and 17.9%, respectively. In summary, the goal of this paper is to enhance the debugging capabilities of LLMs on complex programming problems, enabling them to more effectively identify and fix errors.