An Empirical Study of Code Generation Errors made by Large Language Models

Da Song, Zijie Zhou, Zhijie Wang, Yuheng Huang, Shengmai Chen, Bonan Kou, Lei Ma, Tianyi Zhang
2023-01-01
Abstract:The emergence of Large Language Models (LLMs) has revolutionized automatic code generation from natural language input. Despite the promising performance, there remains a limited understanding of the code generation errors that LLMs can produce. To bridge the gap, this study provides an in-depth analysis of code generation errors across three representative LLMs within the HumanEval dataset. Specifically, we employ open-coding and iterative refinement to distill a comprehensive taxonomy of code generation errors intrinsic to LLMs. Based on this taxonomy, we identified two predominant categories of errors: semantic errors, indicating logical misunderstandings of the natural language input, and syntactic errors, uncovering structural misconceptions within the code. Additionally, we observed a consistent distribution of different error types across three models despite the differing successful rates. Our findings reveal the challenges that current code generation LLMs encounter, shedding light on future research about error-handling and repair techniques for LLMs’ code generation.
What problem does this paper attempt to address?