Imperfect Code Generation: Uncovering Weaknesses in Automatic Code Generation by Large Language Models

Xiaoli Lian,Shuaisong Wang,Jieping Ma,Xin Tan,Fang Liu,Lin Shi,Cuiyun Gao,Li Zhang
DOI: https://doi.org/10.1145/3639478.3643081
2024-01-01
Abstract:The task of code generation has received significant attention in recent years, especially when the pre-trained large language models (LLMs) for code have consistently achieved state-of-the-art performance. However, there is currently a lack of a comprehensive weakness taxonomy in the field, uncovering weaknesses in automatic code generation by LLMs. This may lead the community to invest excessive efforts into well-known hotspots while neglecting many crucial yet unrecognized issues that deserve more attention. To bridge this gap, we conduct a systematic study on analyzing the weaknesses based on three state-of-the-art LLMs across three widely-used code generation datasets. Our study identifies eight types of weaknesses and assesses their prevalence across each LLM and dataset, aiming to inform and shape the trajectory of future research in the domain.
What problem does this paper attempt to address?