Deep Learning Based Code Generation Methods: Literature Review

Zezhou Yang,Sirong Chen,Cuiyun Gao,Zhenhao Li,Ge Li,Michael Lyu

DOI: https://doi.org/10.13328/j.cnki.jos.006981

2024-04-18

Abstract:This paper focuses on Code Generation task that aims at generating relevant code fragments according to given natural language descriptions. In the process of software development, developers often encounter two scenarios. One is requested to write a large amount of repetitive and low-technical code for implementing common functionalities. The other is writing code that depends on specific task requirements, which may necessitate the use of external resources such as documentation or other tools. Therefore, code generation has received a lot of attention among academia and industry for assisting developers in coding. In fact, it has also been one of the key concerns in the field of software engineering to make machines understand users' requirements and write programs on their own. The recent development of deep learning techniques especially pre-training models make the code generation task achieve promising performance. In this paper, we systematically review the current work on deep learning-based code generation and classify the current deep learning-based code generation methods into three categories: methods based on code features, methods incorporated with retrieval, and methods incorporated with post-processing. The first category refers to the methods that use deep learning algorithms for code generation based on code features, and the second and third categories of methods improve the performance of the methods in the first category. In this paper, the existing research results of each category of methods are systematically reviewed, summarized and commented. Besides, the paper summarizes and analyzes the corpus and the popular evaluation metrics used in the existing code generation work. Finally, the paper summarizes the overall literature review and provides a prospect on future research directions worthy of attention.

Software Engineering

What problem does this paper attempt to address?

The paper attempts to address two major challenges faced by developers during the software development process: 1. the need to write a large amount of repetitive and low-technical-content code; 2. writing code that depends on specific task requirements, which often requires developers to consult documentation or use other tools to complete. To tackle these challenges, the paper focuses on the task of Code Generation, aiming to help developers complete coding work more efficiently through machine learning, especially deep learning techniques. Specifically, the paper explores how to enable machines to understand user requirements and automatically generate code snippets that meet these requirements, thereby reducing the burden on developers and improving the efficiency and quality of software development. The paper pays special attention to the application of deep learning technologies, particularly pre-trained models, which have made significant progress in the task of code generation in recent years. By systematically reviewing current deep learning-based code generation methods, the paper categorizes these methods into three types: feature-based methods, retrieval-based methods, and post-processing-based methods, and systematically reviews, analyzes, and summarizes the existing research results of each type. In addition, the paper summarizes and analyzes commonly used corpora and evaluation methods in code generation tasks, providing valuable references for subsequent research. Finally, the paper summarizes the research progress of code generation methods and looks forward to future research directions worth paying attention to.

Deep Learning Based Code Generation Methods: Literature Review

Deep Learning Based Code Generation Methods: A Literature Review.

Deep Learning for Code Generation: a Survey

Code Generation Based on Deep Learning: a Brief Review

Deep Learning Based Code Generation from Requirements Text: Are We There Yet?

Deep Learning for Source Code Modeling and Generation

Deep Learning for Source Code Modeling and Generation: Models, Applications and Challenges

A comprehensive review of State-of-The-Art methods for Java code generation from Natural Language Text

Deep Learning Based Program Generation from Requirements Text: Are We There Yet?

Deep Learning for Automated Code Generation: Challenges and Opportunities

Language To Code With Open Source Software

A Review on Code Generation with LLMs: Application and Evaluation

Natural Language to Code: How Far Are We?

Survey of Code Search Based on Deep Learning

A Survey on Large Language Models for Code Generation

Automated Source Code Generation and Auto-completion Using Deep Learning: Comparing and Discussing Current Language-Model-Related Approaches

The Good, the Bad, and the Missing: Neural Code Generation for Machine Learning Tasks

Investigating the Use of Natural Language Processing for Automated Code Generation

How Well Do LLMs Generate Code for Different Application Domains? Benchmark and Evaluation

Natural Language Generation and Understanding of Big Code for AI-Assisted Programming: A Review