A Survey of Neural Code Intelligence: Paradigms, Advances and Beyond

Qiushi Sun,Zhirui Chen,Fangzhi Xu,Kanzhi Cheng,Chang Ma,Zhangyue Yin,Jianing Wang,Chengcheng Han,Renyu Zhu,Shuai Yuan,Qipeng Guo,Xipeng Qiu,Pengcheng Yin,Xiaoli Li,Fei Yuan,Lingpeng Kong,Xiang Li,Zhiyong Wu
2024-09-01
Abstract:Neural Code Intelligence -- leveraging deep learning to understand, generate, and optimize code -- holds immense potential for transformative impacts on the whole society. Bridging the gap between Natural Language and Programming Language, this domain has drawn significant attention from researchers in both research communities over the past few years. This survey presents a systematic and chronological review of the advancements in code intelligence, encompassing over 50 representative models and their variants, more than 20 categories of tasks, and an extensive coverage of over 680 related works. We follow the historical progression to trace the paradigm shifts across different research phases (e.g., from modeling code with recurrent neural networks to the era of Large Language Models). Concurrently, we highlight the major technical transitions in models, tasks, and evaluations spanning through different stages. For applications, we also observe a co-evolving shift. It spans from initial endeavors to tackling specific scenarios, through exploring a diverse array of tasks during its rapid expansion, to currently focusing on tackling increasingly complex and varied real-world challenges. Building on our examination of the developmental trajectories, we further investigate the emerging synergies between code intelligence and broader machine intelligence, uncovering new cross-domain opportunities and illustrating the substantial influence of code intelligence across various domains. Finally, we delve into both the opportunities and challenges associated with this field, alongside elucidating our insights on the most promising research directions. An ongoing, dynamically updated project and resources associated with this survey have been released at <a class="link-external link-https" href="https://github.com/QiushiSun/NCISurvey" rel="external noopener nofollow">this https URL</a>.
Software Engineering,Artificial Intelligence,Computation and Language,Programming Languages
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the review and progress analysis in the field of Neural Code Intelligence. Specifically, the paper aims to: 1. **Systematic Review**: Provide a systematic and chronological review, covering more than 50 representative models and their variants, more than 20 task categories, and over 690 related research works. 2. **Technical Evolution**: Trace the paradigm shifts at different research stages, such as the development from using Recurrent Neural Networks (RNN) to model code to the era of large - language models (LLM). 3. **Technical Transformation**: Emphasize the major technical transformations in models, tasks, and evaluations across different development stages. 4. **Application Evolution**: Observe the application evolution from the initial attempts to solve problems in specific scenarios, to exploring diverse tasks, and then to currently focusing on solving increasingly complex and diverse real - world challenges. 5. **Cross - Domain Fusion**: Explore the emerging synergies between code intelligence and other broader machine intelligence, reveal new cross - domain opportunities, and demonstrate the substantial impact of code intelligence in various fields. 6. **Opportunities and Challenges**: Thoroughly explore the opportunities and challenges in this field and clarify insights into the most promising research directions in the future. Through these objectives, the paper hopes to provide researchers and practitioners with a comprehensive perspective on the latest progress and development trends in Neural Code Intelligence.