From Code to Natural Language: Type-Aware Sketch-Based Seq2Seq Learning

Yuhang Deng,Hao Huang,Xu Chen,Zuopeng Liu,Sai Wu,Jifeng Xuan,Zongpeng Li
DOI: https://doi.org/10.1007/978-3-030-59410-7_25
2020-01-01
Abstract:Code comment generation aims to translate existing source code into natural language explanations. It provides an easy-to-understand description for developers who are unfamiliar with the functionality of source code. Existing approaches to code comment generation focus on summarizing multiple lines of code with a short text, but often cannot effectively explain a single line of code. In this paper, we propose an asynchronous learning model, which learns the code semantics and generates a fine-grained natural language explanation for each line of code. Different from a coarse-grained code comment generation, this fine-grained explanation can help developers better understand the functionality line-by-line. The proposed model adopts a type-aware sketch-based sequence-to-sequence learning method to generate natural language explanations for source code. This method incorporates the type of source code and the mask mechanism with the Long Short Term Memory (LSTM) network via encoding and decoding phases. We empirically compare the proposed model with state-of-the-art approaches on real data sets of source code and description in Python. Experimental results demonstrate that our model can outperform existing approaches on commonly used metrics for neural machine translation.
What problem does this paper attempt to address?