A Simple Retrieval-based Method for Code Comment Generation

Xiaoning Zhu,Chaofeng Sha,Junyu Niu
DOI: https://doi.org/10.1109/SANER53432.2022.00126
2022-01-01
Abstract:Code comments can effectively help developers comprehend programs. However, it is a challenging and time consuming task to write good comments for source code. Therefore, automatic generation of code comments is a promising research direction. Recently, researchers have leveraged neural machine translation to generate comments from source code and achieved impressive results. Another line of work has tried to exploit information retrieval (IR) techniques and showed excellent performance improvement on this task. However, current retrieval-based methods usually involve complex retrieval and editing operations, which are difficult to implement. To tackle the problems, we propose kNN-Transformer, a simple end-to-end retrieval-based code comment generation method. Our method combines a simple nearest neighbor retrieval module and a powerful transformer-based model. When generating each token, the retrieval module estimates a probability distribution depending on the current translation context rather than obtaining the retrieved samples in advance. The experiment results on four widely used public datasets (two Java datasets and two Python datasets) demonstrate that our method outperforms all the baselines, and our kNN retrieval module brings significant improvement when similar code snippets are available.
What problem does this paper attempt to address?