Abstract:AbstractCode summarization aims at generating a code comment given a block of source code and it is normally performed by training machine learning algorithms on existing code block-comment pairs. Code comments in practice have different intentions. For example, some code comments might explain how the methods work, while others explain why some methods are written. Previous works have shown that a relationship exists between a code block and the category of a comment associated with it. In this article, we aim to investigate to which extent we can exploit this relationship to improve code summarization performance. We first classify comments into six intention categories and manually label 20,000 code-comment pairs. These categories include “what,” “why,” “how-to-use,” “how-it-is-done,” “property,” and “others.” Based on this dataset, we conduct an experiment to investigate the performance of different state-of-the-art code summarization approaches on the categories. We find that the performance of different code summarization approaches varies substantially across the categories. Moreover, the category for which a code summarization model performs the best is different for the different models. In particular, no models perform the best for “why” and “property” comments among the six categories. We design a composite approach to demonstrate that comment category prediction can boost code summarization to reach better results. The approach leverages classified code-category labeled data to train a classifier to infer categories. Then it selects the most suitable models for inferred categories and outputs the composite results. Our composite approach outperforms other approaches that do not consider comment categories and obtains a relative improvement of 8.57% and 16.34% in terms of ROUGE-L and BLEU-4 score, respectively.

Improving Code Summarization Performance with Model Fusion

Why My Code Summarization Model Does Not Work

Why My Code Summarization Model Does Not Work: Code Comment Improvement with Category Prediction

On the Evaluation of Neural Code Summarization

Demystifying Code Summarization Models.

Interpretation-based Code Summarization.

GypSum: Learning Hybrid Representations for Code Summarization

Contextual Information Enhanced Source Code Summarization

ESALE: Enhancing Code-Summary Alignment Learning for Source Code Summarization

Improved Code Summarization via a Graph Neural Network

EnCoSum: enhanced semantic features for multi-scale multi-modal source code summarization

Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization

Improving Automatic Source Code Summarization Via Deep Reinforcement Learning

Towards Retrieval-Based Neural Code Summarization: A Meta-Learning Approach

WheaCha: A Method for Explaining the Predictions of Code Summarization Models

Revisiting Information Retrieval and Deep Learning Approaches for Code Summarization.

Low-Resources Project-Specific Code Summarization

CoCoSum: Contextual Code Summarization with Multi-Relational Graph Neural Network

Leveraging In-and-Cross Project Pseudo-Summaries for Project-Specific Code Summarization

A Syntax-Augmented and Headline-Aware Neural Text Summarization Method

Learning to Generate Structured Code Summaries From Hybrid Code Context