Semantic Similarity Loss for Neural Source Code Summarization

Chia-Yi Su,Collin McMillan
2024-06-12
Abstract:This paper presents a procedure for and evaluation of using a semantic similarity metric as a loss function for neural source code summarization. Code summarization is the task of writing natural language descriptions of source code. Neural code summarization refers to automated techniques for generating these descriptions using neural networks. Almost all current approaches involve neural networks as either standalone models or as part of a pretrained large language models e.g., GPT, Codex, LLaMA. Yet almost all also use a categorical cross-entropy (CCE) loss function for network optimization. Two problems with CCE are that 1) it computes loss over each word prediction one-at-a-time, rather than evaluating a whole sentence, and 2) it requires a perfect prediction, leaving no room for partial credit for synonyms. In this paper, we extend our previous work on semantic similarity metrics to show a procedure for using semantic similarity as a loss function to alleviate this problem, and we evaluate this procedure in several settings in both metrics-driven and human studies. In essence, we propose to use a semantic similarity metric to calculate loss over the whole output sentence prediction per training batch, rather than just loss for each word. We also propose to combine our loss with CCE for each word, which streamlines the training process compared to baselines. We evaluate our approach over several baselines and report improvement in the vast majority of conditions.
Software Engineering,Artificial Intelligence
What problem does this paper attempt to address?
The main aim of this paper is to address two key issues in the field of neural source code summarization when using the traditional Categorical Cross-Entropy (CCE) loss function: 1. **Word-by-word evaluation instead of whole sentence evaluation**: CCE calculates the loss for each predicted word rather than evaluating the entire generated sentence. 2. **Lack of partial scoring mechanism**: CCE requires completely correct predictions and does not give any score for predictions that are semantically similar but use different words. To solve these problems, the authors propose a new loss function based on semantic similarity, named use-seq. This new method introduces semantic similarity as part of the loss function during training, thereby better reflecting human evaluation standards for output results. Specifically, use-seq is implemented through the following steps: - Convert the predicted sequence into natural language form. - Calculate the semantic similarity between the reference summary and the predicted summary. - Broadcast the semantic similarity to each word. - Use masking to avoid inappropriate penalties. - Use exponential rewards to adjust the semantic similarity. - Combine the semantic similarity score with CCE to calculate the final loss value for each predicted word. Additionally, the paper describes a series of experiments to validate the effectiveness of the proposed method, including using different types of neural network models for the source code summarization task and comparing with existing baseline methods (such as CCE, BLEU-based loss, and SimiLe). The experimental results show that in most cases, the use of the use-seq loss function leads to performance improvements. In summary, the core objective of this paper is to improve the model training process and final performance in neural source code summarization tasks by introducing the use-seq loss function based on semantic similarity.