Improving Code Summarization Performance with Model Fusion

Yuqing Guo,Hang Su,Hongyu Gao,Qingcheng Sun
DOI: https://doi.org/10.1109/ICID60307.2023.10396770
2023-01-01
Abstract:Source code summaries provide concise natural language descriptions aiding in software maintenance. In traditional deep learning-based code summarization techniques, a single model with the same architecture is typically used to handle all data, without considering the differences in source code characteristics. This paper presents a model fusion strategy that integrates multiple model architectures. This strategy dynamically allocates weights through the gating network, using different single models for summary generation based on the distinct characteristics of the code. It fully leverages the various embedding methods and model structures of multiple single models, finding the appropriate processing combination for codes with different characteristics. The strategy harnesses the advantages of the supervised learning model’s explicit knowledge learning and the pre-trained model’s deep semantic understanding. Coupled with various code feature extraction methods, it effectively generates high-quality summaries for code snippets of varying types and complexities. Two widely recognized public datasets are used to train and evaluate our method. The results show that our model fusion strategy substantially elevates code summarization quality over single models, enhancing semantic alignment between source code and summaries.
What problem does this paper attempt to address?