The Unexpected Blocking of Code Understanding in AI-Based Code Summarization: Observations and Concerns from a Study on Cross-Project Learning Performance

Sixuan Zhang,Yan Liu
DOI: https://doi.org/10.1007/978-3-031-66459-5_1
2024-01-01
Abstract:Code comments are one of the pillars of software reuse, as they assist program comprehension activities and reduce maintenance costs. Code comments are prevalent with missing, outdated, and mismatched issues, whereas code summarization could help by generating concise comments. With the development of deep learning, the emergence of pre-trained models has brought notable progress in code summarization. However, these Transformer-based models prefer literal representations of code and may struggle to utilize structural features, which may affect the models. Structured features are common to codes; real code is always in a project. Therefore, in this work, we evaluate the models by the observations of the models’ cross-project generalization performance, including metrics, tendency observations, and iterative observations. The results show that current models perform surprisingly poorly, with BLEU scores over 50% lower, while ROUGE and METEOR scores are over 30% lower on cross-project test sets. Current models may rely more on project-specific than generic code features to understand code. We link this to the model’s performance on different code channels. It is suggested that the model performance from different code channels be evaluated and the implicit code channels further explored.
What problem does this paper attempt to address?