Enhancing Code Annotation Reliability: Generative AI's Role in Comment Quality Assessment Models

Seetharam Killivalavan,Durairaj Thenmozhi
2024-10-30
Abstract:This paper explores a novel method for enhancing binary classification models that assess code comment quality, leveraging Generative Artificial Intelligence to elevate model performance. By integrating 1,437 newly generated code-comment pairs, labeled as "Useful" or "Not Useful" and sourced from various GitHub repositories, into an existing C-language dataset of 9,048 pairs, we demonstrate substantial model improvements. Using an advanced Large Language Model, our approach yields a 5.78% precision increase in the Support Vector Machine (SVM) model, improving from 0.79 to 0.8478, and a 2.17% recall boost in the Artificial Neural Network (ANN) model, rising from 0.731 to 0.7527. These results underscore Generative AI's value in advancing code comment classification models, offering significant potential for enhanced accuracy in software development and quality control. This study provides a promising outlook on the integration of generative techniques for refining machine learning models in practical software engineering settings.
Software Engineering
What problem does this paper attempt to address?
The problem that this paper attempts to solve is **improving the performance of the code comment quality assessment model**. Specifically, the author enhances the ability of the binary classification model to assess code comment quality by introducing Generative AI. The following are the main problems and challenges mentioned in the paper: 1. **Limitations of Manual Evaluation of Code Comments**: - Manually evaluating the quality of code comments is time - consuming and subjective, and it is difficult to maintain consistency. - As the size and complexity of software projects increase, manual evaluation becomes increasingly infeasible. 2. **Deficiencies of Existing Automated Methods**: - Although existing automated classification methods can partially solve the problem, there is still room for improvement in terms of accuracy and efficiency. - Traditional models have limitations in handling semantic understanding and context - related aspects of code comments. 3. **Limitations of the Dataset**: - The quantity and diversity of existing datasets are limited, which may lead to insufficient model training or over - fitting. - The comment quality labels in the dataset may be biased, affecting the generalization ability of the model. To solve these problems, the paper proposes a new method based on Generative AI to improve the code comment quality assessment model through the following steps: - **Generate New Code - Comment Pairs**: Use a large - language model (LLM) to generate 1,437 new code - comment pairs and label them as "useful" or "useless" to expand the existing C - language dataset (which contains 9,048 code - comment pairs). - **Model Improvement**: Integrate these newly generated data into the existing Support Vector Machine (SVM) and Artificial Neural Network (ANN) models to enhance the performance of the models. - **Performance Verification**: Through experimental verification, the data generated using LLM significantly improves the accuracy of the SVM model (from 0.79 to 0.8478, an increase of 5.78%) and improves the recall rate of the ANN model (from 0.731 to 0.7527, an increase of 2.17%). In conclusion, this paper aims to solve the bottlenecks in existing code comment quality assessment methods by introducing Generative AI technology, thereby improving the accuracy and efficiency of the models and providing more reliable tools for software development and quality control.