Classifying Python Code Comments Based on Supervised Learning

Jingyi Zhang,Lei Xu,Yanhui Li
DOI: https://doi.org/10.1007/978-3-030-02934-0_4
2018-01-01
Abstract:Code comments can provide a great data source for understanding programmer’s needs and underlying implementation. Previous work has illustrated that code comments enhance the reliability and maintainability of the code, and engineers use them to interpret their code as well as help other developers understand the code intention better. In this paper, we studied comments from 7 python open source projects and contrived a taxonomy through an iterative process. To clarify comments characteristics, we deploy an effective and automated approach using supervised learning algorithms to classify code comments according to their different intentions. With our study, we find that there does exist a pattern across different python projects: Summary covers about 75% of comments. Finally, we conduct an evaluation on the behaviors of two different supervised learning classifiers and find that Decision Tree classifier is more effective on accuracy and runtime than Naive Bayes classifier in our research.
What problem does this paper attempt to address?