Prioritizing Code Documentation Effort: Can We Do It Simpler but Better?

Shiran Liu,Zhaoqiang Guo,Yanhui Li,Hongmin Lu,Lin Chen,Lei Xu,Yuming Zhou,Baowen Xu
DOI: https://doi.org/10.1016/j.infsof.2021.106686
IF: 3.9
2021-01-01
Information and Software Technology
Abstract:. Due to time or economic pressures, code developers are often unable to write documents for all modules in a project. Recently, a supervised artificial neural network (ANN) approach is proposed to prioritize documentation effort “to ensure that sections of code important to program comprehension are thoroughly explained”. . However, as a supervised approach, there is a need to use labeled training data to train the prediction model, which may not easy to obtain in practice. Furthermore, it is unclear whether the ANN approach is generalizable, as it is only evaluated on several small data sets collected from API libraries. . In this paper, we propose an unsupervised approach based on improved PageRank to prioritize documentation effort. This approach identifies “important” modules only based on the dependence relationships between modules in a project. As a result, the PageRank approach does not need any training data to build the prediction model. . In order to evaluate the effectiveness of the PageRank approach, we use six additional large data sets collected from two larger libraries and four applications to conduct the experiment. The experimental results show that the PageRank approach is superior to the state-of-the-art ANN approach. . Due to the simplicity and effectiveness, we advocate that the PageRank approach should be used as an easy-to-implement baseline in future research on documentation effort prioritization, and any newly proposed approach should be compared with it to demonstrate its effectiveness.
What problem does this paper attempt to address?