Extracting Code-relevant Description Sentences Based on Structural Similarity

Yingkui Cao,Yanzhen Zou,Bing Xie
DOI: https://doi.org/10.1145/3361242.3362699
2019-01-01
Abstract:Software developers often need to read code snippets that are dispersed among different documentation, e.g., Q&A posts, to reuse APIs to complete certain tasks. These code snippets are often surrounded by lengthy context text which are used to describe the functions of code snippets. It will be helpful for code comprehension if we can align a code snippet with its description. In this paper, we propose an approach to extracting code-relevant sentences from its context text. To quantify the relevance between code line and natural language sentence, we represent them with structure trees and calculate their structural similarity. We conduct two experiments to evaluate our approach. In Experiment I, the results show that our approach achieves 83.5% precision and 80.1% recall in aligning Lucene code snippets and corresponding comments. Our approach achieves 27.6% ~ 40.2% improvement in precision compared with existing method, and 33.8% ~ 39.7% improvement in recall. In Experiment II, the results show that our approach achieves 66.4% ~ 93.9% precision to extract code-relevant sentences.
What problem does this paper attempt to address?