An Exploratory Study on Codes in Heterogeneous Software Documents

Yanzhen Zou,Yingkui Cao,Bing Xie
DOI: https://doi.org/10.1145/3275219.3275233
2018-01-01
Abstract:Different kinds of software documents are produced in the life cycle of a software project, such as Bug Reports, Mail Lists, etc. These documents have close relationship with source code, but it is difficult to recover their traceability relationship. In this paper, we conduct an exploratory study on codes in a software project's heterogeneous documents, so that we can give some hints for traceability recovery from software documents to source code. We select a famous open source software project, Lucene, as sample, and collect its four kinds of software documents, including Bug Reports, Mail Lists, Stack Overflow Q&A Documents and Blogs. On this basis, we analyze these heterogeneous documents to answer the following questions: How much code is there in different kinds of documents? What APIs do these documents focus on? How many documents are relevant to the same APIs? Based on the study, we give 3 hints for recovering the traceability from software heterogeneous documents to source code.
What problem does this paper attempt to address?