Recovering Relationships Between Documentation and Source Code Based on the Characteristics of Software Engineering
Xiaobo Wang,Guanhui Lai,Chao Liu
DOI: https://doi.org/10.1016/j.entcs.2009.07.009
2009-01-01
Electronic Notes in Theoretical Computer Science
Abstract:Software documentation is usually expressed in natural languages, which contains much useful information. Therefore, establishing the traceability links between documentation and source code can be very helpful for software engineering management, such as requirement traceability, impact analysis, and software reuse. Currently, the recovery of traceability links is mostly based on information retrieval techniques, for instance, probabilistic model, vector space model, and latent semantic indexing. Previous work treats both documentation and source code as plain text files, but the quality of retrieved links can be improved by imposing additional structure using that they are software engineering documents. In this paper, we present four enhanced strategies to improve traditional LSI method based on the special characteristics of documentation and source code, namely, source code clustering, identifier classifying, similarity thesaurus, and hierarchical structure enhancement. Experimental results show that the first three enhanced strategies can increase the precision of retrieved links by 5%∼16%, while the the fourth strategy is about 13%.