Mining Cohesive Domain Topics from Source Code

Bing Xie,Meng Li,Jing Jin,Junfeng Zhao,Yanzhen Zou
DOI: https://doi.org/10.1007/978-3-642-38977-1_16
2013-01-01
Abstract:Using topic models to mine domain topics from source code has been a promising way for developers to comprehend the functional concerns implemented in the source code of a software system. However, not all the topics mined from source code are domain topics that represent functional concerns of the software. Besides domain topics, other topics may represent cross-cutting concerns or other concerns. These topics are noises in the context of helping developers to comprehend the functional concerns. In this paper, we propose an approach to filter out noises and mine Cohesive Domain Topics (CDTs) from source code. A topic is a CDT if its associated words represent certain functional concern and its associated source code elements collaboratively implement the functional concern. Firstly, we propose a series of Filtering Heuristics to filter out programming related information in source code which may bring in noises. Then, we mine raw topics from source code using Latent Dirichlet Allocation. Finally, based on the structural relationships among the source code elements associated to a topic, we propose a novel metric called Topic Cohesion to identify CDTs from the raw topics. Experimental results on a set of open source software show that our approach can effectively filter out noises and obtain CDTs from source code.
What problem does this paper attempt to address?