Building a Domain Knowledge Base from Wikipedia: a Semi-supervised Approach.

Kai Chen,Xiang Dong,Jiangang Zhu,Beijun Shen
DOI: https://doi.org/10.18293/seke2016-051
2016-01-01
Abstract:Knowledge bases are becoming indispensable to software engineering and knowledge engineering. However, the existing domain knowledge bases are always artificially construct- ed and small-scale. In this paper, we propose a semi-supervised approach to domain concepts detection and software engineering knowledge base construction from Wikipedia. First, the approach selects domain relevant tags from Stackoverflow. Then, it matches Wikipedia entities and expands the concept set through an improved label propagation algorithm. A rule-based method is designed to discover semantic relations including relate, subclassOf and equal by analyzing structural information of Wikipedia. A relation derivation mechanism is presented to optimize the relation set. We finally construct SEBase, a domain- specific knowledge base of software engineering. Experimental results show the high accuracy of the integrated concepts and relations. Compared with other knowledge bases, SEBase has the widest coverage of concepts and relations in software engineering.
What problem does this paper attempt to address?