MR 2-KG: A Multi-Relation Multi-Rationale Knowledge Graph for Modeling Software Engineering Knowledge on Stack Overflow
Lina Gong,Haoxiang Zhang
DOI: https://doi.org/10.1109/tse.2024.3403108
IF: 7.4
2024-07-19
IEEE Transactions on Software Engineering
Abstract:Stack Overflow is a knowledge sharing platform where its users create and share informative content from both inside and outside the site. Prior studies have leveraged the relation across Stack Overflow posts through internal links to build services and applications to enhance the accessibility of knowledge. However, they focused on studying a knowledge unit that consists of a question post and all the associated answer posts to represent the relation. It is unknown whether such representation of knowledge on Stack Overflow could comprehensively model various complex relations among webpages, such as questions, answers, internal and external links. In addition, the rationales behind sharing knowledge on Stack Overflow have yet to be explored among distinct user groups, such as askers, answerers, readers who wish to learn. Thus, in this study, we first investigate the real-world characteristics of Stack Overflow knowledge by abstracting the complex knowledge representation into relations among its building blocks. We observe that a question thread includes three basic knowledge relations to reassemble into complex knowledge, that is, the hierarchy relation within the associated answers in a question, the coupling relation between knowledge artifacts (i.e., question or answer posts) through internal links, and the complimentary relation between Stack Overflow posts and external websites. All these three basic knowledge relations are informative and could be caused by different rationales when the crowdsourced knowledge is shared on Stack Overflow. Our findings highlight that it is necessary to propose a comprehensive knowledge graph to represent the real-world knowledge on Stack Overflow. Therefore, we further propose a Multi-Relation Multi-Rationale Knowledge Graph (MR 2-KG), whose nodes represent questions, answers, and external webpages. Edges in the MR 2-KG represent the rationales included in the three structures (i.e., question answering, duplicate, priori, posterior, parallelism, containment, and working examples knowledge). In addition, we develop an automated approach to model the nodes and edges to represent Stack Overflow knowledge associated with a question thread. Our case study shows that the automated knowledge representation generation can achieve an ROC AUC of 96% and MCC of 89% to identify edges in the MR 2-KG. To further evaluate the applicability of MR 2-KG, we develop an answer generator to help developers efficiently identify the answers that meet their intent. Our user study of 100 real-world Java questions indicates the usefulness of MR 2-KG. Finally, we discuss the implications of our findings for developers, researchers, and Stack Overflow moderators.
engineering, electrical & electronic,computer science, software engineering