An Empirical Study on the Characteristics of Connected Knowledge Subgraphs on Stack Overflow

Tianyue Sun,Lina Gong,Jingxuan Zhang,Mingqiang Wei
DOI: https://doi.org/10.1109/qrs-c63300.2024.00019
2024-01-01
Abstract:In the Stack Overflow (SO) community, users typically employ links (including internal and external links) in their posts or comments to better present their cited knowledge. Prior studies have already demonstrated that external links bring valuable information, thus enhancing the software engineering knowledge quality within the community. Nowadays, the act of users referencing internal links has formed connected knowledge subgraphs with question threads as nodes and internal links as edges. However, the characteristics of the subgraphs have not been investigated. It is still unclear how and why users reference internal links within the subgraphs. In addition, the impact of internal links within the subgraphs on the community is yet to be explored. Therefore, we constructed subgraphs through internal links and analyzed their structural stability based on the official SO data dump. Subsequently, we employed qualitative analysis methods to explore how and why developers reference internal links within the subgraphs. Finally, we conducted a quantitative analysis to assess the impact of the subgraphs on the community. We observed the subgraphs are structurally stable, with a slowing expansion rate. Meanwhile, We found that 82.4% of links were cited without summarizing the content. Notably, 64.3% of links without summaries are likely to hinder knowledge comprehension once they become obsolete. This is especially pronounced for links cited for “providing solutions to subproblem”, reaching as high as 96.9%. Our quantitative analysis has uncovered the last updated time of question threads within the subgraphs is closer to the present. Simultaneously, they also exhibit a lower deletion rate. Based on our findings, we provide actionable suggestions for developers, SO community, and researchers. For example, we encourage researchers to develop a visualization tool that can visualize question threads and their internal links from the “linked” list as graphs. This facilitates in-depth exploration of specific software engineering topics.
What problem does this paper attempt to address?