Stochastic Degree Sequence Model with Edge Constraints (SDSM-EC) for Backbone Extraction

Zachary P. Neal,Jennifer Watling Neal
DOI: https://doi.org/10.1007/978-3-031-53468-3_11
2024-04-08
Abstract:It is common to use the projection of a bipartite network to measure a unipartite network of interest. For example, scientific collaboration networks are often measured using a co-authorship network, which is the projection of a bipartite author-paper network. Caution is required when interpreting the edge weights that appear in such projections. However, backbone models offer a solution by providing a formal statistical method for evaluating when an edge in a projection is statistically significantly strong. In this paper, we propose an extension to the existing Stochastic Degree Sequence Model (SDSM) that allows the null model to include edge constraints (EC) such as prohibited edges. We demonstrate the new SDSM-EC in toy data and empirical data on young children's' play interactions, illustrating how it correctly omits noisy edges from the backbone.
Social and Information Networks,Other Statistics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to extract the backbone of the network more accurately during the process of projecting from a bipartite graph to a unimodal graph. Specifically, the paper proposes an extended Stochastic Degree Sequence Model with Edge Constraints (SDSM - EC). This model allows the addition of edge constraints (such as forbidden edges) in the null model to more precisely evaluate whether the edges in the projected graph are statistically significant. ### Background and Problem In many studies, in order to measure the unimodal network of interest, the projection of a bipartite graph is usually used. For example, the scientific cooperation network can be measured through the co - author network, and the co - author network is the projection of the author - paper bipartite graph. However, the edge weights in this projection cannot directly reflect the actual connection strength between nodes because they may be affected by the network structure. Therefore, a method is needed to identify which edges are statistically significantly stronger than expected, so as to extract the backbone of the network, that is, only keep these significant edges. ### Limitations of Existing Methods Existing backbone extraction models, such as the Fixed Degree Sequence Model (FDSM) and the Stochastic Degree Sequence Model (SDSM), although they can provide certain solutions, they all have a common limitation: they cannot handle specific edge constraints, such as forbidden edges. Forbidden edges refer to edges that cannot exist in certain situations. For example, in the author - paper bipartite graph, an author cannot write a paper before birth. ### Proposed Method The paper proposes the SDSM - EC model, which improves the existing SDSM model by introducing edge constraints (such as forbidden edges). Specifically, when generating a random network, the SDSM - EC model will consider these edge constraints, so that the generated random network is more in line with the actual situation. In this way, when evaluating whether the edges in the projected graph are significant, it is possible to more accurately exclude those edges that cannot exist due to structural limitations. ### Experimental Verification The paper verifies the effectiveness of the SDSM - EC model through toy data and empirical data. The experimental results show that the SDSM - EC model can correctly exclude those non - significant edges. Especially in the case of the existence of forbidden edges, compared with the traditional SDSM model, the backbone extracted by the SDSM - EC model is more accurate. ### Conclusion The paper emphasizes the importance of using an appropriately constrained null model when extracting the backbone of a bipartite graph projection. In particular, when there are forbidden edges in the bipartite graph, the SDSM - EC model can more accurately identify significant edges, thereby improving the quality of backbone extraction. Future research can further explore other types of edge constraints and improve the method of estimating the probability matrix \( Q \) to improve the efficiency and accuracy of the model.