Network-adjusted covariates for community detection

Y Hu,W Wang
DOI: https://doi.org/10.1093/biomet/asae011
IF: 3.0279
2024-02-23
Biometrika
Abstract:Abstract Community detection is a crucial task in network analysis that can be significantly improved by incorporating subject-level information, i.e., covariates. Existing methods have shown the effectiveness of using covariates on the low-degree nodes, but rarely discuss the case where communities have significantly different density levels, i.e. multiscale networks. In this paper, we introduce a novel method that addresses this challenge by constructing network-adjusted covariates, which leverage the network connections and covariates with a node-specific weight for each node. This weight can be calculated without tuning parameters. We present novel theoretical results on the strong consistency of our method under degree-corrected stochastic blockmodels with covariates, even in the presence of misspecification and multiple sparse communities. Additionally, we establish a general lower bound for the community detection problem when both network and covariates are present, and it shows our method is optimal for connection intensity up to a constant factor. Our method outperforms existing approaches in simulations and a LastFM app user network. We then compare our method with others on a statistics publication citation network where 30% of nodes are isolated, and our method produces reasonable and balanced results.
statistics & probability,mathematical & computational biology,biology
What problem does this paper attempt to address?
This paper attempts to solve the problem of community detection in multi - scale networks, especially when there are communities with significantly different density levels in the network. Existing methods have proven the effectiveness of using covariates on low - degree nodes, but rarely discuss the situation in multi - scale networks. This paper addresses this challenge by introducing network - adjusted covariates, which utilize node - specific weights to combine network connections and node attributes. Specifically, the main contributions of the paper include: 1. **Proposing a new method**: By constructing network - adjusted covariates, this method can effectively perform community detection in multi - scale networks. The weight of each node can be calculated without parameter tuning. 2. **Theoretical results**: The paper provides strong consistency results under network - adjusted covariates, and can maintain strong consistency in the degree - corrected stochastic block model even in the case of network - covariate mismatches. 3. **Optimality analysis**: A general lower bound for the community detection problem in the presence of both the network and covariates is established, and it is proven that the proposed method achieves optimality (up to a constant factor) in connection strength. 4. **Experimental verification**: The effectiveness of the method is verified through simulation and actual data sets (such as the LastFM application user network and the statistical publication citation network), especially when dealing with networks with a large number of isolated nodes, showing reasonable and balanced results. Overall, this paper aims to improve the accuracy and robustness of community detection in multi - scale networks by introducing network - adjusted covariates, especially in the case where there are extremely sparse communities in the network.