Abstract:Identifying significant community structures in networks with incomplete data is a challenging task, as the reliability of solutions diminishes with increasing levels of missing information. However, in many empirical contexts, some information about the uncertainty in the network measurements can be estimated. In this work, we extend the recently developed Flow Stability framework, originally designed for detecting communities in time-varying networks, to address the problem of community detection in weighted, directed networks with missing links. Our approach leverages known uncertainty levels in nodes' out-degrees to enhance the robustness of community detection. Through comparisons on synthetic networks and a real-world network of messaging channels on the Telegram platform, we demonstrate that our method delivers more reliable community structures, even when a significant portion of data is missing.
What problem does this paper attempt to address?
### What problem does this paper attempt to solve?
This paper aims to solve the problem of **community detection in directed networks with missing edges**. Specifically, the author focuses on how to more accurately identify the community structure in the network when the network data is incomplete. As the amount of missing information increases, the reliability of traditional community detection methods will decrease significantly. However, in many practical scenarios, the uncertainty in network measurement can be estimated.
#### Main challenges
1. **Data missing**: In many real - world networks (such as social networks, communication networks, etc.), due to various reasons (for example, users deleting messages, incomplete data collection, etc.), some edges in the network may not be observable.
2. **Measurement error**: Even if some network data can be obtained, these data may also have measurement errors, resulting in inaccurate estimation of the statistical characteristics of nodes and edges.
#### Solutions
To address the above challenges, the author proposes a new method named **∆Flow Stability (∆FS)**, which is an extension of the original Flow Stability method. This method enhances the robustness of community detection by combining the known measurement errors of the out - degree of nodes. Specific improvements include:
- **Introducing measurement error**: Use the measurement error of the out - degree of nodes (i.e., experimental error) to adjust the transition probability in the random walk process, so as to better reflect the real structure of the network.
- **Biased Teleportation**: When the random walk is at a certain node, with a certain probability, the walker is transferred to other nodes according to the error term, instead of moving according to the conventional random walk rules.
#### Experimental verification
The author verifies the effectiveness of ∆FS in the following ways:
1. **Synthetic network experiment**: Use the Stochastic Block Model (SBM) to generate networks with known community structures, and gradually remove a certain proportion of edges, and compare the performance of ∆FS and the conventional Flow Stability in restoring the original community structure.
2. **Real - world application**: Tested on the dataset of the Telegram messaging platform, which contains the message records of public channels and groups related to the far - right in the UK. Since users can delete messages, there are a large number of missing edges in these datasets.
Through these experiments, the author proves that ∆FS can more reliably detect the community structure in the case of data missing, especially in the case of a high proportion of missing edges.
#### Key formulas
1. **Transition matrix \( M_f \)**:
\[
M_f^{ij} = (1 - \alpha_i) \frac{A_f^{\Delta}(i, j)}{s_{out}^i} + \alpha_i \frac{s_{in}^j}{\sum_l s_{in}^l}
\]
where,
\[
s_{out}^i = \sum_j A_f^{\Delta}(i, j), \quad s_{in}^j = \sum_i A_f^{\Delta}(i, j), \quad \alpha_i = \frac{\epsilon_i}{\epsilon_i + s_{out}^i}
\]
2. **Forward and backward covariance matrices**:
\[
S_{forw}(t) = P_f(0) T_f(t) T_f^{-1}(t) - p_f(0)^T p_f(0)
\]
\[
S_{back}(t) = P_b(0) T_b(t) T_b^{-1}(t) - p_b(0)^T p_b(0)
\]
3. **Normalized Mutual Information (NMI) and Normalized Variation of Information (NVI) are used to evaluate the quality of community detection results.
Through these formulas and methods, the paper shows how to use measurement error information to improve the accuracy of community detection when dealing with complex networks with missing edges.