Abstract:Predicting future interactions or novel links in networks is an indispensable tool across diverse domains, including genetic research, online social networks, and recommendation systems. Among the numerous techniques developed for link prediction, those leveraging the networks' community structure have proven highly effective. For example, the recently proposed MapSim predicts links based on a similarity measure derived from the code structure of the map equation, a community-detection objective function that operates on network flows. However, the standard map equation assumes complete observations and typically identifies many small modules in networks where the nodes connect through only a few links. This aspect can degrade MapSim's performance on sparse networks. To overcome this limitation, we incorporate a global regularisation method based on a Bayesian estimate of the transition rates along with three local regularisation methods. The regularised versions of the map equation compensate for incomplete observations and decrease the number of identified communities in sparse networks. The regularised methods outperform standard MapSim and several state-of-the-art embedding methods in highly sparse networks. This performance holds across multiple real-world networks with randomly removed links, simulating incomplete observations. Among the proposed regularisation methods, the global regularisation method provides the most reliable community detection and the highest link prediction performance across different network densities.
What problem does this paper attempt to address?
### What problem does this paper attempt to solve?
This paper aims to address the challenges of link prediction in sparse networks. Specifically, the author focuses on how to improve the link prediction performance in sparse networks by improving the dynamic compression method. The following is a detailed problem description:
1. **Link prediction difficulties in sparse networks**:
- In many practical applications, such as gene interaction prediction, friend recommendation in online social networks, and retail recommendation systems, sparse networks (i.e., networks with fewer connections between nodes) are common. Due to the lack of sufficient structural information in these networks, link prediction becomes more difficult.
2. **Limitations of existing methods**:
- Existing link prediction methods, such as community - structure - based methods (e.g., MapSim), usually assume that all unobserved links do not exist. This method may lead to the identification of too many small modules in sparse networks, thus reducing the prediction performance.
3. **Introduction of regularization methods**:
- To overcome the above problems, the author introduced global and local regularization methods. These methods adjust the transition rates of random walks, compensate for the influence of incomplete observation data, and reduce the number of identified communities, thereby improving the link prediction performance in sparse networks.
4. **Specific objectives**:
- Improve the accuracy of link prediction in sparse networks, especially in highly sparse networks.
- Improve community detection through regularization methods to make the prediction results more reliable and interpretable.
### Main contributions of the paper
- **Global regularization**: Based on the adjustment of transition rates in Bayesian estimation, it is possible to more accurately identify the community structure in the case of incomplete observation data.
- **Local regularization**: Proposed three local regularization techniques (Common Neighbours, Mixed Markov Time, and Variable Markov Time) to restore local regularities and further improve the prediction performance.
- **Experimental verification**: Through experiments on multiple real - world networks, the effectiveness of the regularization methods was verified, and their superior performance under different network densities was demonstrated.
### Formula summary
- **Transition rate estimation in global regularization**:
\[
\hat{t}_{uv}(W_u)=\frac{w_{uv}+\gamma_{uv}}{\sum_{v = 1}^n w_{uv}+\gamma_{uv}}
\]
where \(\alpha_u=\frac{\sum_{v = 1}^n\gamma_{uv}}{\sum_{v = 1}^n w_{uv}+\gamma_{uv}}\), \(\gamma_{uv}=\lambda_{uv}c_{uv}\), \(c_{uv}\) is the expected link weight in the continuous configuration model, and \(\lambda_{uv}=\frac{\ln n}{n}\).
- **MapSim similarity calculation**:
\[
\text{MapSim}(M, u, v)=-\log_2(\text{rev}(m, u)\cdot\text{fwd}(m, v))
\]
where \(\text{rev}(m, u)\) represents the transition rate from node \(u\) to the root node of module \(m\), and \(\text{fwd}(m, v)\) represents the transition rate from the root node of module \(m\) to node \(v\).
Through these methods, the paper successfully solved the problem of link prediction in sparse networks and provided more reliable prediction results.