Abstract:In this work, we focus on the Bipartite Stochastic Block Model (BiSBM), a popular model for bipartite graphs with a community structure. We consider the high dimensional setting where the number $n_1$ of type I nodes is far smaller than the number $n_2$ of type II nodes. The recent work of Braun and Tyagi (2022) established a sufficient and necessary condition on the sparsity level $p_{max}$ of the bipartite graph to be able to recover the latent partition of type I nodes. They proposed an iterative method that extends the one proposed by Ndaoud et al. (2022) to achieve this goal. Their method requires a good enough initialization, usually obtained by a spectral method, but empirical results showed that the refinement algorithm doesn't improve much the performance of the spectral method. This suggests that the spectral achieves exact recovery in the same regime as the refinement method. We show that it is indeed the case by providing new entrywise bounds on the eigenvectors of the similarity matrix used by the spectral method. Our analysis extend the framework of Lei (2019) that only applies to symmetric matrices with limited dependencies. As an important technical step, we also derive an improved concentration inequality for similarity matrices.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to use the spectral method to accurately recover the latent partition of type - I nodes in a high - dimensional sparse bipartite graph. Specifically, the paper focuses on how to achieve the exact recovery of type - I nodes through the spectral method under the Bipartite Stochastic Block Model (BiSBM) when the number of type - I nodes \(n_1\) is much smaller than the number of type - II nodes \(n_2\). The main contribution of the paper is to prove that under the condition of \(n_1 n_2 p_{\text{max}}^2 \gtrsim \log n_1\), the spectral method can achieve exact recovery, and this recovery is optimal.
### Background of the Paper and Problem Definition
The **Bipartite Stochastic Block Model (BiSBM)** is a model used to describe the community structure in a bipartite graph. In this model, nodes are divided into two types (type - I and type - II), and there are certain connection probabilities within or between each type of nodes. Model parameters include:
- The set of type - I nodes \(N_1 = [n_1]\) and the set of type - II nodes \(N_2 = [n_2]\).
- The community partitions \(C_1,\ldots,C_K\) of type - I nodes and the community partitions \(C'_1,\ldots,C'_L\) of type - II nodes.
- The connection probability matrix between communities \(\Pi = (\pi_{kk'})_{k\in [K], k'\in [L]} \in [0,1]^{K\times L}\).
**Problem**: In a high - dimensional sparse setting (i.e., \(n_1 \ll n_2\) and the sparsity \(p_{\text{max}}\) of the graph is low), how to use the spectral method to accurately recover the community structure of type - I nodes from a bipartite graph.
### Main Contributions
1. **Exact Recovery Conditions**: The paper proves that under the condition of \(n_1 n_2 p_{\text{max}}^2 \gtrsim \log n_1\), the spectral method can achieve the exact recovery of type - I nodes. This condition is optimal, that is, under this condition, the spectral method is optimal.
2. **Extension to Similarity Matrices**: The paper extends the bounds on the element - wise concentration of eigenvectors proposed by Lei (2019) to make them applicable to similarity matrices. This extension allows for the partial removal of the "spectral gap condition", which is a common condition in the analysis of spectral methods.
3. **Improved Concentration Inequality**: The paper derives an improved concentration inequality for similarity matrices, which is crucial for proving the performance of the spectral method.
### Technical Details
- **Algorithm Description of the Spectral Method**: The paper proposes an improved spectral method. It achieves community partitioning by constructing a Gram matrix \(B = H(AA^{\top})\) with an empty diagonal and calculating its first \(r\) eigenvectors. Here \(H(X)=X - \text{diag}(X)\).
- **Theoretical Analysis**: By introducing new element - wise concentration bounds, the paper proves that in a high - dimensional sparse setting, the spectral method can achieve exact recovery. Specifically, the paper proves the \(\ell_2 \to \infty\) concentration bounds of eigenvectors and uses this result to prove the exact recovery property of the spectral method.
### Conclusion
The paper fills the gap in the lack of consistency guarantees of existing spectral clustering methods in high - dimensional sparse bipartite graphs. It proves that under the condition of \(n_1 n_2 p_{\text{max}}^2 \gtrsim \log n_1\), the spectral method can achieve exact recovery, and this recovery is optimal. This result is of great significance for understanding the performance of the spectral method in complex networks.