Abstract:In this work, we focus on the Bipartite Stochastic Block Model (BiSBM), a popular model for bipartite graphs with a community structure. We consider the high dimensional setting where the number $n_1$ of type I nodes is far smaller than the number $n_2$ of type II nodes. The recent work of Braun and Tyagi (2022) established a sufficient and necessary condition on the sparsity level $p_{max}$ of the bipartite graph to be able to recover the latent partition of type I nodes. They proposed an iterative method that extends the one proposed by Ndaoud et al. (2022) to achieve this goal. Their method requires a good enough initialization, usually obtained by a spectral method, but empirical results showed that the refinement algorithm doesn't improve much the performance of the spectral method. This suggests that the spectral achieves exact recovery in the same regime as the refinement method. We show that it is indeed the case by providing new entrywise bounds on the eigenvectors of the similarity matrix used by the spectral method. Our analysis extend the framework of Lei (2019) that only applies to symmetric matrices with limited dependencies. As an important technical step, we also derive an improved concentration inequality for similarity matrices.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to use the spectral method to accurately recover the latent partition of type - I nodes in a high - dimensional sparse bipartite graph. Specifically, the paper focuses on how to achieve the exact recovery of type - I nodes through the spectral method under the Bipartite Stochastic Block Model (BiSBM) when the number of type - I nodes $n_1$ is much smaller than the number of type - II nodes $n_2$. The main contribution of the paper is to prove that under the condition of $n_1 n_2 p_{\text{max}}^2 \gtrsim \log n_1$, the spectral method can achieve exact recovery, and this recovery is optimal. ### Background of the Paper and Problem Definition The **Bipartite Stochastic Block Model (BiSBM)** is a model used to describe the community structure in a bipartite graph. In this model, nodes are divided into two types (type - I and type - II), and there are certain connection probabilities within or between each type of nodes. Model parameters include: - The set of type - I nodes $N_1 = [n_1]$ and the set of type - II nodes $N_2 = [n_2]$. - The community partitions $C_1,\ldots,C_K$ of type - I nodes and the community partitions $C'_1,\ldots,C'_L$ of type - II nodes. - The connection probability matrix between communities $\Pi = (\pi_{kk'})_{k\in [K], k'\in [L]} \in [0,1]^{K\times L}$. **Problem**: In a high - dimensional sparse setting (i.e., $n_1 \ll n_2$ and the sparsity $p_{\text{max}}$ of the graph is low), how to use the spectral method to accurately recover the community structure of type - I nodes from a bipartite graph. ### Main Contributions 1. **Exact Recovery Conditions**: The paper proves that under the condition of $n_1 n_2 p_{\text{max}}^2 \gtrsim \log n_1$, the spectral method can achieve the exact recovery of type - I nodes. This condition is optimal, that is, under this condition, the spectral method is optimal. 2. **Extension to Similarity Matrices**: The paper extends the bounds on the element - wise concentration of eigenvectors proposed by Lei (2019) to make them applicable to similarity matrices. This extension allows for the partial removal of the "spectral gap condition", which is a common condition in the analysis of spectral methods. 3. **Improved Concentration Inequality**: The paper derives an improved concentration inequality for similarity matrices, which is crucial for proving the performance of the spectral method. ### Technical Details - **Algorithm Description of the Spectral Method**: The paper proposes an improved spectral method. It achieves community partitioning by constructing a Gram matrix $B = H(AA^{\top})$ with an empty diagonal and calculating its first $r$ eigenvectors. Here $H(X)=X - \text{diag}(X)$. - **Theoretical Analysis**: By introducing new element - wise concentration bounds, the paper proves that in a high - dimensional sparse setting, the spectral method can achieve exact recovery. Specifically, the paper proves the $\ell_2 \to \infty$ concentration bounds of eigenvectors and uses this result to prove the exact recovery property of the spectral method. ### Conclusion The paper fills the gap in the lack of consistency guarantees of existing spectral clustering methods in high - dimensional sparse bipartite graphs. It proves that under the condition of $n_1 n_2 p_{\text{max}}^2 \gtrsim \log n_1$, the spectral method can achieve exact recovery, and this recovery is optimal. This result is of great significance for understanding the performance of the spectral method in complex networks.

Strong Consistency Guarantees for Clustering High-Dimensional Bipartite Graphs with the Spectral Method

Analysis of spectral clustering algorithms for community detection: the general bipartite setting

Strong Consistency, Graph Laplacians, and the Stochastic Block Model

Strong Consistency of Spectral Clustering for Stochastic Block Models.

Strong Consistency of Spectral Clustering for the Sparse Degree-Corrected Hypergraph Stochastic Block Model

Fundamental Limits of Spectral Clustering in Stochastic Block Models

On the Robustness of Spectral Algorithms for Semirandom Stochastic Block Models

On consistency of constrained spectral clustering under representation-aware stochastic block model

Information-Theoretic Limits and Strong Consistency on Binary Non-uniform Hypergraph Stochastic Block Models

Community Detection in the Hypergraph SBM: Exact Recovery Given the Similarity Matrix

Spectral Recovery in the Labeled SBM

The Power of Two Matrices in Spectral Algorithms for Community Recovery

A Consistent Adjacency Spectral Embedding for Stochastic Blockmodel Graphs

Consistency of Constrained Spectral Clustering under Graph Induced Fair Planted Partitions

Multiview Spectral Clustering with Bipartite Graph

Robust spectral clustering with rank statistics

A Spectral Method to Find Communities in Bipartite Networks

Spectral embedding and the latent geometry of multipartite networks

Simultaneous Dimensionality and Complexity Model Selection for Spectral Graph Clustering

Spectral clustering in the Gaussian mixture block model

Community detection with the Bethe-Hessian