Fundamental Limits of Spectral Clustering in Stochastic Block Models

Anderson Ye Zhang
2024-05-10
Abstract:Spectral clustering has been widely used for community detection in network sciences. While its empirical successes are well-documented, a clear theoretical understanding, particularly for sparse networks where degrees are much smaller than $\log n$, remains unclear. In this paper, we address this significant gap by demonstrating that spectral clustering offers exponentially small error rates when applied to sparse networks under Stochastic Block Models. Our analysis provides sharp characterizations of its performance, backed by matching upper and lower bounds possessing an identical exponent with the same leading constant. The key to our results is a novel truncated $\ell_2$ perturbation analysis for eigenvectors, coupled with a new analysis idea of eigenvectors truncation.
Statistics Theory,Social and Information Networks,Spectral Theory
What problem does this paper attempt to address?
This paper attempts to address the issue of unclear theoretical performance analysis when using spectral clustering for community detection in sparse networks. Specifically, although spectral clustering performs well in practical applications, its performance has not been fully understood theoretically, especially in sparse networks where the degree is much smaller than \(\log n\). The paper fills this gap by proving that spectral clustering can achieve an exponentially small error rate under the Stochastic Block Model (SBM). ### Main Issues 1. **Insufficient Theoretical Performance Analysis**: Existing research mainly focuses on dense networks, and there is a significant gap in the theoretical performance analysis of spectral clustering for sparse networks (i.e., networks with degrees much smaller than \(\log n\)). 2. **Exponential Error Bound**: Although spectral clustering performs excellently in practical applications, there is a lack of rigorous theoretical proof, particularly regarding whether it can achieve an exponentially small error rate. ### Main Contributions of the Paper 1. **Proof of Exponential Error Bound**: The paper proves that in sparse networks, spectral clustering can achieve an exponentially small error rate and provides matching upper and lower bounds with the same asymptotic exponent and leading constant. 2. **Precise Performance Characterization**: Through a new truncated \(\ell_2\) perturbation analysis method and eigenvector truncation technique, the paper provides a precise characterization of the performance of spectral clustering, including the leading constant. 3. **New Analytical Tools**: The paper introduces a new truncated \(\ell_2\) perturbation analysis method, which may also be valuable for fine spectral perturbation analysis of other binary random matrices. ### Methods and Techniques 1. **Truncated \(\ell_2\) Perturbation Analysis**: By truncating the coordinates of the eigenvectors, the \(\ell_\infty\) norm of the eigenvectors is controlled within a selected threshold \(t_0\), thus overcoming the \(\log n\) factor issue in classical concentration inequalities. 2. **Eigenvector Truncation Technique**: Through the eigenvector truncation technique, the paper effectively addresses the tail probability problem in high-dimensional data. 3. **Leave-One-Out Technique**: To decompose the tail probability of \((\tilde{A}_i - E A_i)(I - U^* U^{*T}) U\), the paper uses an improved leave-one-out technique to decouple the dependency between \(\tilde{A}_i\) and \(U\). ### Conclusion Through rigorous mathematical analysis, the paper proves that spectral clustering can achieve an exponentially small error rate in sparse networks and provides matching upper and lower bounds. These results not only fill the theoretical gap but also provide a solid theoretical foundation for the application of spectral clustering in sparse networks.