Exact Recovery of Community Detection in k-partite Graph Models

Zhongyang Li
DOI: https://doi.org/10.48550/arXiv.1910.04320
2020-06-04
Abstract:We study the vertex classification problem on a graph whose vertices are in $k\ (k\geq 2)$ different communities, edges are only allowed between distinct communities, and the number of vertices in different communities are not necessarily equal. The observation is a weighted adjacency matrix, perturbed by a scalar multiple of the Gaussian Orthogonal Ensemble (GOE), or Gaussian Unitary Ensemble (GUE) matrix. For the exact recovery of the maximum likelihood estimation (MLE) with various weighted adjacency matrices, we prove sharp thresholds of the intensity $\sigma$ of the Gaussian perturbation. These weighted adjacency matrices may be considered as natural models for the electric network. Surprisingly, these thresholds of $\sigma$ do not depend on whether the sample space for MLE is restricted to such classifications that the number of vertices in each group is equal to the true value. In contrast to the $\ZZ_2$-synchronization, a new complex version of the semi-definite programming (SDP) is designed to efficiently implement the community detection problem when the number of communities $k$ is greater than 2, and a common region (independent of $k$) for $\sigma$ such that SDP exactly recovers the true classification is obtained.
Probability,Data Structures and Algorithms
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the problem of exact community detection recovery in the k - partite graph model. Specifically, the paper focuses on the situation where there are \(k\) (\(k\geq2\)) different communities in a graph, edges between these communities are only allowed to exist between different communities, and the number of vertices in different communities is not necessarily equal. How to exactly recover the community to which each vertex belongs through Maximum Likelihood Estimation (MLE) or Semidefinite Programming (SDP). ### Main Research Contents 1. **Model Description**: - The vertices of the graph are divided into \(k\) different communities. - Edges only exist between different communities. - The observed value is a weighted adjacency matrix, which is perturbed by a Gaussian Orthogonal Ensemble (GOE) or Gaussian Unitary Ensemble (GUE) matrix. 2. **Main Objectives**: - To study the conditions under which the MLE and SDP methods can exactly recover the true community structure given the observed values. - To prove the critical value regarding the Gaussian perturbation strength \(\sigma\). When \(\sigma\) is less than the critical value, as the size of the graph tends to infinity, the probability of exact recovery approaches 1; when \(\sigma\) is greater than the critical value, the probability of exact recovery approaches 0. 3. **Key Results**: - For MLE, the paper gives an explicit critical value \(\sigma_c\) such that when \(\sigma < \sigma_c\), the probability of exact recovery approaches 1; when \(\sigma > \sigma_c\), the probability of exact recovery approaches 0. - For SDP, the paper designs a complex - version semidefinite programming method to efficiently solve the community detection problem for \(k > 2\) communities and gives a \(\sigma\) interval independent of \(k\) to ensure that SDP can exactly recover the true community structure. ### Formula Summary 1. **Critical Value of MLE**: - For real - weighted adjacency matrices: \[ \sigma^2 < \frac{(1 - \delta)n\min_{1\leq i < j\leq k}(c_i - c_j)^2}{4\log n} \] When \(\sigma\) satisfies the above inequality, the probability of exact recovery approaches 1. 2. **Critical Value of SDP**: - For complex unit matrices: \[ \sigma^2 < \frac{(1 - \delta)n(1 - \cos\frac{2\pi}{k})}{2\log n} \] When \(\sigma\) satisfies the above inequality, the probability of exact recovery approaches 1. ### Conclusion Through strict mathematical analysis, this paper provides a theoretical basis for exactly recovering the community structure in the multi - part graph model, especially conducting a detailed study on the critical value of the Gaussian perturbation strength \(\sigma\). These results not only help to understand the theoretical boundaries of the community detection problem but also provide important guidance for algorithm design in practical applications.