Local Algorithms for Block Models with Side Information

Elchanan Mossel,Jiaming Xu
DOI: https://doi.org/10.1145/2840728.2840749
2016-01-14
Abstract:There has been a recent interest in understanding the power of local algorithms for optimization and inference problems on sparse graphs. Gamarnik and Sudan (2014) showed that local algorithms are weaker than global algorithms for finding large independent sets in sparse random regular graphs thus refuting a conjecture by Hatami, Lovász, and Szegedy (2012). Montanari (2015) showed that local algorithms are suboptimal for finding a community with high connectivityin the sparse Erdös-Rényi random graphs. For the symmetric planted partition problem (also named community detection for the block models) on sparse graphs, a simple observation is that local algorithms cannot have non-trivial performance. In this work we consider the effect of side information on local algorithms for community detection under the binary symmetric stochastic block model. In the block model with side information each of the n vertices is labeled + or - independently and uniformly at random; each pair of vertices is connected independently with probability a/n if both of them have the same label or b/n otherwise. The goal is to estimate the underlying vertex labeling given 1) the graph structure and 2) side information in the form of a vertex labeling positively correlated with the true one. Assuming that the ratio between in and out degree a/b is θ(1) and the average degree (a+b) / 2 = n{o(1), we show that a local algorithm, namely, belief propagation run on the local neighborhoods, maximizes the expected fraction of vertices labeled correctly in the following three regimes: |a--b|<2 and all 0 < α < 1/2 (a--b)2 > C (a+b) for some constant C and all 0 < α < 1/2 For all a,b if the probability that each given vertex label is incorrect is at most α* for some constant α* ∈ (0,1/2). |a--b|<2 and all 0 < α < 1/2 (a--b)2 > C (a+b) for some constant C and all 0 < α < 1/2 For all a,b if the probability that each given vertex label is incorrect is at most α* for some constant α* ∈ (0,1/2). Thus, in contrast to the case of independent sets or a single community in random graphs and to the case of symmetric block models without side information, we show that local algorithms achieve optimal performance in the above three regimes for the block model with side information. To complement our results, in the large degree limit α → ∞, we give a formula of the expected fraction of vertices labeled correctly by the local belief propagation, in terms of a fixed point of a recursion derived from the density evolution analysis with Gaussian approximations.
What problem does this paper attempt to address?