Degree Distribution Identifiability of Stochastic Kronecker Graphs

Daniel Alabi,Dimitris Kalimeris
2023-09-30
Abstract:Large-scale analysis of the distributions of the network graphs observed in naturally-occurring phenomena has revealed that the degrees of such graphs follow a power-law or lognormal distribution. Seshadhri, Pinar, and Kolda (J. ACM, 2013) proved that stochastic Kronecker graph (SKG) models cannot generate graphs with degree distribution that follows a power-law or lognormal distribution. As a result, variants of the SKG model have been proposed to generate graphs which approximately follow degree distributions, without any significant oscillations. However, all existing solutions either require significant additional parameterization or have no provable guarantees on the degree distribution. -- In this work, we present statistical and computational identifiability notions which imply the separation of SKG models. Specifically, we prove that SKG models in different identifiability classes can be separated by the existence of isolated vertices and connected components in their corresponding generated graphs. This could explain the large (i.e., $>50\%$) fraction of isolated vertices in some popular graph generation benchmarks. -- We present and analyze an efficient algorithm that can get rid of oscillations in the degree distribution by mixing seeds of relative prime dimensions. For an initial $2\times 1$ and $2\times 2$ seed, a crucial subroutine of this algorithm solves a degree-2 and degree-4 optimization problem in the variables of the initial seed, respectively. We generalize this approach to solving optimization problems for $m\times n$ seeds, for any $m, n\in\mathbb{N}$. -- The use of $3\times 3$ seeds alone cannot get rid of significant oscillations. We prove that such seeds result in degree distribution that is bounded above by an exponential tail and thus cannot result in a power-law or lognormal.
Data Structures and Algorithms,Social and Information Networks
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to generate graph models with specific degree distributions (such as power - law or log - normal distributions), especially for Stochastic Kronecker Graphs (SKGs). Specifically, the existing SKG models are unable to generate degree distributions that conform to power - law or log - normal distributions, but instead exhibit significant volatility. Moreover, although some variants of the SKG model have been proposed to reduce this volatility, these methods either require additional parameterization or are unable to provide provable guarantees regarding the degree distribution. The main contributions of the paper are as follows: 1. **Define statistical and computational identifiability**: The authors propose the concepts of statistical and computational identifiability for classifying and separating different SKG models. Through these definitions, it can be determined whether different categories of SKG models can generate graphs with specific degree distributions. 2. **Propose the RPSKG algorithm**: The Relative Prime Stochastic Kronecker Graph (RPSKG) algorithm can eliminate the volatility in the degree distribution without introducing additional parameters by using 2×2 and 3×3 seed matrices in a mixed manner. 3. **Theoretical and experimental verification**: The paper verifies the effectiveness of the RPSKG algorithm through theoretical analysis and experimental verification, and shows its advantages in generating graphs with log - normal degree distributions without significant volatility. ### Overview of the main content of the paper #### 1. Introduction - **Background**: Graph models are widely used in fields such as biological networks, social networks, and communication networks. However, due to copyright, legal, and privacy issues, it is difficult to share actual large - scale graph data. Therefore, it has become very important to study how to generate graph models that can truly reflect the "real - world" graph structure. - **Problem**: The existing SKG models are unable to generate degree distributions that conform to power - law or log - normal distributions, but instead exhibit significant volatility. This limits their effectiveness in practical applications. #### 2. Related work - **Stochastic Kronecker Graph**: The SKG model proposed by Leskovec et al. is very effective in capturing the characteristics of real - world social networks, but has the problem of degree distribution volatility. - **Noisy Stochastic Kronecker Graph**: The Noisy Stochastic Kronecker Graph (NSKG) model attempts to reduce the volatility of the degree distribution by introducing noise, but requires additional parameterization. - **Multiplicative Attribute Graph**: The Multiplicative Attribute Graph (MAG) model generates graphs by combining node attributes, but also has the problem of degree distribution volatility. #### 3. Existence of isolated vertices and connected components - **Isolated vertices**: The authors prove that different identifiability categories of SKG models can be distinguished by the existence of isolated vertices in the generated graphs. - **Connected components**: Similarly, the existence of connected components can also be used to distinguish different SKG models. #### 4. Relative prime Stochastic Kronecker Graph - **Optimization problem**: The RPSKG algorithm generates 3×3 seed matrices by solving 2 - dimensional and 4 - dimensional optimization problems. - **Algorithm implementation**: The RPSKG algorithm can effectively use 2×2 and 3×3 seed matrices in a mixed manner to generate graphs without significant volatility. #### 5. Experimental verification - **Experimental results**: The effectiveness of the RPSKG algorithm in generating graphs with log - normal degree distributions without significant volatility has been verified through experiments. #### 6. Conclusion - **Main contributions**: The paper defines statistical and computational identifiability, proposes the RPSKG algorithm, and verifies its effectiveness through theoretical and experimental verification. - **Future work**: Further research on the separation of other graph properties and the application of these methods to a wider range of graph generation models. ### Key formulas - **Kronecker product**: \[ A\otimes B=\begin{bmatrix} a_{11}B & \cdots & a_{1n}B \\ \vdots & \ddots & \vdots \\ a_{m1}B & \cdots \end{bmatrix}