Reducing Oversmoothing through Informed Weight Initialization in Graph Neural Networks

Dimitrios Kelesis,Dimitris Fotakis,Georgios Paliouras
2024-10-31
Abstract:In this work, we generalize the ideas of Kaiming initialization to Graph Neural Networks (GNNs) and propose a new scheme (G-Init) that reduces oversmoothing, leading to very good results in node and graph classification tasks. GNNs are commonly initialized using methods designed for other types of Neural Networks, overlooking the underlying graph topology. We analyze theoretically the variance of signals flowing forward and gradients flowing backward in the class of convolutional GNNs. We then simplify our analysis to the case of the GCN and propose a new initialization method. Our results indicate that the new method (G-Init) reduces oversmoothing in deep GNNs, facilitating their effective use. Experimental validation supports our theoretical findings, demonstrating the advantages of deep networks in scenarios with no feature information for unlabeled nodes (i.e., ``cold start'' scenario).
Machine Learning
What problem does this paper attempt to address?
This paper attempts to solve the over - smoothing problem in graph neural networks (GNNs). Specifically, as the number of layers in the GNN increases, the node representations become too similar, thus losing the initial information and leading to a decline in model performance. To solve this problem, the author proposes a new weight initialization method (G - Init), which aims to stabilize the variance of signals and gradients when flowing inside the model and reduce the over - smoothing phenomenon. ### Main problems 1. **Over - smoothing problem**: As the number of GNN layers increases, the node representations gradually converge, causing the model to be unable to effectively distinguish different nodes, thus affecting the performance of classification tasks. 2. **Limitations of existing initialization methods**: Traditional weight initialization methods (such as Kaiming initialization) are designed for other types of neural networks and do not consider the influence of the graph structure in GNNs, so they are not effective in GNNs. ### Solutions The author solves the above problems through the following steps: 1. **Theoretical analysis**: The author generalizes the initialization method proposed by He et al. to convolutional GNNs and analyzes the variance changes of the forward - propagation signals and the backward - propagation gradients. 2. **New initialization method (G - Init)**: Based on the theoretical analysis, the author proposes a new initialization method, especially for the GCN model. This method controls the variance by adjusting the standard deviation of the weight matrix to prevent over - smoothing. 3. **Experimental verification**: Through experiments on multiple datasets, the effectiveness of G - Init is verified, especially in deeper GNNs. ### Key formulas - **Upper bound of forward - propagation variance**: \[ \text{V ar}[y_i^{(l)}] \leq n_l \cdot (d_i + 1(\beta \neq 0)+ 1(\gamma \neq 0))\times\left(\frac{\alpha^2}{2d_i^2}\text{V ar}[y_i^{(l - 1)}]+\frac{\gamma^2}{2}\cdot\text{V ar}[y_i^{(l - 2)}]+j(\alpha, \beta)\right)\times(\delta^2\text{V ar}[w_l]+\epsilon^2) \] where $n_l$ is the dimension of the weight matrix, $d_i$ is the degree of node $i$, $\alpha, \beta, \gamma, \delta, \epsilon$ are model parameters, and $j(\alpha, \beta)$ is the function defined in Lemma 3. - **Upper bound of backward - propagation variance**: \[ \text{V ar}[\Delta x_i^{(l)}] \leq m_w\cdot\left(\frac{\alpha^2}{d_i^2}\text{V ar}[\Delta x_i^{(l + 1)}]+q(\alpha)\right) \] where \[ m_w=\frac{1}{2n_l(d_i + 1(\gamma \neq 0))}\cdot(\delta^2\text{V ar}[w_l]+\epsilon^2) \] - **Standard deviation of G - Init initialization**: \[ \sigma=\sqrt{\frac{2d_i}{n_l}} \] Through these formulas and methods, G - Init effectively reduces the over - smoothing phenomenon and improves the performance of GNNs in node classification and graph classification tasks.