Model Degradation Hinders Deep Graph Neural Networks

Wentao Zhang,Zeang Sheng,Ziqi Yin,Yuezihan Jiang,Yikuan Xia,Jun Gao,Zhi Yang,Bin Cui

DOI: https://doi.org/10.1145/3534678.3539374

2022-06-09

Abstract:Graph Neural Networks (GNNs) have achieved great success in various graph mining <a class="link-external link-http" href="http://tasks.However" rel="external noopener nofollow">this http URL</a>, drastic performance degradation is always observed when a GNN is stacked with many layers. As a result, most GNNs only have shallow architectures, which limits their expressive power and exploitation of deep <a class="link-external link-http" href="http://neighborhoods.Most" rel="external noopener nofollow">this http URL</a> recent studies attribute the performance degradation of deep GNNs to the \textit{over-smoothing} issue. In this paper, we disentangle the conventional graph convolution operation into two independent operations: \textit{Propagation} (\textbf{P}) and \textit{Transformation} (\textbf{T}).Following this, the depth of a GNN can be split into the propagation depth ($D_p$) and the transformation depth ($D_t$). Through extensive experiments, we find that the major cause for the performance degradation of deep GNNs is the \textit{model degradation} issue caused by large $D_t$ rather than the \textit{over-smoothing} issue mainly caused by large $D_p$. Further, we present \textit{Adaptive Initial Residual} (AIR), a plug-and-play module compatible with all kinds of GNN architectures, to alleviate the \textit{model degradation} issue and the \textit{over-smoothing} issue simultaneously. Experimental results on six real-world datasets demonstrate that GNNs equipped with AIR outperform most GNNs with shallow architectures owing to the benefits of both large $D_p$ and $D_t$, while the time costs associated with AIR can be ignored.

Machine Learning

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the sharp decline in performance of deep graph neural networks (Deep Graph Neural Networks, GNNs) as the number of layers increases. Specifically, although GNNs have achieved great success in various graph mining tasks, their performance will decline significantly when GNNs are stacked with multiple layers. This performance decline limits the expressive ability of GNNs and their utilization of deep - level neighborhoods. Most existing studies attribute this performance decline to the over - smoothing problem. However, through experiments, this paper finds that the main cause of the performance decline of deep GNNs is the model degradation problem, rather than the over - smoothing problem. The model degradation problem refers to the phenomenon that both the training accuracy and the test accuracy will decline as the number of network layers increases. To solve this problem, the author proposes a plug - in module named Adaptive Initial Residual (AIR). This module can be compatible with various GNN architectures and simultaneously alleviate the model degradation and over - smoothing problems. The experimental results show that GNNs equipped with AIR outperform most shallow - architecture GNNs on six real - world datasets, and the time cost of AIR is negligible.

Model Degradation Hinders Deep Graph Neural Networks

NGAT: Attention in Breadth and Depth Exploration for Semi-Supervised Graph Representation Learning

Evaluating Deep Graph Neural Networks

Beyond Over-smoothing: Uncovering the Trainability Challenges in Deep Graph Neural Networks

Understanding and Resolving Performance Degradation in Graph Convolutional Networks

Deep Graph Neural Networks via Flexible Subgraph Aggregation

GNN S ,Y OU C AN B E S TRONGER ,DEEPER AND F ASTER

Adaptive Depth Graph Attention Networks

Deeper-GXX: Deepening Arbitrary GNNs

DeGNN: Improving Graph Neural Networks with Graph Decomposition

Beyond smoothness: A general optimization framework for graph neural networks with negative Laplacian regularization

Universal Deep GNNs: Rethinking Residual Connection in GNNs from a Path Decomposition Perspective for Preventing the Over-smoothing

Another Perspective of Over-Smoothing: Alleviating Semantic Over-Smoothing in Deep GNNs

Graph Elimination Networks

SkipNode: On Alleviating Performance Degradation for Deep Graph Convolutional Networks

Adaptive Multi-Channel Deep Graph Neural Networks

Interpreting Deep Graph Convolutional Networks with Spectrum Perspective

Graph Neural Aggregation-diffusion with Metastability

Deep Graph Neural Networks via Posteriori-Sampling-based Node-Adaptive Residual Module

DEGNN: Dual Experts Graph Neural Network Handling Both Edge and Node Feature Noise

Decoupling the Depth and Scope of Graph Neural Networks