Model Degradation Hinders Deep Graph Neural Networks

Wentao Zhang,Zeang Sheng,Ziqi Yin,Yuezihan Jiang,Yikuan Xia,Jun Gao,Zhi Yang,Bin Cui
DOI: https://doi.org/10.1145/3534678.3539374
2022-06-09
Abstract:Graph Neural Networks (GNNs) have achieved great success in various graph mining <a class="link-external link-http" href="http://tasks.However" rel="external noopener nofollow">this http URL</a>, drastic performance degradation is always observed when a GNN is stacked with many layers. As a result, most GNNs only have shallow architectures, which limits their expressive power and exploitation of deep <a class="link-external link-http" href="http://neighborhoods.Most" rel="external noopener nofollow">this http URL</a> recent studies attribute the performance degradation of deep GNNs to the \textit{over-smoothing} issue. In this paper, we disentangle the conventional graph convolution operation into two independent operations: \textit{Propagation} (\textbf{P}) and \textit{Transformation} (\textbf{T}).Following this, the depth of a GNN can be split into the propagation depth ($D_p$) and the transformation depth ($D_t$). Through extensive experiments, we find that the major cause for the performance degradation of deep GNNs is the \textit{model degradation} issue caused by large $D_t$ rather than the \textit{over-smoothing} issue mainly caused by large $D_p$. Further, we present \textit{Adaptive Initial Residual} (AIR), a plug-and-play module compatible with all kinds of GNN architectures, to alleviate the \textit{model degradation} issue and the \textit{over-smoothing} issue simultaneously. Experimental results on six real-world datasets demonstrate that GNNs equipped with AIR outperform most GNNs with shallow architectures owing to the benefits of both large $D_p$ and $D_t$, while the time costs associated with AIR can be ignored.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the sharp decline in performance of deep graph neural networks (Deep Graph Neural Networks, GNNs) as the number of layers increases. Specifically, although GNNs have achieved great success in various graph mining tasks, their performance will decline significantly when GNNs are stacked with multiple layers. This performance decline limits the expressive ability of GNNs and their utilization of deep - level neighborhoods. Most existing studies attribute this performance decline to the over - smoothing problem. However, through experiments, this paper finds that the main cause of the performance decline of deep GNNs is the model degradation problem, rather than the over - smoothing problem. The model degradation problem refers to the phenomenon that both the training accuracy and the test accuracy will decline as the number of network layers increases. To solve this problem, the author proposes a plug - in module named Adaptive Initial Residual (AIR). This module can be compatible with various GNN architectures and simultaneously alleviate the model degradation and over - smoothing problems. The experimental results show that GNNs equipped with AIR outperform most shallow - architecture GNNs on six real - world datasets, and the time cost of AIR is negligible.