GSSA: Pay attention to graph feature importance for GCN via statistical self-attention

Jin Zheng,Yang Wang,Wanjun Xu,Zilu Gan,Ping Li,Jiancheng Lv
DOI: https://doi.org/10.1016/j.neucom.2020.07.098
IF: 6
2020-12-01
Neurocomputing
Abstract:<p>Graph convolutional network (GCN) has been proved to be an effective framework for graph-based semi-supervised learning applications. The core operation block of GCN is the convolutional layer, which enables the network to construct node embeddings by fusing both attributes of nodes and relationships between nodes. Different features or feature interactions inherently have various influences on the convolutional layers. However, there are very limited studies about the impact of feature importance in GCN-related communities. In this work, we attempt to augment convolutional layers in GCNs with statistical attention-based feature importance by modeling the latent interactions of features, which is complementary to the standard GCNs and only needs simple calculations with statistics rather than heavy trainings. To this end, we treat the feature input of each convolutional layer as a separate multi-layer heterogeneous graph, and propose Graph Statistical Self-Attention (GSSA) method to automatically learn the hierarchical structure of feature importance. More specifically, we propose two modules in GSSA, Channel-wise Self-Attention (CSA) to capture the dependencies between feature channels, and Mean-based Self-Attention (MSA) to reweight similarities among features. Aiming at each graph convolutional layer, GSSA can be applied in a "plug and play" way for a wide range of GCN variants. To the best of our knowledge, this is the first implementation that optimizes GCNs from the feature importance perspective. Extensive experiments demonstrate that GSSA can promote existing popular baselines remarkably in semi-supervised node classification tasks. We further employ multiple qualitative evaluations to get deep insights into our method.</p>
computer science, artificial intelligence
What problem does this paper attempt to address?