Attention is all you need for boosting graph convolutional neural network

Yinwei Wu

2024-03-10

Abstract:Graph Convolutional Neural Networks (GCNs) possess strong capabilities for processing graph data in non-grid domains. They can capture the topological logical structure and node features in graphs and integrate them into nodes' final representations. GCNs have been extensively studied in various fields, such as recommendation systems, social networks, and protein molecular structures. With the increasing application of graph neural networks, research has focused on improving their performance while compressing their size. In this work, a plug-in module named Graph Knowledge Enhancement and Distillation Module (GKEDM) is proposed. GKEDM can enhance node representations and improve the performance of GCNs by extracting and aggregating graph information via multi-head attention mechanism. Furthermore, GKEDM can serve as an auxiliary transferor for knowledge distillation. With a specially designed attention distillation method, GKEDM can distill the knowledge of large teacher models into high-performance and compact student models. Experiments on multiple datasets demonstrate that GKEDM can significantly improve the performance of various GCNs with minimal overhead. Furthermore, it can efficiently transfer distilled knowledge from large teacher networks to small student networks via attention distillation.

Machine Learning,Graphics,Social and Information Networks

What problem does this paper attempt to address?

The paper focuses on enhancing the performance and reducing the scale of Graph Convolutional Neural Networks (GCNs). The authors propose a plugin module called Graph Knowledge Enhancement and Distillation Module (GKEDM), which utilizes multi-head attention mechanism to enhance node representations and improve the performance of GCNs. GKEDM also serves as an auxiliary tool for knowledge distillation, efficiently transferring knowledge from large teacher models to high-performance and compact student models through specific attention distillation methods. Experimental results demonstrate that GKEDM can significantly improve the performance of various GCNs with minimum overhead, and effectively transfer knowledge from large teacher networks to small student networks. The paper also discusses the problem of over-smoothing and related work on knowledge distillation in GCNs, as well as presents the application of attention mechanism in graph neural networks.

Attention is all you need for boosting graph convolutional neural network

Attention-Based Relational Graph Convolutional Network for Knowledge Graph Reasoning

High-order graph attention network

DAGCN: Dual Attention Graph Convolutional Networks

Self-attention empowered graph convolutional network for structure learning and node embedding

Reliable Data Distillation on Graph Convolutional Network.

Improving Dynamic Graph Convolutional Network with Fine-Grained Attention Mechanism

Boosting Graph Neural Networks via Adaptive Knowledge Distillation

A gated graph attention network based on dual graph convolution for node embedding

Permutohedral-GCN: Graph Convolutional Networks with Global Attention

Edge Attention-based Multi-Relational Graph Convolutional Networks

Multi-scale Graph Convolutional Networks with Self-Attention

Distilling Knowledge from Graph Convolutional Networks

Knowledge Graph Reasoning Based on Attention GCN

Enhanced Scalable Graph Neural Network via Knowledge Distillation

An In-depth Analysis of Graph Neural Networks for Semi-supervised Learning

Graph Adaptive Attention Network with Cross-Entropy

DeGNN: Improving Graph Neural Networks with Graph Decomposition

Enhancing Graph Neural Networks by a High-quality Aggregation of Beneficial Information

DropEdge: Towards Deep Graph Convolutional Networks on Node Classification.