CNN2GNN: How to Bridge CNN with GNN

Ziheng Jiao,Hongyuan Zhang,Xuelong Li
2024-04-23
Abstract:Although the convolutional neural network (CNN) has achieved excellent performance in vision tasks by extracting the intra-sample representation, it will take a higher training expense because of stacking numerous convolutional layers. Recently, as the bilinear models, graph neural networks (GNN) have succeeded in exploring the underlying topological relationship among the graph data with a few graph neural layers. Unfortunately, it cannot be directly utilized on non-graph data due to the lack of graph structure and has high inference latency on large-scale scenarios. Inspired by these complementary strengths and weaknesses, \textit{we discuss a natural question, how to bridge these two heterogeneous networks?} In this paper, we propose a novel CNN2GNN framework to unify CNN and GNN together via distillation. Firstly, to break the limitations of GNN, a differentiable sparse graph learning module is designed as the head of networks to dynamically learn the graph for inductive learning. Then, a response-based distillation is introduced to transfer the knowledge from CNN to GNN and bridge these two heterogeneous networks. Notably, due to extracting the intra-sample representation of a single instance and the topological relationship among the datasets simultaneously, the performance of distilled ``boosted'' two-layer GNN on Mini-ImageNet is much higher than CNN containing dozens of layers such as ResNet152.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
This paper discusses the combination of Convolutional Neural Networks (CNNs) and Graph Neural Networks (GNNs). CNNs are good at extracting features within samples, but training with multiple layers is costly. On the other hand, GNNs can effectively explore the topological relationships in graph data, but they are not suitable for non-graph data and have high inference latency. The paper proposes a framework called CNN2GNN, which dynamically learns the graph structure for non-graph data through a differentiable sparse graph learning module, and transfers the knowledge from CNN to GNN using response-based distillation method, so that the heterogeneous networks of both can be integrated. As a result, the distilled two-layer GNN outperforms the ResNet152 with tens of layers on Mini-ImageNet.