Jiale Yan,Hiroaki Ito,Ángel López García-Arias,Yasuyuki Okoshi,Hikari Otsuka,Kazushi Kawamura,Thiem Van Chu,Masato Motomura
Abstract:The Strong Lottery Ticket Hypothesis (SLTH) demonstrates the existence of high-performing subnetworks within a randomly initialized model, discoverable through pruning a convolutional neural network (CNN) without any weight training. A recent study, called Untrained GNNs Tickets (UGT), expanded SLTH from CNNs to shallow graph neural networks (GNNs). However, discrepancies persist when comparing baseline models with learned dense weights. Additionally, there remains an unexplored area in applying SLTH to deeper GNNs, which, despite delivering improved accuracy with additional layers, suffer from excessive memory requirements. To address these challenges, this work utilizes Multicoated Supermasks (M-Sup), a scalar pruning mask method, and implements it in GNNs by proposing a strategy for setting its pruning thresholds adaptively. In the context of deep GNNs, this research uncovers the existence of untrained recurrent networks, which exhibit performance on par with their trained feed-forward counterparts. This paper also introduces the Multi-Stage Folding and Unshared Masks methods to expand the search space in terms of both architecture and parameters. Through the evaluation of various datasets, including the Open Graph Benchmark (OGB), this work establishes a triple-win scenario for SLTH-based GNNs: by achieving high sparsity, competitive performance, and high memory efficiency with up to 98.7\% reduction, it demonstrates suitability for energy-efficient graph processing.
What problem does this paper attempt to address?
### Problems Addressed by the Paper
This paper aims to address several key challenges in Graph Neural Networks (GNNs), particularly in the application of the Strong Lottery Ticket Hypothesis (SLTH). Specifically, the paper focuses on the following issues:
1. **Performance Difference Between Shallow GNNs and Dense Weight Training Models**:
- Although Untrained GNNs Tickets (UGT) show better accuracy than Edge-Popup in shallow GNNs (such as GCN, GAT, and GIN), there is still a performance gap compared to baseline models trained with dense weights at different sparsity levels. The researchers pose the question: Is it possible to design a better SLTH method to maintain high accuracy in shallow GNNs?
2. **Performance and Memory Consumption of Deep GNNs**:
- For deep GNNs, UGT achieves SLTH by increasing the number of layers, but the accuracy of these extended models cannot reach the level of two-layer models. Meanwhile, ResGCNs improve accuracy by increasing the number of layers but require more memory consumption. The researchers raise two questions: First, is it feasible to apply SLTH to deep GNNs? Second, can model efficiency be further improved by reducing model size without sacrificing high accuracy?
### Solutions
To address the above challenges, the paper introduces the following methods and techniques:
1. **Multi-Coating Supermask (M-Sup)**:
- M-Sup is a scalar pruning mask method that sets adaptive pruning thresholds in GNNs to discover untrained high-performance sub-networks. Experiments show that M-Sup maintains high accuracy at sparsity levels up to 90%, outperforming Single-Coating Supermask (S-Sup).
2. **Multi-Stage Folding (MSF) and Non-Shared Masks**:
- These methods expand the search space, optimizing GNNs from both network structure and parameter aspects. Specifically, through non-shared masks and MSF, memory consumption can be significantly reduced while maintaining high accuracy.
3. **Untrained Recursive Graph Sub-Networks**:
- The study finds that untrained recursive graph sub-networks exist in deep GNNs, whose performance is comparable to trained feedforward networks. This provides new possibilities for efficient, low-power graph processing.
### Experimental Results
- **Shallow GNNs**:
- On node-level tasks (such as Cora, Citeseer, and PubMed datasets) and graph-level tasks (such as OGBG-Molhiv and OGBG-Molbace datasets), M-Sup outperforms S-Sup at different sparsity levels and maintains high accuracy even at high sparsity.
- **Deep GNNs**:
- On the 28-layer ResGCN+ on the OGBN-Arxiv dataset and the 7-layer DyResGEN on the OGBG-Molhiv dataset, the combination of M-Sup and MSF not only improves accuracy but also significantly reduces memory consumption. For example, ResGCN+ reduces memory consumption by 97% on OGBN-Arxiv, while DyResGEN reduces memory consumption by 98% on OGBG-Molhiv.
### Conclusion
By introducing techniques such as M-Sup, MSF, and non-shared masks, the paper successfully addresses the challenges of applying SLTH in shallow and deep GNNs, achieving the triple advantages of high sparsity, high accuracy, and high memory efficiency. This provides new solutions for efficient, low-power graph processing.