Abstract:The Strong Lottery Ticket Hypothesis (SLTH) demonstrates the existence of high-performing subnetworks within a randomly initialized model, discoverable through pruning a convolutional neural network (CNN) without any weight training. A recent study, called Untrained GNNs Tickets (UGT), expanded SLTH from CNNs to shallow graph neural networks (GNNs). However, discrepancies persist when comparing baseline models with learned dense weights. Additionally, there remains an unexplored area in applying SLTH to deeper GNNs, which, despite delivering improved accuracy with additional layers, suffer from excessive memory requirements. To address these challenges, this work utilizes Multicoated Supermasks (M-Sup), a scalar pruning mask method, and implements it in GNNs by proposing a strategy for setting its pruning thresholds adaptively. In the context of deep GNNs, this research uncovers the existence of untrained recurrent networks, which exhibit performance on par with their trained feed-forward counterparts. This paper also introduces the Multi-Stage Folding and Unshared Masks methods to expand the search space in terms of both architecture and parameters. Through the evaluation of various datasets, including the Open Graph Benchmark (OGB), this work establishes a triple-win scenario for SLTH-based GNNs: by achieving high sparsity, competitive performance, and high memory efficiency with up to 98.7\% reduction, it demonstrates suitability for energy-efficient graph processing.

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper aims to address several key challenges in Graph Neural Networks (GNNs), particularly in the application of the Strong Lottery Ticket Hypothesis (SLTH). Specifically, the paper focuses on the following issues: 1. **Performance Difference Between Shallow GNNs and Dense Weight Training Models**: - Although Untrained GNNs Tickets (UGT) show better accuracy than Edge-Popup in shallow GNNs (such as GCN, GAT, and GIN), there is still a performance gap compared to baseline models trained with dense weights at different sparsity levels. The researchers pose the question: Is it possible to design a better SLTH method to maintain high accuracy in shallow GNNs? 2. **Performance and Memory Consumption of Deep GNNs**: - For deep GNNs, UGT achieves SLTH by increasing the number of layers, but the accuracy of these extended models cannot reach the level of two-layer models. Meanwhile, ResGCNs improve accuracy by increasing the number of layers but require more memory consumption. The researchers raise two questions: First, is it feasible to apply SLTH to deep GNNs? Second, can model efficiency be further improved by reducing model size without sacrificing high accuracy? ### Solutions To address the above challenges, the paper introduces the following methods and techniques: 1. **Multi-Coating Supermask (M-Sup)**: - M-Sup is a scalar pruning mask method that sets adaptive pruning thresholds in GNNs to discover untrained high-performance sub-networks. Experiments show that M-Sup maintains high accuracy at sparsity levels up to 90%, outperforming Single-Coating Supermask (S-Sup). 2. **Multi-Stage Folding (MSF) and Non-Shared Masks**: - These methods expand the search space, optimizing GNNs from both network structure and parameter aspects. Specifically, through non-shared masks and MSF, memory consumption can be significantly reduced while maintaining high accuracy. 3. **Untrained Recursive Graph Sub-Networks**: - The study finds that untrained recursive graph sub-networks exist in deep GNNs, whose performance is comparable to trained feedforward networks. This provides new possibilities for efficient, low-power graph processing. ### Experimental Results - **Shallow GNNs**: - On node-level tasks (such as Cora, Citeseer, and PubMed datasets) and graph-level tasks (such as OGBG-Molhiv and OGBG-Molbace datasets), M-Sup outperforms S-Sup at different sparsity levels and maintains high accuracy even at high sparsity. - **Deep GNNs**: - On the 28-layer ResGCN+ on the OGBN-Arxiv dataset and the 7-layer DyResGEN on the OGBG-Molhiv dataset, the combination of M-Sup and MSF not only improves accuracy but also significantly reduces memory consumption. For example, ResGCN+ reduces memory consumption by 97% on OGBN-Arxiv, while DyResGEN reduces memory consumption by 98% on OGBG-Molhiv. ### Conclusion By introducing techniques such as M-Sup, MSF, and non-shared masks, the paper successfully addresses the challenges of applying SLTH in shallow and deep GNNs, achieving the triple advantages of high sparsity, high accuracy, and high memory efficiency. This provides new solutions for efficient, low-power graph processing.

Multicoated and Folded Graph Neural Networks with Strong Lottery Tickets

Rethinking Graph Lottery Tickets: Graph Sparsity Matters

Graph Structure Learning Via Lottery Hypothesis at Scale.

NGAT: Attention in Breadth and Depth Exploration for Semi-Supervised Graph Representation Learning

Fast Track to Winning Tickets: Repowering One-Shot Pruning for Graph Neural Networks

Pre-Training Identification of Graph Winning Tickets in Adaptive Spatial-Temporal Graph Neural Networks

You Can Have Better Graph Neural Networks by Not Training Weights at All: Finding Untrained GNNs Tickets

Exploring Lottery Ticket Hypothesis in Spiking Neural Networks

Early-Bird GCNs: Graph-Network Co-Optimization Towards More Efficient GCN Training and Inference via Drawing Early-Bird Lottery Tickets

Adversarial Erasing with Pruned Elements: Towards Better Graph Lottery Ticket

Polynomially Over-Parameterized Convolutional Neural Networks Contain Structured Strong Winning Lottery Tickets

Pursing the Sparse Limitation of Spiking Deep Learning Structures

Gaining the Sparse Rewards by Exploring Lottery Tickets in Spiking Neural Network

Coarsening the Granularity: Towards Structurally Sparse Lottery Tickets

Graph Neural Networks Inspired by Classical Iterative Algorithms

Graph Unfolding Networks

Efficient Weight Pruning using Pre-trained Lottery Jackpots

Probabilistic Modeling: Proving the Lottery Ticket Hypothesis in Spiking Neural Network

Learning to Model Graph Structural Information on MLPs via Graph Structure Self-Contrasting

Deep Graph Neural Networks via Flexible Subgraph Aggregation

Adaptive Depth Graph Attention Networks