Abstract:Fully exploiting the learning capacity of neural networks requires overparameterized dense networks. On the other side, directly training sparse neural networks typically results in unsatisfactory performance. Lottery Ticket Hypothesis (LTH) provides a novel view to investigate sparse network training and maintain its capacity. Concretely, it claims there exist winning tickets from a randomly initialized network found by iterative magnitude pruning and preserving promising trainability (or we say being in trainable condition). In this work, we regard the winning ticket from LTH as the subnetwork which is in trainable condition and its performance as our benchmark, then go from a complementary direction to articulate the Dual Lottery Ticket Hypothesis (DLTH): Randomly selected subnetworks from a randomly initialized dense network can be transformed into a trainable condition and achieve admirable performance compared with LTH -- random tickets in a given lottery pool can be transformed into winning tickets. Specifically, by using uniform-randomly selected subnetworks to represent the general cases, we propose a simple sparse network training strategy, Random Sparse Network Transformation (RST), to substantiate our DLTH. Concretely, we introduce a regularization term to borrow learning capacity and realize information extrusion from the weights which will be masked. After finishing the transformation for the randomly selected subnetworks, we conduct the regular finetuning to evaluate the model using fair comparisons with LTH and other strong baselines. Extensive experiments on several public datasets and comparisons with competitive approaches validate our DLTH as well as the effectiveness of the proposed model RST. Our work is expected to pave a way for inspiring new research directions of sparse network training in the future. Our code is available at <a class="link-external link-https" href="https://github.com/yueb17/DLTH" rel="external noopener nofollow">this https URL</a>.

Dissecting Lottery Ticket Transformers: Structural and Behavioral Study of Sparse Neural Machine Translation

Playing Lottery Tickets with Vision and Language

Exploring the Lottery Ticket Hypothesis with Explainability Methods: Insights into Sparse Network Performance

Early Transformers: A study on Efficient Training of Transformer Models through Early-Bird Lottery Tickets

Gaining the Sparse Rewards by Exploring Lottery Tickets in Spiking Neural Network

Finding the Dominant Winning Ticket in Pre-Trained Language Models

Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP

Coarsening the Granularity: Towards Structurally Sparse Lottery Tickets

The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter

LOTUS: Improving Transformer Efficiency with Sparsity Pruning and Data Lottery Tickets

SparseSpikformer: A Co-Design Framework for Token and Weight Pruning in Spiking Transformer

The Elastic Lottery Ticket Hypothesis

Pruning Redundant Mappings in Transformer Models Via Spectral-Normalized Identity Prior

Rethinking Graph Lottery Tickets: Graph Sparsity Matters

Data Level Lottery Ticket Hypothesis for Vision Transformers

Pruning Redundant Mappings in Transformer Models via Spectral-Normalized Identity Prior

Transformers Provably Learn Sparse Token Selection While Fully-Connected Nets Cannot

The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives

Dual Lottery Ticket Hypothesis

Robust Lottery Tickets for Pre-trained Language Models

Uncovering Layer-Dependent Activation Sparsity Patterns in ReLU Transformers