Lottery Ticket Hypothesis for Attention Mechanism in Residual Convolutional Neural Network*

Zhongzhan Huang,Senwei Liang,Mingfu Liang,Wei He,Haizhao Yang,Liang Lin
DOI: https://doi.org/10.1109/icme57554.2024.10688205
2024-01-01
Abstract:Recently many plug-and-play self-attention modules (SAMs) are proposed to enhance the model performance by exploiting the internal information of deep convolutional neural networks. However, most SAMs connect individually with each block of the backbone for granted, leading to incremental computational cost and the number of parameters with the growth of network depth. To address this issue, we first empirically and theoretically explore the Lottery Ticket Hypothesis for Self-attention Networks (LTH4SA): a full self-attention network contains a subnetwork with sparse self-attention connections that can (1) accelerate inference, (2) reduce extra parameter increment, and (3) maintain accuracy. Furthermore, we propose a simple yet effective policy-gradient-based baseline method to search the ticket, i.e., the connection scheme that satisfies the three above-mentioned conditions. Extensive experiments on widely-used benchmarks and popular self-attention networks show the effectiveness of our method. Besides, our experiments illustrate that our searched ticket has the capacity of transferring to other vision tasks, e.g., crowd counting and segmentation. https://github.com/gbup-group/EAN-efficient-attention-network.
What problem does this paper attempt to address?