SpikingMiniLM: Energy-Efficient Spiking Transformer for Natural Language Understanding

Jiayu Zhang,Jiangrong Shen,Zeke Wang,Qinghai Guo,Rui Yan,Gang Pan,Huajin Tang
DOI: https://doi.org/10.1007/s11432-024-4101-6
2024-01-01
Abstract:In the era of large-scale pretrained models, artificial neural networks(ANNs) have excelled in natural language understanding(NLU) tasks. However, their success often necessitates substantial computational resources and energy consumption. To address this, we explore the potential of spiking neural networks(SNNs) in NLU——a promising avenue with demonstrated advantages, including reduced power consumption and improved efficiency due to their event-driven characteristics. We propose the SpikingMiniLM,a novel spiking Transformer model tailored for natural language understanding. We first introduce a multi-step encoding method to convert text embeddings into spike trains. Subsequently, we redesign the attention mechanism and residual connections to make our model operate on the pure spike-based paradigm without any normalization technique. To facilitate stable and fast convergence, we propose a general parameter initialization method grounded in the stable firing rate principle. Furthermore, we apply an ANN-to-SNN knowledge distillation to overcome the challenges of pretraining SNNs. Our approach achieves a macro-average score of 75.5 on the dev sets of the GLUE benchmark, retaining 98% of the performance exhibited by the teacher model MiniLMv2. Our smaller model also achieves similar performance to BERT MINI with fewer parameters and much lower energy consumption, underscoring its competitiveness and resource efficiency in NLU tasks.
What problem does this paper attempt to address?