SpikeBERT: A Language Spikformer Learned from BERT with Knowledge Distillation

Changze Lv,Tianlong Li,Jianhan Xu,Chenxi Gu,Zixuan Ling,Cenyuan Zhang,Xiaoqing Zheng,Xuanjing Huang
2023-01-01
Abstract:Spiking neural networks (SNNs) offer a promising avenue to implement deepneural networks in a more energy-efficient way. However, the networkarchitectures of existing SNNs for language tasks are still simplistic andrelatively shallow, and deep architectures have not been fully explored,resulting in a significant performance gap compared to mainstreamtransformer-based networks such as BERT. To this end, we improve arecently-proposed spiking Transformer (i.e., Spikformer) to make it possible toprocess language tasks and propose a two-stage knowledge distillation methodfor training it, which combines pre-training by distilling knowledge from BERTwith a large collection of unlabelled texts and fine-tuning with task-specificinstances via knowledge distillation again from the BERT fine-tuned on the sametraining examples. Through extensive experimentation, we show that the modelstrained with our method, named SpikeBERT, outperform state-of-the-art SNNs andeven achieve comparable results to BERTs on text classification tasks for bothEnglish and Chinese with much less energy consumption. Our code is available athttps://github.com/Lvchangze/SpikeBERT.
What problem does this paper attempt to address?