Abstract:Towards energy-efficient artificial intelligence similar to the human brain, the bio-inspired spiking neural networks (SNNs) have advantages of biological plausibility, event-driven sparsity, and binary activation. Recently, large-scale language models exhibit promising generalization capability, making it a valuable issue to explore more general spike-driven models. However, the binary spikes in existing SNNs fail to encode adequate semantic information, placing technological challenges for generalization. This work proposes the first fully spiking mechanism for general language tasks, including both discriminative and generative ones. Different from previous spikes with {0,1} levels, we propose a more general spike formulation with bi-directional, elastic amplitude, and elastic frequency encoding, while still maintaining the addition nature of SNNs. In a single time step, the spike is enhanced by direction and amplitude information; in spike frequency, a strategy to control spike firing rate is well designed. We plug this elastic bi-spiking mechanism in language modeling, named SpikeLM. It is the first time to handle general language tasks with fully spike-driven models, which achieve much higher accuracy than previously possible. SpikeLM also greatly bridges the performance gap between SNNs and ANNs in language modeling. Our code is available at <a class="link-external link-https" href="https://github.com/Xingrun-Xing/SpikeLM" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to use a fully spiking - based mechanism (i.e., SNNs, spiking neural networks) to achieve performance comparable to or even better than that of traditional artificial neural networks (ANNs) in language modeling tasks. Specifically, the paper studies two main challenges of existing SNNs in language tasks: 1. **Insufficient information representation ability**: Existing SNNs, due to the use of binary spikes ({0, 1}), have difficulty encoding sufficient semantic information, which limits their application and generalization ability in complex language tasks. 2. **Optimization difficulties**: Large - scale language models require stable gradient calculations, while the neuron dynamics in SNNs are non - differentiable, which brings difficulties to optimization. To overcome these challenges, the paper proposes a new framework - SpikeLM, which introduces an Elastic Bi - Spiking Mechanism. By expanding the direction, frequency, and amplitude information of spikes, it improves the representation ability of spike signals and the optimization stability of the model. This mechanism enables SpikeLM to achieve higher accuracy on a variety of language tasks than previous SNNs and significantly narrows the performance gap between SNNs and ANNs. ### Main contributions: - **Proposing SpikeLM**: This is the first fully spiking - based language model that can handle discriminative and generative language tasks, significantly expanding the application range of SNNs for language tasks. - **Elastic Bi - Spiking Mechanism**: This mechanism not only retains the additive characteristics of SNNs but also achieves a controllable spike - firing rate, thus achieving a better balance between performance and energy efficiency. - **Theoretical proof**: Through theoretical analysis of Dynamic Isometry, it is proved that the Elastic Bi - Spiking function is superior to the traditional ReLU function in training stability, ensuring the performance of SpikeLM in general language tasks. ### Experimental results: The paper conducted experiments on multiple standard datasets, including the GLUE benchmark test, text summarization tasks (XSUM and CNN - DailyMail), and machine translation tasks (WMT16 English - Romanian pair). The experimental results show that SpikeLM has achieved significant performance improvements in these tasks. Especially in the GLUE benchmark test, the performance of SpikeLM is close to or even exceeds that of some traditional ANNs models while maintaining low energy consumption. In general, through the innovative spike - encoding mechanism, this paper successfully solves the key problems of SNNs in language modeling tasks and provides new ideas for developing more efficient and low - energy - consumption artificial intelligence systems in the future.

SpikeLM: Towards General Spike-Driven Language Modeling via Elastic Bi-Spiking Mechanisms

SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking

SpikingMiniLM: Energy-Efficient Spiking Transformer for Natural Language Understanding

Toward Efficient Processing and Learning with Spikes: New Approaches for Multispike Learning

SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks

Towards Energy-Preserving Natural Language Understanding with Spiking Neural Networks

SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking State Space Models

P-SpikeSSM: Harnessing Probabilistic Spiking State Space Models for Long-Range Dependency Tasks

SPikE-SSM: A Sparse, Precise, and Efficient Spiking State Space Model for Long Sequences Learning

A Low Latency Adaptive Coding Spike Framework for Deep Reinforcement Learning

Nonlinear Modeling of Neural Interaction for Spike Prediction Using the Staged Point-Process Model

SpikeVoice: High-Quality Text-to-Speech Via Efficient Spiking Neural Network

Event-driven Spiking Neural Networks with Spike-Based Learning

Spike Attention Coding for Spiking Neural Networks.

PT-Spike: A Precise-Time-Dependent Single Spike Neuromorphic Architecture with Efficient Supervised Learning

SpikingBERT: Distilling BERT to Train Spiking Language Models Using Implicit Differentiation

SpikeBERT: A Language Spikformer Learned from BERT with Knowledge Distillation

Scaling Spike-driven Transformer with Efficient Spike Firing Approximation Training

SpikeCLIP: A Contrastive Language-Image Pretrained Spiking Neural Network

An end-to-end functional spiking model for sequential feature learning

Spike-driven Multi-Scale Learning with Hybrid Mechanisms of Spiking Dendrites.