SPikE-SSM: A Sparse, Precise, and Efficient Spiking State Space Model for Long Sequences Learning

Yan Zhong,Ruoyu Zhao,Chao Wang,Qinghai Guo,Jianguo Zhang,Zhichao Lu,Luziwei Leng
2024-10-07
Abstract:Spiking neural networks (SNNs) provide an energy-efficient solution by utilizing the spike-based and sparse nature of biological systems. Since the advent of Transformers, SNNs have struggled to compete with artificial networks on long sequential tasks, until the recent emergence of state space models (SSMs), which offer superior computational efficiency and modeling capability. However, applying the highly capable SSMs to SNNs for long sequences learning poses three major challenges: (1) The membrane potential is determined by the past spiking history of the neuron, leading to reduced efficiency for sequence modeling in parallel computing scenarios. (2) Complex dynamics of biological spiking neurons are crucial for functionality but challenging to simulate and exploit effectively in large networks. (3) It is arduous to maintain high sparsity while achieving high accuracy for spiking neurons without resorting to dense computing, as utilized in artificial neuron-based SSMs. To address them, we propose a sparse, precise and efficient spiking SSM framework, termed SPikE-SSM. For (1), we propose a boundary compression strategy (PMBC) to accelerate the inference of the spiking neuron model, enabling parallel processing for long sequence learning. For (2), we propose a novel and concise neuron model incorporating reset-refractory mechanism to leverage the inherent temporal dimension for dynamic computing with biological interpretability. For (3), we hierarchically integrate the proposed neuron model to the original SSM block, and enhance the dynamics of SPikE-SSM by incorporating trainable thresholds and refractory magnitudes to balance accuracy and sparsity. Extensive experiments verify the effectiveness and robustness of SPikE-SSM on the long range arena benchmarks and large language dataset WikiText-103, showing the potential of dynamic spiking neurons in efficient long sequence learning.
Neural and Evolutionary Computing,Artificial Intelligence
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on three aspects: 1. **Membrane potential depends on the past spiking history**: In spiking neural networks (SNNs), the membrane potential of a neuron depends on its past spiking history, which leads to reduced efficiency in sequence modeling in parallel computing scenarios. 2. **The complex dynamics of biological spiking neurons are difficult to effectively simulate**: The dynamics of biological spiking neurons are crucial for function, but it is difficult to efficiently simulate and utilize these complex dynamic characteristics in large - scale networks. 3. **Achieving high precision while maintaining high sparsity**: Maintaining high sparsity while achieving high precision without using dense computing is a challenge for spiking neurons, and artificial - neuron - based SSMs usually rely on dense computing. To address these challenges, the authors propose a framework named SPikE - SSM, which has the following features: - **Perimeter - Minimizing Boundary Compression (PMBC)**: By proposing a perimeter - minimizing boundary compression strategy to accelerate the inference of spiking neuron models, thereby achieving parallel processing in long - sequence learning. - **Novel reset - refractory period mechanism**: A new neuron model is proposed, which combines the reset - refractory period mechanism, utilizes the inherent time dimension for dynamic calculation, and is biologically interpretable. - **Hierarchical integration into SSM blocks**: The proposed neuron model is hierarchically integrated into the original SSM blocks, and the dynamics of SPikE - SSM are enhanced by introducing trainable thresholds and refractory period amplitudes to balance accuracy and sparsity. Through these methods, SPikE - SSM aims to improve the efficiency and accuracy of long - sequence learning, especially when dealing with large - scale language datasets (such as WikiText - 103) and long - range task benchmarks (such as the LRA benchmark).