Abstract:Spiking Neural Networks have attracted significant attention in recent years due to their distinctive low-power characteristics. Meanwhile, Transformer models, known for their powerful self-attention mechanisms and parallel processing capabilities, have demonstrated exceptional performance across various domains, including natural language processing and computer vision. Despite the significant advantages of both SNNs and Transformers, directly combining the low-power benefits of SNNs with the high performance of Transformers remains challenging. Specifically, while the sparse computing mode of SNNs contributes to reduced energy consumption, traditional attention mechanisms depend on dense matrix computations and complex softmax operations. This reliance poses significant challenges for effective execution in low-power scenarios. Given the tremendous success of Transformers in deep learning, it is a necessary step to explore the integration of SNNs and Transformers to harness the strengths of both. In this paper, we propose a novel model architecture, Spike Aggregation Transformer (SAFormer), that integrates the low-power characteristics of SNNs with the high-performance advantages of Transformer models. The core contribution of SAFormer lies in the design of the Spike Aggregated Self-Attention (SASA) mechanism, which significantly simplifies the computation process by calculating attention weights using only the spike matrices query and key, thereby effectively reducing energy consumption. Additionally, we introduce a Depthwise Convolution Module (DWC) to enhance the feature extraction capabilities, further improving overall accuracy. We evaluated and demonstrated that SAFormer outperforms state-of-the-art SNNs in both accuracy and energy consumption, highlighting its significant advantages in low-power and high-performance computing.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of how to effectively combine the low - power consumption characteristics of Spiking Neural Networks (SNNs) with the high - performance advantages of the Transformer model. Specifically, SNNs have lower energy consumption due to their sparse computing mode, but the traditional self - attention mechanism relies on dense matrix operations and complex softmax operations, which pose challenges to the effective execution in low - power consumption scenarios. In addition, traditional methods often find it difficult to maintain or improve model performance while reducing energy consumption. To solve these problems, the paper proposes a new model architecture - **Spike Aggregation Transformer (SAFormer)** and introduces the **Spike Aggregated Self - Attention (SASA)** mechanism. The SASA mechanism significantly simplifies the calculation process and effectively reduces energy consumption by calculating the attention weights only using the query and key of the spike matrix. In addition, a Depthwise Convolution Module (DWC) is introduced to enhance the feature extraction ability and further improve the overall accuracy. ### Main contributions 1. **Propose the SASA mechanism**: Avoid operations involving the value matrix, utilize the diverse features generated by the key matrix, and effectively reduce energy consumption. Through the aggregation matrix, SASA improves the expressiveness of the attention map, thereby enhancing the overall performance of the model. 2. **Develop the low - power, high - performance SAFormer framework**: Since the aggregation matrix can be designed to be smaller, this network provides an efficient solution for resource - constrained devices. 3. **Experimental verification**: Extensive experiments on the CIFAR - 10, CIFAR - 100, DVS128 - Gesture and CIFAR10 - DVS datasets show that the proposed architecture outperforms or matches the existing state - of - the - art SNNs in terms of accuracy and energy consumption. ### Formula summary - **LIF neuron model**: \[ H[t]=V[t - 1]+\frac{1}{\tau}(X[t]-(V[t - 1]-V_{\text{reset}})) \] \[ S[t]=\Theta(H[t]-V_{\text{th}}) \] \[ V[t]=H[t](1 - S[t])+V_{\text{reset}}S[t] \] - **SASA mechanism**: \[ QF = XW_Q,\quad KF = XW_K \] \[ Q = SN(BN(AG(QF))),\quad K = SN(BN(AG(KF))) \] \[ SASA'(Q, K)=SN(\text{SUM}_c(Q\otimes K)) \] \[ SASA(Q, K)=BN(\text{Linear}(SN(KD\oplus SASA'(Q, K)))) \] Through these improvements, SAFormer not only performs well on static images and neuromorphic datasets, but also achieves significant optimization in terms of energy consumption.

Combining Aggregated Attention and Transformer Architecture for Accurate and Efficient Performance of Spiking Neural Networks

Spikeformer: Training high-performance spiking neural network with transformer

Towards High-performance Spiking Transformers from ANN to SNN Conversion

Scaling Spike-driven Transformer with Efficient Spike Firing Approximation Training

Spiking Transformer with Spatial-Temporal Attention

SpikingResformer: Bridging ResNet and Vision Transformer in Spiking Neural Networks

Masked Spiking Transformer

Spikformer: When Spiking Neural Network Meets Transformer

SpikingMiniLM: Energy-Efficient Spiking Transformer for Natural Language Understanding

Towards Energy-Preserving Natural Language Understanding with Spiking Neural Networks

TE-Spikformer:Temporal-enhanced spiking neural network with transformer

Spatial-Temporal Self-Attention for Asynchronous Spiking Neural Networks

Stochastic Spiking Attention: Accelerating Attention with Stochastic Computing in Spiking Networks

Spikingformer: Spike-driven Residual Learning for Transformer-based Spiking Neural Network

Towards 3D Acceleration for low-power Mixture-of-Experts and Multi-Head Attention Spiking Transformers

HybridSNN: Combining Bio-Machine Strengths by Boosting Adaptive Spiking Neural Networks.

QKFormer: Hierarchical Spiking Transformer using Q-K Attention

SpikeGraphormer: A High-Performance Graph Transformer with Spiking Graph Attention

Spike-driven Transformer V2: Meta Spiking Neural Network Architecture Inspiring the Design of Next-generation Neuromorphic Chips

RTFormer: Re-parameter TSBN Spiking Transformer

Spike Trains Encoding and Threshold Rescaling Method for Deep Spiking Neural Networks