Combining Aggregated Attention and Transformer Architecture for Accurate and Efficient Performance of Spiking Neural Networks

Hangming Zhang,Alexander Sboev,Roman Rybka,Qiang Yu
2024-12-18
Abstract:Spiking Neural Networks have attracted significant attention in recent years due to their distinctive low-power characteristics. Meanwhile, Transformer models, known for their powerful self-attention mechanisms and parallel processing capabilities, have demonstrated exceptional performance across various domains, including natural language processing and computer vision. Despite the significant advantages of both SNNs and Transformers, directly combining the low-power benefits of SNNs with the high performance of Transformers remains challenging. Specifically, while the sparse computing mode of SNNs contributes to reduced energy consumption, traditional attention mechanisms depend on dense matrix computations and complex softmax operations. This reliance poses significant challenges for effective execution in low-power scenarios. Given the tremendous success of Transformers in deep learning, it is a necessary step to explore the integration of SNNs and Transformers to harness the strengths of both. In this paper, we propose a novel model architecture, Spike Aggregation Transformer (SAFormer), that integrates the low-power characteristics of SNNs with the high-performance advantages of Transformer models. The core contribution of SAFormer lies in the design of the Spike Aggregated Self-Attention (SASA) mechanism, which significantly simplifies the computation process by calculating attention weights using only the spike matrices query and key, thereby effectively reducing energy consumption. Additionally, we introduce a Depthwise Convolution Module (DWC) to enhance the feature extraction capabilities, further improving overall accuracy. We evaluated and demonstrated that SAFormer outperforms state-of-the-art SNNs in both accuracy and energy consumption, highlighting its significant advantages in low-power and high-performance computing.
Neural and Evolutionary Computing
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of how to effectively combine the low - power consumption characteristics of Spiking Neural Networks (SNNs) with the high - performance advantages of the Transformer model. Specifically, SNNs have lower energy consumption due to their sparse computing mode, but the traditional self - attention mechanism relies on dense matrix operations and complex softmax operations, which pose challenges to the effective execution in low - power consumption scenarios. In addition, traditional methods often find it difficult to maintain or improve model performance while reducing energy consumption. To solve these problems, the paper proposes a new model architecture - **Spike Aggregation Transformer (SAFormer)** and introduces the **Spike Aggregated Self - Attention (SASA)** mechanism. The SASA mechanism significantly simplifies the calculation process and effectively reduces energy consumption by calculating the attention weights only using the query and key of the spike matrix. In addition, a Depthwise Convolution Module (DWC) is introduced to enhance the feature extraction ability and further improve the overall accuracy. ### Main contributions 1. **Propose the SASA mechanism**: Avoid operations involving the value matrix, utilize the diverse features generated by the key matrix, and effectively reduce energy consumption. Through the aggregation matrix, SASA improves the expressiveness of the attention map, thereby enhancing the overall performance of the model. 2. **Develop the low - power, high - performance SAFormer framework**: Since the aggregation matrix can be designed to be smaller, this network provides an efficient solution for resource - constrained devices. 3. **Experimental verification**: Extensive experiments on the CIFAR - 10, CIFAR - 100, DVS128 - Gesture and CIFAR10 - DVS datasets show that the proposed architecture outperforms or matches the existing state - of - the - art SNNs in terms of accuracy and energy consumption. ### Formula summary - **LIF neuron model**: \[ H[t]=V[t - 1]+\frac{1}{\tau}(X[t]-(V[t - 1]-V_{\text{reset}})) \] \[ S[t]=\Theta(H[t]-V_{\text{th}}) \] \[ V[t]=H[t](1 - S[t])+V_{\text{reset}}S[t] \] - **SASA mechanism**: \[ QF = XW_Q,\quad KF = XW_K \] \[ Q = SN(BN(AG(QF))),\quad K = SN(BN(AG(KF))) \] \[ SASA'(Q, K)=SN(\text{SUM}_c(Q\otimes K)) \] \[ SASA(Q, K)=BN(\text{Linear}(SN(KD\oplus SASA'(Q, K)))) \] Through these improvements, SAFormer not only performs well on static images and neuromorphic datasets, but also achieves significant optimization in terms of energy consumption.